Tips & Tricks: Guide to using Regular Expressions to filter incoming email
One of the features that gets the most love from our customers are Routes. Routes let you accept, parse and POST or forward your incoming emails. We've done some work lately to make Routes even more useful by adding the ability to test your webhook endpoints when you want to POST your incoming emails to your app. None of this matters though if you are not able to effectively accept the incoming emails. Today, we'll walk you through some examples of the most common ways to group incoming emails for processing. These methods use Regular Expressions which are extremely powerful, but can also be tricky to work with sometimes. Hence today's post. Hope you enjoy it!
Using Regular Expressions
The Regular Expression notation for Routes is based on the Python spec and there is an excellent review of Regular Expressions over at python.org. We're not going to go into all the details, but we do want to point out one error that developers who are new to regular expressions often make:
[list type = x]
- The asterisk "*" itself is not a wildcard in regular expression syntax. To match any character, you need to use a period "." To match any series of characters, you'd use a period followed by an asterisk".*"
Examples of using Regular Expressions
Here are some of the most common regular expressions that we see customers using when receiving messages. This list only scratches the surface of what you can do, but hopefully it will get you thinking about the power of regular expressions.
To make things easier, we've grouped the expressions into 3 groups
I. Matching on variations of the recipient of the email.
a. Match defined recipient on any domain for the account:
Explanation- we are looking for a specific recipient for any domains currently loaded in the Mailgun "Domains" tab. Keep in mind, you must have your MX records pointed to Mailgun before Mailgun will accept messages for that domain.
b. Match defined recipient with plus addressing for a specific domain:
Explanation- Mailgun configures our inbound mail server to accept recipients with plus addressing. You could also limit the plus addresses by using the syntax from the below example.
c. Match several defined recipients sent to a specific domain:
Explanation- we've used regular expressions to define a set of recipients that are valid for a specific domain.
d. Match any recipient sent to a specific domain:
Explanation- we've created a "catch all" for a specific domain. Don't get confused with the global catch all that Mailgun Routes provides where all emails received are forwarded.
e. Use a named capture to forward a message to an external recipient:
match_recipient('(?P<user>.*?)@example.com') -> forward('\g<user>@externaldomain.com')
Explanation- we want Mailgun to receive and forward the incoming message to an external domain, but retain the user to user mapping. To do this, we use a named capture. The named capture will remember the "user" and use it in the forward action.
II. Matching on specific headers from the email.
a. Match a defined from: attribute:
Explanation- we want the route to trigger for any email that is from "email@example.com". Notice we add the wildcards before and after the email address. This is because a "From" field can contain several other attributes. For example, the sender's name. "Mailgun Bob <firstname.lastname@example.org>"
b. Match several defined keywords in the subject:
Explanation- we are looking for any messages with a subject that contains "urgent" OR "help" OR "asap". We add wildcards on both the beginning and end. This example would trigger on a subject like this: "My request is urgent!".
c. Collect incoming spam messages to an external mailbox:
match_header('X-Mailgun-Sflag', 'Yes') -> forward('email@example.com')
Explanation- Mailgun provides spam filtering for inbound messages. When we determine a message is spam, we inject a special header. You can use Routes to filter messages based on these headers. In this case, we are forwarding the message to an external mailbox, so we can review it later.
III. Chaining Regular Expressions to match on multiple attributes
a. Match any recipient and is a Reply:
match_recipient('^(.*)@example.com$') and match_header("subject", "^(Re:|re:|RE:).*$")
Explanation- we've created a "catch all" for a specific domain, but we only route messages when it's a reply to our original thread. You could also use "Fw" to represent a forwarded message.
b. Match any recipient and if the message is in English only:
match_recipient('^(.*)@example.com$') and match_header("Content-Language", "^(.*)en-US(.*)$")
Explanation- we've created a "catch all" for a specific domain, but we only route messages when the content language is in English. For other languages, check out the ISO specification for content languages, https://en.wikipedia.org/wiki/ListofISO639-1codes.
We hope these examples will help you write more effective regular expressions for filtering your incoming emails. Something else you'd like to know how to write? Let us know in the comments and we'll see if we can come up with an example.