Weekly product update - Improvements to email validation API
A few weeks ago, we launched a new API to validate email addresses submitted through web forms (API reference & demo). Sending billions of emails, we noticed significant error rates that come from users mistyping email addresses, and the pains that developers go through to validate them using complex regex expressions. We thought people would be happy with the API, but we were blown away by the response from the developer community. Our announcement made it all the way to the top of HackerNews, even higher than an article about how legal weed is hurting San Francisco hippies which is quite an accomplishment. : )
We have several customers using the validator on their production sites already, including the awesome folks at bellechic.com, lolshirts.com and tanga.com. Our users noticed a few bugs in the validator, so today we wanted to provide an update on the what we've been doing since launch to make the validator even more accurate, and some things that are on the horizon.
Improvements to email validation algorithm
Since launching the email validator, we're validating thousands of email addresses everyday for our customers. Interestingly, the majority of these are from e-commerce stores, but we're also seeing usage from apps with user accounts. Since these customers have real dollars on the line, limiting errors in the validator was a big priority for us.
Based on user feedback (thanks y'all!), we've advanced the validator in a few areas, most importantly with custom grammar for Gmail, the largest email service provider. Here are some of the things that we've worked on this week:
1) Better custom grammar for Gmail
- Gmail does allow consecutive dots (.) in the local part of an email address (that's the part before the @) but strips them before processing them. So Gmail treats michael...is...firstname.lastname@example.org the same as email@example.com. This (along with tags) is one of the many ways people people keep track of which email addresses they've given to which site (they can counts the "extra" dots). The validator now accounts for this and strips extra dots, just like Gmail, when validating addresses.
michael....is....firstname.lastname@example.org now returns as valid
- Gmail usernames can be between 6 and 30 characters, so you can't have email@example.com. However, Gmail supports tags in email addresses so, again, you can keep track of which site you gave your email address to (this one is a little easier than dots to manage!). Previously, the validator calculated username length by including all characters ("+mailgun" in the example "firstname.lastname@example.org"). This was incorrect however since the comment is not a part of the address registered at Gmail.
email@example.com now returns as invalid
2) ai is a real (if not obscure) TLD, so we process it correctly now
- Even though it's uncommon, top level domains (TLD) like .com, .net, and .org can all receive email if the owner has set up their MX records. While .com, .net, and .org do not have MX records set, the TLD .ai has setup their MX records. Previously we were marking @ai addresses as invalid, but we've now fixed this.
3) More logical functions in the API
Originally we included support for display-name parsing (in "Bobby <firstname.lastname@example.org>", the display name is "Bobby"), but realized that the main use case was for the address itself, and not display-names. That's why we've made the /validate end point strictly parse addresses while /parse handles display names as well.
Things you can look forward to
We want to continue improving the validator so developers can say goodbye to complex (and ineffective) regex validation. So, here are some things that we are working on:
- We are working to open source the entire validator service along with our awesome MIME parsing library. Watch out for those, they are coming soon!
- We have been working on adding more ESP specific custom grammar. We'll be able to recognize mistakes in more addresses, and give even better validation results, when we detect ESPs we know about.
- We are working on improving the spelling corrector. We have some tricks up our sleeve that we will be using to give better and more accurate suggestions when someone mistypes a domain.
Till next week!