And that's a great philosophy until your email gets rejected by some service tha...

jwecker · on Sept 8, 2016

Back when SMTP servers still had remnants of UUCP etc.- where the address could actually contain characters that specify intermediate servers to route to, I would have argued that front-end sanitization was important as, for example, html sanitization from end-users from a security perspective.

However, IETF made lots of progress simplifying things- to the point where, at the very least, the standard tells us specifically that we should leave it up to the destination host to interpret the local part of the email address-- that is, the thing to the right of the @ should be given the thing on the left unmolested ideally- even being ignored by intermediate relay servers. Since that's what most people complain about, any validation to the left of the @ should become extinct.

But off the top of my head that still leaves the thing on the right of the @ (such as localhost), buffer overflows by allowing longer strings than the standard allows (those limits do exist), and the problem with multiple @ which the MTA may or may not handle well... Since I'm not a security expert I'm going to go out on a limb and assume that I'm missing a bunch of other things.

My point is not that the article is wrong, though- my point is that if he wanted to convince me to only validate by sending the email on any string, he should convince me that those security concerns are not an issue- not that it's not good at catching type-os.

Kadin · on Sept 8, 2016

I think there is a reasonable middle-ground for validating the domain side of an email address. There are RFCs on all this stuff; it's not just a total free-for-all. The RFCs just aren't nearly as strict as a lot of badly-designed validation regexps are, presumably because most people are unaware of the diversity of acceptable email addresses.

The currently operative RFC is 5322, specifically section 3.4.1: https://tools.ietf.org/html/rfc5322#section-3.4.1

There are some basic rules that you could safely apply to an address, which would prevent some attacks (buffer overflows, etc.) while also not blocking any legitimate addresses. E.g. limiting the overall length to 255 characters, for instance, could be defensible practice. There are also well-defined rules for validating the domain portion, since it has to be a routable address by definition.

What nobody ought to be doing is looking too hard at the string to the left of the @ symbol, because it's designed purely as instructions to the recipient server. Nobody else needs to care about it; only the receiving mailserver needs to actually parse that part of the address, in order to put the message into the right mailbox. From what I've seen, the vast majority of false-positive validation failures occur because people are looking at the mailbox portion of an email address when they have no business doing so.