Email content filtering is evil
Posted by simonw on Tue 11 Jul 2006 at 08:59
As those who have read my blog, and my article on Postfix spam prevention, I'm not keen on content filtering to detect spam, as it inevitably leads to false positives, and it doesn't require much imagination to work around it if you are a spammer.
This week I finally implemented some additional spam filtering at work. Amongst other minor changes I enabled the requirement that the domain of the sender in an email message must exist. For Postfix this is something like;
smtpd_sender_restrictions = reject_unknown_sender_domain, ...other restrictions here...
This isn't too controversial or radical you might think, as all your legitimate emails have a valid domain for the sender, right? Have you actually checked every legitimate email you receive (or would have received)?
Well I was very dubious of this change, and historically it was an expection from the standards that one would not make such a requirement. I tried it previously and had to back it out, because it caused a few problems.
First you want to be sure that your own internal administration emails for alerting you to problems and such like all have valid domains. But computers aren't clients or bosses, so you can always switch the check on and check the logs (See also "warn_if_reject" for the Postfix was of doing this without rejecting the emails and "soft_bounce").
After some careful checking, I finally enabled this check on our email servers this week, and looked at the logs carefully. End of the first day, top of the log file for domains rejected by this rule were emails from "ebayco.uk" (Obviously not a real domain as UK companies get a third level domain name in ".co.uk" like "ebay.co.uk"). Great you think, I've stopped an Ebay scammer, but no the emails were coming from an Ebay SMTP server.
I think requiring a valid sending domain is a reasonable thing to do. I've been doubtful of two things, losing internal emails (especially infrequent ones), and putting in tests that encourage spammers to use real domain names (without also requiring that they have some right to use that domain name). But because we forward so much email to other servers, and many of those servers make this requirement, it ultimately results in a cleaner queue our end (and less spam delivered) if we only accept emails where a domain both exists, and is resolvable at the time the email is sent to us.
Lesson 1 - even the simplest, and most basic of checks on email content can result in false positives. Although by doing the check early, we avoid creating backscatter, that might happen if we did this after queuing the email.
Lesson 2 - some types of false positive make the sender look so daft (can't type their own domain name) they are unlikely to complain too loudly. Learning to recognise which policy changes fit into this category is by far the most useful thing to learn from this.
Lesson 3 - Good email administration is hard work, I'd suggest most companies should outsource it along with DNS management, as economies of scale quickly accumulate in providing these sort of services, but you need to choose a provider carefully.