Weblog entry #185 for Steve

Spam statistics & greylisting...
Posted by Steve on Sun 16 Sep 2007 at 14:11
Tags: , ,

Here are my incoming spam stats for yesterday:

                   Spam Rejecting Plugin      Count
--------------------------------------------------------------
                                   dnsbl       6799
                             hosts_allow       1596
                             greylisting        740
                       check_earlytalker        429
                         check_badrcptto        239
             require_resolvable_fromhost        217
                          check_spamhelo         85
                           virus::clamav         69
                       check_badmailfrom         11
--------------------------------------------------------------
                                          Total Mails    : 10610
                                          Total SPAM     : 10185
                                          Total Accepted : 425

                                          Spam Percentage: 95.99%

Of over 10 thousand incoming mails I accepted only 400ish for delivery. The rest were rejected at SMTP time - making my total mail 4.01% non-spam.

Most were rejected because of DNSRBL (6799). Then 1596 were rejected because they were sent by hosts which have been on a DNSRBL in the past five days.

Finally, the reason for this post, of the remaining mail 740 senders sent a message which was not retried via greylisting.

Since mail was already rejected by DNSRBLs prior to that test it is unclear how effective greylsting would have been - but as a percentage of all mail not already rejected 33%.

I guess that means it is worth keeping - however it is clear that greylisting alone does virtually nothing for my system since most of my spam is routed via my @debian.org mail address - which will happily retry until the greylisting succeeds. (via master.debian.org).

 

Comments on this Entry

Posted by ajt (84.12.xx.xx) on Sun 16 Sep 2007 at 15:38
[ Send Message | View Weblogs ]
What's your email stack that this data is from?

I'm assuming you have talked about it in the past, I'm just curious what the exact combination is at the moment?

--
"It's Not Magic, It's Work"
Adam

[ Parent | Reply to this comment ]

Posted by Steve (82.32.xx.xx) on Sun 16 Sep 2007 at 15:43
[ Send Message | View Steve's Scratchpad | View Weblogs ]

All stats coming from a soon-to-be-launched SMTP proxy service, currently in use by myself and a few local people.

Internally it uses MySQL for saving per-domain configuration data (with a web-based control panel allowing different filtering types to be enabled/disabled on a per-domain basis and logs to be viewed).

At the SMTP side I use a combination of Exim4 + QPSMTPD + a completely replaced collection of custom plugins to do the detection/processing and logging.

Steve

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Mon 17 Sep 2007 at 22:15
[ Send Message | View Weblogs ]
Forwarding defeats Greylisting; in other news dog bites man.

I have a domain that only gets spam, with no known forwarding, that makes the maths easier.

Yesterday;

49 spam email delivered.
15400 Rejected connections.

In order (roughly).

34 HELO'ed with our own servers details.
14046 on SPAMHAUS zen list.
1105 Greylisted connections.
161 on ix.dnsbl.manitu.net
54 sender domain did not exist.

So Greylisting is still getting about 80% of spam email exposed to it. This is a lot less effective that my previous 99% measured for Greylisting when I introduced it, but one needs to account for other changes in the processes used. For example spamhaus now lists IP address space declared as dead by ISPs, which means it should catch a greater proportion of spambots.

The domain is slightly unusual in that I accept all email for all addresses. Real domains are best configured without catch-alls, which usually kills another fair chunk.

I've no doubt Greylisting has been supplanted by Spamhaus Zen, as the most effective single measure, due to the advent of spambots that retry. Greylisting is still very effective where no forwarding occurs.

For my personal email I'm using the policy-weightd package, and I believe it substantially improves on the better than the ~99.7% spam kill rate above, with (so far) no discovered false positives, but I've not documented the skill (I don't get that much rubbish email sent to my own email address).

The filtering above is still almost entirely devoid of content based filtering, and so errors are all related to sender behaviour and reputation, and won't block email that just happens to look like spam, or talk about spam/abuse issues. On the other hand it doesn't stop spam forwarded by Debians mail servers either, or outscatter.

Almost all my spam is forwarded from the GNU email servers, who claim they have just finished a roll-out of new anti-spam system. A close second is a mail server run by a guy who works for Message Labs (but at least he has the excuse it is too much like his day job).

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Mon 17 Sep 2007 at 22:24
[ Send Message | View Weblogs ]
PS: Constant Contact have the dishonour of being the only recognizably "non-bot" sender of any of the 49 spams that got through (30 of those spams were sent by the same bot).

Does that mean Constant Contact constitute 7% of all spam sources on the Internet? Lies, damned lies, and statistics.

[ Parent | Reply to this comment ]

Posted by endecotp (86.6.xx.xx) on Tue 18 Sep 2007 at 16:51
[ Send Message | View Weblogs ]
Do you (or simonw) have any idea about the false positive rates, e.g. from the blacklists?

My most serious recent spam problem has been backscatter, i.e. someone uses my address as the from: in their spam, and I get the bounces (about 5000 in one weekend). Of course this has different characteristics from other spam and it's harder to filter; I'm currently rejecting mail 'from:<>' from IPs in backscatterer.org, but this includes sites like the sourceforge and gnu.org mail servers that do "call out" verification.....

Phil.

[ Parent | Reply to this comment ]

Posted by simonw (212.24.xx.xx) on Tue 18 Sep 2007 at 18:00
[ Send Message | View Weblogs ]
Blacklist error rates depend on the blacklist.

I tested the two I mention extensively, and have never seen a false positive with either (unlike Greylisting), in one place this is many 10's millions of (mostly reject) decisions, no complaints.

That said I like spamhaus because they have end user removal. i.e. If a real person is blocked, they can click on a link, and say "I'm a real person not a spambot", and so there is no need for people to bug me (postmaster) to get themselves removed.

I think a bigger issue is big chunks of IP space, controlled by static IP address spammers, not in spamhaus. They can be a tad conservative, which is I guess why they have so few false positives.

Backscatter/outscatter is discussed by Wietse on the Postfix site. If you control the sending hosts for all email for a domain you can try Domain keys (or [spits] SPF), but Wietse explains how you can simply tag outgoing mail, so that you can recognize a lot of junk that isn't a genuine bounce. Blocklists are useless against outscatter, since most of the sources are genuine email servers (or Barracuda boxes [spit]).

"Call out" is bad karma, as it shifts forged sender costs, onto the forged sender, and he probably doesn't have the resources that the spammer does. Bad GNU (I've told them this several times). It also messes with greylisting, since the call out is declined first time, delaying email unnecessarily, but I think that is a minor point, since the host is usually whitelisted soon enough.

[ Parent | Reply to this comment ]

Posted by endecotp (86.6.xx.xx) on Tue 18 Sep 2007 at 18:30
[ Send Message | View Weblogs ]
> never seen a false positive ... (unlike Greylisting)

So you mean that you get false positives with greylisting, i.e. genuine messages where the sending MTA doesn't retry? Or do you just mean that the messages are delayed? If the former, I'm worried!

> If you control the sending hosts for all email for a domain
> you can try Domain keys (or [spits] SPF)

Ah, SPF, that's something else that you & Steve don't mention. I have published SPF records, and some of the *clueless* backscatter bounces were sent to me because the SPF had failed! *D'oh*

> Blocklists are useless against outscatter, since most of the sources are
> genuine email servers

Have a look at backscatterer.org if you start to find it a problem. You won't want to treat it like other blacklists though; only for from:<> messages.

> (or Barracuda boxes [spit]).

Double spit.

> "Call out" is bad karma, as it shifts forged sender costs, onto the
> forged sender, and he probably doesn't have the resources that the
> spammer does.

Agreed, though wasting my bandwidth is orders of magnitude less bad then putting equivalent volumes of spam in my inbox. This leads to a problem with backscatterer.org; they treat sites that do callout the same as sites that send backscatter bounces, and I'd like to be able to be selective.

> Bad GNU (I've told them this several times).

And I've told sourceforge. One problem is that some sourceforge lists, like Debian lists, don't require subscription. So they do callout to keep the spam down. Although it can be useful to let anyone post messages without subscribing, I think the callout issue outweighs it; if you require subscription before posting then there's no need for callout.

However, I did find a solution that lets me post to sourceforge and gnu lists (based on something Google found in the Exim docs I think): my backscatterer blacklist rule is applied after DATA, not after RCPT; the callout verification sees a normal response to the RCPT command and is happy.

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Wed 19 Sep 2007 at 23:05
[ Send Message | View Weblogs ]
Greylisting has non-zero false positive.

Some mail servers don't retry, since I'd reject email from these anyway if the server was overwhelmed with spam (or even genuine email), I regard it as acceptable to reject email from such systems.

Some email is retried from servers in a different /24, thus failing Postgrey greylisting. I've not seen one of these in anger, but I did follow up on a report of such on the Postgrey mailing list, so they exist but are extremely rare.

SPF I don't mention as I don't reject on this basis. Too much email is forwarded without envelope rewriting, that rejecting on SPF fail/softfail would reject more genuine email than spam (after other filters have been applied), as such I regard SPF broken by design.

Note I regard almost all spam filtering as a tactical fix, the real problem (in most cases) is the large number of compromised nodes on the Internet.

[ Parent | Reply to this comment ]

Posted by Utumno (60.248.xx.xx) on Fri 21 Sep 2007 at 08:13
[ Send Message | View Utumno's Scratchpad | View Weblogs ]

You guys talk from a mail administrator standpoint; let me add my $0.02 from a user's standpoint.

I have a shell account at a friend's FreeBSD server and have been receiving all my private mail there for the last 12 years. He provides a simple spamassasin+procmail setup. Here's a snipplet from my .procmailrc:

(...)

# deal with spam:
# SPAM_LEVEL <= 4.0 --> good
# 4.0 < SPAM_LEVEL <= 15.0 --> possible spam
# 15.0 < SPAM_LEVEL --> /dev/null

# this bastard just wont remove me from his mailing list
:0:
* ^From.*pgr_art*
/dev/null

:0fw:spamassassin.lock
| /usr/local/bin/spamc -U /var/run/spamd

# delete all mail with SPAM STATUS >= 15
:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
/dev/null

# move all mail above SPAM threshold to the SPAM folder
:0:
* ^X-Spam-Status: Yes
$SPAM

First, I've been testing this for 2 years without the /dev/null option. During those two years, I got 3 or 4 false positives, but all of them scored well below the 15 point threshold. As I receive about 100 spams a day, and cleaning the spam/ folder was a big nuisance, I decided to just send everything above 15 penalty points straight to /dev/null.

I dont keep exact statistics, but looking at my procmail.log reveals that about 80% of the mail goes to /dev/null, next 15% to spam/ , while about 5% goes to Inbox.

Even with that, 1-2 spams/day manage to break in my Inbox.

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Mail Filtering

Quick Site Search