Question: Preventing Apache referer spam?

Posted by Steve on Mon 29 Aug 2005 at 16:00

Referer spam is something that has only affected weblogs until recently. However it is now on the rise generally and many webservers are seeing incoming requests with HTTP Referer spam.

Referer spam is simply described as incoming requests to your webserver with a website being listed in the "referer" field. The intention of submitting requests is that these logs will be archived somewhere and that search engines will spider these logs and increase the score of the spammed websites.

There are two popular approaches to dealing with Referer spam on Apache webservers - both of which require you to maintain a blacklist of referer strings, or IP addresses, you wish to ignore.

  • Using mod_rewrite to redirect bogus requests.
  • Using mod_security to deny incoming requests.

Each of these approaches suffers from the same problem: You must have a list of the invalid referers to block.

For example with mod_security you can block referers which mention "poker" with rules like this:

SecFilterSelective "HTTP_REFERER" "(holdem|poker|casino)"

This will match on all incoming requests which have a referer string containing the words "poker", "holdem", or "casino".

The mod_rewrite equivilent is :


  RewriteEngine   on
  RewriteCond %{HTTP_REFERER} poker  [OR]
  RewriteCond %{HTTP_REFERER} holdem [OR]
  RewriteCond %{HTTP_REFERER} casino 
  RewriteRule .* - [F,L]

Both of these solutions are simple to setup if you're using one of the modules already. (We've previously covered installing mod_security and enabling mod_rewrite for Apache/Apache2.)

The real problem is keeping the blacklists/rules current.

So, my question is how do you deal with this problem?

Share/Save/Bookmark


Posted by blackm (212.202.xx.xx) on Mon 29 Aug 2005 at 18:03
[ Send Message | View Weblogs ]
Hi Steve,

referer spam is a problem on my apache but I havn't dealed with this yet. So it's a good time to start now.

My logfiles are evaluated with webalizer. So I have a list with all referer. That list could be compared with a blacklist (my mod_security rules) and a white list. Referer that are on none of this lists are mailed to me and I put them on one of this lists.

Maintaining this list shouldn't be that much work.

by, Martin

[ Parent | Reply to this comment ]

Posted by Anonymous (82.227.xx.xx) on Mon 29 Aug 2005 at 18:09
I have a good list of faked referrers at http://www.glop.org/referrers/
Feel free to use it ;)

[ Parent | Reply to this comment ]

Posted by analogue (82.227.xx.xx) on Mon 29 Aug 2005 at 18:28
[ Send Message ]
As I previously posted as anonymous, I suspect my comment to have been moderated.

I have a 15k+ list of spammed referrers at the bottom of http://www.glop.org/referrers/
You can rip it and use this list to start something about spam referrers. A good start is also to filter out all the referrers coming from client with an empty user-agent string, They are spam 99% of the time.

I also have a list of IP sending those referrers but I don't feel like spreading it as it will put some legit clients on the side =(

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Mon 29 Aug 2005 at 19:12
[ Send Message | View Steve's Scratchpad | View Weblogs ]

(Anonymous comments are not moderated currently; if abuse becomes a problem this might change, but I have no immediate plans to break the current status quo).

Thanks for the list pointer, I see a few more offered as well. I guess I have the idea that once I start using a blacklist I'm going to have to keep adding to it - and that strikes me as fighting a losing battle.

As for empty User-Agent headers I've honestly not noticed that. Most of the bogus referers I've seen have spoofed Mozilla, or IE.

One big giveaway is that they will typically request only the index page - repeatedly - and never fetch the favicon.ico / stylesheets / etc.

I did wonder if I could auto-blocklist clients which request only a single page (more than a given number of times) without requesting any of the referenced content - although I guess this would drop things like googlebot...

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by analogue (82.227.xx.xx) on Tue 30 Aug 2005 at 11:42
[ Send Message ]
As for empty User-Agent headers I've honestly not noticed that. Most of the bogus referers I've seen have spoofed Mozilla, or IE.
Out of the 127108 spammed referrers I got until today, 15128 were sent with en empty user-agent string. That's a nice first 10% filter ;)

[ Parent | Reply to this comment ]

Posted by SanctimoniousHypocrite (12.221.xx.xx) on Mon 29 Aug 2005 at 19:28
[ Send Message | View SanctimoniousHypocrite's Scratchpad | View Weblogs ]
I also use webalizer, but it's (I hope) only accessible from within my network. Under what circumstances would I want my access.log to be generally available?

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Mon 29 Aug 2005 at 19:44
[ Send Message | View Steve's Scratchpad | View Weblogs ]

If you host domains for friends who want to be able to see stats online? (I guess Basic-Auth could prevent this from being public though).

Or to share with other company members? (Although in that case you can use Apache's security directives to limit access to IP addresses in your LAN).

However whether the statistics are shared is pretty irrelevent. It appears to be the case that mass-referer spam requests are being sent to hosts which don't share their stats.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by SanctimoniousHypocrite (12.221.xx.xx) on Mon 29 Aug 2005 at 20:28
[ Send Message | View SanctimoniousHypocrite's Scratchpad | View Weblogs ]

mass-referer spam requests are being sent to hosts which don't share their stats

I've seen them in my log files. I guess it's the nature of spam that it's sent where it isn't wanted by the recipient, and where it isn't even useful to the sender.

In partial answer to my own question, these things do add cruft to my log file and pollute my stats, so I'm happy to know of a way to keep them out. Plus, anything that makes life harder for a spammer is a good thing.

[ Parent | Reply to this comment ]

Posted by eszpee (81.196.xx.xx) on Mon 29 Aug 2005 at 18:19
[ Send Message ]
Maybe this could be a good source, it's often updated:

http://www.jayallen.org/comment_spam/blacklist.txt

(actually it's for the MTBlackList antispam plugin for MovableType, but I think the spam issues for comments are very close to the ones of referrer spam)

Some small modification to the /etc/init.d/apache script to fetch the updated version of this list and generate an /etc/apache/conf.d/refererspam.conf file at every restart might be a good idea... or anything else?

--
root.log

[ Parent | Reply to this comment ]

Posted by fsateler (201.214.xx.xx) on Mon 29 Aug 2005 at 21:18
[ Send Message | View Weblogs ]
How come this kind of spam is useful for the spammers? I thought search engines created their own referral databases scanning the web pages directly.
--------
Felipe Sateler

[ Parent | Reply to this comment ]

Posted by Steve (69.13.xx.xx) on Mon 29 Aug 2005 at 21:22
[ Send Message | View Steve's Scratchpad | View Weblogs ]

It seems to have started shortly after it became well-known that Google used inbound links as a measure of website popularity / importance.

The intention is that if the "fake referers" get archived publically then search engine spiders will count those links when assessing the relevence of the target - and the site rank will be boosted artificially.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Mon 29 Aug 2005 at 21:23
[ Send Message | View Weblogs ]
No spam problem, but one client is having an issue with email forms being automatically clicked. Looks like some sort of failed abuse attempt, but nothing "obvious", and we don't usually log that much detail on the server in question, and he gets blank emails.

Some of the source IPs are "well known" open proxies.

Is there a simple way to use the DNS accessible lists of open proxies in Apache2 I wonder? I'm thinking look up would be too slow for HTTP.

Obviously we can spot all the "blank" messages for this specific hosting client, but I noticed the same thing happening to forms on sites owned by other clients.

[ Parent | Reply to this comment ]

Posted by Steve (69.13.xx.xx) on Mon 29 Aug 2005 at 21:26
[ Send Message | View Steve's Scratchpad | View Weblogs ]

There's not any obvious way to do this, short of adding IPs to a blacklist / firewall manually.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by sno (62.254.xx.xx) on Tue 30 Aug 2005 at 00:01
[ Send Message | View Weblogs ]
nothing to add except thanks for the links

cheers

[ Parent | Reply to this comment ]

Posted by SanctimoniousHypocrite (12.221.xx.xx) on Tue 30 Aug 2005 at 15:46
[ Send Message | View SanctimoniousHypocrite's Scratchpad | View Weblogs ]

I implemented the mod_security filter. It worked but now there's an entry in access.log showing the spam url with a 500 error, and an entry in audit_log showing the spam url that was kept out. So by implementing this I now get two spam referrer entries. That's kind of amusing. I guess I should tell mod_security not to log those:

SecFilterSelective "HTTP_REFERER" "(holdem|poker|casino)" deny,nolog,status:500

I think this will stop the referrer from appearing in audit_log, but how do I stop the entry from appearing in access.log? Maybe if they keep getting an error they'll stop. I also wonder, is a 500 error the best one to have mod_security generate? Or is there another error that will more effectively discourage the spammers? Or should it just fail silently?

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Tue 30 Aug 2005 at 18:15
[ Send Message | View Steve's Scratchpad | View Weblogs ]

You can avoid logging particular status codes if you like - previously mentioned here briefly.

But that is global for a host's logging, and it might not make sense, because you might want to ensure you see all legitimate 500's.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by dopehouse (84.130.xx.xx) on Sat 3 Sep 2005 at 19:47
[ Send Message | View dopehouse's Scratchpad ]
I think a good way is to block the webstatistics from being indexed by any searchengines. I know that's a long time work, but that should be the right way.

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Sat 3 Sep 2005 at 20:33
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I already do that ... but this doesn't prevent the malicious requests from coming in.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by dopehouse (84.130.xx.xx) on Sat 3 Sep 2005 at 21:32
[ Send Message | View dopehouse's Scratchpad ]
That's right. But if 90% of the webmasters will do so, than the spamming will be reduced. *I think*

[ Parent | Reply to this comment ]

Posted by Anonymous (209.149.xx.xx) on Fri 9 Sep 2005 at 14:42
I've got an article that is a brief tour of what I've been able to do (and not been able to do) , here.

[ Parent | Reply to this comment ]

Posted by Anonymous (66.93.xx.xx) on Tue 18 Oct 2005 at 23:40
ReferrerCop is the answer to all your problems.

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Related Links

Quick Site Search