In the quest to someone, anyone to click through to their advertising-laden (and often pr0n-laden) sites, filthy spammers have taken to spamming referrer logs. I check through my log files periodically (in fact, Textpattern stores this data for me too) to see if anyone else has posted something in response to my posts. It’s like trackback, but retro. Of course spamming referrer logs is nothing new, but it seems to be getting to the point now where it’s really becoming a nuisance.
The process is simple. The spammer writes a script to trawl through a list of URLs (something like the home page of weblogs.com is ideal) and performs an HTTP GET on each site, setting the address of their own site in the referrer header. This results in an entry in the site’s access logs showing that, apparently, the spammer’s site is linking to you. Of course, when the owner of a site goes through and clicks to see who’s linking to them, they’re driven directly to the spammer’s site. Often, they’ll register interesting sounding domain names to throw you off the scent – but of course they all point to the same place.
This weekend my site was hit will a pretty intensive campaign of referrer log spamming – I was getting several an hour on various domain names all pointing to the same site. Fortunately for me (and stupidly on the part of the spammer) all the hits were originating from the same host name – a collocated server with an ISP called Jupiter Hosting. The answer was simple:
deny from jupiterhosting.com
Adding this to my .htaccess file results in my site not being served to any requests originating from Jupiter Hosting. So that blocks the spammer, but also every other Jupiter Hosting customer, right? Well, I could be more specific in my rule but I’m not feeling that charitable. If ISPs like Jupiter Hosting don’t take responsibility for malicious activity originating directly from their networks, then I’m more than happy to block them. (Yeah, I know I’m evil).



Comments
* With one way requests (the spammers don’t need/want a response), it easy for spammers to forge IPs.
* I don’t know the scale of the attack, but recognizing something like that automatically is usually hard, and can’t be done 100%. Since it is regular HTTP-Requests, that don’t look anymore suspicious than what your browser does, when surfing regularly.
For small-scale sites pre-moderation of Referrer-Adresses (by viewing a summary for example), might be a possibility. It doesn’t scale well, though.
I’m also getting lots of referrer spam from 66.230.218.67 and .66.
There’s a fairly simply work around to this. Make your stats packages check that a site that supposedly refers to you actually contains a link to you before it will tally it up. Sure it’s a hit to their site, but not one that’s going to be loading any ads, etc.
Of course that could put more of a burden on your host, especially if all your stats packages work incrementally and at the same time (eg every hour), but I’m sure there are ways to work around that as well.
It’s very frustrating, however, and monitoring access logs, finding IPs, researching for abuse, blocking via .htaccess, etc. all is very time consuming.
I have found that since getting web hosting from TextDrive ;-) that my referral spam has decreased, but since I drive pretty much zero traffic I’m a bad judge.
I’ve begun to store abusive IPs and hosts in a ‘temporary ban’ area of my .htaccess files. I’ve also made all referrer logs private. Other than that – it’s just shootin at flies with spitballs (stupid analogy acknowledged).
There are some domains you could immediately whitelist, of course, which might help a little. then again, if you did implement a whitelist blacklist system, and that became a widescale solution, it’d be worth it to these people to send the bot around and have a page which does include a link to the sites they’re spamming, only to replace it with advertising encrusted crap when real browsers go to it.
Still, while I don’t filter my logs, I do, when a php page picks up a referrer, go and spider that site, and grab a quote, verify the link is there, etc. And that’s probably more intensive processing than a simple verification would require.
Of course I started logging the details and a surprising number of referrers are blogspot sites with some adult references in the name. So now I also filter a lot of four letter words in the referrer. It has little effect because the refererspammers don’t check their errorlogs, but it feels like I’m doing something.
SInce my referrers are public, and I like it that way, I keep the search engines off with robots.txt NOINDEX and NOFOLLOW, thus denying any as-yet un-blacklisted referrer spammers the linkage advantage.