All in the <head>

– Ponderings & code by Drew McLellan –

– Live from The Internets since 2003 –

About

Referrer Log Spam

21 July 2004

In the quest to someone, anyone to click through to their advertising-laden (and often pr0n-laden) sites, filthy spammers have taken to spamming referrer logs. I check through my log files periodically (in fact, Textpattern stores this data for me too) to see if anyone else has posted something in response to my posts. It’s like trackback, but retro. Of course spamming referrer logs is nothing new, but it seems to be getting to the point now where it’s really becoming a nuisance.

The process is simple. The spammer writes a script to trawl through a list of URLs (something like the home page of weblogs.com is ideal) and performs an HTTP GET on each site, setting the address of their own site in the referrer header. This results in an entry in the site’s access logs showing that, apparently, the spammer’s site is linking to you. Of course, when the owner of a site goes through and clicks to see who’s linking to them, they’re driven directly to the spammer’s site. Often, they’ll register interesting sounding domain names to throw you off the scent – but of course they all point to the same place.

This weekend my site was hit will a pretty intensive campaign of referrer log spamming – I was getting several an hour on various domain names all pointing to the same site. Fortunately for me (and stupidly on the part of the spammer) all the hits were originating from the same host name – a collocated server with an ISP called Jupiter Hosting. The answer was simple:

deny from jupiterhosting.com

Adding this to my .htaccess file results in my site not being served to any requests originating from Jupiter Hosting. So that blocks the spammer, but also every other Jupiter Hosting customer, right? Well, I could be more specific in my rule but I’m not feeling that charitable. If ISPs like Jupiter Hosting don’t take responsibility for malicious activity originating directly from their networks, then I’m more than happy to block them. (Yeah, I know I’m evil).

- Drew McLellan

Comments

  1. § Sencer: Things to remember:

    * With one way requests (the spammers don’t need/want a response), it easy for spammers to forge IPs.
    * I don’t know the scale of the attack, but recognizing something like that automatically is usually hard, and can’t be done 100%. Since it is regular HTTP-Requests, that don’t look anymore suspicious than what your browser does, when surfing regularly.

    For small-scale sites pre-moderation of Referrer-Adresses (by viewing a summary for example), might be a possibility. It doesn’t scale well, though.
  2. § Manuzhai: I actually got a lot of referrer spam from probably the same host, some colo at jupiterhosting. I haven’t blocked it, but I did send email to abuse@jupiterhosting.com yesterday. Haven’t heard anything from them, though.

    I’m also getting lots of referrer spam from 66.230.218.67 and .66.
  3. § Lach: ‘Through you off the scent’?

    There’s a fairly simply work around to this. Make your stats packages check that a site that supposedly refers to you actually contains a link to you before it will tally it up. Sure it’s a hit to their site, but not one that’s going to be loading any ads, etc.

    Of course that could put more of a burden on your host, especially if all your stats packages work incrementally and at the same time (eg every hour), but I’m sure there are ways to work around that as well.
  4. § andrew: @Lach: With the amounts of referral spam that I’ve seen on some domains, this option would most likely drain your bandwidth.

    It’s very frustrating, however, and monitoring access logs, finding IPs, researching for abuse, blocking via .htaccess, etc. all is very time consuming.

    I have found that since getting web hosting from TextDrive ;-) that my referral spam has decreased, but since I drive pretty much zero traffic I’m a bad judge.

    I’ve begun to store abusive IPs and hosts in a ‘temporary ban’ area of my .htaccess files. I’ve also made all referrer logs private. Other than that – it’s just shootin at flies with spitballs (stupid analogy acknowledged).
  5. § Lach: Yes. Again, it’s hard to judge because i don’t get that much referrer spam or referrals apart from google et al, myself.

    There are some domains you could immediately whitelist, of course, which might help a little. then again, if you did implement a whitelist blacklist system, and that became a widescale solution, it’d be worth it to these people to send the bot around and have a page which does include a link to the sites they’re spamming, only to replace it with advertising encrusted crap when real browsers go to it.

    Still, while I don’t filter my logs, I do, when a php page picks up a referrer, go and spider that site, and grab a quote, verify the link is there, etc. And that’s probably more intensive processing than a simple verification would require.
  6. § jadwigo: I thought it was funny to send the people in my blackist a “HTTP/1.0 402 Payment Required”. With on the error page a nice little explanation for the ones that might be infected with some kind of rediretion trojan.

    Of course I started logging the details and a surprising number of referrers are blogspot sites with some adult references in the name. So now I also filter a lot of four letter words in the referrer. It has little effect because the refererspammers don’t check their errorlogs, but it feels like I’m doing something.
  7. § Paulo: I tried emailing Jupiterhosting about the abuse, but they don’t seem to be doing anything, so into my “deny from” referrer blacklist go the offending IPs.

    SInce my referrers are public, and I like it that way, I keep the search engines off with robots.txt NOINDEX and NOFOLLOW, thus denying any as-yet un-blacklisted referrer spammers the linkage advantage.
  8. § Jim Amos: I’ve found that the most effective way for me to prevent referrer spam is to simply block each ISP through my server’s Cpanel. Of course, I have to wait until one of them shows up before I can add them to my block-list, but it’s not too much hassle. It usually takes spammers a couple of weeks before they come back under a new ISP.
  9. § charon: i would block requests that dont have a link to my page on the referrer page. but what about links that come from a POST or randomly generated link lists or blogrolls?

Photographs

Work With Me

edgeofmyseat.com logo

At edgeofmyseat.com we build custom content management systems, ecommerce solutions and develop web apps.

Follow me

Recent Links

Affiliation

  • Web Standards Project
  • Britpack
  • 24 ways

I made

Perch - a really little cms

About Drew McLellan

Photo of Drew McLellan

Drew McLellan (@drewm) has been hacking on the web since around 1996 following an unfortunate incident with a margarine tub. Since then he’s spread himself between both front- and back-end development projects, and now is Director and Senior Web Developer at edgeofmyseat.com in Maidenhead, UK (GEO: 51.5217, -0.7177). Prior to this, Drew was a Web Developer for Yahoo!, and before that primarily worked as a technical lead within design and branding agencies for clients such as Nissan, Goodyear Dunlop, Siemens/Bosch, Cadburys, ICI Dulux and Virgin.net. Somewhere along the way, Drew managed to get himself embroiled with Dreamweaver and was made an early Macromedia Evangelist for that product. This lead to book deals, public appearances, fame, glory, and his eventual downfall.

Picking himself up again, Drew is now a strong advocate for best practises, and stood as Group Lead for The Web Standards Project 2006-08. He has had articles published by A List Apart, Adobe, and O’Reilly Media’s XML.com, mostly due to mistaken identity. Drew is a proponent of the lower-case semantic web, and is currently expending energies in the direction of the microformats movement, with particular interests in making parsers an off-the-shelf commodity and developing simple UI conventions. He writes here at all in the head and, with a little help from his friends, at 24 ways.