All in the <head> – Ponderings and code by Drew McLellan –

Preventing Comment Spam

Spam in blog comments is a very real problem for a lot of bloggers, and in order to keep their sites spam-free, we’re seeing a good number of people take steps to prevent spam being posted. Some have taken to switching comments off after a set time period, others require registration, and some have turned comments off altogther. More behind-the-scenes techniques involve complete comment moderation, shared blacklists and such. Nearly all methods restrict either the freedom of the site owner in running their site how the want to, or the interaction of those who visit it.

The ‘smart’ spammers have figured out that popular blogging tools like MovableType use the same comment field names on every site, so writing a bot to post using those field names is pretty straightforward. Less advanced (or more authentic, depending on how you see it) spammers simply cruse and post manually.

Although this may be a recent phenomenon for blogs, the problem is combinations of two old friends – email spam and forum trolls. Surely then we can reuse what we already know about these two problems to help devise solutions for comment spam.

Something that comment spam often has in common with email spam is its content matter. For email spam we use keyword filters to pick up likely spam and flag it for attention. So how about we do the same for comment spam. If it triggers certain keywords, flag it for moderation and hide the comment until it’s approved.

Of course, not all comment spam has a direct message. A lot of it just says stuff like I agree and then links to the site the spammer is trying to promote. Keyword matching is no use here, as we’re looking at the quality of the post rather than the words used. This is a problem solved in many discussion forums, mailing lists and other online communities by moderating all new users until they are proven trustworthy. This is usually applied to some sort of user account or list subscription that isn’t desirable for a blog, but so long as you don’t publish commenter’s email addresses on the site (not a bad idea in itself) but require the user to comment with one, you can simply tie the moderation to the email address. The first time an address is used, the comment gets moderated – if approved no need to be checked again.

Both these techniques (ideally used together) might give the site owner the moderation options without forcing moderation on all comments, killing conversation and added extra admin overheads.