Tag Archives: bayesian

Akismet Losing Its Mojo?

I have long praised the free spam fighting service Akismet but yesterday a horribly obvious spam comment wasn’t filtered which is very unusual. I’ll include the comment here so people can see what I’m referring to:

Name: Armond
Site: http://groups.google.com/group/otekal/web/free-bestiality-sex-stories
Message: free bestiality sex stories…
accessories distributed at most major retailers for such

Automattic have never disclosed with any specificity how the internals of Akismet work as a service, however it is more than reasonable to assume that Bayesian filtering is in their spam fighting tool belt. For those that aren’t aware, Bayesian filtering works by learning or being told what messages are spam and then analyses each word with those spam versus non spam messages. If a given message contains words contained in spam emails above a threshold, the message is considered spam.

Given that it is a learning based system, so to speak, I find it hard to believe that the words beastiality, sex in the URL and within the body of the comment aren’t throwing up great big red flags. I’m going to put this slip up down to one of two things:

  1. I was one of the first people to receive and register that particular spam signature
  2. When the comment was submitted, the Akismet service wasn’t able to be contacted

I’m heavily leaning towards the latter, for no other reason than there are literally hundreds of thousands of blogs on the internet – the likelihood I was one of the first for that particular spam signature is highly unlikely.

Long live Akismet!