Plumbing Life's Depths - Maybe it's time to add a Bayesian classifier for comments (More spam... evil spam...)

Spambayes is just sitting there, beckoning me. The idea is that you would configure a Hammie instance as a Zope object pointing to an on-disk database (to avoid balooning the ZODB). You would train on all comments in your blog (assuming you have only good comments), then train on any spam that comes in.

The problem is that you wouldn't have a sufficiently large spam corpus from just one blog. You'd need to have a way of collecting spam samples from everyone running the system in order to produce a useful spam corpus. You'd need to make it a one-click operation to mark spam as spam, trigger a reclassification of any new comments based on new levels, and report the spam to some central location (where it would need to be reviewed before being accepted as spam).

You'd want to be able to review the judgements, of course. That'll require a whole new page somewhere.

Spambayes, incidentally, seems pretty darn simple to set up. I'm definitely going to have to do some playing with it when I get a chance.

Maybe it's time to add a Bayesian classifier for comments (More spam... evil spam...)
Written by Mike on May 18, 2005 in Snaking.

Comments

Pingbacks

Categories

Authors

Recent entries

Recent comments

Random entries