[Catalog-sig] SpamBayes filter for submissions?

skip at pobox.com skip at pobox.com
Wed Jun 10 04:51:32 CEST 2009


    Martin> I'm fairly skeptical about SpamBayes, so no interest from me.

    JP> Ditto, at least for this case.  How about a "This is spam" button
    JP> that logged in users can click?  Clicking it notifies an admin who
    JP> can take the appropriate action.

There's nothing wrong with SpamBayes per se, but if you don't get much spam
it's certainly overkill.  I don't watch PyPI very much so I don't know how
bad a spamming problem it has.  I only noticed because someone complained
about it on comp.lang.python.

If you have enough bad and good inputs in text format you can generally
create a pretty good detector with SpamBayes.  I built it into a simple web
proxy several years ago to discourage my then pre-teen son from visiting
websites he shouldn't have.  Some guy wrote a "stupidity filter" for YouTube
using SpamBayes:

    http://userscripts.org/scripts/show/13839

which I thought was kind of cool.

It's also sitting there fairly quietly in the python.org Roundup instance.
This is probably the example that makes Martin skeptical of its capability.
It turned out that a simple timer Martin implemented requiring a minimum
time between registration and first submission was pretty much all that was
necessary to block the bulk of the spammish submissions.  Consequently there
is simply not much spam for it to munch on.  I was training the filter for
awhile but haven't done that in a month or so.

I'm finding it less useful for my own email these days.  The large email
providers like Gmail and Yahoo! recently forced the email forwarding
services like pobox.com to enable their spam filtering systems for all their
users.  (It had always left it off.)  Consequently, most spam is rejected
during the SMTP session using dns blacklists, greylisting and such.  Not
much spam makes it to my Gmail mailbox, and even less to my laptop, just a
couple each day.

Skip


More information about the Catalog-SIG mailing list