[spambayes-dev] Any MoinMoin experts here?

skip at pobox.com skip at pobox.com
Sat Jan 27 03:47:11 CET 2007


Is there anyone here with experience working with the MoinMoin code base?  I
think using SpamBayes to deflect spam instead of the current
BadContent/LocalBadContent approach would be useful.  I wrote a couple
messages to the moin-users mailing list, but received no responses.  (In
scanning the archive I don't see my message.  Must have disappeared in a
black hole.)  In case someone's interested, here's what I wrote in my second
post:

    We all know wikis get spammed.  I'm not up-to-speed on the latest
    versions of MoinMoin, but I think the concept used at least through the
    1.3 series (the use of BadContent and LocalBadContent pages) is
    fundamentally flawed since it relies on the users to manually update
    "bad" words.  You're always trying to catch up with the spammers.

    Instead, let me suggest that you incorporate a SpamBayes-based
    classifier into MoinMoin.  I did this recently for a couple other
    websites I manage (Mojam and Musi-Cal - not wikis).  It worked
    marvelously there.  I now reject 100% of the spam submissions and also
    catch submission mistakes by good users that I would never have caught
    before.

    Here's how I envision it working.  Whenever a form submission happens
    the new page is scored against the current SpamBayes database.  If it
    scores as possible or probable spam, it is automatically reverted back
    to the last revision that scores as okay, and the full URL for that
    revision is mailed to all people in AdminGroup.  An admin reviews that
    URL.  If it's okay, the URL is added to the HamPages page.  If not, it's
    added to the SpamPages page (both suitably protected for AdminGroup
    write only and not themselves checked by SpamBayes).  Whenever those
    pages are saved the entire database is retrained from scratch.  This
    should not generally be a problem, as there will probably only be a few
    pages in the database, so retraining should be quick.  It should also be
    a relatively rare occurrence.  If the suspect page was actually ham,
    after retraining, score it again.  It should score as ham now.  If so,
    just revert to it.  If not, add it to the HamPages page a second time.
    I'm not entirely sure how to handle new pages which are spam, but I
    think you should be able to automatically DeletePage them, then revive
    them later if they turn out to be good.

    This all said, I can help from the SpamBayes side of things (write the
    tokenizer, suggest some synthetic tokens that might help improve the
    discrimination of ham and spam), but I'm not familiar with the MoinMoin
    code base, certainly not the latest versions.  It's unlikely that I
    could implement it quickly on that side of things.  If someone familiar
    with MoinMoin's code base would like to team up with me on this, let me
    know.  Together we should be able to knock this off very quickly.

Skip


More information about the spambayes-dev mailing list