[Tracker-discuss] On spammish submissions

skip at pobox.com skip at pobox.com
Wed Mar 21 20:04:49 CET 2007

    Paul> For my 2 cents worth, preventing registration (even including a
    Paul> manual review) is a much narrower job with less activity. Manual
    Paul> screening of submissions is not practical.

I didn't mean to imply that all submissions should be manually reviewed in
the SpamBayes-based system I proposed, only those that don't pass muster.
When SpamBayes is properly trained (knows a bit about what you think is and
is not acceptable) the distribution of its scores tend to be bimodal and
quite accurate.  You generally have obviously valid submissions (which no
human would normally need to review) or obviously spammy submissions (which
you would just delete and forget about.  The only cases I encountered which
required manual intervention were those where the submission was incorrectly
submitted (e.g., city misspelled, date omitted) or where the filter wasn't
sure about the submission (e.g., spam which is significantly different in
structure from previously seen spams).  In the former case someone else
suggested adding a SPAM button to the ticket page (only visible to admins).
In fact, unsure and spam submissions could (in theory) be accepted but not
displayed in lists or located by searches except by logged in admins.  Based
on the classification as an admin you would see:

    Classified as                       Button(s) displayed
    -------------                       -------------------
    Valid (score <= 0.15)               SPAM
    Unsure (0.15 < score < 0.60)        OKAY, SPAM
    Spam (0.60 <= score)                OKAY, DELETE

The OKAY button would classify the ticket as okay, retrain the database and
release the ticket for viewing by the general public.  The DELETE button
would ban the user who submitted the ticket then delete the ticket.  The
SPAM button would do the DELETE operation but also classify the submission
as spam and retrain the database.

    Paul> One should also add defenses on the mailing lists that are the
    Paul> TARGET of the mail from the tracker, since they need those anyway
    Paul> irregardless of the tracker.

That's the original use for which SpamBayes was designed.  I see no
particular reason you couldn't front those lists using it.  Many other lists
hosted on python.org are already filtered that way.  Just ask
postmaster at python.org to set things up.


More information about the Tracker-discuss mailing list