[Spambayes] "No filterable mail items are selected."

skip at pobox.com skip at pobox.com
Tue Oct 14 22:10:39 CEST 2008


    Jesse> (The quickness with which SpamBayes learns to discern spam
    Jesse> impresses me, so I retrain periodically just to watch it learn.
    Jesse> Perhaps I am in need of professional help.)

In my experience you will eventually do one of the following:

    * incorrectly train some hams as spam or vice versa

    * Nominally train some messages correctly, but have that training tip
      the balance of some common clues in the wrong direction.

Training from scratch periodically erases these mistakes and meta-mistakes.

A brief explanation of the second item (the "meta-mistake").  We are all
used to seeing these lines at the bottom of each list message:

    SpamBayes at python.org
    http://mail.python.org/mailman/listinfo/spambayes
    Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
    Check the FAQ before asking: http://spambayes.sf.net/faq.html

All those words are clues.  Consider "FAQ".  That's probably a pretty hammy
clue because it's seen in lots of SpamBayes messages, several of which
you've trained on.  Now suppose we get a couple messages with just a single
similar line:

    Enlargement: http://some.where/zowie

or

    Please her: http:/some.where.else/wow

It trains on what it sees, including the tokens generated from the one line
content line in those messages and the clues in the footer of every
SpamBayes message.  (There are often lots of clues in the message headers
you typically don't see.)  If your training database is kind of small (as
mine tends to be) after training on a couple spams which sneak through "FAQ"
and the other clues in the footer and message headers might start to look
kind of spammy.  So, even though you were "correct" to train that message as
spam, it might have the effect of "overwhelming" the scoring of some good
messages, pushing them into the unsure category (or worse - generating a
false positive).

Skip


More information about the SpamBayes mailing list