[spambayes-dev] RE: [Spambayes] How low can you go?

Skip Montanaro skip at pobox.com
Mon Dec 22 13:15:36 EST 2003


    >> [Tim Peters]
    >> I don't want to expire a hapax if it's been used recently in
    >> *scoring*.  Message times can't distinguish used from unused
    >> features.  If you're doing train-on-everything (with or without
    >> whole-msg expiration), a hapax used in scoring becomes a non-hapax
    >> the first time it's used in scoring.  For

    Seth> But for really unusual messages of the type you were concerned
    Seth> about, this may only happen once a year, or so, which is too long
    Seth> for a hapax-expiration scheme.

Under the heading of "practicality beats purity"...

If you know a given type of message is ham but is seen infrequently, train
on it twice.  That makes sure none of its tokens are hapaxes, and are thus
never candidates for deletion.

Hmmm...  That violates my "never train on a message twice" dictum.

Skip



More information about the spambayes-dev mailing list