[spambayes-dev] RE: [Spambayes] How low can you go?
Skip Montanaro
skip at pobox.com
Mon Dec 22 13:15:36 EST 2003
>> [Tim Peters]
>> I don't want to expire a hapax if it's been used recently in
>> *scoring*. Message times can't distinguish used from unused
>> features. If you're doing train-on-everything (with or without
>> whole-msg expiration), a hapax used in scoring becomes a non-hapax
>> the first time it's used in scoring. For
Seth> But for really unusual messages of the type you were concerned
Seth> about, this may only happen once a year, or so, which is too long
Seth> for a hapax-expiration scheme.
Under the heading of "practicality beats purity"...
If you know a given type of message is ham but is seen infrequently, train
on it twice. That makes sure none of its tokens are hapaxes, and are thus
never candidates for deletion.
Hmmm... That violates my "never train on a message twice" dictum.
Skip
More information about the spambayes-dev
mailing list