[spambayes-dev] RE: [Spambayes] How low can you go?
T. Alexander Popiel
popiel at wolfskeep.com
Fri Dec 19 11:55:37 EST 2003
In message: <LNBBLJKPBEHFEDALKOLCIEBMHOAB.tim.one at comcast.net>
"Tim Peters" <tim.one at comcast.net> writes:
>> Sounds like _you're_ arguing for expiration of whole messages :)
>Oh yes, I do want that -- eventually. We have no experience with that in
>this project, though; we have a lot of experience with the consequences of
>hapaxes, and I have no fears remaining about picking on them.
Actually, there have been experiments done (by me) with expiry of whole
messages. I invite you to look at the 'expire4months' regime for my
incremental testing harness. Performance was worse than remembering
everything, but significantly better than mistake-based training (with
the 'fpfnunsure' regime).
I have not done any experiments with just nuking hapaxes; I didn't see
any reason to do a partial job instead of a full one.
>> I know you're not arguing that, but if there were bidirectional msg_id
>> <-> feature_ID maps, it would be fairly easy to expire whole
>> That would obviate the need to track last time seen for every token.
>Only if you don't want also to be able to expire tokens on their own.
No... just find the most recent message that the token appeared in,
which would be a quick search through a few message times. A really
quick search if you're only looking to expire hapaxes.
More information about the Spambayes