[spambayes-dev] RE: [Spambayes] How low can you go?

T. Alexander Popiel popiel at wolfskeep.com
Fri Dec 19 11:55:37 EST 2003


In message:  <LNBBLJKPBEHFEDALKOLCIEBMHOAB.tim.one at comcast.net>
             "Tim Peters" <tim.one at comcast.net> writes:
>
>> Sounds like _you're_ arguing for expiration of whole messages :)
>
>Oh yes, I do want that -- eventually.  We have no experience with that in
>this project, though; we have a lot of experience with the consequences of
>hapaxes, and I have no fears remaining about picking on them.

Actually, there have been experiments done (by me) with expiry of whole
messages.  I invite you to look at the 'expire4months' regime for my
incremental testing harness.  Performance was worse than remembering
everything, but significantly better than mistake-based training (with
the 'fpfnunsure' regime).

I have not done any experiments with just nuking hapaxes; I didn't see
any reason to do a partial job instead of a full one.

>> I know you're not arguing that, but if there were bidirectional msg_id
>> <-> feature_ID maps, it would be fairly easy to expire whole
>> messages.

>> That would obviate the need to track last time seen for every token.
>
>Only if you don't want also to be able to expire tokens on their own.

No... just find the most recent message that the token appeared in,
which would be a quick search through a few message times.  A really
quick search if you're only looking to expire hapaxes.

- Alex



More information about the Spambayes mailing list