[Spambayes] expiration ideas.
Anthony Baxter
anthony@interlink.com.au
Mon Oct 21 07:06:53 2002
>>> Tim Peters wrote
> OTOH, what's the purpose of expiration? I can think of two:
>
> 1. To reduce database size.
>
> 2. To accelerate adaptation to changes in ham and/or spam.
The former. I'm trying to think about how this could be deployed "in
the real world".
Note also that I'm not so much worried about adapting to spam as
adapting to changing ham patterns. I know that my own email changes
over time (for instance, until this project started, I doubt the word
"Nigerian" would have been considered a strong ham indicator for me :)
(somewhat off-topic, but related: I also suspect that if the spambayes
code is vulnerable to being deliberately sabotaged, it'll be the
tokeniser that's the weak point, not the classifier. For instance,
I already have a couple of persistent FNs with message bodies entirely
encoded in javascript. I don't want to think about having to decode
javascript or run it to check if something's spam.)
I'm somewhat nervous of the "purge all unique words" approach - one
obvious failing is that it means if you _are_ doing ongoing training,
you'd want to batch up a bunch of messages. I'm also not sure that
deliberately perverting the real world in that way isn't going against
the "stupid beats smart" meta-rule that's served us so far...
but-then-maybe-stupider-beats-stupid, too.
Anthony.