[Spambayes] A proposal for mail filtering

Paul Wagland spambayes at kungfoocoder.org
Wed Dec 3 17:57:42 EST 2003

On Dec 3, 2003, at 23:34, Tim Stone wrote:

> On Wed, 3 Dec 2003 23:16:29 +0100, Paul Wagland 
> <spambayes at kungfoocoder.org> wrote:
>> My suggestion is to implement some form of mistakes based training.
> This strategy has been debated on this list ad-infinitum.  The people 
> who currently write code for spambayes by-and-large believe that it is 
> less valid than a training regimen that includes positive 
> reinforcement of correct decisions made by spambayes, as well as 
> proper training of unsures and correction of mistakes.

  Hmm. OK, I can see why this might be "interesting". I think that a 
rigourous testing regime for this could be quote difficult to setup... 
:-) I can see why it can be useful to "re-enforce" good training, since 
then it can help to pick up a slowly evolving corpus of HAM or SPAM. 
The reason that I am suggesting this is that I would really like to be 
able to just "set and forget" this thing :-) And so I would like some 
form of automatic training that is more optimal than the current 
built-in default, since that for most people is going to suffer from 
some horrible kind of sideways skew towards high SPAM or HAM counts.

Perhaps it might be interesting to have another set of regions for the 
HAM/SPAM probabilities that we train on. Then we could positively 
re-enforce the database with known very safe HAM/SPAM, or maybe try to 
positively re-enforce it with marginally good HAM/SPAM (figuring that 
this would lead to the best overall improvement)

The other thing that I have been noticing is that most people seem to 
say that a low message count is good. If I positively train on all my 
HAM/SPAM then I very quickly get quite large message counts. Maybe then 
we need some way to "retire" old tokens and/or messages. Something that 
I know cannot currently be done since we don't store any dates with the 
token information.

Anyway, as to me submitting code... I will look into it, but am 
currently busy trying to get a FLAC codec for quicktime to work ;-)


