[Spambayes] A proposal for mail filtering
Paul Wagland
spambayes at kungfoocoder.org
Wed Dec 3 17:57:42 EST 2003
On Dec 3, 2003, at 23:34, Tim Stone wrote:
> On Wed, 3 Dec 2003 23:16:29 +0100, Paul Wagland
> <spambayes at kungfoocoder.org> wrote:
>
>> My suggestion is to implement some form of mistakes based training.
>
> This strategy has been debated on this list ad-infinitum. The people
> who currently write code for spambayes by-and-large believe that it is
> less valid than a training regimen that includes positive
> reinforcement of correct decisions made by spambayes, as well as
> proper training of unsures and correction of mistakes.
Hmm. OK, I can see why this might be "interesting". I think that a
rigourous testing regime for this could be quote difficult to setup...
:-) I can see why it can be useful to "re-enforce" good training, since
then it can help to pick up a slowly evolving corpus of HAM or SPAM.
The reason that I am suggesting this is that I would really like to be
able to just "set and forget" this thing :-) And so I would like some
form of automatic training that is more optimal than the current
built-in default, since that for most people is going to suffer from
some horrible kind of sideways skew towards high SPAM or HAM counts.
Perhaps it might be interesting to have another set of regions for the
HAM/SPAM probabilities that we train on. Then we could positively
re-enforce the database with known very safe HAM/SPAM, or maybe try to
positively re-enforce it with marginally good HAM/SPAM (figuring that
this would lead to the best overall improvement)
The other thing that I have been noticing is that most people seem to
say that a low message count is good. If I positively train on all my
HAM/SPAM then I very quickly get quite large message counts. Maybe then
we need some way to "retire" old tokens and/or messages. Something that
I know cannot currently be done since we don't store any dates with the
token information.
Anyway, as to me submitting code... I will look into it, but am
currently busy trying to get a FLAC codec for quicktime to work ;-)
Cheers,
Paul
More information about the Spambayes
mailing list