[Spambayes] Outlook plugin - training
Tim Peters
tim.one@comcast.net
Fri Nov 8 09:15:24 2002
[Tim]
> ...
> I'm going to try an experiment: I'm going to wipe my home database and
> start over from scratch, training first on one ham and one spam, then
> only on mistakes and unsures. This should be fun <wink>.
It is! The msg from me I'm replying to here scored 94 (solid spam). I've
now got 5 ham and 5 spam in my training set, most of the new ones from
Unsures. The latest spam was a blatant false negative, from Hapax City:
'*H*' 0.998601
'*S*' 8.60833e-005
'can' 0.0652174
'have' 0.0652174
"don't" 0.0918367
'never' 0.0918367
'number' 0.0918367
'one' 0.0918367
'what' 0.0918367
'"the' 0.155172 ham hapaxes from here
'able' 0.155172
'about' 0.155172
'against' 0.155172
'also' 0.155172
'any' 0.155172
'anything' 0.155172
'back' 0.155172
'because' 0.155172
'been' 0.155172
'check' 0.155172
'even' 0.155172
'find' 0.155172
'found' 0.155172
'heard' 0.155172
'how' 0.155172
'into' 0.155172
"it's" 0.155172
'more' 0.155172
'needed' 0.155172
'other' 0.155172
'out' 0.155172
'own' 0.155172
'people' 0.155172
'skip:a 10' 0.155172
'skip:i 10' 0.155172
'special' 0.155172
'subject:.' 0.155172
'subject:: ' 0.155172
'their' 0.155172
'them.' 0.155172
'they' 0.155172
'those' 0.155172
'time' 0.155172
'time.' 0.155172
'unsubscribe' 0.155172
'until' 0.155172
'useful' 0.155172
'using' 0.155172 to here
'and' 0.275281
'for' 0.275281
'subject: ' 0.275281
'you' 0.275281
'from' 0.355072
'not' 0.355072
'off' 0.355072
'our' 0.355072
'when' 0.355072
'new' 0.644928
'see' 0.644928
'url:gif' 0.724719
'url:www' 0.724719
'call' 0.844828 spam hapaxes from here
'contact' 0.844828
'credit' 0.844828
'email.' 0.844828
'every' 0.844828
'further' 0.844828
'header:Received:2' 0.844828
'made' 0.844828
'more!' 0.844828
'most' 0.844828
'now' 0.844828
'plus,' 0.844828
'receive' 0.844828
'search' 0.844828
'skip:1 10' 0.844828
'url:jpg' 0.844828 to here
'email' 0.908163
I think I've established that 5+5 isn't enough for great results <snort>.
However, 80% of its decisions have been correct so far!
More information about the Spambayes
mailing list