[Spambayes] Outlook plugin - training

Tim Peters tim.one@comcast.net
Fri Nov 8 09:15:24 2002


[Tim]
> ...
> I'm going to try an experiment:  I'm going to wipe my home database and
> start over from scratch, training first on one ham and one spam, then
> only on mistakes and unsures.  This should be fun <wink>.

It is!  The msg from me I'm replying to here scored 94 (solid spam).  I've
now got 5 ham and 5 spam in my training set, most of the new ones from
Unsures.  The latest spam was a blatant false negative, from Hapax City:

'*H*'                          0.998601
'*S*'                          8.60833e-005
'can'                          0.0652174
'have'                         0.0652174
"don't"                        0.0918367
'never'                        0.0918367
'number'                       0.0918367
'one'                          0.0918367
'what'                         0.0918367
'"the'                         0.155172   ham hapaxes from here
'able'                         0.155172
'about'                        0.155172
'against'                      0.155172
'also'                         0.155172
'any'                          0.155172
'anything'                     0.155172
'back'                         0.155172
'because'                      0.155172
'been'                         0.155172
'check'                        0.155172
'even'                         0.155172
'find'                         0.155172
'found'                        0.155172
'heard'                        0.155172
'how'                          0.155172
'into'                         0.155172
"it's"                         0.155172
'more'                         0.155172
'needed'                       0.155172
'other'                        0.155172
'out'                          0.155172
'own'                          0.155172
'people'                       0.155172
'skip:a 10'                    0.155172
'skip:i 10'                    0.155172
'special'                      0.155172
'subject:.'                    0.155172
'subject:: '                   0.155172
'their'                        0.155172
'them.'                        0.155172
'they'                         0.155172
'those'                        0.155172
'time'                         0.155172
'time.'                        0.155172
'unsubscribe'                  0.155172
'until'                        0.155172
'useful'                       0.155172
'using'                        0.155172   to here
'and'                          0.275281
'for'                          0.275281
'subject: '                    0.275281
'you'                          0.275281
'from'                         0.355072
'not'                          0.355072
'off'                          0.355072
'our'                          0.355072
'when'                         0.355072
'new'                          0.644928
'see'                          0.644928
'url:gif'                      0.724719
'url:www'                      0.724719
'call'                         0.844828   spam hapaxes from here
'contact'                      0.844828
'credit'                       0.844828
'email.'                       0.844828
'every'                        0.844828
'further'                      0.844828
'header:Received:2'            0.844828
'made'                         0.844828
'more!'                        0.844828
'most'                         0.844828
'now'                          0.844828
'plus,'                        0.844828
'receive'                      0.844828
'search'                       0.844828
'skip:1 10'                    0.844828
'url:jpg'                      0.844828   to here
'email'                        0.908163

I think I've established that 5+5 isn't enough for great results <snort>.
However, 80% of its decisions have been correct so far!




More information about the Spambayes mailing list