[Spambayes] understanding high false negative rate
Jeremy Hylton
jeremy@alum.mit.edu
Sat, 7 Sep 2002 16:15:03 -0400
Here's clarification of why I did:
First test results using tokenizer.Tokenizer.tokenize_headers()
unmodified.
Training on 644 hams & 557 spams
0.000 10.413
1.398 6.104
1.398 5.027
Training on 644 hams & 557 spams
0.000 8.259
1.242 2.873
1.242 5.745
Training on 644 hams & 557 spams
1.398 5.206
1.398 4.488
0.000 9.336
Training on 644 hams & 557 spams
1.553 5.206
1.553 5.027
0.000 9.874
total false pos 139 5.39596273292
total false neg 970 43.5368043088
Second test results using mboxtest.MyTokenizer.tokenize_headers().
This uses all headers except Received, Data, and X-From_.
Training on 644 hams & 557 spams
0.000 7.540
0.932 4.847
0.932 3.232
Training on 644 hams & 557 spams
0.000 7.181
0.621 2.873
0.621 4.847
Training on 644 hams & 557 spams
1.087 4.129
1.087 3.052
0.000 6.822
Training on 644 hams & 557 spams
0.776 3.411
0.776 3.411
0.000 6.463
total false pos 97 3.76552795031
total false neg 738 33.1238779174
Jeremy