[Spambayes] Two Scheme Enter, One Scheme Leave.
Anthony Baxter
anthony@interlink.com.au
Thu, 26 Sep 2002 20:29:01 +1000
Quick bit on max_discriminators.
Using the settings from before, I get
md fp fn total
5 69 17 86 (cutoff 0.55)
15 11 30 41 (cutoff 0.625)
100 10 21 31
150 9 21 30
400 9 21 30
--
Next is Graham vs. best settings for Robinson, with
a bunch of different seeds. Letting the best cutoff
win in all cases for Graham, to compensate for the work
put into tuning Robinson in this trial (cutoff for graham's
nearly always 0.95, it seems)
(format is fp+fn=total)
seed Graham Robinson
1010101 41+12=53 19+21=40
12346 30+12=42 9+21=30
271170 29+16=45 25+16=41
432104 48+19=67 18+24=42
56743 29+13=42 13+17=30
774213 44+17=61 21+19=40
999111 39+21=60 20+26=46
mean 37+16=53 18+21=39
mean fp% 1.85% 0.9%
mean fn% 0.80% 1.05%
The final test, of course, is to feed the full corpus into the
two tweaked schemes. That's 11,000 spam and 19,000 ham split into
10 sets. Note that I was a _bit_ naughty here - these two tests
are done with the version of tokenizer.py that strips out and
discards style sheets.
Graham:
-> <stat> Ham scores for all runs: 19224 items; mean 0.78; sdev 8.59
-> <stat> Spam scores for all runs: 11303 items; mean 99.38; sdev 7.61
-> best cutoff for all runs: 0.8
-> with 139 fp + 74 fn = 213 mistakes
-> matched at 0.825 (139 fp + 74 fn)
-> matched at 0.85 (137 fp + 76 fn)
-> matched at 0.9 (135 fp + 78 fn)
total unique false pos 135
total unique false neg 78
average fp % 0.702258059105
average fn % 0.690069873164
Robinson, with cutoff to 0.65 (because I care more about fp than fn):
-> <stat> Ham scores for all runs: 19224 items; mean 31.30; sdev 13.09
-> <stat> Spam scores for all runs: 11303 items; mean 83.51; sdev 9.47
-> best cutoff for all runs: 0.6
-> with 76 fp + 108 fn = 184 mistakes
total unique false pos 10
total unique false neg 258
average fp % 0.0520210194464
average fn % 2.28248163189
This is a completely _spooky_ level of accuracy, given the sheer
unpleasantness of the corpus (and how badly the default Graham did
on it)
If the change to make the HTML re a bit greedier was in the code
it would fix 2 of my 10 fps.
I've appended the histograms from this run to the email.
Comparing the final two versions...
----
graham->robinson
test1_alls -> test2_alls
false positive percentages
0.572 0.000 won -100.00%
0.676 0.104 won -84.62%
0.780 0.052 won -93.33%
0.884 0.052 won -94.12%
0.728 0.052 won -92.86%
0.832 0.000 won -100.00%
0.676 0.104 won -84.62%
0.676 0.104 won -84.62%
0.572 0.000 won -100.00%
0.624 0.052 won -91.67%
won 10 times
tied 0 times
lost 0 times
total unique fp went from 135 to 10 won -92.59%
mean fp % went from 0.702258059105 to 0.0520210194464 won -92.59%
false negative percentages
0.088 1.062 lost +1106.82%
0.619 2.653 lost +328.59%
1.150 2.389 lost +107.74%
0.796 2.829 lost +255.40%
0.973 3.363 lost +245.63%
0.531 2.124 lost +300.00%
0.796 2.476 lost +211.06%
0.531 1.858 lost +249.91%
0.885 1.770 lost +100.00%
0.531 2.301 lost +333.33%
won 0 times
tied 0 times
lost 10 times
total unique fn went from 78 to 258 lost +230.77%
mean fn % went from 0.690069873164 to 2.28248163189 lost +230.76%
ham mean ham sdev
0.63 31.50 +4900.00% 7.79 12.94 +66.11%
0.67 30.95 +4519.40% 8.16 13.05 +59.93%
0.79 31.35 +3868.35% 8.75 13.24 +51.31%
1.00 31.47 +3047.00% 9.72 13.11 +34.88%
0.84 31.30 +3626.19% 8.99 13.13 +46.05%
0.91 31.12 +3319.78% 9.24 13.17 +42.53%
0.71 31.09 +4278.87% 8.19 13.11 +60.07%
0.85 31.90 +3652.94% 8.71 12.88 +47.88%
0.63 31.20 +4852.38% 7.67 13.10 +70.80%
0.78 31.14 +3892.31% 8.43 13.19 +56.47%
ham mean and sdev for all runs
0.78 31.30 +3912.82% 8.59 13.09 +52.39%
spam mean spam sdev
99.91 83.86 -16.06% 2.97 8.83 +197.31%
99.38 83.59 -15.89% 7.80 9.57 +22.69%
98.86 83.62 -15.42% 10.53 9.75 -7.41%
99.40 82.76 -16.74% 7.07 9.56 +35.22%
99.19 82.82 -16.50% 8.64 9.60 +11.11%
99.52 83.31 -16.29% 6.69 9.43 +40.96%
99.22 83.61 -15.73% 8.58 9.46 +10.26%
99.62 83.94 -15.74% 5.74 9.41 +63.94%
99.17 83.57 -15.73% 8.93 9.44 +5.71%
99.53 83.98 -15.62% 6.52 9.58 +46.93%
spam mean and sdev for all runs
99.38 83.51 -15.97% 7.61 9.47 +24.44%
ham/spam mean difference: 98.60 52.21 -46.39
----
I'd say that Robinson scheme is a clear winner for this data set.
Anthony
.........
Histogram for the final full run of Robinson scheme:
-> <stat> Ham scores for all runs: 19224 items; mean 31.30; sdev 13.09
* = 33 items
0.00 448 **************
2.50 843 **************************
5.00 385 ************
7.50 283 *********
10.00 349 ***********
12.50 416 *************
15.00 484 ***************
17.50 512 ****************
20.00 519 ****************
22.50 610 *******************
25.00 803 *************************
27.50 1389 *******************************************
30.00 1590 *************************************************
32.50 1768 ******************************************************
35.00 1963 ************************************************************
37.50 1885 **********************************************************
40.00 1591 *************************************************
42.50 1245 **************************************
45.00 871 ***************************
47.50 533 *****************
50.00 325 **********
52.50 177 ******
55.00 106 ****
57.50 53 **
60.00 43 **
62.50 23 *
65.00 7 *
67.50 2 *
70.00 0
72.50 1 *
75.00 0
77.50 0
80.00 0
82.50 0
85.00 0
87.50 0
90.00 0
92.50 0
95.00 0
97.50 0
-> <stat> Spam scores for all runs: 11303 items; mean 83.51; sdev 9.47
* = 26 items
0.00 0
2.50 0
5.00 0
7.50 0
10.00 0
12.50 0
15.00 0
17.50 0
20.00 0
22.50 0
25.00 0
27.50 0
30.00 0
32.50 0
35.00 0
37.50 1 *
40.00 0
42.50 2 *
45.00 6 *
47.50 10 *
50.00 10 *
52.50 13 *
55.00 29 **
57.50 37 **
60.00 59 ***
62.50 91 ****
65.00 159 *******
67.50 260 **********
70.00 428 *****************
72.50 741 *****************************
75.00 1113 *******************************************
77.50 1463 *********************************************************
80.00 1521 ***********************************************************
82.50 1044 *****************************************
85.00 624 ************************
87.50 507 ********************
90.00 618 ************************
92.50 668 **************************
95.00 870 **********************************
97.50 1029 ****************************************
-> best cutoff for all runs: 0.6
-> with 76 fp + 108 fn = 184 mistakes
Final bonus note: The false positive that scored over 72.5 (with
names obscured). It's a false positive, because it wasn't spam, but
at the same time, I don't think anyone would've missed it had it
"gone astray". :)
Subject: Inquiry
I'm looking for the right person at your company to contact regarding our
services. Frankly, I don't know if we can increase your sales as we have for
our current partners, but you do have a GREAT site and I'd like to see if a
relationship would make sense.
Would you put me in touch with the person who is responsible for driving
your online sales? Any help would be most appreciated.
Thanks!
Uuu
==============================
Uuu Uuuuuuu ......... Vice President, NNNNNNNN Inc.
Tel: 3x0-6x0-xxxx ...... NNNN S. SS SSSSSSS, Suite 706
eFax: (xxx) 843-xxxx...... PPPPPPPPP, CA ppppp
uuuuuuu@NNNNNNNN.com .... http://www.NNNNNNNN.com/inc/solutions/
NNNNNNNN, Inc's Product Rating, E-Mail Sharing, Gift Registry and Shopping
List services are cost effective, simple to integrate, and GUARANTEED to
increase sales:
http://www.NNNNNNNN.com/inc/demo/index.htm?m=solutions