[Spambayes] Two Scheme Enter, One Scheme Leave.

Thu, 26 Sep 2002 20:29:01 +1000

Quick bit on max_discriminators. 
Using the settings from before, I get

md      fp      fn      total
5       69      17      86      (cutoff 0.55)
15      11      30      41      (cutoff 0.625)
100     10      21      31
150     9       21      30
400     9       21      30

--

Next is Graham vs. best settings for Robinson, with 
a bunch of different seeds. Letting the best cutoff 
win in all cases for Graham, to compensate for the work 
put into tuning Robinson in this trial (cutoff for graham's 
nearly always 0.95, it seems)

(format is fp+fn=total)

seed	   Graham  	Robinson
1010101    41+12=53     19+21=40
12346	   30+12=42      9+21=30
271170     29+16=45     25+16=41
432104	   48+19=67	18+24=42
56743	   29+13=42     13+17=30
774213	   44+17=61	21+19=40
999111     39+21=60     20+26=46

mean       37+16=53     18+21=39
mean fp%   1.85%        0.9%
mean fn%   0.80%        1.05%

The final test, of course, is to feed the full corpus into the
two tweaked schemes. That's 11,000 spam and 19,000 ham split into 
10 sets. Note that I was a _bit_ naughty here - these two tests
are done with the version of tokenizer.py that strips out and 
discards style sheets. 

Graham:

  -> <stat> Ham scores for all runs: 19224 items; mean 0.78; sdev 8.59
  -> <stat> Spam scores for all runs: 11303 items; mean 99.38; sdev 7.61
  -> best cutoff for all runs: 0.8
  ->     with 139 fp + 74 fn = 213 mistakes
  ->     matched at 0.825 (139 fp + 74 fn)
  ->     matched at 0.85 (137 fp + 76 fn)
  ->     matched at 0.9 (135 fp + 78 fn)
  total unique false pos 135
  total unique false neg 78
  average fp % 0.702258059105
  average fn % 0.690069873164

Robinson, with cutoff to 0.65 (because I care more about fp than fn):
  -> <stat> Ham scores for all runs: 19224 items; mean 31.30; sdev 13.09
  -> <stat> Spam scores for all runs: 11303 items; mean 83.51; sdev 9.47
  -> best cutoff for all runs: 0.6
  ->     with 76 fp + 108 fn = 184 mistakes
  total unique false pos 10
  total unique false neg 258
  average fp % 0.0520210194464
  average fn % 2.28248163189

This is a completely _spooky_ level of accuracy, given the sheer
unpleasantness of the corpus (and how badly the default Graham did
on it)

If the change to make the HTML re a bit greedier was in the code
it would fix 2 of my 10 fps. 

I've appended the histograms from this run to the email.

Comparing the final two versions...
---- 
graham->robinson

test1_alls -> test2_alls

false positive percentages
    0.572  0.000  won   -100.00%
    0.676  0.104  won    -84.62%
    0.780  0.052  won    -93.33%
    0.884  0.052  won    -94.12%
    0.728  0.052  won    -92.86%
    0.832  0.000  won   -100.00%
    0.676  0.104  won    -84.62%
    0.676  0.104  won    -84.62%
    0.572  0.000  won   -100.00%
    0.624  0.052  won    -91.67%

won  10 times
tied  0 times
lost  0 times

total unique fp went from 135 to 10 won    -92.59%
mean fp % went from 0.702258059105 to 0.0520210194464 won    -92.59%

false negative percentages
    0.088  1.062  lost  +1106.82%
    0.619  2.653  lost  +328.59%
    1.150  2.389  lost  +107.74%
    0.796  2.829  lost  +255.40%
    0.973  3.363  lost  +245.63%
    0.531  2.124  lost  +300.00%
    0.796  2.476  lost  +211.06%
    0.531  1.858  lost  +249.91%
    0.885  1.770  lost  +100.00%
    0.531  2.301  lost  +333.33%

won   0 times
tied  0 times
lost 10 times

total unique fn went from 78 to 258 lost  +230.77%
mean fn % went from 0.690069873164 to 2.28248163189 lost  +230.76%

ham mean                     ham sdev
   0.63   31.50 +4900.00%        7.79   12.94  +66.11%
   0.67   30.95 +4519.40%        8.16   13.05  +59.93%
   0.79   31.35 +3868.35%        8.75   13.24  +51.31%
   1.00   31.47 +3047.00%        9.72   13.11  +34.88%
   0.84   31.30 +3626.19%        8.99   13.13  +46.05%
   0.91   31.12 +3319.78%        9.24   13.17  +42.53%
   0.71   31.09 +4278.87%        8.19   13.11  +60.07%
   0.85   31.90 +3652.94%        8.71   12.88  +47.88%
   0.63   31.20 +4852.38%        7.67   13.10  +70.80%
   0.78   31.14 +3892.31%        8.43   13.19  +56.47%

ham mean and sdev for all runs
   0.78   31.30 +3912.82%        8.59   13.09  +52.39%

spam mean                    spam sdev
  99.91   83.86  -16.06%        2.97    8.83 +197.31%
  99.38   83.59  -15.89%        7.80    9.57  +22.69%
  98.86   83.62  -15.42%       10.53    9.75   -7.41%
  99.40   82.76  -16.74%        7.07    9.56  +35.22%
  99.19   82.82  -16.50%        8.64    9.60  +11.11%
  99.52   83.31  -16.29%        6.69    9.43  +40.96%
  99.22   83.61  -15.73%        8.58    9.46  +10.26%
  99.62   83.94  -15.74%        5.74    9.41  +63.94%
  99.17   83.57  -15.73%        8.93    9.44   +5.71%
  99.53   83.98  -15.62%        6.52    9.58  +46.93%

spam mean and sdev for all runs
  99.38   83.51  -15.97%        7.61    9.47  +24.44%

ham/spam mean difference: 98.60 52.21 -46.39
----

I'd say that Robinson scheme is a clear winner for this data set.

Anthony

.........

Histogram for the final full run of Robinson scheme:

-> <stat> Ham scores for all runs: 19224 items; mean 31.30; sdev 13.09
* = 33 items
  0.00  448 **************
  2.50  843 **************************
  5.00  385 ************
  7.50  283 *********
 10.00  349 ***********
 12.50  416 *************
 15.00  484 ***************
 17.50  512 ****************
 20.00  519 ****************
 22.50  610 *******************
 25.00  803 *************************
 27.50 1389 *******************************************
 30.00 1590 *************************************************
 32.50 1768 ******************************************************
 35.00 1963 ************************************************************
 37.50 1885 **********************************************************
 40.00 1591 *************************************************
 42.50 1245 **************************************
 45.00  871 ***************************
 47.50  533 *****************
 50.00  325 **********
 52.50  177 ******
 55.00  106 ****
 57.50   53 **
 60.00   43 **
 62.50   23 *
 65.00    7 *
 67.50    2 *
 70.00    0 
 72.50    1 *
 75.00    0 
 77.50    0 
 80.00    0 
 82.50    0 
 85.00    0 
 87.50    0 
 90.00    0 
 92.50    0 
 95.00    0 
 97.50    0 

-> <stat> Spam scores for all runs: 11303 items; mean 83.51; sdev 9.47
* = 26 items
  0.00    0 
  2.50    0 
  5.00    0 
  7.50    0 
 10.00    0 
 12.50    0 
 15.00    0 
 17.50    0 
 20.00    0 
 22.50    0 
 25.00    0 
 27.50    0 
 30.00    0 
 32.50    0 
 35.00    0 
 37.50    1 *
 40.00    0 
 42.50    2 *
 45.00    6 *
 47.50   10 *
 50.00   10 *
 52.50   13 *
 55.00   29 **
 57.50   37 **
 60.00   59 ***
 62.50   91 ****
 65.00  159 *******
 67.50  260 **********
 70.00  428 *****************
 72.50  741 *****************************
 75.00 1113 *******************************************
 77.50 1463 *********************************************************
 80.00 1521 ***********************************************************
 82.50 1044 *****************************************
 85.00  624 ************************
 87.50  507 ********************
 90.00  618 ************************
 92.50  668 **************************
 95.00  870 **********************************
 97.50 1029 ****************************************
-> best cutoff for all runs: 0.6
->     with 76 fp + 108 fn = 184 mistakes

Final bonus note: The false positive that scored over 72.5 (with
names obscured). It's a false positive, because it wasn't spam, but
at the same time, I don't think anyone would've missed it had it 
"gone astray". :)

Subject: Inquiry

I'm looking for the right person at your company to contact regarding our
services. Frankly, I don't know if we can increase your sales as we have for
our current partners, but you do have a GREAT site and I'd like to see if a
relationship would make sense.

Would you put me in touch with the person who is responsible for driving
your online sales? Any help would be most appreciated.

Thanks!
Uuu

 ==============================
Uuu Uuuuuuu ......... Vice President, NNNNNNNN Inc.
Tel:  3x0-6x0-xxxx ...... NNNN S. SS SSSSSSS, Suite 706
eFax: (xxx) 843-xxxx...... PPPPPPPPP, CA ppppp
uuuuuuu@NNNNNNNN.com .... http://www.NNNNNNNN.com/inc/solutions/

NNNNNNNN, Inc's Product Rating, E-Mail Sharing, Gift Registry and Shopping
List services are cost effective, simple to integrate, and GUARANTEED to
increase sales:
http://www.NNNNNNNN.com/inc/demo/index.htm?m=solutions