[Spambayes] There Can Be Only One

Skip Montanaro skip@pobox.com
Wed, 25 Sep 2002 13:54:08 -0500


An now, coming in from faaaar out in right field we have Skip, the weirdo
with the wacky results:

    grahams -> fws
    -> <stat> tested 200 hams & 200 spams against 1800 hams & 1800 spams
    ...

    false positive percentages
        0.000  0.000  tied          
        0.000  0.500  lost  +(was 0)
        0.000  0.000  tied          
        0.000  0.000  tied          
        0.000  0.000  tied          
        0.000  0.000  tied          
        0.500  0.500  tied          
        0.000  0.000  tied          
        0.500  0.000  won   -100.00%
        0.000  0.000  tied          

    won   1 times
    tied  8 times
    lost  1 times

    total unique fp went from 2 to 2 tied          
    mean fp % went from 0.1 to 0.1 tied          

    false negative percentages
        9.000  10.500  lost   +16.67%
        11.000  14.000  lost   +27.27%
        10.000  12.000  lost   +20.00%
        7.000  8.500  lost   +21.43%
        14.000  19.500  lost   +39.29%
        12.000  12.000  tied          
        13.000  17.000  lost   +30.77%
        9.000  12.500  lost   +38.89%
        9.000  13.000  lost   +44.44%
        9.500  12.000  lost   +26.32%

    won   0 times
    tied  1 times
    lost  9 times

    total unique fn went from 207 to 262 lost   +26.57%
    mean fn % went from 10.35 to 13.1 lost   +26.57%

    ham mean                     ham sdev
       0.00   21.97 +(was 0)        0.00    6.82 +(was 0)
       0.58   23.40 +3934.48%        5.62    8.78  +56.23%
       0.00   22.83 +(was 0)        0.00    7.58 +(was 0)
       0.00   22.52 +(was 0)        0.00    7.65 +(was 0)
       0.00   22.96 +(was 0)        0.00    7.78 +(was 0)
       0.03   22.27 +74133.33%        0.42    7.70 +1733.33%
       0.50   22.79 +4458.00%        7.07    7.67   +8.49%
       0.00   22.77 +(was 0)        0.00    6.88 +(was 0)
       0.50   22.44 +4388.00%        7.07    7.61   +7.64%
       0.00   22.76 +(was 0)        0.00    7.13 +(was 0)

    ham mean and sdev for all runs
       0.16   22.67 +14068.75%        3.63    7.57 +108.54%

    spam mean                    spam sdev
      91.02   69.87  -23.24%       28.49   12.08  -57.60%
      89.95   67.92  -24.49%       29.71   12.32  -58.53%
      90.62   69.54  -23.26%       28.80   11.87  -58.78%
      93.70   71.21  -24.00%       23.90   10.83  -54.69%
      86.81   66.41  -23.50%       33.55   13.55  -59.61%
      88.40   69.92  -20.90%       31.84   12.23  -61.59%
      88.89   68.79  -22.61%       30.41   13.18  -56.66%
      91.15   69.36  -23.91%       28.24   12.28  -56.52%
      91.38   69.29  -24.17%       27.94   12.55  -55.08%
      90.73   70.44  -22.36%       28.83   12.34  -57.20%

    spam mean and sdev for all runs
      90.27   69.27  -23.26%       29.26   12.38  -57.69%

    ham/spam mean difference: 90.11 46.60 -43.51

Here are the overall graphs from the f(w) run:

    -> <stat> Ham scores for all runs: 2000 items; mean 22.67; sample sdev 7.57
    * = 5 items
      0.00   5 *
      2.50   6 **
      5.00  16 ****
      7.50  48 **********
     10.00  80 ****************
     12.50 121 *************************
     15.00 195 ***************************************
     17.50 277 ********************************************************
     20.00 278 ********************************************************
     22.50 260 ****************************************************
     25.00 257 ****************************************************
     27.50 168 **********************************
     30.00 125 *************************
     32.50  55 ***********
     35.00  40 ********
     37.50  20 ****
     40.00  18 ****
     42.50  12 ***
     45.00   7 **
     47.50   4 *
     50.00   1 *
     52.50   5 *
     55.00   0 
     57.50   0 
     60.00   2 *
     62.50   0 
     65.00   0 
     67.50   0 
     70.00   0 
     72.50   0 
     75.00   0 
     77.50   0 
     80.00   0 
     82.50   0 
     85.00   0 
     87.50   0 
     90.00   0 
     92.50   0 
     95.00   0 
     97.50   0 

    -> <stat> Spam scores for all runs: 2000 items; mean 69.27; sample sdev 12.38
    * = 3 items
      0.00   0 
      2.50   0 
      5.00   0 
      7.50   0 
     10.00   0 
     12.50   0 
     15.00   0 
     17.50   0 
     20.00   1 *
     22.50   0 
     25.00   1 *
     27.50   2 *
     30.00   1 *
     32.50   6 **
     35.00   4 **
     37.50  10 ****
     40.00  23 ********
     42.50  20 *******
     45.00  37 *************
     47.50  41 **************
     50.00  60 ********************
     52.50  56 *******************
     55.00  70 ************************
     57.50 107 ************************************
     60.00 124 ******************************************
     62.50 121 *****************************************
     65.00 151 ***************************************************
     67.50 163 *******************************************************
     70.00 166 ********************************************************
     72.50 169 *********************************************************
     75.00 139 ***********************************************
     77.50 126 ******************************************
     80.00 102 **********************************
     82.50  95 ********************************
     85.00  79 ***************************
     87.50  53 ******************
     90.00  46 ****************
     92.50  23 ********
     95.00   4 **
     97.50   0 

It looks to me like the problem (whatever it is) with my data is that it
causes the standard deviation to get quite large, though the mean spam score
seems to be a bit lower than other stuff I've seen.  What would cause that?
More variability in my spam?

Skip