[Spambayes] RE: spam detection via probability - actual results!

Anthony Baxter anthony@interlink.com.au
Fri, 20 Sep 2002 14:23:41 +1000


>>> Tim Peters wrote
> max_discriminators: 150
 
For me, adding only this line to my previous run (the "after" in
http://mail.python.org/pipermail-21/spambayes/2002-September/000360.html
)

This is producing impressive results!


false positive percentages
    0.334  0.056  won    -83.23%
    0.278  0.000  won   -100.00%
    0.167  0.056  won    -66.47%
    0.278  0.333  lost   +19.78%
    0.389  0.222  won    -42.93%

won   4 times
tied  0 times
lost  1 times

total unique fp went from 26 to 12 won    -53.85%
mean fp % went from 0.289173231033 to 0.133419869849 won    -53.86%

false negative percentages
    0.517  0.840  lost   +62.48%
    0.388  0.647  lost   +66.75%
    0.516  0.710  lost   +37.60%
    0.453  0.712  lost   +57.17%
    0.712  1.165  lost   +63.62%

won   0 times
tied  0 times
lost  5 times

total unique fn went from 40 to 63 lost   +57.50%
mean fn % went from 0.517406493303 to 0.814957202608 lost   +57.51%

The new histograms:
Ham distribution for all runs:
* = 22 items
  0.00  468 **********************
  2.50  352 ****************
  5.00   66 ***
  7.50   83 ****
 10.00  123 ******
 12.50   62 ***
 15.00   66 ***
 17.50  157 ********
 20.00  532 *************************
 22.50  912 ******************************************
 25.00 1241 *********************************************************
 27.50 1235 *********************************************************
 30.00 1281 ***********************************************************
 32.50 1121 ***************************************************
 35.00  721 *********************************
 37.50  337 ****************
 40.00  133 *******
 42.50   53 ***
 45.00   21 *
 47.50   15 *
 50.00    7 *
 52.50    1 *
 55.00    2 *
 57.50    1 *
 60.00    1 *
 62.50    0 
 65.00    0 
 67.50    0 
 70.00    0 
 72.50    0 
 75.00    0 
 77.50    0 
 80.00    0 
 82.50    0 
 85.00    0 
 87.50    0 
 90.00    0 
 92.50    0 
 95.00    0 
 97.50    0 

Spam distribution for all runs:
* = 17 items
  0.00   0 
  2.50   0 
  5.00   0 
  7.50   0 
 10.00   0 
 12.50   0 
 15.00   0 
 17.50   0 
 20.00   2 *
 22.50   0 
 25.00   0 
 27.50   0 
 30.00   1 *
 32.50   0 
 35.00   2 *
 37.50   5 *
 40.00   5 *
 42.50   6 *
 45.00  17 *
 47.50  25 **
 50.00  33 **
 52.50  65 ****
 55.00 116 *******
 57.50 199 ************
 60.00 397 ************************
 62.50 725 *******************************************
 65.00 917 ******************************************************
 67.50 978 **********************************************************
 70.00 811 ************************************************
 72.50 554 *********************************
 75.00 397 ************************
 77.50 357 *********************
 80.00 327 ********************
 82.50 271 ****************
 85.00 180 ***********
 87.50 172 ***********
 90.00 101 ******
 92.50 114 *******
 95.00 102 ******
 97.50 852 ***************************************************