[Spambayes] Moving closer to Gary's ideal

Neil Schemenauer nas@python.ca
Sat, 21 Sep 2002 12:47:13 -0700


Tim Peters wrote:
>               distance = abs(prob - 0.5)
> +             if distance < 0.1:
> +                 continue
>               if distance > smallest_best:

For me, that's enough to match the performance default setup:

    [Classifier]
    use_robinson_probability: True
    use_robinson_combining: True
    max_discriminators: 1500

    [TestDriver]
    spam_cutoff: 0.6


    false positive percentages
        0.667  0.000  won   -100.00%
        0.000  0.000  tied
        1.000  0.667  won    -33.30%
        0.333  0.333  tied
        0.000  0.333  lost  +(was 0)
        0.000  0.000  tied

    won   2 times
    tied  3 times
    lost  1 times

    total unique fp went from 6 to 4 won    -33.33%
    mean fp % went from 0.333333333333 to 0.222222222222 won    -33.33%

    false negative percentages
        0.333  1.667  lost  +400.60%
        1.333  1.333  tied
        1.667  2.000  lost   +19.98%
        0.333  0.000  won   -100.00%
        1.333  1.333  tied
        1.667  1.333  won    -20.04%

    won   2 times
    tied  2 times
    lost  2 times

    total unique fn went from 20 to 23 lost   +15.00%
    mean fn % went from 1.11111111111 to 1.27777777778 lost   +15.00%

    Ham distribution for all runs:
    * = 4 items
      5.00   0
      7.50   1 *
     10.00   0
     12.50   3 *
     15.00  18 *****
     17.50  29 ********
     20.00  70 ******************
     22.50 155 ***************************************
     25.00 206 ****************************************************
     27.50 231 **********************************************************
     30.00 227 *********************************************************
     32.50 220 *******************************************************
     35.00 186 ***********************************************
     37.50 127 ********************************
     40.00  91 ***********************
     42.50  72 ******************
     45.00  51 *************
     47.50  31 ********
     50.00  36 *********
     52.50  23 ******
     55.00  13 ****
     57.50   6 **
     60.00   3 *
     62.50   1 *
     65.00   0

    Spam distribution for all runs:
    * = 5 items
     47.50   0
     50.00   4 *
     52.50   1 *
     55.00   7 **
     57.50  11 ***
     60.00  34 *******
     62.50  33 *******
     65.00  58 ************
     67.50 106 **********************
     70.00 169 **********************************
     72.50 210 ******************************************
     75.00 243 *************************************************
     77.50 264 *****************************************************
     80.00 217 ********************************************
     82.50 167 **********************************
     85.00 143 *****************************
     87.50  93 *******************
     90.00  37 ********
     92.50   3 *
     95.00   0