[spambayes-dev] A URL experiment

Tony Meyer tameyer at ihug.co.nz
Tue Dec 30 22:17:49 EST 2003


My results (this is with a chuck of my most recent mail, with timcv.py
-n10).

Tim's patch:

bases.txt -> nntims.txt
-> <stat> tested 357 hams & 395 spams against 3311 hams & 3704 spams
-> <stat> tested 397 hams & 384 spams against 3271 hams & 3715 spams
-> <stat> tested 385 hams & 433 spams against 3283 hams & 3666 spams
-> <stat> tested 407 hams & 397 spams against 3261 hams & 3702 spams
-> <stat> tested 350 hams & 412 spams against 3318 hams & 3687 spams
-> <stat> tested 338 hams & 405 spams against 3330 hams & 3694 spams
-> <stat> tested 359 hams & 416 spams against 3309 hams & 3683 spams
-> <stat> tested 358 hams & 405 spams against 3310 hams & 3694 spams
-> <stat> tested 348 hams & 411 spams against 3320 hams & 3688 spams
-> <stat> tested 369 hams & 441 spams against 3299 hams & 3658 spams
-> <stat> tested 357 hams & 395 spams against 3311 hams & 3704 spams
-> <stat> tested 397 hams & 384 spams against 3271 hams & 3715 spams
-> <stat> tested 385 hams & 433 spams against 3283 hams & 3666 spams
-> <stat> tested 407 hams & 397 spams against 3261 hams & 3702 spams
-> <stat> tested 350 hams & 412 spams against 3318 hams & 3687 spams
-> <stat> tested 338 hams & 405 spams against 3330 hams & 3694 spams
-> <stat> tested 359 hams & 416 spams against 3309 hams & 3683 spams
-> <stat> tested 358 hams & 405 spams against 3310 hams & 3694 spams
-> <stat> tested 348 hams & 411 spams against 3320 hams & 3688 spams
-> <stat> tested 369 hams & 441 spams against 3299 hams & 3658 spams

false positive percentages
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.246  0.246  tied          
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.557  0.557  tied          
    0.559  0.559  tied          
    0.287  0.287  tied          
    0.000  0.000  tied          

won   0 times
tied 10 times
lost  0 times

total unique fp went from 6 to 6 tied          
mean fp % went from 0.164881884948 to 0.164881884948 tied          

false negative percentages
    0.253  0.253  tied          
    0.781  0.781  tied          
    0.462  0.462  tied          
    0.756  0.756  tied          
    0.243  0.243  tied          
    0.247  0.247  tied          
    0.240  0.240  tied          
    0.494  0.494  tied          
    0.973  0.973  tied          
    0.454  0.454  tied          

won   0 times
tied 10 times
lost  0 times

total unique fn went from 20 to 20 tied          
mean fn % went from 0.490257037938 to 0.490257037938 tied          

ham mean                     ham sdev
   1.18    1.17   -0.85%        7.76    7.67   -1.16%
   0.99    0.99   +0.00%        6.64    6.64   +0.00%
   0.84    0.85   +1.19%        6.14    6.14   +0.00%
   1.99    2.10   +5.53%        9.46    9.73   +2.85%
   0.49    0.49   +0.00%        3.59    3.57   -0.56%
   0.85    0.87   +2.35%        5.45    5.46   +0.18%
   1.16    1.16   +0.00%        9.30    9.29   -0.11%
   1.20    1.30   +8.33%        8.13    8.66   +6.52%
   1.55    1.55   +0.00%        8.05    8.05   +0.00%
   0.47    0.47   +0.00%        3.22    3.15   -2.17%

ham mean and sdev for all runs
   1.08    1.10   +1.85%        7.13    7.21   +1.12%

spam mean                    spam sdev
  98.75   98.75   +0.00%        8.72    8.72   +0.00%
  97.67   97.69   +0.02%       11.26   11.24   -0.18%
  98.08   98.14   +0.06%       10.12    9.97   -1.48%
  98.16   98.16   +0.00%       10.19   10.20   +0.10%
  98.35   98.41   +0.06%        8.77    8.69   -0.91%
  98.45   98.47   +0.02%        8.97    8.86   -1.23%
  98.35   98.41   +0.06%        9.73    9.69   -0.41%
  98.25   98.36   +0.11%        9.16    8.96   -2.18%
  97.93   97.97   +0.04%       11.99   11.98   -0.08%
  98.92   98.93   +0.01%        7.62    7.62   +0.00%

spam mean and sdev for all runs
  98.30   98.34   +0.04%        9.72    9.66   -0.62%

ham/spam mean difference: 97.22 97.24 +0.02

Skip's patch:

bases.txt -> pickskips.txt
-> <stat> tested 357 hams & 395 spams against 3311 hams & 3704 spams
-> <stat> tested 397 hams & 384 spams against 3271 hams & 3715 spams
-> <stat> tested 385 hams & 433 spams against 3283 hams & 3666 spams
-> <stat> tested 407 hams & 397 spams against 3261 hams & 3702 spams
-> <stat> tested 350 hams & 412 spams against 3318 hams & 3687 spams
-> <stat> tested 338 hams & 405 spams against 3330 hams & 3694 spams
-> <stat> tested 359 hams & 416 spams against 3309 hams & 3683 spams
-> <stat> tested 358 hams & 405 spams against 3310 hams & 3694 spams
-> <stat> tested 348 hams & 411 spams against 3320 hams & 3688 spams
-> <stat> tested 369 hams & 441 spams against 3299 hams & 3658 spams
-> <stat> tested 357 hams & 395 spams against 3311 hams & 3704 spams
-> <stat> tested 397 hams & 384 spams against 3271 hams & 3715 spams
-> <stat> tested 385 hams & 433 spams against 3283 hams & 3666 spams
-> <stat> tested 407 hams & 397 spams against 3261 hams & 3702 spams
-> <stat> tested 350 hams & 412 spams against 3318 hams & 3687 spams
-> <stat> tested 338 hams & 405 spams against 3330 hams & 3694 spams
-> <stat> tested 359 hams & 416 spams against 3309 hams & 3683 spams
-> <stat> tested 358 hams & 405 spams against 3310 hams & 3694 spams
-> <stat> tested 348 hams & 411 spams against 3320 hams & 3688 spams
-> <stat> tested 369 hams & 441 spams against 3299 hams & 3658 spams

false positive percentages
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.246  0.246  tied          
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.557  0.557  tied          
    0.559  0.559  tied          
    0.287  0.287  tied          
    0.000  0.000  tied          

won   0 times
tied 10 times
lost  0 times

total unique fp went from 6 to 6 tied          
mean fp % went from 0.164881884948 to 0.164881884948 tied          

false negative percentages
    0.253  0.253  tied          
    0.781  0.781  tied          
    0.462  0.462  tied          
    0.756  0.756  tied          
    0.243  0.243  tied          
    0.247  0.247  tied          
    0.240  0.240  tied          
    0.494  0.494  tied          
    0.973  0.973  tied          
    0.454  0.454  tied          

won   0 times
tied 10 times
lost  0 times

total unique fn went from 20 to 20 tied          
mean fn % went from 0.490257037938 to 0.490257037938 tied          

ham mean                     ham sdev
   1.18    1.18   +0.00%        7.76    7.76   +0.00%
   0.99    0.99   +0.00%        6.64    6.64   +0.00%
   0.84    0.84   +0.00%        6.14    6.14   +0.00%
   1.99    1.99   +0.00%        9.46    9.46   +0.00%
   0.49    0.50   +2.04%        3.59    3.60   +0.28%
   0.85    0.87   +2.35%        5.45    5.55   +1.83%
   1.16    1.16   +0.00%        9.30    9.30   +0.00%
   1.20    1.21   +0.83%        8.13    8.14   +0.12%
   1.55    1.55   +0.00%        8.05    8.06   +0.12%
   0.47    0.47   +0.00%        3.22    3.22   +0.00%

ham mean and sdev for all runs
   1.08    1.08   +0.00%        7.13    7.14   +0.14%

spam mean                    spam sdev
  98.75   98.78   +0.03%        8.72    8.56   -1.83%
  97.67   97.70   +0.03%       11.26   11.25   -0.09%
  98.08   98.08   +0.00%       10.12   10.12   +0.00%
  98.16   98.17   +0.01%       10.19   10.15   -0.39%
  98.35   98.38   +0.03%        8.77    8.73   -0.46%
  98.45   98.46   +0.01%        8.97    8.97   +0.00%
  98.35   98.38   +0.03%        9.73    9.68   -0.51%
  98.25   98.29   +0.04%        9.16    9.05   -1.20%
  97.93   97.95   +0.02%       11.99   11.98   -0.08%
  98.92   98.93   +0.01%        7.62    7.62   +0.00%

spam mean and sdev for all runs
  98.30   98.32   +0.02%        9.72    9.68   -0.41%

ham/spam mean difference: 97.22 97.24 +0.02

3-way compare:

filename:    bases  nntims pickskips
ham:spam:  3668:4099       3668:4099
                   3668:4099      
fp total:        6       6       6
fp %:         0.16    0.16    0.16
fn total:       20      20      20
fn %:         0.49    0.49    0.49
unsure t:      178     173     175
unsure %:     2.29    2.23    2.25
real cost: $115.60 $114.60 $115.00
best cost:  $93.00  $91.20  $92.40
h mean:       1.08    1.10    1.08
h sdev:       7.13    7.21    7.14
s mean:      98.30   98.34   98.32
s sdev:       9.72    9.66    9.68
mean diff:   97.22   97.24   97.24
k:            5.77    5.76    5.78

Rather like Tim's results, really, at least to my ignorant eyes.

=Tony Meyer




More information about the spambayes-dev mailing list