[Spambayes] experimental_ham_spam_imbalance_adjustment result

Meyer, Tony T.A.Meyer at massey.ac.nz
Tue Mar 11 17:45:42 EST 2003


> (d) Something went wrong somewhere.  The listings of means 
> and sdevs are
> supremely sensitive to even the tiniest changes:  I've never 
> seen them all zero unless the classifiers and tokenizers
> going into them were actually identical.

Which  was the case here.  <blush>.  The mistake was that timtest wasn't finding the new config, so it was running the same test twice and comparing it.  Not surprisingly, option=false did the same as option=false :)  Thanks for the help :)

> Given that you have more ham than spam, the expected effect 
> of enabling the option is to decrease your FN rate (which,
> at 4%, is high), and possibly increase your FP rate (which is 0).

Which is what happened.  From 4% to 1% and from 0% to 0.2%.  The 3 fp's were (1) a "you're almost ready to start using" email from habeas.com (this does better in my personal set since I check for the habeas headers), (2) an announcement from mtnsms.com about their new smspop service, and (3) a "thank you for installing" message from Real.

I think this says that for me, it's a loss.  All three of these (particularly the first two) were important at the time, and I would not have wanted to wade through the spam folder for them.  I would much rather put up with the fn's.

Here are (hopefully) correct results:
imbal_falses.txt -> imbal_trues.txt
-> <stat> tested 372 hams & 48 spams against 983 hams & 155 spams
-> <stat> tested 333 hams & 56 spams against 1022 hams & 147 spams
-> <stat> tested 329 hams & 48 spams against 1026 hams & 155 spams
-> <stat> tested 321 hams & 51 spams against 1034 hams & 152 spams
-> <stat> tested 372 hams & 48 spams against 983 hams & 155 spams
-> <stat> tested 333 hams & 56 spams against 1022 hams & 147 spams
-> <stat> tested 329 hams & 48 spams against 1026 hams & 155 spams
-> <stat> tested 321 hams & 51 spams against 1034 hams & 152 spams

false positive percentages
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.000  0.304  lost  +(was 0)
    0.000  0.623  lost  +(was 0)

won   0 times
tied  2 times
lost  2 times

total unique fp went from 0 to 3 lost  +(was 0)
mean fp % went from 0.0 to 0.231751081821 lost  +(was 0)

false negative percentages
    6.250  2.083  won    -66.67%
    0.000  0.000  tied          
    6.250  2.083  won    -66.67%
    3.922  0.000  won   -100.00%

won   3 times
tied  1 times
lost  0 times

total unique fn went from 8 to 2 won    -75.00%
mean fn % went from 4.10539215686 to 1.04166666667 won    -74.63%

ham mean                     ham sdev
   0.39    1.45 +271.79%        3.46    6.76  +95.38%
   0.09    1.30 +1344.44%        0.91    6.05 +564.84%
   0.65    2.56 +293.85%        4.57    9.96 +117.94%
   1.40    3.37 +140.71%        7.93   14.06  +77.30%

ham mean and sdev for all runs
   0.62    2.14 +245.16%        4.87    9.65  +98.15%

spam mean                    spam sdev
  87.62   94.09   +7.38%       28.34   16.45  -41.95%
  90.83   99.06   +9.06%       18.01    3.61  -79.96%
  91.17   94.81   +3.99%       25.61   17.83  -30.38%
  85.65   94.52  +10.36%       25.97   14.35  -44.74%

spam mean and sdev for all runs
  88.85   95.74   +7.75%       24.68   14.10  -42.87%

ham/spam mean difference: 88.23 93.60 +5.37

=Tony Meyer



More information about the Spambayes mailing list