[Spambayes] experimental_ham_spam_imbalance_adjustment result

Meyer, Tony T.A.Meyer at massey.ac.nz
Tue Mar 11 15:54:44 EST 2003


> Here are my current results on the imbalance option.  
And here are mine.

imbalance_false4s.txt -> imbalance_true4s.txt
-> <stat> tested 372 hams & 48 spams against 983 hams & 155 spams
-> <stat> tested 333 hams & 56 spams against 1022 hams & 147 spams
-> <stat> tested 329 hams & 48 spams against 1026 hams & 155 spams
-> <stat> tested 321 hams & 51 spams against 1034 hams & 152 spams
-> <stat> tested 372 hams & 48 spams against 983 hams & 155 spams
-> <stat> tested 333 hams & 56 spams against 1022 hams & 147 spams
-> <stat> tested 329 hams & 48 spams against 1026 hams & 155 spams
-> <stat> tested 321 hams & 51 spams against 1034 hams & 152 spams

false positive percentages
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.000  0.000  tied          

won   0 times
tied  4 times
lost  0 times

total unique fp went from 0 to 0 tied          
mean fp % went from 0.0 to 0.0 tied          

false negative percentages
    6.250  6.250  tied          
    0.000  0.000  tied          
    6.250  6.250  tied          
    3.922  3.922  tied          

won   0 times
tied  4 times
lost  0 times

total unique fn went from 8 to 8 tied          
mean fn % went from 4.10539215686 to 4.10539215686 tied          

ham mean                     ham sdev
   0.39    0.39   +0.00%        3.46    3.46   +0.00%
   0.09    0.09   +0.00%        0.91    0.91   +0.00%
   0.65    0.65   +0.00%        4.57    4.57   +0.00%
   1.40    1.40   +0.00%        7.93    7.93   +0.00%

ham mean and sdev for all runs
   0.62    0.62   +0.00%        4.87    4.87   +0.00%

spam mean                    spam sdev
  87.62   87.62   +0.00%       28.34   28.34   +0.00%
  90.83   90.83   +0.00%       18.01   18.01   +0.00%
  91.17   91.17   +0.00%       25.61   25.61   +0.00%
  85.65   85.65   +0.00%       25.97   25.97   +0.00%

spam mean and sdev for all runs
  88.85   88.85   +0.00%       24.68   24.68   +0.00%

ham/spam mean difference: 88.23 88.23 +0.00

My ham:spam ratio is about 7:1 (Mark's was about 1:2.5).  Forgive the newbie question, but does this mean that:
(a) for my corpus, the options makes no difference at all?
(b) I haven't tested with a big enough corpus?
(c) I did something wrong ;)

Thanks,
Tony Meyer



More information about the Spambayes mailing list