[spambayes-dev] spammy subject lines

Mon Oct 13 21:18:59 EDT 2003

[Tony Meyer]
> ...
> So, I (and anyone else ;) should do timcv.py,

That's right, and with -n10 if possible.

> rates.py and cmp.py and post the results of that?

Right.  Also table.py, because, unfortunately, cmp.py predates the idea of
unsures, and doesn't tell us anything about the effect on the unsure rate.

> Along with any necessary histograms from the timcv.py output, then.

Optional.  The change here looks so tiny that the histograms aren't going to
say much (but see below).

> So here's a new attempt (the first data was, as you said in your next
> message, basically spam and 'hard' ham, this data is spam and hard and
> 'soft' ham).

> cv_octs.txt -> cv_oct_subjs.txt
> -> <stat> tested 488 hams & 897 spams against 1824 hams & 3501 spams
> -> <stat> tested 462 hams & 863 spams against 1850 hams & 3535 spams
> -> <stat> tested 475 hams & 863 spams against 1837 hams & 3535 spams
> -> <stat> tested 430 hams & 887 spams against 1882 hams & 3511 spams
> -> <stat> tested 457 hams & 888 spams against 1855 hams & 3510 spams
> -> <stat> tested 488 hams & 897 spams against 1824 hams & 3501 spams
> -> <stat> tested 462 hams & 863 spams against 1850 hams & 3535 spams
> -> <stat> tested 475 hams & 863 spams against 1837 hams & 3535 spams
> -> <stat> tested 430 hams & 887 spams against 1882 hams & 3511 spams
> -> <stat> tested 457 hams & 888 spams against 1855 hams & 3510 spams
>
> false positive percentages
>     0.000  0.000  tied
>     0.000  0.000  tied
>     0.000  0.000  tied
>     0.000  0.000  tied
>     0.219  0.219  tied
>
> won   0 times
> tied  5 times
> lost  0 times

So all 5 runs tied on FP.  That tells us much more than that the *net*
effect across 5 runs was nil on FP:  it tells us that there are no hidden
glitches hiding behind that "net nothing" -- it was no change across the
board.

> total unique fp went from 1 to 1 tied
> mean fp % went from 0.0437636761488 to 0.0437636761488 tied
>
> false negative percentages
>     2.007  2.007  tied
>     1.390  1.390  tied
>     1.622  1.622  tied
>     2.029  1.917  won     -5.52%
>     2.703  2.477  won     -8.36%
>
> won   2 times
> tied  3 times
> lost  0 times

When evaluating a small change, I'm heartened to see that in no run did it
lose.  At worst it tied, and twice it helped a little.  That's encouraging.

What the histograms would tell us that we can't tell from this is whether
you could have done just as well without the change by raising your ham
cutoff a little.  That would also tie on FP, and *may* also get rid of the
same number (or even more) of FN.

> total unique fn went from 86 to 83 won     -3.49%
> mean fn % went from 1.95029003772 to 1.88269707836 won     -3.47%
>
> ham mean                     ham sdev
>    0.57    0.58   +1.75%        4.63    4.77   +3.02%
>    0.08    0.07  -12.50%        1.20    1.01  -15.83%
>    0.36    0.29  -19.44%        3.61    3.23  -10.53%
>    0.08    0.11  +37.50%        0.89    1.18  +32.58%
>    0.72    0.76   +5.56%        6.80    7.06   +3.82%
>
> ham mean and sdev for all runs
>    0.37    0.37   +0.00%        4.10    4.16   +1.46%

That's a good example of grand averages hiding the truth:  the averaged
change in the mean ham score was 0 across all 5 runs, but *within* the 5
runs it slobbered around wildly, from decreasing 20% to increasing 40%(!).
It can be fun <wink> to track down why that is.  My guess is that you've got
just a handful of ham that generate new tokens here, and that a few of these
new tokens appeared more often in spam than in ham.  Sometimes they help ham
a little, sometimes they hurt ham a little.

> spam mean                    spam sdev
>   96.43   96.44   +0.01%       15.89   15.89   +0.00%
>   97.01   97.07   +0.06%       13.79   13.70   -0.65%
>   97.14   97.16   +0.02%       14.05   14.02   -0.21%
>   96.52   96.56   +0.04%       15.65   15.52   -0.83%
>   95.53   95.63   +0.10%       17.47   17.31   -0.92%
>
> spam mean and sdev for all runs
>   96.52   96.57   +0.05%       15.46   15.37   -0.58%

That's good to see:  it's a consistent win for spam scores across runs,
although an almost imperceptible one.  It's good when the mean spam score
rises, and it's good when sdev (for ham or spam) decreases.

> ham/spam mean difference: 96.15 96.20 +0.05
>
> I *think* ;) that this is back to a slight win for the change...

I agree, although seeing the details gives cause to worry some about the
effect on ham:  the ham sdev increased overall, and the effects on ham mean
and ham sdev varied wildly across runs.  OTOH, the "before" numbers for ham
mean and ham sdev varied wildly across runs already.  The gives cause to
worry some about the data <wink>.