[Spambayes] Need testers!

Mon, 16 Sep 2002 14:45:01 -0400

[Neale Pickett]
> Here's the result of my inbox vs. spambox.  The inbox is my 1000
> personal messages, and some back posts to the thttpd list and the
> virgule-dev list.  I've checked them all over for spam and I'm pretty
> sure (but not certain) that they're clean.  The spambox is my spam
> folder, which I'm pretty sure (but again not certain) is totally dirty.
> For the second run I added the lines:
>
>   [Classifier]
>   adjust_probs_by_evidence_mass: True
>   min_spamprob: 0.001
>   max_spamprob: 0.999
>   hambias: 1.5
>
> to bayescustomize.ini.
>
>
> results1s -> results2s
> -> <stat> tested 382 hams & 417 spams against 1002 hams & 1000 spams
> -> <stat> tested 327 hams & 690 spams against 1002 hams & 1000 spams
> -> <stat> tested 1002 hams & 1000 spams against 382 hams & 417 spams
> -> <stat> tested 327 hams & 690 spams against 382 hams & 417 spams
> -> <stat> tested 1002 hams & 1000 spams against 327 hams & 690 spams
> -> <stat> tested 382 hams & 417 spams against 327 hams & 690 spams
> -> <stat> tested 382 hams & 417 spams against 1002 hams & 1000 spams
> -> <stat> tested 327 hams & 690 spams against 1002 hams & 1000 spams
> -> <stat> tested 1002 hams & 1000 spams against 382 hams & 417 spams
> -> <stat> tested 327 hams & 690 spams against 382 hams & 417 spams
> -> <stat> tested 1002 hams & 1000 spams against 327 hams & 690 spams
> -> <stat> tested 382 hams & 417 spams against 327 hams & 690 spams

I *assume* you're not using timcv.py to run the tests, and aren't using "the
standard" one-msg-per-file setup, and so also can't use rebal.py to balance
the # of hams and spams across sets, and also aren't doing cross validation
runs here (this looks like you were running a 3x3 test grid).

> false positive percentages
>     2.094  3.665  lost   +75.02%
>     0.612  2.141  lost  +249.84%
>     34.331  40.419  lost   +17.73%
>     13.456  11.927  won    -11.36%
>     52.695  59.481  lost   +12.88%
>     44.503  57.330  lost   +28.82%

Those numbers are spectacularly bad, both the "before" and "after" numbers,
much worse than anyone else has reported.  It's hard not to suspect a bug in
whichever test driver script you used (I'm not set up to run, e.g.,
mboxtest.py, so don't have any feel for how that's working these days -- but
if that's what you were using, it looks like it's not working these days
<wink>).

> won   1 times
> tied  0 times
> lost  5 times
>
> total unique fp went from 798 to 898 lost   +12.53%
> mean fp % went from 24.6150141717 to 29.1603737144 lost   +18.47%
>
> false negative percentages
>     0.959  0.480  won    -49.95%
>     1.014  1.014  tied
>     0.800  0.700  won    -12.50%
>     1.449  1.304  won    -10.01%
>     0.100  0.100  tied
>     0.240  0.240  tied
>
> won   3 times
> tied  3 times
> lost  0 times
>
> total unique fn went from 29 to 25 won    -13.79%
> mean fn % went from 0.760468147221 to 0.639710840023 won    -15.88%

Too weird.  Who else runs mboxtest.py (assuming that's what Neale runs)?
Has it gone to hell recently for you?