[Spambayes] Need testers!
Tim Peters
tim.one@comcast.net
Mon, 16 Sep 2002 14:45:01 -0400
[Neale Pickett]
> Here's the result of my inbox vs. spambox. The inbox is my 1000
> personal messages, and some back posts to the thttpd list and the
> virgule-dev list. I've checked them all over for spam and I'm pretty
> sure (but not certain) that they're clean. The spambox is my spam
> folder, which I'm pretty sure (but again not certain) is totally dirty.
> For the second run I added the lines:
>
> [Classifier]
> adjust_probs_by_evidence_mass: True
> min_spamprob: 0.001
> max_spamprob: 0.999
> hambias: 1.5
>
> to bayescustomize.ini.
>
>
> results1s -> results2s
> -> <stat> tested 382 hams & 417 spams against 1002 hams & 1000 spams
> -> <stat> tested 327 hams & 690 spams against 1002 hams & 1000 spams
> -> <stat> tested 1002 hams & 1000 spams against 382 hams & 417 spams
> -> <stat> tested 327 hams & 690 spams against 382 hams & 417 spams
> -> <stat> tested 1002 hams & 1000 spams against 327 hams & 690 spams
> -> <stat> tested 382 hams & 417 spams against 327 hams & 690 spams
> -> <stat> tested 382 hams & 417 spams against 1002 hams & 1000 spams
> -> <stat> tested 327 hams & 690 spams against 1002 hams & 1000 spams
> -> <stat> tested 1002 hams & 1000 spams against 382 hams & 417 spams
> -> <stat> tested 327 hams & 690 spams against 382 hams & 417 spams
> -> <stat> tested 1002 hams & 1000 spams against 327 hams & 690 spams
> -> <stat> tested 382 hams & 417 spams against 327 hams & 690 spams
I *assume* you're not using timcv.py to run the tests, and aren't using "the
standard" one-msg-per-file setup, and so also can't use rebal.py to balance
the # of hams and spams across sets, and also aren't doing cross validation
runs here (this looks like you were running a 3x3 test grid).
> false positive percentages
> 2.094 3.665 lost +75.02%
> 0.612 2.141 lost +249.84%
> 34.331 40.419 lost +17.73%
> 13.456 11.927 won -11.36%
> 52.695 59.481 lost +12.88%
> 44.503 57.330 lost +28.82%
Those numbers are spectacularly bad, both the "before" and "after" numbers,
much worse than anyone else has reported. It's hard not to suspect a bug in
whichever test driver script you used (I'm not set up to run, e.g.,
mboxtest.py, so don't have any feel for how that's working these days -- but
if that's what you were using, it looks like it's not working these days
<wink>).
> won 1 times
> tied 0 times
> lost 5 times
>
> total unique fp went from 798 to 898 lost +12.53%
> mean fp % went from 24.6150141717 to 29.1603737144 lost +18.47%
>
> false negative percentages
> 0.959 0.480 won -49.95%
> 1.014 1.014 tied
> 0.800 0.700 won -12.50%
> 1.449 1.304 won -10.01%
> 0.100 0.100 tied
> 0.240 0.240 tied
>
> won 3 times
> tied 3 times
> lost 0 times
>
> total unique fn went from 29 to 25 won -13.79%
> mean fn % went from 0.760468147221 to 0.639710840023 won -15.88%
Too weird. Who else runs mboxtest.py (assuming that's what Neale runs)?
Has it gone to hell recently for you?