Anthony Baxter anthony at interlink.com.au
Wed Mar 12 01:22:47 EST 2003

>>> "Meyer, Tony" wrote
> Curious, and (sort of) able to now run tests (thanks Tim & Mark), I
> changed the "prob = (S-H + 1.0) / 2.0" equation in classifier.py to
> use this method. I had to also fiddle with 0's since log(0) isn't nice
> (how does CRM114 do this?), plus I moved it from -350to+350 to 0-1.
> Surprisingly I got good (well, perfect, actually) results. Is this
> just my tiny-weeny sets? A fluke? *Another* mistake on my part?

Um, I'd say "mistake". Look at the numbers. Your ham mean has gone
from around 3 to around 55, while the spam mean's gone from around
92 to around 45. So you've moved everything solidly into the "unsure"

This, of course, will remove your FN/FP numbers. But then, dumping
your email directly into the unsure folder without running spambayes
will do that, too <wink>

Worse yet, your spam is scoring, on average, less than your ham! Oops.


> ham mean and sdev for all runs
>    3.05   55.83 +1730.49%       10.83    3.14  -71.01%
> spam mean and sdev for all runs
>   91.98   45.07  -51.00%       18.75    3.57  -80.96%

