[Spambayes] Proposing to remove 4 combining schemes

Tim Peters tim.one@comcast.net
Thu Oct 17 22:32:46 2002


[Sean True]
> I changed to chi-scoring. That was easy.
> I rescored the /News filter, which is not used for any training
> whatsover.  The spread of scores is now 0 to .99, as advertised.

That's a bit odd:  in all test reports to date, the median spam score under
chi-combining was 1.00, and that matches what I've seen on my personal email
too (a large majority of spam scores 1.00, to the precision of the Hammie
display).

> The scores for the financial newsletters are now much higher,

Meaning 0.99, or ...?  The rule of thumb under chi-combining so far has been
to say a thing is spam if its score exceeds 0.95, else to call it a "middle
ground" msg (with "it's ham" under 0.05).

> which does not meet my qualitative assessment.

The scores will change over time, of course -- the system learns what it's
taught.

> If I'm not sure it's spam, I'd prefer a score that matched that.

Under chi-combining, a score under .95 (as a rule of thumb so far) does mean
"I'm not sure it's spam".  So quantifying this would be helpful.

> All in all, for exposed scoring, I still prefer the old scores,
> but not enough to keep complaining about.

You're allowed to -- as I said in the msg that started this thread, I too
re-found a fondess for the fuzzy scores when using the same UI you're using.
I'm not sure how much that has to do with the reality of the scoring, and
how much to do with the UI, but we have lots of test results saying that
chi-combining does objectively better *if* you need to pick your cutoffs in
advance.  If you're happy to live with fuzzy and shifting cutoffs (which is
allowed, but I expect will be as much a minority position as my position
that I'd rather have false positives than false negatives), the all-default
scheme may work just as well.

> And since getting whacked by Tim for lack of intellectual rigor and
> laziness is familiar, rousing, but not all that much fun <grin>, I think
> I'll go back to doing systems engineering and just _use_ the results.

I think you're confusing me with the geometric mean of "the MREC group" <0.9
wink>, Sean.  I simply challenged you to use a scheme before strongly
dissing it.

>> A UI that makes most sense under a middle-ground scheme would
>> shuffle "I'm pretty sure it's spam" into a Spam folder, and the
>> ones it knows it's confused about into an Unsure folder.  The
>> rest ("I'm pretty sure it's ham") would be left in your inbox.
>> We can't really do that with the default combining scheme because
>> about half your email would end up in the Unsure folder -- the
>> separation it makes between populations is too fuzzy.

> Depends on whether one wants the machine making the decision, or wants
> help making the decision oneself.

I only want help, but shuffling msgs off to Spam and Unsure folders for
later review is exactly the help I want.  For example, I don't want to be
bothered *at all* with probable spam until I put my brain in "spam mode" and
go on a mass-delete spree dedicated to reviewing probable spam.  Until then,
I don't want it in my inbox at all.

> The ability to label a message, and then sort using the label is really
> handy if you spend your time classifying mail.

With respect to labels specifically measuring spamness, I expect we're
destined never to agree on this.  I found sorting Outlook displays by Hammie
score (yes, I've tried rescoring under all schemes) to be intellectually
interesting, but not the way I'd want to work in real life.  Spam vs
non-spam is a boring decision I want to spend as little time on as possible;
personal email vs work email is an example of an interesting decision.

> ...
> I'm off to rescore all the mail in my Outlook folders, which
> takes about an hour on an Athlon 2000 XP.

This is something else to look into:  scoring thru the Outlook wrappers is
*much* slower than scoring msg-per-plain-text-file (which is what I do
during large tests, which routinely score over 100,000 times).  I score
about 80 msgs per second the latter way.  When scoring about 500 Outlook
Inbox msgs, I take a much-needed bathroom break <wink>.