[Spambayes] Proposing to remove 4 combining schemes

Sean True seant@iname.com
Thu Oct 17 21:02:57 2002

> [Sean True]
> > I hate to try to speak for Joe User (like speaking for the "common man",
> > always a red flag), but I _am_ just a user of these scoring schemes.
> > I have several hundred messages (commercial email) tucked away in a
> > folder that score in the non-chi scheme in the range .4 to .6. That
> > score appears to reflect my own real uncertainty about the value of
> > Motley Fool newsletters.   No snickering, please. A system like chi-
> > looks like a very good choice for black and white, upstream discards
> > offers to increase body part size.
> Try rescoring your msgs using chi-combining before simply
> guessing that it's
> inappropriate for you.  You don't have to retrain your database, you just
> have to change the spamprob() used for scoring.
> In my email so far, chi-combining is correctly and extremely certain that
> most of my commerical ham is in fact ham.  The worst it does is on the one
> HTML newsletter I have from Strong Investments so far, of which
> there are no
> examples in my training data.  It gets a score of about 0.6, very
> solidly in
> its "middle ground".

I changed to chi-scoring. That was easy.
I rescored the /News filter, which is not used for any training whatsover.
The spread of scores is now 0 to .99, as advertised. The scores for the
financial newsletters are now much higher, which does not meet my
qualitative assessment. If I'm not sure it's spam, I'd prefer a score
that matched that.

All in all, for exposed scoring, I still prefer the old scores,
but not enough to keep complaining about. And since getting whacked
by Tim for lack of intellectual rigor and laziness is familiar, rousing,
but not all that much fun <grin>, I think I'll go back to doing systems
engineering and just _use_ the results.

> A UI that makes most sense under a middle-ground scheme would shuffle "I'm
> pretty sure it's spam" into a Spam folder, and the ones it knows it's
> confused about into an Unsure folder.  The rest ("I'm pretty sure
> it's ham")
> would be left in your inbox.  We can't really do that with the default
> combining scheme because about half your email would end up in the Unsure
> folder -- the separation it makes between populations is too fuzzy.

Depends on whether one wants the machine making the decision, or wants
help making the decision oneself. The ability to label a message, and
then sort using the label is really handy if you spend your time
classifying mail.

>(yet), because there are *many* customers here, and the union of what they
> want can't be had with a single scheme.  Bat a fan of a particular scheme
> gets a lot more credibility after they've tried the alternatives and given
> them thought based on actual experience.

Whack, whack, whack. <vlg>

Always a pleasure, Mr. Peters.

I'm off to rescore all the mail in my Outlook folders, which takes about an
hour on an
Athlon 2000 XP.

-- Sean