[Spambayes] RE Spam
Amedee Van Gasse
amedee at amedee.be
Wed May 24 14:01:52 CEST 2006
On Wed, May 24, 2006 12:48, skip at pobox.com said:
>
> Amedee> However, English is not my mother language and most of my
> Amedee> correspondence is in Dutch. As a consequence, most common
> Amedee> English words are quite uncommon for me. The result is that
> Amedee> common English words will score a bit above 0.5. Perhaps not
> Amedee> much, but enough to be significant after a while.
>
> Thanks, I didn't realize that. Do you have an example in your training
> database you can share with us (both message and word scores) where you
> think the English disclaimer text has tipped the scales and caused a ham
> message to later be scored as spam? If you simple train on one or two of
> those misclassified hams does the problem go away? How skewed is your
> training database (number of spams vs number of hams)? Have you
> considered
> throwing out your current training database and starting fresh?
>
> One thing that might help is to further break messages which score as spam
> into "low" and "high" spam. Based on my current settings that gives me
> these four categories:
>
> ham 0.00-0.14
> unsure 0.15-0.59
> low spam 0.60-0.74
> high spam 0.75-1.00
>
> High spam is tossed without further consideration. Ham is sorted in the
> appropriate mailbox by procmail. Unsure and low spam messages each wind
> up
> in their own mailboxes for further consideration. I train on most unsure
> messages but only train on lospams which are actually ham.
>
> My suspicion is that if you have ham messages which are erroneously
> winding
> up as spam they are at the very low end of the spam scale. It might be
> sufficient to move your spam threshold up a bit so they are more likely to
> land in the unsure category.
>
> Skip
Skip,
I think you have hit the mark there.
I already use something like your lospam/hiham.
I have 5 categories: high ham, low ham, unsure, low spam, high spam
The high ham/spam respectively go to procmail or /dev/null.
And indeed, the misclassified hams all wind up in unsure or low spam.
--
Amedee
More information about the SpamBayes
mailing list