[Spambayes] Language distribution

Tony Meyer tameyer at ihug.co.nz
Thu Feb 12 20:07:11 EST 2004

> However I discovered a certain weakness. Most of the incoming 
> spam is in English, while a large portion of my ham is 
> German. So when I get English ham, it's often classified not 
> near 0 but at about 0.20, while German spam (which is 
> currently evolving) is often not recognized as such.

You could try tailoring your training.  Try roughly the same amount of
German ham as German spam, and roughly the same amount of English ham as
English spam.  Maybe even also the same amount of German mail as English
mail.  (Just a couple of hundred trained messages of ham & spam is often
enough to get excellent results, so you don't need vast amounts of each).

> I also have a similar ratio with HTML mails and non-HTML mails.

This shouldn't be making any difference - spambayes trims the vast majority
of HTML stuff.

> Spambayes also has some problems distinguishing real MDA 
> error messages from those MyDoom stuff with the typical 
> attachments.

Enough training should solve this - take a look at the clues for the
messages and see if you can pick where it's going wrong - that often gives a
clue as to how you can fix it.

