[Spambayes] multiple languages

Francois Granger francois.granger at free.fr
Thu May 29 18:34:39 EDT 2003


(Sorry, mis handling of the To: field of this liste !)


At 09:49 -0500 on 29/05/2003, in message Re: [Spambayes] multiple 
languages, Skip Montanaro wrote:
>     Alex> 99% of my ham is in Swedish.
>     Alex> 99% of my spam is in English.
>
>     Alex> Because of this I get quite a number of false negatives written in
>     Alex> Swedish and false positives written in english.
>
>I believe it's been discussed a bit here, though not recently.

I raised the issue here long time ago and did not got a really good 
answer from Tim.

>I'm not sure there's an easy way out of this.  If you've saved all your
>training messages you can try deleting a bunch (maybe 75%) of the Swedish
>ham and English spam from your database and retrain on the remaining
>messages.  Then starting from that point, only train on the mistakes
>(messages which are completely misclassified or wind up marked "unsure").
>This probably won't improve things immediately, but it should make it easier
>for Swedish spam or English ham to begin to tip the scales.

I am french. I get a similar problem as stated here. I get some 
occasional spanish and portuguese spam in addition. I am using the 
Pop3proxy version.
I have been using various versions of SpamBayes since Sept 2002.

My current database was created on 1 Feb 2003. I trained on some 
(100) messages to start with, then trained mostly on unsure and mis 
classified. I kept an eye to the balance of ham/spam as well as 
trying to put some english ham in the training set when I trained on 
english unsure as spam. and the same for the other combination. I 
have now trained on Spam: 639  Ham: 486.

The success rate is astonishing since a long time. I get only few 
unsure and no mis calssified messages in either language.



-- 
Hofstadter's Law :
It always takes longer than you expect, even when you take into 
account Hofstadter's Law.



More information about the Spambayes mailing list