[spambayes-dev] RE: [Spambayes] multiple languages

Meyer, Tony T.A.Meyer at massey.ac.nz
Thu Jun 5 17:16:52 EDT 2003


[Moving to spambayes-dev since it seems more in place there]

> Nope!  It's a statistical classifier with no semantic 
> knowledge, and you already explained the consequences of 
> that.

It would be interested (IMO) to see what sort of effect doing a token-by-token translation (into a common language, whichever one) would have (i.e. give it a bit of semantic knowledge).  Those tokens that weren't in the translation dictionary could be left alone (which would include the garbage that is often added).

This would then (in theory) mean that if I get email offering me "pornografía", Spambayes would give it the same score as one offering me "pornography".  Hopefully the email itself would say that it was in Spanish, but otherwise a range of dictionaries could be consulted.

It would be easy enough to implement (I presume) - either using some (f/oss) translation tool or even babelfish or translate.google.com.

OTOH, Francois's message indicated that he gets good results without all this bother, so maybe it isn't worth it...

Thoughts?

=Tony Meyer



More information about the spambayes-dev mailing list