    Jan> My original question was whether mixed case should be penalized:

It's easy enough to tweak the spambayes tokenizer to generate a synthetic
token for unusually capitalized words.  Then, you don't assign a penalty to
it, but let the classifier decide if it is a hammy or spammy (or neither)

A weird idea just crossed my mind.  Has anyone ever tested the performance
of the system using only synthetic tokens, no real content?


