[Spambayes] All Cap or Cap Word Subjects

Tim Peters tim.one@comcast.net
Sun, 08 Sep 2002 16:56:59 -0400


[Brad Clements]
> Just curious if subject line capitalization can be used as an indicator.
>
> Either the percentage of characters that are caps..
>
> Or, percentage starting with a capital letter (if number of words > xx)

Supply a mod to tokenizer.py and I'll test it (eventually <wink>).  Note
that the tokenizer already *preserves* case in subject-line words, because
experiment showed that this was better than folding case away in this
specific context (but experiment also showed-- against my
expectations --that preserving case everywhere didn't make a significant
difference to either error rate -- the subject line is a special case for
this).