[Spambayes] loosen up address_headers option?

Tim Peters tim.one at comcast.net
Tue Jan 14 14:34:32 EST 2003


[Skip Montanaro]
> The tokenizer's address_headers option only examines "from".  The code has
> this comment:
>
> ...
>
> As spambayes moves out of the experimental stage, perhaps it's
> worth looking at adding to and cc (and maybe reply-to and sender) to the
> default list of analyzed headers.

They remain killer-strong clues for bad reasons when training on
mixed-source corpora, so caution is still in order.

In the Outlook client, life is so constrained (meaning mixed-source corpora
are darned hard to get at there) that the Outlook client's default has been:

    [Tokenizer]
    address_headers: from to cc sender reply-to

for a long time.  This works fine in practice, except when python.org has to
turn off Spamassassin and lots of spam leaks thru.  Then it piles up lots of
"this came from python.org, so it's probably not spam" tokens, which
increases the incidence of FN and (especially) spam rating Unsure.




More information about the Spambayes mailing list