[Spambayes] RE: [Spambayes-checkins] spambayes tokenizer.py,1.40,1.41

Neil Schemenauer nas@python.ca
Sat, 28 Sep 2002 11:15:37 -0700


Tim Peters wrote:
> Neil, is there a reason to make this an option?  That is, as opposed to just
> doing it all the time?  Like, could this screw up a mixed-source corpus
> somehow?

I don't think so.  In your ham collection there are probably certain
hosts that appear in the message IDs often.  The hosts for the spam
message IDs should be pretty random.  Either they are generated by the
spammer or by the open relay MTA.

  Neil