[Spambayes] New option: summarize_email_prefixes
Skip Montanaro
skip at pobox.com
Tue Dec 10 23:01:18 EST 2002
I just checked in code for a new option: summarize_email_prefixes. It tries
to take advantage of clumps of related email addresses in a single message,
e.g.:
To: <itinerart@videotron.ca>
Cc: <itinerant@skyful.com>, <itinerant@netillusions.net>,
<itineraries@musi-cal.com>, <itinerario@rullet.leidenuniv.nl>,
<itinerance@sorengo.com>
It's not a big win, but "pfxlen:big" is a very strong spam indicator. It
might help on small messages without many other clues. I'd like others to
give it a try and post their results.
The code is pretty straightforward, so I won't go into more detail. Just
gaze at tokenizer.py for a few seconds.
Skip
More information about the Spambayes
mailing list