[Spambayes] New option: summarize_email_prefixes

Skip Montanaro skip at pobox.com
Tue Dec 10 23:01:18 EST 2002


I just checked in code for a new option: summarize_email_prefixes.  It tries
to take advantage of clumps of related email addresses in a single message,
e.g.: 

   To: <itinerart@videotron.ca>
   Cc: <itinerant@skyful.com>, <itinerant@netillusions.net>,
       <itineraries@musi-cal.com>, <itinerario@rullet.leidenuniv.nl>,
       <itinerance@sorengo.com>

It's not a big win, but "pfxlen:big" is a very strong spam indicator.  It
might help on small messages without many other clues.  I'd like others to
give it a try and post their results.

The code is pretty straightforward, so I won't go into more detail.  Just
gaze at tokenizer.py for a few seconds.

Skip




More information about the Spambayes mailing list