[Spambayes] mboxtrain.py chokes on bugtraq email messages

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Mon Apr 14 15:45:19 EDT 2003


4/14/2003 1:07:05 PM, "T. Alexander Popiel" <popiel at wolfskeep.com> wrote:

>In message:  <GCXV83TNRQ1THDWTGCOM61SMGDOM4X87.3e9af5c6 at myst>
>             <tim at fourstonesExpressions.com> writes:
>>
>>This is a multipart/digest message, and a known problem.  Keep an eye out 
for 
>>the fix checkin.  It'll get fixed one of these days.
>
>Here's a question: what is the proper behaviour for these messages?
>
>Should the entire message get a ham/spam score, should the individual
>sub-messages get their own scores, or both?  If both, how should the
>individual scores be combined into the overall score?  Should the digest
>be broken into multiple messages: one containing ham, one containing
>spam, and one containing unsure?

I've spent a bit of time thinking about this, and there really is no good 
answer that I can come up with.  Splitting the digest into three 
(ham/spam/unsure) digests makes the most sense, but there isn't much facility 
in the current email package to manage this, I don't think.  

>
>My initial impulse is to score each sub-message individually, and if
>any of them are ham, mark the entire thing as ham.  If none are ham,
>but some are unsure, mark the overall message as unsure.  Otherwise
>mark it as spam.  As to the debug clue headers, I have no idea how
>to handle them...

This might handle things, but doesn't make training work.  Again, spliting the 
digest makes the most sense to me.

>
>- Alex
>
>_______________________________________________
>Spambayes mailing list
>Spambayes at python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org

There are 10 kinds of people in the world:
  those who understand binary,
  and those who don't.





More information about the Spambayes mailing list