[Spambayes] mboxtrain.py chokes on bugtraq email messages
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Mon Apr 14 15:45:19 EDT 2003
4/14/2003 1:07:05 PM, "T. Alexander Popiel" <popiel at wolfskeep.com> wrote:
>In message: <GCXV83TNRQ1THDWTGCOM61SMGDOM4X87.3e9af5c6 at myst>
> <tim at fourstonesExpressions.com> writes:
>>This is a multipart/digest message, and a known problem. Keep an eye out
>>the fix checkin. It'll get fixed one of these days.
>Here's a question: what is the proper behaviour for these messages?
>Should the entire message get a ham/spam score, should the individual
>sub-messages get their own scores, or both? If both, how should the
>individual scores be combined into the overall score? Should the digest
>be broken into multiple messages: one containing ham, one containing
>spam, and one containing unsure?
I've spent a bit of time thinking about this, and there really is no good
answer that I can come up with. Splitting the digest into three
(ham/spam/unsure) digests makes the most sense, but there isn't much facility
in the current email package to manage this, I don't think.
>My initial impulse is to score each sub-message individually, and if
>any of them are ham, mark the entire thing as ham. If none are ham,
>but some are unsure, mark the overall message as unsure. Otherwise
>mark it as spam. As to the debug clue headers, I have no idea how
>to handle them...
This might handle things, but doesn't make training work. Again, spliting the
digest makes the most sense to me.
>Spambayes mailing list
>Spambayes at python.org
c'est moi - TimS
There are 10 kinds of people in the world:
those who understand binary,
and those who don't.
More information about the Spambayes