[Spambayes] error with sb_imapfilter.py alpha 7

Mikhail Yakoubov qub at qub.com
Tue Feb 3 11:26:17 EST 2004

Christopher Messina wrote:
Disclaimer: I'm not a Spambayes developer, just a user like Christopher.
And, before anything else, I'd like to thank anyone who worked and works
on Spambayes -- with 300-400 spam messages received daily, and then
completely filtered out by the software without a single false positive
in late _months_, I really appreciate the work done.

> Hello.  I've been using the imap filter succesfully for a while, but I hit
> a wall today.
> Apparently a message that I've trained as either spam or ham is causing a
> problem.  

No. From my experience, it's a classification, not training, problem.
There is a malformed message in your Inbox, declaired as multipart in
its headers but without mutipart boudaries in the body. You have to find
and weed it out.

If you get only few messages in your inbox, you should view them one by
one, looking for a multipart header and no boundaries (or just with an
opening and no closing boundary) in the corresponding body. If you get
many messages, like I do, you can employ dividing-by-half strategy,
temporarily moving a half of your inbox messages to a temp folder, then
running classifier, then iterating. I stick to preserve (i. e., set them
manually) "unread" flags on messages, though not sure if it's really
neccessary for Spambayes to consider the inbox letters being yet
unclassified. Or, you can try to check if you've got a message with subj
"acts quicker and lasts much longer!" -- this was my malformed message
of the day today.

To dev team: recent days I ran into this kind of error every day, so
it's became a problem. Dividing-by-half solution is realy time-consuming
for large IMAP inboxes like mine. The solution seems to be to trap the
underlying python mime library exception, and then treat the message as
a non-mutipart one in spite of the multipart header. The effort-saving
temp workaround would be to include a clue (subj, date or Message-ID) on
what message has caused the error in the exception output, so a user can
quickly locate and delete it herself.

Hope this helps,
Mike Yakoubov.

> Here is the session.
> [13 waynes cmessina]: python2.3 /usr/bin/sb_imapfilter.py -c -e y -D hammie.db
> SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
> using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
> and engine SpamBayes Beta2, version 0.2 (July 2003).
> Traceback (most recent call last):
>   File "/usr/bin/sb_imapfilter.py", line 825, in ?
>     run()
>   File "/usr/bin/sb_imapfilter.py", line 815, in run
>     imap_filter.Filter()
>   File "/usr/bin/sb_imapfilter.py", line 675, in Filter
>     self.unsure_folder)
>   File "/usr/bin/sb_imapfilter.py", line 590, in Filter
>     for msg in self:
>   File "/usr/bin/sb_imapfilter.py", line 485, in __iter__
>     yield self[key]
>   File "/usr/bin/sb_imapfilter.py", line 533, in __getitem__
>     msg.get_substance()
>   File "/usr/bin/sb_imapfilter.py", line 364, in get_substance
>     new_msg = email.Parser.Parser().parsestr(data["RFC822"])
>   File "/var/tmp/python2.3-2.3.2-root/usr/lib/python2.3/email/Parser.py", line 75, in parsestr
>   File "/var/tmp/python2.3-2.3.2-root/usr/lib/python2.3/email/Parser.py", line 64, in parse
>   File "/var/tmp/python2.3-2.3.2-root/usr/lib/python2.3/email/Parser.py", line 245, in _parsebody
> email.Errors.BoundaryError: multipart message with no defined boundary
> I train the filter with the following command:
> python2.3 /usr/bin/sb_imapfilter.py -t -D hammie.db
> Is there some way I can track down the offending message?
> Please let me know if I can provide any more information.
> --
> Chris
> _______________________________________________
> Spambayes at python.org
> http://mail.python.org/mailman/listinfo/spambayes
> Check the FAQ before asking: http://spambayes.sf.net/faq.html

More information about the Spambayes mailing list