[spambayes-dev] bug in imap filter or in email package
Tony Meyer
tameyer at ihug.co.nz
Tue Aug 3 02:07:38 CEST 2004
> I noticed that I had way too many Unsures so I did some investigating.
> One message I looked at carefully was a pure HTML message (i.e. not a
> multipart/alternative) which was encoded with base64. Ordinarily
> Spambayes should decode that and tokenize the decoded message.
[...]
> My Python is almost fully up-to-date, the email package is completely
> up-to-date (my last cvs update was after the last change to the email
> component).
This sounds a lot like the bug with the email package that Neil Schemenauer
brought up here very recently. He said that he'd brought it up with Barry,
but not submitted a bug report. I'm not sure if he has yet, or not (and I
haven't had a chance to look at it more), but if not, then it would probably
be worth you doing this, so that Barry doesn't forget about it (and maybe it
could squeeze into Python 2.4a2, if it's a really simple fix and Barry isn't
too busy).
> I went through the steps of what sb_imapfilter.py does by hand and I
> noticed a few things:
>
> Message.asTokens is defined as follows:
> ~ def asTokens(self):
> ~ return tokenize(self.as_string())
> and tokenize (which is really Tokenizer.tokenize does this:
> ~ def tokenize(self, obj):
> ~ msg = self.get_message(obj)
> [...]
> and finally, self.get_message (which is really get_message in
> tokenizer.py) creates a Message instance of the argument string.
>
> I have the feeling that this can be made more efficient by having
> ~ def asTokens(self):
> ~ return tokenize(self)
> instead. get_message just returns its argument if it is a Message
> instance (which self in Message.asTokens is).
+1 to checking this in.
=Tony Meyer
More information about the spambayes-dev
mailing list