[Spambayes] Re: imapfilter progress

David Abrahams dave at boost-consulting.com
Fri Apr 25 15:27:27 EDT 2003


"Meyer, Tony" <T.A.Meyer at massey.ac.nz> writes:

>> > could you run imapfilter with the option "-i4"?
> [...]
>> ...so I set up some test "inbox, spam, unsure" folders and 
>> tried to run:
>> 
>>    python imapfilter.py -v -i 5 -t -c -D ~/bayes.db
>> 
>> Just so we'd have some data to look at.  I got a traceback 
>> while training on ham; the tail of the session is visible here:
>>   http://users.rcn.com/abrahams/imap/bugout.txt
>
> This is odd.  The message that it crashes on doesn't have any crlf's
> (even just a cr or lf) separating the headers.  The ones above it do,
> though.
>
> Does your mailer show this message correctly?  

Believe it or not, yes.  My mailer (Oort GNUs 0.18) also doesn't seem
to be having any problems with IMAP protocols.  Maybe youse guys
should be reading elisp code to see how to handle things ;-)

> If the mailer has an option to show the message (or even the
> headers) in it's original form, does it show it correctly?

The header part shows as one long string with no newlines, just as you
said.  I'm guessing (pure conjecture) that header parsing works this
way:  

1. search for a colon
2. search backwards through lower-case letters.
3. If you're looking at a capital letter, 
   this is a "best beginning of a header" and the colon is the end
   if the previous character is '-', back up one character and go to step 2.

The last "best beginning of a header" is the beginning of the header.

> I'd understand it more if it was imaplib or our imapfilter that was
> stripping the cr/lf chars, but from the trace, it looks like the raw
> text from the server doesn't have them - and the length of the message
> (with no cr/lf) matches the literal length (4278).
>
> I'm baffled.  It would be easy to catch this error, but I don't know
> what to do then - if the headers can't be parsed, then when we try and
> rewrite the message into the correct folder all the headers will be lost
> (again :( ).

Somehow, GNUs is getting it right.

> I could add in a test so that training continues and messages like these
> are just ignored, but I can't think of anything else to try (well,
> except for writing a script to try and insert line endings where there
> should be line endings).  Anyone else got any ideas?

Check out the emacs source, I guess...

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com




More information about the Spambayes mailing list