[Spambayes] Eliminating duplicates from mbox file

Skip Montanaro skip at pobox.com
Fri Mar 7 22:28:22 EST 2003


    >> 2. Why did I have to subclass mailbox.PortableUnixMailbox?

    Tim> You shouldn't have to, and you shouldn't have to check for "msg is
    Tim> None" either.  Note that some of the earliest scripts in the
    Tim> codebase don't do either.  For example, from split.py:

        mbox = mailbox.PortableUnixMailbox(infp, mboxutils.get_message)
        for msg in mbox:
            if random.random() < percent:
                outfp = bin1out
        ...

Yeah, I know.  That's how I originally wrote it.  Without the test against
None it just went into an infloop.

    >> 3. Is there a better way to emit the unique messages that doesn't
    >> require me to manually escape leading "From " sequences?

    Tim> Looks to me like the email pkg (at least the one in Python CVS)
    Tim> already does the ">From" bit within msg bodies.  

I figured it must have.  Must be something other than the .as_string()
method though.  It clearly doesn't escape "\nFrom " as "\n>From ".

    Tim> The *leading* "From " isn't supposed to be escaped --

Correct.

    Tim> "From " at the start of a line within a body is supposed to be
    Tim> escaped precisely so that an unescaped "From " at the start of a
    Tim> line is recognized as the start of a new msg.

I guess I was really asking if there's something better than .as_string() to
call when I want to emit a message.  I don't see anything obvious in the
online docs though.

Skip



More information about the Spambayes mailing list