[Spambayes] Eliminating duplicates from mbox file
skip at pobox.com
Fri Mar 7 22:28:22 EST 2003
>> 2. Why did I have to subclass mailbox.PortableUnixMailbox?
Tim> You shouldn't have to, and you shouldn't have to check for "msg is
Tim> None" either. Note that some of the earliest scripts in the
Tim> codebase don't do either. For example, from split.py:
mbox = mailbox.PortableUnixMailbox(infp, mboxutils.get_message)
for msg in mbox:
if random.random() < percent:
outfp = bin1out
Yeah, I know. That's how I originally wrote it. Without the test against
None it just went into an infloop.
>> 3. Is there a better way to emit the unique messages that doesn't
>> require me to manually escape leading "From " sequences?
Tim> Looks to me like the email pkg (at least the one in Python CVS)
Tim> already does the ">From" bit within msg bodies.
I figured it must have. Must be something other than the .as_string()
method though. It clearly doesn't escape "\nFrom " as "\n>From ".
Tim> The *leading* "From " isn't supposed to be escaped --
Tim> "From " at the start of a line within a body is supposed to be
Tim> escaped precisely so that an unescaped "From " at the start of a
Tim> line is recognized as the start of a new msg.
I guess I was really asking if there's something better than .as_string() to
call when I want to emit a message. I don't see anything obvious in the
online docs though.
More information about the Spambayes