[Tutor] Mailbox.UnixMailbox vs Mailbox.PortableUnixMailbox
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Mon Mar 17 13:50:03 2003
> PS: Just curious about line 28, why you chose UnixMailbox over
> PortableUnixMailbox.
Hi Erik,
There's a reason: PortableUnixMailbox was much too permissive in what it
considered to be "boundaries" between emails in a mail file, and broke
when I tried feeding it the total wisdom of the Tutor mailing archive.
A mailbox file consists of all the emails, concatenated to each other end
by end. How does the system distinguish where one message begins and
another ends? One way is to look for anything that begins with,
From: ...
and treat that as the start of a new message. This is probably the
strategy that UnixMailbox takes (although I think it does a few more
checks to see that it's really seeing the start of an email header).
PortableUnixMailbox is a little looser: it looks for anything like
From ...
At first, I tried PortableUnixMailbox because it sounded, well, more
portable. *grin* But I ran into severe problems because people like to
use the word "From" in their own emails, so that, in a pathological case
where a line began with the sentence "From...", PortableMailbox wasn't
able to reliably distinguish between emails! A concrete example of this
was on line 8928 of the tutor archive file
(http://mail.python.org/pipermail/tutor.mbox/tutor.mbox):
"""
8927: Great, isn't it?
8928: From here on, you can continue endlessly and keep it fun
8929: to the audience.
"""
PortableUnixMailbox broke when it saw that "From here on" line, and
thought that it was the start of a new mail message. The documentation on
the 'mailbox' module does mention this,
http://www.python.org/doc/lib/module-mailbox.html
so at least I was warned.
Hmmm... now that you mention it, I should definitely comment why I'm using
UnixMailbox rather than PortableUnixMailbox in the code. *grin*
Thanks for the question!