[Python-checkins] python/nondist/sandbox/mailbox libmailbox.tex, 1.12, 1.13 mailbox.py, 1.12, 1.13
gregorykjohnson@users.sourceforge.net
gregorykjohnson at users.sourceforge.net
Sun Aug 21 21:02:58 CEST 2005
- Previous message: [Python-checkins] python/dist/src/Doc/lib lib.tex, 1.239, 1.240 libhmac.tex, 1.1, 1.2 libmd5.tex, 1.21, 1.22 libsha.tex, 1.12, 1.13
- Next message: [Python-checkins] python/nondist/sandbox/mailbox mailbox.py, 1.13, 1.14 test_mailbox.py, 1.8, 1.9
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Update of /cvsroot/python/python/nondist/sandbox/mailbox
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3406
Modified Files:
libmailbox.tex mailbox.py
Log Message:
* Add mailbox classes and add next() method to Maildir, for backward
compatibility.
* Document backward compatibility.
* Add examples to documentation.
* Remove unncessary len() check before string slicing (as pointed out by
A.M. Kuchling).
Index: libmailbox.tex
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/mailbox/libmailbox.tex,v
retrieving revision 1.12
retrieving revision 1.13
diff -u -d -r1.12 -r1.13
--- libmailbox.tex 17 Aug 2005 20:32:36 -0000 1.12
+++ libmailbox.tex 21 Aug 2005 19:02:47 -0000 1.13
@@ -14,27 +14,6 @@
class with format-specific state and behavior. Supported mailbox formats are
Maildir, mbox, MH, Babyl, and MMDF.
-An example of using the module to sort mail:
-
-\begin{verbatim}
->>> import mailbox
->>> inbox = mailbox.Maildir('~/Maildir', None)
->>> python_box = mailbox.Maildir('~/email/python-list', None)
->>> len(inbox) # Number of messages.
-13
->>> len(python_box)
-818
->>> for key, message in inbox.iteritems():
-... if 'python-list' in message['list-id']:
-... python_box.add(message) # Add the message to python_box
-... del inbox[key] # and remove it from inbox.
-...
->>> len(inbox)
-2
->>> len(python_box)
-829
-\end{verbatim}
-
\begin{seealso}
\seemodule{email}{Represent and manipulate messages.}
\end{seealso}
@@ -1185,8 +1164,192 @@
\lineii{A flag}{A flag}
\end{tableii}
-\subsection{Deprecated classes}
-\label{mailbox-deprecated-classes}
+\subsection{Deprecated classes and methods}
+\label{mailbox-deprecated}
+
+Older versions of the \module{mailbox} module do not support modification of
+mailboxes, such as adding or removing message, and do not provide classes to
+represent format-specific message properties. For backward compatibility, the
+older mailbox classes are still available, but the newer classes should be used
+in preference to them.
+
+Older mailbox objects support only iteration and provide a single public
+method:
+
+\begin{methoddesc}{next}{}
+Return the next message in the mailbox, created with the optional \var{factory}
+argument passed into the mailbox object's constructor. By default this is an
+\class{rfc822.Message} object (see the \refmodule{rfc822} module). Depending
+on the mailbox implementation the \var{fp} attribute of this object may be a
+true file object or a class instance simulating a file object, taking care of
+things like message boundaries if multiple mail messages are contained in a
+single file, etc. If no more messages are available, this method returns
+\code{None}.
+\end{methoddesc}
+
+Most of the older mailbox classes have names that differ from the current
+mailbox class names, except for \class{Maildir}. For this reason, the new
+\class{Maildir} class defines a \method{next()} method and its constructor
+differs slightly from those of the other new mailbox classes.
+
+The older mailbox classes whose names are not the same as their newer
+counterparts are as follows:
+
+\begin{classdesc}{UnixMailbox}{fp\optional{, factory}}
+Access to a classic \UNIX-style mailbox, where all messages are
+contained in a single file and separated by \samp{From }
+(a.k.a.\ \samp{From_}) lines. The file object \var{fp} points to the
+mailbox file. The optional \var{factory} parameter is a callable that
+should create new message objects. \var{factory} is called with one
+argument, \var{fp} by the \method{next()} method of the mailbox
+object. The default is the \class{rfc822.Message} class (see the
+\refmodule{rfc822} module -- and the note below).
+
+\begin{notice}
+ For reasons of this module's internal implementation, you will
+ probably want to open the \var{fp} object in binary mode. This is
+ especially important on Windows.
+\end{notice}
+
+For maximum portability, messages in a \UNIX-style mailbox are
+separated by any line that begins exactly with the string \code{'From
+'} (note the trailing space) if preceded by exactly two newlines.
+Because of the wide-range of variations in practice, nothing else on
+the From_ line should be considered. However, the current
+implementation doesn't check for the leading two newlines. This is
+usually fine for most applications.
+
+The \class{UnixMailbox} class implements a more strict version of
+From_ line checking, using a regular expression that usually correctly
+matched From_ delimiters. It considers delimiter line to be separated
+by \samp{From \var{name} \var{time}} lines. For maximum portability,
+use the \class{PortableUnixMailbox} class instead. This class is
+identical to \class{UnixMailbox} except that individual messages are
+separated by only \samp{From } lines.
+
+For more information, see
+\citetitle[http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html]{Configuring
+Netscape Mail on \UNIX: Why the Content-Length Format is Bad}.
+\end{classdesc}
+
+\begin{classdesc}{PortableUnixMailbox}{fp\optional{, factory}}
+A less-strict version of \class{UnixMailbox}, which considers only the
+\samp{From } at the beginning of the line separating messages. The
+``\var{name} \var{time}'' portion of the From line is ignored, to
+protect against some variations that are observed in practice. This
+works since lines in the message which begin with \code{'From '} are
+quoted by mail handling software at delivery-time.
+\end{classdesc}
+
+\begin{classdesc}{MmdfMailbox}{fp\optional{, factory}}
+Access an MMDF-style mailbox, where all messages are contained
+in a single file and separated by lines consisting of 4 control-A
+characters. The file object \var{fp} points to the mailbox file.
+Optional \var{factory} is as with the \class{UnixMailbox} class.
+\end{classdesc}
+
+\begin{classdesc}{MHMailbox}{dirname\optional{, factory}}
+Access an MH mailbox, a directory with each message in a separate
+file with a numeric name.
+The name of the mailbox directory is passed in \var{dirname}.
+\var{factory} is as with the \class{UnixMailbox} class.
+\end{classdesc}
+
+\begin{classdesc}{BabylMailbox}{fp\optional{, factory}}
+Access a Babyl mailbox, which is similar to an MMDF mailbox. In
+Babyl format, each message has two sets of headers, the
+\emph{original} headers and the \emph{visible} headers. The original
+headers appear before a line containing only \code{'*** EOOH ***'}
+(End-Of-Original-Headers) and the visible headers appear after the
+\code{EOOH} line. Babyl-compliant mail readers will show you only the
+visible headers, and \class{BabylMailbox} objects will return messages
+containing only the visible headers. You'll have to do your own
+parsing of the mailbox file to get at the original headers. Mail
+messages start with the EOOH line and end with a line containing only
+\code{'\e{}037\e{}014'}. \var{factory} is as with the
+\class{UnixMailbox} class.
+\end{classdesc}
+
+If you wish to use the older mailbox classes with the \module{email} module
+rather than the deprecated \module{rfc822} module, you can do so as follows:
+
+\begin{verbatim}
+import email
+import email.Errors
+import mailbox
+
+def msgfactory(fp):
+ try:
+ return email.message_from_file(fp)
+ except email.Errors.MessageParseError:
+ # Don't return None since that will
+ # stop the mailbox iterator
+ return ''
+
+mbox = mailbox.UnixMailbox(fp, msgfactory)
+\end{verbatim}
+
+Alternatively, if you know your mailbox contains only well-formed MIME
+messages, you can simplify this to:
+
+\begin{verbatim}
+import email
+import mailbox
+
+mbox = mailbox.UnixMailbox(fp, email.message_from_file)
+\end{verbatim}
\subsection{Examples}
\label{mailbox-examples}
+
+A simple example of printing the subjects of all messages in a mailbox that
+seem interesting:
+
+\begin{verbatim}
+import mailbox
+for message in mailbox.mbox('~/mbox'):
+ subject = message['subject'] # Could possibly be None.
+ if subject and 'python' in subject.lower():
+ print subject
+\end{verbatim}
+
+A (surprisingly) simple example of copying all mail from a Babyl mailbox to an
+MH mailbox, converting all of the format-specific information that can be
+converted:
+
+\begin{verbatim}
+import mailbox
+destination = mailbox.MH('~/Mail')
+for message in mailbox.Babyl('~/RMAIL'):
+ destination.add(MHMessage(message))
+\end{verbatim}
+
+An example of sorting mail from numerous mailing lists, being careful to avoid
+mail corruption due to concurrent modification by other programs, mail loss due
+to interruption of the program, or premature termination due to malformed
+messages in the mailbox:
+
+\begin{verbatim}
+import mailbox
+import email.Errors
+list_names = ('python-list', 'python-dev', 'python-bugs')
+boxes = dict((name, mailbox.mbox('~/email/%s' % name)) for name in list_names)
+inbox = mailbox.Maildir('~/Maildir', None)
+for key in inbox.iterkeys():
+ try:
+ message = inbox[key]
+ except email.Errors.MessageParseError:
+ continue # The message is malformed. Just leave it.
+ for name in list_names:
+ list_id = message['list-id']
+ if list_id and name in list_id:
+ box = boxes[name]
+ box.lock()
+ box.add(message)
+ box.flush() # Write copy to disk before removing original.
+ box.unlock()
+ inbox.discard(key)
+ break # Found destination, so stop looking.
+for box in boxes.itervalues():
+ box.close()
+\end{verbatim}
Index: mailbox.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/mailbox/mailbox.py,v
retrieving revision 1.12
retrieving revision 1.13
diff -u -d -r1.12 -r1.13
--- mailbox.py 19 Aug 2005 19:25:55 -0000 1.12
+++ mailbox.py 21 Aug 2005 19:02:47 -0000 1.13
@@ -21,7 +21,8 @@
__all__ = [ 'Mailbox', 'Maildir', 'mbox', 'MH', 'Babyl', 'MMDF',
'Message', 'MaildirMessage', 'mboxMessage', 'MHMessage',
- 'BabylMessage', 'MMDFMessage' ]
+ 'BabylMessage', 'MMDFMessage', 'UnixMailbox',
+ 'PortableUnixMailbox', 'MmdfMailbox', 'MHMailbox', 'BabylMailbox' ]
class Mailbox:
@@ -447,6 +448,16 @@
except KeyError:
raise KeyError('No message with key: %s' % key)
+ # This method is for backward compatibility only.
+ def next(self):
+ """Return the next message in a one-time iteration."""
+ if not hasattr(self, '_onetime_iterator'):
+ self._onetime_iterator = self.itervalues()
+ try:
+ return self._onetime_iterator.next()
+ except StopIteration:
+ return None
+
class _singlefileMailbox(Mailbox):
"""A single-file mailbox."""
@@ -1310,7 +1321,7 @@
def get_flags(self):
"""Return as a string the flags that are set."""
- if len(self._info) > 2 and self._info.startswith('2,'):
+ if self._info.startswith('2,'):
return self._info[2:]
else:
return ''
@@ -1843,6 +1854,184 @@
socket.gethostname(),
os.getpid()))
+
+## Start: classes from the original module (for backward compatibility).
+
+# Note that the Maildir class, whose name is unchanged, itself offers a next()
+# method for backward compatibility.
+
+class _Mailbox:
+
+ def __init__(self, fp, factory=rfc822.Message):
+ self.fp = fp
+ self.seekp = 0
+ self.factory = factory
+
+ def __iter__(self):
+ return iter(self.next, None)
+
+ def next(self):
+ while 1:
+ self.fp.seek(self.seekp)
+ try:
+ self._search_start()
+ except EOFError:
+ self.seekp = self.fp.tell()
+ return None
+ start = self.fp.tell()
+ self._search_end()
+ self.seekp = stop = self.fp.tell()
+ if start != stop:
+ break
+ return self.factory(_PartialFile(self.fp, start, stop))
+
+# Recommended to use PortableUnixMailbox instead!
+class UnixMailbox(_Mailbox):
+
+ def _search_start(self):
+ while 1:
+ pos = self.fp.tell()
+ line = self.fp.readline()
+ if not line:
+ raise EOFError
+ if line[:5] == 'From ' and self._isrealfromline(line):
+ self.fp.seek(pos)
+ return
+
+ def _search_end(self):
+ self.fp.readline() # Throw away header line
+ while 1:
+ pos = self.fp.tell()
+ line = self.fp.readline()
+ if not line:
+ return
+ if line[:5] == 'From ' and self._isrealfromline(line):
+ self.fp.seek(pos)
+ return
+
+ # An overridable mechanism to test for From-line-ness. You can either
+ # specify a different regular expression or define a whole new
+ # _isrealfromline() method. Note that this only gets called for lines
+ # starting with the 5 characters "From ".
+ #
+ # BAW: According to
+ #http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html
+ # the only portable, reliable way to find message delimiters in a BSD (i.e
+ # Unix mailbox) style folder is to search for "\n\nFrom .*\n", or at the
+ # beginning of the file, "^From .*\n". While _fromlinepattern below seems
+ # like a good idea, in practice, there are too many variations for more
+ # strict parsing of the line to be completely accurate.
+ #
+ # _strict_isrealfromline() is the old version which tries to do stricter
+ # parsing of the From_ line. _portable_isrealfromline() simply returns
+ # true, since it's never called if the line doesn't already start with
+ # "From ".
+ #
+ # This algorithm, and the way it interacts with _search_start() and
+ # _search_end() may not be completely correct, because it doesn't check
+ # that the two characters preceding "From " are \n\n or the beginning of
+ # the file. Fixing this would require a more extensive rewrite than is
+ # necessary. For convenience, we've added a PortableUnixMailbox class
+ # which uses the more lenient _fromlinepattern regular expression.
+
+ _fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
+ r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$"
+ _regexp = None
+
+ def _strict_isrealfromline(self, line):
+ if not self._regexp:
+ import re
+ self._regexp = re.compile(self._fromlinepattern)
+ return self._regexp.match(line)
+
+ def _portable_isrealfromline(self, line):
+ return True
+
+ _isrealfromline = _strict_isrealfromline
+
+
+class PortableUnixMailbox(UnixMailbox):
+ _isrealfromline = UnixMailbox._portable_isrealfromline
+
+
+class MmdfMailbox(_Mailbox):
+
+ def _search_start(self):
+ while 1:
+ line = self.fp.readline()
+ if not line:
+ raise EOFError
+ if line[:5] == '\001\001\001\001\n':
+ return
+
+ def _search_end(self):
+ while 1:
+ pos = self.fp.tell()
+ line = self.fp.readline()
+ if not line:
+ return
+ if line == '\001\001\001\001\n':
+ self.fp.seek(pos)
+ return
+
+
+class MHMailbox:
+
+ def __init__(self, dirname, factory=rfc822.Message):
+ import re
+ pat = re.compile('^[1-9][0-9]*$')
+ self.dirname = dirname
+ # the three following lines could be combined into:
+ # list = map(long, filter(pat.match, os.listdir(self.dirname)))
+ list = os.listdir(self.dirname)
+ list = filter(pat.match, list)
+ list = map(long, list)
+ list.sort()
+ # This only works in Python 1.6 or later;
+ # before that str() added 'L':
+ self.boxes = map(str, list)
+ self.boxes.reverse()
+ self.factory = factory
+
+ def __iter__(self):
+ return iter(self.next, None)
+
+ def next(self):
+ if not self.boxes:
+ return None
+ fn = self.boxes.pop()
+ fp = open(os.path.join(self.dirname, fn))
+ msg = self.factory(fp)
+ try:
+ msg._mh_msgno = fn
+ except (AttributeError, TypeError):
+ pass
+ return msg
+
+
+class BabylMailbox(_Mailbox):
+
+ def _search_start(self):
+ while 1:
+ line = self.fp.readline()
+ if not line:
+ raise EOFError
+ if line == '*** EOOH ***\n':
+ return
+
+ def _search_end(self):
+ while 1:
+ pos = self.fp.tell()
+ line = self.fp.readline()
+ if not line:
+ return
+ if line == '\037\014\n' or line == '\037':
+ self.fp.seek(pos)
+ return
+
+## End: classes from the original module (for backward compatibility).
+
+
class Error(Exception):
"""Raised for module-specific errors."""
- Previous message: [Python-checkins] python/dist/src/Doc/lib lib.tex, 1.239, 1.240 libhmac.tex, 1.1, 1.2 libmd5.tex, 1.21, 1.22 libsha.tex, 1.12, 1.13
- Next message: [Python-checkins] python/nondist/sandbox/mailbox mailbox.py, 1.13, 1.14 test_mailbox.py, 1.8, 1.9
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the Python-checkins
mailing list