[Python-checkins] python/nondist/sandbox/mailbox libmailbox.tex, 1.12, 1.13 mailbox.py, 1.12, 1.13

gregorykjohnson@users.sourceforge.net gregorykjohnson at users.sourceforge.net
Sun Aug 21 21:02:58 CEST 2005


Update of /cvsroot/python/python/nondist/sandbox/mailbox
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3406

Modified Files:
	libmailbox.tex mailbox.py 
Log Message:
* Add mailbox classes and add next() method to Maildir, for backward
  compatibility.
* Document backward compatibility.
* Add examples to documentation.
* Remove unncessary len() check before string slicing (as pointed out by
  A.M. Kuchling).


Index: libmailbox.tex
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/mailbox/libmailbox.tex,v
retrieving revision 1.12
retrieving revision 1.13
diff -u -d -r1.12 -r1.13
--- libmailbox.tex	17 Aug 2005 20:32:36 -0000	1.12
+++ libmailbox.tex	21 Aug 2005 19:02:47 -0000	1.13
@@ -14,27 +14,6 @@
 class with format-specific state and behavior. Supported mailbox formats are
 Maildir, mbox, MH, Babyl, and MMDF.
 
-An example of using the module to sort mail:
-
-\begin{verbatim}
->>> import mailbox
->>> inbox = mailbox.Maildir('~/Maildir', None)
->>> python_box = mailbox.Maildir('~/email/python-list', None)
->>> len(inbox)          # Number of messages.
-13
->>> len(python_box)
-818
->>> for key, message in inbox.iteritems():
-...     if 'python-list' in message['list-id']:
-...         python_box.add(message)         # Add the message to python_box
-...         del inbox[key]                  # and remove it from inbox.
-...
->>> len(inbox)
-2
->>> len(python_box)
-829
-\end{verbatim}
-
 \begin{seealso}
     \seemodule{email}{Represent and manipulate messages.}
 \end{seealso}
@@ -1185,8 +1164,192 @@
 \lineii{A flag}{A flag}
 \end{tableii}
 
-\subsection{Deprecated classes}
-\label{mailbox-deprecated-classes}
+\subsection{Deprecated classes and methods}
+\label{mailbox-deprecated}
+
+Older versions of the \module{mailbox} module do not support modification of
+mailboxes, such as adding or removing message, and do not provide classes to
+represent format-specific message properties. For backward compatibility, the
+older mailbox classes are still available, but the newer classes should be used
+in preference to them.
+
+Older mailbox objects support only iteration and provide a single public
+method:
+
+\begin{methoddesc}{next}{}
+Return the next message in the mailbox, created with the optional \var{factory}
+argument passed into the mailbox object's constructor. By default this is an
+\class{rfc822.Message} object (see the \refmodule{rfc822} module).  Depending
+on the mailbox implementation the \var{fp} attribute of this object may be a
+true file object or a class instance simulating a file object, taking care of
+things like message boundaries if multiple mail messages are contained in a
+single file, etc.  If no more messages are available, this method returns
+\code{None}.
+\end{methoddesc}
+
+Most of the older mailbox classes have names that differ from the current
+mailbox class names, except for \class{Maildir}. For this reason, the new
+\class{Maildir} class defines a \method{next()} method and its constructor
+differs slightly from those of the other new mailbox classes.
+
+The older mailbox classes whose names are not the same as their newer
+counterparts are as follows:
+
+\begin{classdesc}{UnixMailbox}{fp\optional{, factory}}
+Access to a classic \UNIX-style mailbox, where all messages are
+contained in a single file and separated by \samp{From }
+(a.k.a.\ \samp{From_}) lines.  The file object \var{fp} points to the
+mailbox file.  The optional \var{factory} parameter is a callable that
+should create new message objects.  \var{factory} is called with one
+argument, \var{fp} by the \method{next()} method of the mailbox
+object.  The default is the \class{rfc822.Message} class (see the
+\refmodule{rfc822} module -- and the note below).
+
+\begin{notice}
+  For reasons of this module's internal implementation, you will
+  probably want to open the \var{fp} object in binary mode.  This is
+  especially important on Windows.
+\end{notice}
+
+For maximum portability, messages in a \UNIX-style mailbox are
+separated by any line that begins exactly with the string \code{'From
+'} (note the trailing space) if preceded by exactly two newlines.
+Because of the wide-range of variations in practice, nothing else on
+the From_ line should be considered.  However, the current
+implementation doesn't check for the leading two newlines.  This is
+usually fine for most applications.
+
+The \class{UnixMailbox} class implements a more strict version of
+From_ line checking, using a regular expression that usually correctly
+matched From_ delimiters.  It considers delimiter line to be separated
+by \samp{From \var{name} \var{time}} lines.  For maximum portability,
+use the \class{PortableUnixMailbox} class instead.  This class is
+identical to \class{UnixMailbox} except that individual messages are
+separated by only \samp{From } lines.
+
+For more information, see
+\citetitle[http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html]{Configuring
+Netscape Mail on \UNIX: Why the Content-Length Format is Bad}.
+\end{classdesc}
+
+\begin{classdesc}{PortableUnixMailbox}{fp\optional{, factory}}
+A less-strict version of \class{UnixMailbox}, which considers only the
+\samp{From } at the beginning of the line separating messages.  The
+``\var{name} \var{time}'' portion of the From line is ignored, to
+protect against some variations that are observed in practice.  This
+works since lines in the message which begin with \code{'From '} are
+quoted by mail handling software at delivery-time.
+\end{classdesc}
+
+\begin{classdesc}{MmdfMailbox}{fp\optional{, factory}}
+Access an MMDF-style mailbox, where all messages are contained
+in a single file and separated by lines consisting of 4 control-A
+characters.  The file object \var{fp} points to the mailbox file.
+Optional \var{factory} is as with the \class{UnixMailbox} class.
+\end{classdesc}
+
+\begin{classdesc}{MHMailbox}{dirname\optional{, factory}}
+Access an MH mailbox, a directory with each message in a separate
+file with a numeric name.
+The name of the mailbox directory is passed in \var{dirname}.
+\var{factory} is as with the \class{UnixMailbox} class.
+\end{classdesc}
+
+\begin{classdesc}{BabylMailbox}{fp\optional{, factory}}
+Access a Babyl mailbox, which is similar to an MMDF mailbox.  In
+Babyl format, each message has two sets of headers, the
+\emph{original} headers and the \emph{visible} headers.  The original
+headers appear before a line containing only \code{'*** EOOH ***'}
+(End-Of-Original-Headers) and the visible headers appear after the
+\code{EOOH} line.  Babyl-compliant mail readers will show you only the
+visible headers, and \class{BabylMailbox} objects will return messages
+containing only the visible headers.  You'll have to do your own
+parsing of the mailbox file to get at the original headers.  Mail
+messages start with the EOOH line and end with a line containing only
+\code{'\e{}037\e{}014'}.  \var{factory} is as with the
+\class{UnixMailbox} class.
+\end{classdesc}
+
+If you wish to use the older mailbox classes with the \module{email} module
+rather than the deprecated \module{rfc822} module, you can do so as follows:
+
+\begin{verbatim}
+import email
+import email.Errors
+import mailbox
+
+def msgfactory(fp):
+    try:
+        return email.message_from_file(fp)
+    except email.Errors.MessageParseError:
+        # Don't return None since that will
+	# stop the mailbox iterator
+	return ''
+
+mbox = mailbox.UnixMailbox(fp, msgfactory)
+\end{verbatim}
+
+Alternatively, if you know your mailbox contains only well-formed MIME
+messages, you can simplify this to:
+
+\begin{verbatim}
+import email
+import mailbox
+
+mbox = mailbox.UnixMailbox(fp, email.message_from_file)
+\end{verbatim}
 
 \subsection{Examples}
 \label{mailbox-examples}
+
+A simple example of printing the subjects of all messages in a mailbox that
+seem interesting:
+
+\begin{verbatim}
+import mailbox
+for message in mailbox.mbox('~/mbox'):
+    subject = message['subject']       # Could possibly be None.
+    if subject and 'python' in subject.lower():
+        print subject
+\end{verbatim}
+
+A (surprisingly) simple example of copying all mail from a Babyl mailbox to an
+MH mailbox, converting all of the format-specific information that can be
+converted:
+
+\begin{verbatim}
+import mailbox
+destination = mailbox.MH('~/Mail')
+for message in mailbox.Babyl('~/RMAIL'):
+    destination.add(MHMessage(message))
+\end{verbatim}
+
+An example of sorting mail from numerous mailing lists, being careful to avoid
+mail corruption due to concurrent modification by other programs, mail loss due
+to interruption of the program, or premature termination due to malformed
+messages in the mailbox:
+
+\begin{verbatim}
+import mailbox
+import email.Errors
+list_names = ('python-list', 'python-dev', 'python-bugs')
+boxes = dict((name, mailbox.mbox('~/email/%s' % name)) for name in list_names)
+inbox = mailbox.Maildir('~/Maildir', None)
+for key in inbox.iterkeys():
+    try:
+        message = inbox[key]
+    except email.Errors.MessageParseError:
+        continue                # The message is malformed. Just leave it.
+    for name in list_names:
+        list_id = message['list-id']
+        if list_id and name in list_id:
+            box = boxes[name]
+            box.lock()
+            box.add(message)
+            box.flush()         # Write copy to disk before removing original.
+            box.unlock()
+            inbox.discard(key)
+            break               # Found destination, so stop looking.
+for box in boxes.itervalues():
+    box.close()
+\end{verbatim}

Index: mailbox.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/mailbox/mailbox.py,v
retrieving revision 1.12
retrieving revision 1.13
diff -u -d -r1.12 -r1.13
--- mailbox.py	19 Aug 2005 19:25:55 -0000	1.12
+++ mailbox.py	21 Aug 2005 19:02:47 -0000	1.13
@@ -21,7 +21,8 @@
 
 __all__ = [ 'Mailbox', 'Maildir', 'mbox', 'MH', 'Babyl', 'MMDF',
             'Message', 'MaildirMessage', 'mboxMessage', 'MHMessage',
-            'BabylMessage', 'MMDFMessage' ]
+            'BabylMessage', 'MMDFMessage', 'UnixMailbox',
+            'PortableUnixMailbox', 'MmdfMailbox', 'MHMailbox', 'BabylMailbox' ]
 
 
 class Mailbox:
@@ -447,6 +448,16 @@
         except KeyError:
             raise KeyError('No message with key: %s' % key)
 
+    # This method is for backward compatibility only.
+    def next(self):
+        """Return the next message in a one-time iteration."""
+        if not hasattr(self, '_onetime_iterator'):
+            self._onetime_iterator = self.itervalues()
+        try:
+            return self._onetime_iterator.next()
+        except StopIteration:
+            return None
+
 
 class _singlefileMailbox(Mailbox):
     """A single-file mailbox."""
@@ -1310,7 +1321,7 @@
 
     def get_flags(self):
         """Return as a string the flags that are set."""
-        if len(self._info) > 2 and self._info.startswith('2,'):
+        if self._info.startswith('2,'):
             return self._info[2:]
         else:
             return ''
@@ -1843,6 +1854,184 @@
                                               socket.gethostname(),
                                               os.getpid()))
 
+
+## Start: classes from the original module (for backward compatibility).
+
+# Note that the Maildir class, whose name is unchanged, itself offers a next()
+# method for backward compatibility.
+
+class _Mailbox:
+
+    def __init__(self, fp, factory=rfc822.Message):
+        self.fp = fp
+        self.seekp = 0
+        self.factory = factory
+
+    def __iter__(self):
+        return iter(self.next, None)
+
+    def next(self):
+        while 1:
+            self.fp.seek(self.seekp)
+            try:
+                self._search_start()
+            except EOFError:
+                self.seekp = self.fp.tell()
+                return None
+            start = self.fp.tell()
+            self._search_end()
+            self.seekp = stop = self.fp.tell()
+            if start != stop:
+                break
+        return self.factory(_PartialFile(self.fp, start, stop))
+
+# Recommended to use PortableUnixMailbox instead!
+class UnixMailbox(_Mailbox):
+
+    def _search_start(self):
+        while 1:
+            pos = self.fp.tell()
+            line = self.fp.readline()
+            if not line:
+                raise EOFError
+            if line[:5] == 'From ' and self._isrealfromline(line):
+                self.fp.seek(pos)
+                return
+
+    def _search_end(self):
+        self.fp.readline()      # Throw away header line
+        while 1:
+            pos = self.fp.tell()
+            line = self.fp.readline()
+            if not line:
+                return
+            if line[:5] == 'From ' and self._isrealfromline(line):
+                self.fp.seek(pos)
+                return
+
+    # An overridable mechanism to test for From-line-ness.  You can either
+    # specify a different regular expression or define a whole new
+    # _isrealfromline() method.  Note that this only gets called for lines
+    # starting with the 5 characters "From ".
+    #
+    # BAW: According to
+    #http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html
+    # the only portable, reliable way to find message delimiters in a BSD (i.e
+    # Unix mailbox) style folder is to search for "\n\nFrom .*\n", or at the
+    # beginning of the file, "^From .*\n".  While _fromlinepattern below seems
+    # like a good idea, in practice, there are too many variations for more
+    # strict parsing of the line to be completely accurate.
+    #
+    # _strict_isrealfromline() is the old version which tries to do stricter
+    # parsing of the From_ line.  _portable_isrealfromline() simply returns
+    # true, since it's never called if the line doesn't already start with
+    # "From ".
+    #
+    # This algorithm, and the way it interacts with _search_start() and
+    # _search_end() may not be completely correct, because it doesn't check
+    # that the two characters preceding "From " are \n\n or the beginning of
+    # the file.  Fixing this would require a more extensive rewrite than is
+    # necessary.  For convenience, we've added a PortableUnixMailbox class
+    # which uses the more lenient _fromlinepattern regular expression.
+
+    _fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
+                       r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$"
+    _regexp = None
+
+    def _strict_isrealfromline(self, line):
+        if not self._regexp:
+            import re
+            self._regexp = re.compile(self._fromlinepattern)
+        return self._regexp.match(line)
+
+    def _portable_isrealfromline(self, line):
+        return True
+
+    _isrealfromline = _strict_isrealfromline
+
+
+class PortableUnixMailbox(UnixMailbox):
+    _isrealfromline = UnixMailbox._portable_isrealfromline
+
+
+class MmdfMailbox(_Mailbox):
+
+    def _search_start(self):
+        while 1:
+            line = self.fp.readline()
+            if not line:
+                raise EOFError
+            if line[:5] == '\001\001\001\001\n':
+                return
+
+    def _search_end(self):
+        while 1:
+            pos = self.fp.tell()
+            line = self.fp.readline()
+            if not line:
+                return
+            if line == '\001\001\001\001\n':
+                self.fp.seek(pos)
+                return
+
+
+class MHMailbox:
+
+    def __init__(self, dirname, factory=rfc822.Message):
+        import re
+        pat = re.compile('^[1-9][0-9]*$')
+        self.dirname = dirname
+        # the three following lines could be combined into:
+        # list = map(long, filter(pat.match, os.listdir(self.dirname)))
+        list = os.listdir(self.dirname)
+        list = filter(pat.match, list)
+        list = map(long, list)
+        list.sort()
+        # This only works in Python 1.6 or later;
+        # before that str() added 'L':
+        self.boxes = map(str, list)
+        self.boxes.reverse()
+        self.factory = factory
+
+    def __iter__(self):
+        return iter(self.next, None)
+
+    def next(self):
+        if not self.boxes:
+            return None
+        fn = self.boxes.pop()
+        fp = open(os.path.join(self.dirname, fn))
+        msg = self.factory(fp)
+        try:
+            msg._mh_msgno = fn
+        except (AttributeError, TypeError):
+            pass
+        return msg
+
+
+class BabylMailbox(_Mailbox):
+
+    def _search_start(self):
+        while 1:
+            line = self.fp.readline()
+            if not line:
+                raise EOFError
+            if line == '*** EOOH ***\n':
+                return
+
+    def _search_end(self):
+        while 1:
+            pos = self.fp.tell()
+            line = self.fp.readline()
+            if not line:
+                return
+            if line == '\037\014\n' or line == '\037':
+                self.fp.seek(pos)
+                return
+
+## End: classes from the original module (for backward compatibility).
+
+
 class Error(Exception):
     """Raised for module-specific errors."""
 



More information about the Python-checkins mailing list