Distinguishing between maildir, mbox, and MH files/directories?

Cameron Simpson cs at zip.com.au
Mon Sep 1 04:07:51 CEST 2014

On 31Aug2014 13:45, Tim Chase <python.list at tim.thechases.com> wrote:
>Tinkering around with a little script, I found myself with the need
>to walk a directory tree and process mail messaged found within.
>Sometimes these end up being mbox files (with multiple messages
>within), sometimes it's a Maildir structure with messages in each
>individual file and extra holding directories, and sometimes it's a
>MH directory.  To complicate matters, there's also the possibility of
>non-{mbox,maildir,mh) files such as binary MUA caches appearing
>alongside these messages.
>Python knows how to handle each just fine as long as I tell it what
>type of file to expect.  But is there a straight-forward way to
>distinguish them?  (FWIW, the *nix "file" utility is just reporting
>"ASCII text", sometimes "with very long lines", and sometimes
>erroneously flags them as C or C++ files‽).
>All I need is "is it maildir, mbox, mh, or something else" (I don't
>have to get more complex for the "something else") inside an os.walk

Here is my code for these tests:

     def ismhdir(path):
       ''' Test if `path` points at an MH directory.
       return os.path.isfile(os.path.join(path, '.mh_sequences'))

     def ismaildir(path):
       ''' Test if `path` points at a Maildir directory.
       for subdir in ('new', 'cur', 'tmp'):
         if not os.path.isdir(os.path.join(path,subdir)):
           return False
       return True

     def ismbox(path):
       ''' Open path and check that its first line begins with "From ".
         from_ = fp.read(5)
       except IOError:
         if fp is not None:
         return False
       return from_ == 'From '

I would use these is code somewhat like this (imagining your use case):

   if ismaildir(path):
   elif ismhdir(path):
   elif ismbox(path):
     reject other known special files here
     continue traversing downward otherwise

Cameron Simpson <cs at zip.com.au>

Gabriel Genellina: See PEP 234 http://www.python.org/dev/peps/pep-0234/
Angus Rodgers:
   You've got to love a language whose documentation contains sentences
   beginning like this:
     "Among its chief virtues are the following four -- no, five -- no,
     six -- points: [...]"
from python-list at python.org

More information about the Python-list mailing list