[Mailman-Users] Inlining MIME attachments on Mailman web archives

Thu Mar 11 19:52:47 CET 1999

On 11 Mar 1999, Michael Alan Dorman wrote:
> Unfortunately, a discussion from a month back reveals that pipermail
> doesn't have the capability, and no one has at present stepped up to
> bat to implement it.  There was some talk of making it easier to use
> other archiving tools (like hypermail or mhonarc), but I don't know if
> this is going to go forward.
> 
> Which is unfortunate since as far as I can tell, it's the one wart on
> an otherwise stunningly good program.

I'd "step up to the bat" if only I had more time.  I'm attaching something
which reads a message on stdin, prints out the header and (for demo
purposes) some decoded stuff from the header, then proceeds to decode the
message, storing attachments (if any) as separate files.  Note that it
doesn't handle attachments recursively (it should - there are some mailers
which send multipart/alternative parts, so the only text/plain part is
buried two levels down).  It also doesn't handle broken mailers like elm
which send a Content-type of "text"!  Anyways, the main message has the
first plain text part, plus references to the attachments.

btw: as a stopgap measure you can simply have a link to a script which
would print out
Content-type: message/rfc822

<email message verbatim>

Netscape (and maybe other browsers) knows what to do with a Content-type: 
message/rfc822; it handles displaying attachments and parsing the header
and everything.  The downside, of course, is that you can't add anything
else like links to the next message.

Hopefully this should serve as a good starting point.  One of the great
things about Python is that it comes standard with all these wonderful
libraries!

--
James Strickland
Perforce Software
-------------- next part --------------
#!/usr/local/bin/python
# Read message from stdin, decode MIME attachments (if any), write result
# as n parts, n>=1
#***************yet to do: recognize uuencode in body of message.
#*****simply recognize "^begin [0-7]{3} " and spit out to file until "^end"
#*****then run uu.decode()

import sys, os, mimetools, multifile
dest_dir='/tmp' #***for testing

def copy(input,output,prepend=''):
  '''Copy input to output using readline(), optionally prepending a string
  to each line.  Note that multifile requires using readline()!'''

  while 1:
    line = input.readline()
    if not line: break
    output.write(prepend+line)

def store_message():

  # initialize input (stdin) and output (for now stdout)
  mainoutput = sys.stdout
  input  = multifile.MultiFile(sys.stdin)
  main   = mimetools.Message(input,0) # reads the mail header

#  # print out the original header lines verbatim
#  for line in main.headers:
#    print line # each line is already terminated with \n

  # print out header
  for field in main.keys():
    print '%s: %s' % (field,main[field])

  # - decoded name and email address
  name,email = main.getaddr('from')
  print 'Name: %s\n' % name
  print 'Email: %s\n' % email

  # - decoded timestamp -> timezone and UTC timestamp
  dateplustz = main.getdate_tz('date')

  if dateplustz: # valid date parsed
    tzoffset = dateplustz[-1]
    if not tzoffset: tzoffset=0
    import time, rfc822 # just to do the conversion to UTC
    secssince1970utc = rfc822.mktime_tz(dateplustz)
    utc = time.gmtime(secssince1970utc)
    print 'tzoffset: %d\n' % tzoffset # add to UTC to get local for sender
    print 'utc: %04d/%02d/%02d %02d:%02d:%02d\n' % utc[:6]
  else:
    print 'Invalid date format!'

  # Deal with MIME encoding, if applicable.
  # RFC2046 says 'The Content-Type field for multipart entities
  # requires one parameter, "boundary".'  I don't bother to check
  # that the Content-Type matches, because the objective here is
  # to store the thing, not to complain about broken mailers.

  attachments = []
  boundary = main.getparam('boundary')

  if not boundary: # single part message

    copy(input,mainoutput,'\t');

  else: # multipart message

    # make a directory to store the parts
    message_number = 42 #****assign a unique identifier
    try: os.mkdir('%s/%d' % (dest_dir,message_number))
    except: pass # we don't care if it's already there

    # skip to the start of the first part
    input.push(boundary)
    while input.readline(): pass # throw away lines before first boundary line
    input.next() # skip over boundary

    part = 1
    foundplaintext=0

    while 1:
      # read header for this part
      subm = mimetools.Message(input,0)
      type = subm.gettype()

      # If it's the first plain text part, store it as part of the default msg
      # (so that it's indexed and self-contained)
      # Otherwise decode the part and store it in an appropriately named file
      if (not foundplaintext) and type == 'text/plain':
        foundplaintext=1
        copy(input,mainoutput,'\t')
      else:
        name = subm.getparam('name')
        if not name: name = "part"+repr(part)
        name = '%d/%s' % (message_number,name)
        attachments.append( (type,name) )
        output = open(dest_dir+'/'+name,'w')
        encoding = subm.getencoding()
        if encoding == 'base64' or encoding == 'quoted-printable' or encoding == 'uuencode':
          mimetools.decode(input,output,encoding)
        else: # copy input to output using readline() -multifile requires readline()!
          copy(input,output)

      # go to the next part
      if not input.next(): break
      part = part + 1

    # write out information gathered about the attachments
    if attachments: mainoutput.write('Attachments:\n')
    for type,name in attachments:
      mainoutput.write('\t%s:%s\n' % (type,name))
  mainoutput.close()

# Execution starts here
try:
  store_message()
except:
  #****need to improve this, obviously...
  print 'something went horribly wrong - send email to james bitching about it'