[Mailman-Users] Archive Bug in CVS Scrubber.py

David Eisner cradle at umd.edu
Tue Feb 4 01:53:52 CET 2003


I checked out the latest sources from CVS this afternoon and
discovered the following bug in Scrubber.py ( I think ).


I. PROBLEM

If I send a message with an attachment to a list from Outlook (XP),
the pipermail archived version of the message is missing the
content of the message, although the attahcment is there.

Here's an example:

   --snip--

   Skipped content of type multipart/alternative-------------- next part --------------
   A non-text attachment was scrubbed...
   Name: test.doc
   Type: application/msword
   Size: 23040 bytes
   Desc: not available
   Url : http://calcetalk.umd.edu/pipermail/test/attachments/20030203/fb37e2d4/test-0001.doc

   --snip--


If, however, I send a message without an attachment, it works correctly:

   --snip--

   This is Test 6, from Outlook, with no attachment.



   -David


   -------------- next part --------------
   An HTML attachment was scrubbed...
   URL: http://calcetalk.umd.edu/pipermail/test/attachments/20030203/1d720250/attachment.htm
   --snip--


II.  ANALYSIS

I poked around Scrubber.py and added some syslog statements.  The
problem is occuring in the process() method.

a. The structure of an Outlook message with an attachment looks like this:

    multipart/mixed
        multipart/alternative
            text/plain
            text/html
        application/msword

Here's are the result of the syslog statements I put in process():

Feb 03 20:48:01 2003 (32291) Processing message part multipart/mixed
Feb 03 20:48:01 2003 (32291) Processing message part multipart/alternative
Feb 03 20:48:01 2003 (32291) Processing message part text/plain
Feb 03 20:48:01 2003 (32291) Processing message part text/html
Feb 03 20:48:01 2003 (32291) Processing message part application/msword
Feb 03 20:48:01 2003 (32291) Out of for loop, final sanitizing
Feb 03 20:48:01 2003 (32291) Processing payload part multipart/alternative
Feb 03 20:48:01 2003 (32291)    continuing
Feb 03 20:48:01 2003 (32291) Processing payload part text/plain

In the final sanitizing, the text/plain and text/html subparts within
the multipart/alternatvie are lost.  The last text/plain part is
what was the word attachment, before it was scrubbed.


b.  The structure of an Outlook message with no attachment looks like this:

    multipart/alternative
        text/plain
        text/html

The syslog statements:

Feb 03 20:48:59 2003 (32291) Processing message part multipart/alternative
Feb 03 20:48:59 2003 (32291) Processing message part text/plain
Feb 03 20:48:59 2003 (32291) Processing message part text/html
Feb 03 20:48:59 2003 (32291) Out of for loop, final sanitizing
Feb 03 20:48:59 2003 (32291) Processing payload part text/plain
Feb 03 20:48:59 2003 (32291) Processing payload part text/plain

In this case, there's no problem, because the text/plain and text/html
subparts are returned directly by msg.get_payload().  In part a., the
first thing returned by msg.get_payload() is the multipart/alternative
part, which is skipped since it's not 'text/plain'.


-David


------------------------+--------------------------+
David Eisner            | E-mail: cradle at umd.edu   |
CALCE EPSC              | Phone:  301-405-5341     |
University of Maryland  | Fax:    301-314-9269     |
------------------------+--------------------------+




More information about the Mailman-Users mailing list