[ python-Feature Requests-795081 ] email.Message param parsing problem II

SourceForge.net noreply at sourceforge.net
Fri Nov 21 15:45:54 EST 2003


Feature Requests item #795081, was opened at 2003-08-25 23:37
Message generated for change (Comment added) made by bwarsaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470

>Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Stuart D. Gathman (customdesigned)
>Assigned to: Nobody/Anonymous (nobody)
Summary: email.Message param parsing problem II

Initial Comment:
The enclosed real life (inactivated) virus message
causes email.Message to fail to find the multipart
attachments.  This is because the headers following
Content-Type are indented, causing email.Message to
properly append them to Content-Type.  The trick is
that the boundary is quoted, and Outhouse^H^H^H^H^Hlook
apparently gets a value of 'bound' for boundary,
whereas email.Message gets the value
'"bound"\n\tX-Priority...'.  email.Utils.unqoute
apparently gives up and doesn't remove any quotes.

I believe that unqoute should return just what is
between the quotes, so that '"abc" def' would be
unquoted to 'abc'.  In fact, my email filtering
software (http://bmsi.com/python/milter.html) works
correctly on all kinds of screwy mail using my version
of unquote using this heuristic.  I believe that header
used by the virus is invalid, so a STRICT parser should
reject it, but a tolerant parser (such as a virus
scanner would use) should use the heuristic.

Here is a brief script to show the problem (attached
file in test/virus5): 
----------t.py----------
import email

msg = email.message_from_file(open('test/virus5','r'))
print msg.get_params()
---------------------
$ python2 t.py
[('multipart/mixed', ''), ('boundary',
'"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority:
Normal\n\tX-Mailer: Microsoft Outlook Express
5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft
MimeOLE V5.50.4522.1300')]


----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2003-11-21 15:45

Message:
Logged In: YES 
user_id=12800

Moving this to feature requests for Python 2.4.  If
appropriate, the email-sig should address this in the
intended new lax parser for email 3.0 / Python 2.4.  We
can't add this to the Python 2.3 (or earlier) maintenance
releases.


----------------------------------------------------------------------

Comment By: Stuart D. Gathman (customdesigned)
Date: 2003-08-25 23:57

Message:
Logged In: YES 
user_id=142072

Here is a proposed fix for email.Util.unquote (except it
should test for a 'strict' mode flag, which is current only
in Parser):

def unquote(str):
    """Remove quotes from a string."""
    if len(str) > 1:
        if str.startswith('"'):
          if str.endswith('"'):
            str = str[1:-1]
          else: # remove garbage after trailing quote
            try: str = str[1:str[1:].index('"')+1]
            except: return str
          return str.replace('\\', '\').replace('\"', '"')
        if str.startswith('<') and str.endswith('>'):
            return str[1:-1]
    return str

Actually, I replaced only email.Message._unquotevalue for my
application to minimize the impact.  That would also be a
good place to check for a STRICT flag stored with the
message object.  Perhaps the Parser should set the Message
_strict flag from its own _strict flag. 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470



More information about the Python-bugs-list mailing list