[ python-Feature Requests-795081 ] email.Message param parsing problem II
SourceForge.net
noreply at sourceforge.net
Fri Mar 30 16:58:14 CEST 2007
Feature Requests item #795081, was opened at 2003-08-25 23:37
Message generated for change (Comment added) made by collinwinter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Stuart D. Gathman (customdesigned)
>Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Message param parsing problem II
Initial Comment:
The enclosed real life (inactivated) virus message
causes email.Message to fail to find the multipart
attachments. This is because the headers following
Content-Type are indented, causing email.Message to
properly append them to Content-Type. The trick is
that the boundary is quoted, and Outhouse^H^H^H^H^Hlook
apparently gets a value of 'bound' for boundary,
whereas email.Message gets the value
'"bound"\n\tX-Priority...'. email.Utils.unqoute
apparently gives up and doesn't remove any quotes.
I believe that unqoute should return just what is
between the quotes, so that '"abc" def' would be
unquoted to 'abc'. In fact, my email filtering
software (http://bmsi.com/python/milter.html) works
correctly on all kinds of screwy mail using my version
of unquote using this heuristic. I believe that header
used by the virus is invalid, so a STRICT parser should
reject it, but a tolerant parser (such as a virus
scanner would use) should use the heuristic.
Here is a brief script to show the problem (attached
file in test/virus5):
----------t.py----------
import email
msg = email.message_from_file(open('test/virus5','r'))
print msg.get_params()
---------------------
$ python2 t.py
[('multipart/mixed', ''), ('boundary',
'"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority:
Normal\n\tX-Mailer: Microsoft Outlook Express
5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft
MimeOLE V5.50.4522.1300')]
----------------------------------------------------------------------
>Comment By: Collin Winter (collinwinter)
Date: 2007-03-30 10:58
Message:
Logged In: YES
user_id=1344176
Originator: NO
I'm still seeing this behaviour as of Python 2.6a0.
Barry: I take it email-sig didn't get around to discussing this?
----------------------------------------------------------------------
Comment By: Barry A. Warsaw (bwarsaw)
Date: 2003-11-21 15:45
Message:
Logged In: YES
user_id=12800
Moving this to feature requests for Python 2.4. If
appropriate, the email-sig should address this in the
intended new lax parser for email 3.0 / Python 2.4. We
can't add this to the Python 2.3 (or earlier) maintenance
releases.
----------------------------------------------------------------------
Comment By: Stuart D. Gathman (customdesigned)
Date: 2003-08-25 23:57
Message:
Logged In: YES
user_id=142072
Here is a proposed fix for email.Util.unquote (except it
should test for a 'strict' mode flag, which is current only
in Parser):
def unquote(str):
"""Remove quotes from a string."""
if len(str) > 1:
if str.startswith('"'):
if str.endswith('"'):
str = str[1:-1]
else: # remove garbage after trailing quote
try: str = str[1:str[1:].index('"')+1]
except: return str
return str.replace('\\\\', '\\').replace('\\"', '"')
if str.startswith('<') and str.endswith('>'):
return str[1:-1]
return str
Actually, I replaced only email.Message._unquotevalue for my
application to minimize the impact. That would also be a
good place to check for a STRICT flag stored with the
message object. Perhaps the Parser should set the Message
_strict flag from its own _strict flag.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470
More information about the Python-bugs-list
mailing list