[ python-Feature Requests-795081 ] email.Message param parsing
problem II
SourceForge.net
noreply at sourceforge.net
Fri Nov 21 15:45:54 EST 2003
Feature Requests item #795081, was opened at 2003-08-25 23:37
Message generated for change (Comment added) made by bwarsaw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470
>Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Stuart D. Gathman (customdesigned)
>Assigned to: Nobody/Anonymous (nobody)
Summary: email.Message param parsing problem II
Initial Comment:
The enclosed real life (inactivated) virus message
causes email.Message to fail to find the multipart
attachments. This is because the headers following
Content-Type are indented, causing email.Message to
properly append them to Content-Type. The trick is
that the boundary is quoted, and Outhouse^H^H^H^H^Hlook
apparently gets a value of 'bound' for boundary,
whereas email.Message gets the value
'"bound"\n\tX-Priority...'. email.Utils.unqoute
apparently gives up and doesn't remove any quotes.
I believe that unqoute should return just what is
between the quotes, so that '"abc" def' would be
unquoted to 'abc'. In fact, my email filtering
software (http://bmsi.com/python/milter.html) works
correctly on all kinds of screwy mail using my version
of unquote using this heuristic. I believe that header
used by the virus is invalid, so a STRICT parser should
reject it, but a tolerant parser (such as a virus
scanner would use) should use the heuristic.
Here is a brief script to show the problem (attached
file in test/virus5):
----------t.py----------
import email
msg = email.message_from_file(open('test/virus5','r'))
print msg.get_params()
---------------------
$ python2 t.py
[('multipart/mixed', ''), ('boundary',
'"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority:
Normal\n\tX-Mailer: Microsoft Outlook Express
5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft
MimeOLE V5.50.4522.1300')]
----------------------------------------------------------------------
>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2003-11-21 15:45
Message:
Logged In: YES
user_id=12800
Moving this to feature requests for Python 2.4. If
appropriate, the email-sig should address this in the
intended new lax parser for email 3.0 / Python 2.4. We
can't add this to the Python 2.3 (or earlier) maintenance
releases.
----------------------------------------------------------------------
Comment By: Stuart D. Gathman (customdesigned)
Date: 2003-08-25 23:57
Message:
Logged In: YES
user_id=142072
Here is a proposed fix for email.Util.unquote (except it
should test for a 'strict' mode flag, which is current only
in Parser):
def unquote(str):
"""Remove quotes from a string."""
if len(str) > 1:
if str.startswith('"'):
if str.endswith('"'):
str = str[1:-1]
else: # remove garbage after trailing quote
try: str = str[1:str[1:].index('"')+1]
except: return str
return str.replace('\\', '\').replace('\"', '"')
if str.startswith('<') and str.endswith('>'):
return str[1:-1]
return str
Actually, I replaced only email.Message._unquotevalue for my
application to minimize the impact. That would also be a
good place to check for a STRICT flag stored with the
message object. Perhaps the Parser should set the Message
_strict flag from its own _strict flag.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470
More information about the Python-bugs-list
mailing list