[New-bugs-announce] [issue26686] email.parser stops parsing headers too soon.

Mark Sapiro report at bugs.python.org
Fri Apr 1 13:46:30 EDT 2016


New submission from Mark Sapiro:

Given an admittedly defective (the folded Content-Type: isn't indented) message part with the following headers/body

-------------------------------
Content-Disposition: inline; filename="04EBD_xxxx.xxxx_A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
name="04EBD_xxxx.xxxx_A546BB.zip"
Content-Transfer-Encoding: base64

UmFyIRoHAM+QcwAADQAAAAAAAABKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIAAAAGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...
-------------------------------

email.parser parses the headers as

-------------------------------
Content-Disposition: inline; filename="04EBD_xxxx.xxxx_A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
-------------------------------

and the body as

-------------------------------
name="04EBD_xxxx.xxxx_A546BB.zip"
Content-Transfer-Encoding: base64

UmFyIRoHAM+QcwAADQAAAAAAAABKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIAAAAGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...
-------------------------------

and shows no defects.

This is wrong. RFC5322 section 2.1 is clear that everything up to the first empty line is headers. Even the docstring in the email/parser.py module says "The header block is terminated either by the end of the string or by a blank line."

Since the message is defective, it isn't clear what the correct result should be, but I think

Headers:
Content-Disposition: inline; filename="04EBD_xxxx.xxxx_A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
Content-Transfer-Encoding: base64

Body:
UmFyIRoHAM+QcwAADQAAAAAAAABKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIAAAAGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...

Defects:
name="04EBD_xxxx.xxxx_A546BB.zip"

would be more appropriate. The problem is that the Content-Transfer-Encoding: base64 header is not in the headers so that get_payload(decode=True) doesn't decode the base64 encoded body making malware recognition difficult.

----------
components: Library (Lib)
messages: 262750
nosy: msapiro
priority: normal
severity: normal
status: open
title: email.parser stops parsing headers too soon.
type: behavior
versions: Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26686>
_______________________________________


More information about the New-bugs-announce mailing list