[Email-SIG] Some parsing/generation issues of email in Python 3

Hans-Peter Jansen hpj at urpla.net
Wed Jun 8 05:56:40 EDT 2016


Dear audience,

when coming back to this list, I couldn't believe my eyes because of the low 
volume level, but after rechecking with the archives, I have to accept, it is 
that quiet here, a bit too quiet from my POV. Hmm.

Well, I'm in the course of replacing a special purpose postfix email filter, 
that is dating back to 2004 with a redeveloped Python 3 version right now.

Basically all it is doing (in pseudo code):

	msg = email.message_from_file(fp)
	processing(msg)
	write(msg.as_string(True))

for a few 100 million mails during that time.

After replacing it with:

	msg = email.message_from_binary_file(fp, policy = email.policy.SMTP)
	processing(msg)
	BytesGenerator(pipe).flatten(msg)

Here, processing mostly saves bodies and attachments, depending on pattern 
matches and adds some headers.

I was quite astonished to find out, that this procedure isn't working that 
well anymore: the email module appears way more sensible in the current state.
This is a bit disappointing, as reading the docs conveys, that some effort was 
put into reliability and robustness. Given the much improved unicode handling 
of Python 3 itself and the ever improving experience in handling emails, this 
is contrary to my expectations, I have to confess.

Minutes after switching to the new code, I stumbled across a traceback in 
msg.get_all('to') from a header like this:

To: unlisted-recipients: ;,
        ""@pop.kundenserver.de (no To-header on input)

Hmm, not nice. http://bugs.python.org/issue27257

Next, I wondered, that arbitrary header data appears in the body of some mail 
in my MUA. Tracked down to a mangled header, that has lost proper indentation: 

X-Microsoft-Exchange-Diagnostics: 
 =?utf-8?B?MTtCTDJQUjAyTUI1MTQ7MjM6bEtRRlNaUHQvVTk5WCttdktlOUVrUGQvVFBH?=
 =?utf-8?B?cDFJemVUeXFzOGNzYnZOYWlwMDZpR0YzbXZyY09WaTBKM2pkeUl4S1VDMkxw?=
 =?utf-8?B?eVRkNWthRW9waUhJTzczTWd5WDZOQ3hMNU1haGFvQTVzVTdRZmxJUnZlblpW?=
 ...

versus: 

X-Microsoft-Exchange-Diagnostics:
 1;BL2PR02MB514;23:lKQFSZPt/U99X+mvKe9EkPd/TPG
p1IzeTyqs8csbvNaip06iGF3mvrcOVi0J3jdyIxKUC2Lp
yTd5kaEopiHIO73MgyX6NCxL5MahaoA5sU7QflIRvenZV

Oh, well. http://bugs.python.org/issue27256

Before I added some code to circumvent those occurrences, I stumbled across a 
traceback in flatten: http://bugs.python.org/issue27258

All these issues were harvested in less than halve an hour. What really 
troubles me is the quietness around here in the light of this experience. 
Doesn't people use Python (3) yet/anymore for these kind of tasks? Does 
somebody care? Am I missing something?

I will do my best to dive into these issues in the next days/weeks, but would 
appreciate a dialog with somebody, who is involved in the email module code 
already.

Thanks,
Pete


More information about the Email-SIG mailing list