MIME encoding change in Python 2.4.3 (or 2.4.2? 2.4.1?) - problem and solution
nobody at nowhere.org
Tue Nov 28 02:10:57 CET 2006
I have an application that processes MIME messages. It reads a message from a file,
looks for a text/html and text/plain parts in it, performs some processing on these
parts, and outputs the new message.
Ever since I recently upgraded my Python to 2.4.3, the output messages started to
come out garbled, as a block of junk characters.
I traced the problem back to a few lines that were removed from the email package:
The new Python no longer encodes the payload when converting the MIME message to a
Since my program must work on several computers, each having a different version of
Python, I had to find a way to make it work correctly no matter if msg.as_string()
encodes the payload or not.
Here is a piece of code that demonstrates how to work around this problem:
.................. code start ................
"""Return the input text or HTML string after processing it in some way."""
# For the sake of this example, we only do some trivial processing.
msg = email.message_from_string(file('input_mime_msg','r').read())
utf8 = email.Charset.Charset('UTF-8')
for part in msg.walk():
if part.get_content_type() in ('text/plain','text/html'):
s = part.get_payload(None, True) # True means decode the payload, which is normally base64-encoded.
# s is now a sting containing just the text or html of the part, not encoded in any way.
s = do_some_processing(s)
# Starting with Python 2.4.3 or so, msg.as_string() no longer encodes the payload
# according to the charset, so we have to do it ourselves here.
# The trick is to create a message-part with 'x' as payload and see if it got
# encoded or not.
should_encode = (email.MIMEText.MIMEText('x', 'html', 'UTF-8').get_payload() != 'x')
s = utf8.body_encode(s)
# The next two lines may be necessary if the original input message uses a different encoding
# encoding than the one used in the email package. In that case we have to replace the
# Content-Transfer-Encoding header to indicate the new encoding.
part['Content-Transfer-Encoding'] = utf8.get_body_encoding()
.................. code end ................
Hope this helps someone out there.
(Permission is hereby granted for anybody to use this piece of code for any purpose whatsoever)
More information about the Python-list