Email headers and non-ASCII characters
Max M
maxm at mxm.dk
Fri Nov 24 08:19:23 EST 2006
Christoph Haas skrev:
> On Thursday 23 November 2006 16:31, Max M wrote:
>> Christoph Haas skrev:
>>> Hello, everyone...
>>>
>>> I'm trying to send an email to people with non-ASCII characters in
>>> their names. A recpient's address may look like:
>>>
>>> "Jörg Nørgens" <joerg at nowhere>
>>>
>>> My example code:
>>>
>>> =================================
>>> def sendmail(sender, recipient, body, subject):
>>> message = MIMEText(body)
>>> message['Subject'] = Header(subject, 'iso-8859-1')
>>> message['From'] = Header(sender, 'iso-8859-1')
>>> message['To'] = Header(recipient, 'iso-8859-1')
>>>
>>> s = smtplib.SMTP()
>>> s.connect()
>>> s.sendmail(sender, recipient, message.as_string())
>>> s.close()
>>> =================================
>>>
>>> However the Header() method encodes the whole expression in
>>> ISO-8859-1:
>>>
>>> =?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=
>>>
>>> However I had expected something like:
>>>
>>> "=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg at nowhere>
>>>
>>> Of course my mail transfer agent is not happy with the first string
>> Why offcourse?
>
> Because my MTA doesn't care about MIME. It just transports the email. And
> it expects an email address in <...> but doesn't decode =?iso...? strings.
>
>> But it seems that you are passing the Header object a
>> utf-8 encoded string, not a latin-1 encoded.
>> You are telling the header the encoding. Not asking it to encode.
>
> Uhm, okay. Let's see:
>
> u'"Jörg Nørgens" <joerg at nowhere>'.encode('latin-1')
>
> => '"J\xc3\xb6rg N\xc3\xb8rgens" <joerg at nowhere>'
>
> So far so good. Now run Header() on it:
>
> => '=?utf-8?b?IkrDtnJnIE7DuHJnZW5zIiA8am9lcmdAbm93aGVyZT4=?='
>
> Still nothing like <...> in it and my MTA is unhappy again. What am I
> missing? Doesn't anyone know how mail clients handle that encoding?
>>> address = u'"Jörg Nørgens" <joerg at nowhere>'.encode('latin-1')
>>> address
'"J\xf6rg N\xf8rgens" <joerg at nowhere>'
>>> from email.Header import Header
>>> hdr = str(Header(address, 'latin-1'))
>>> hdr
'=?iso-8859-1?q?=22J=F6rg_N=F8rgens=22_=3Cjoerg=40nowhere=3E?='
Is this not correct?
At least roundtripping works:
>>> from email.Header import decode_header
>>> encoded, coding = decode_header(hdr)[0]
>>> encoded, coding
('"J\xf6rg N\xf8rgens" <joerg at nowhere>', 'iso-8859-1')
>>> encoded.decode(coding)
u'"J\xf6rg N\xf8rgens" <joerg at nowhere>'
And parsing the address works too.
>>> from email.Utils import parseaddr
>>> parseaddr(encoded.decode(coding))
(u'J\xf6rg N\xf8rgens', u'joerg at nowhere')
>>>
--
hilsen/regards Max M, Denmark
http://www.mxm.dk/
IT's Mad Science
More information about the Python-list
mailing list