Email headers and non-ASCII characters

Max M maxm at mxm.dk
Fri Nov 24 08:19:23 EST 2006


Christoph Haas skrev:
> On Thursday 23 November 2006 16:31, Max M wrote:
>> Christoph Haas skrev:
>>> Hello, everyone...
>>>
>>> I'm trying to send an email to people with non-ASCII characters in
>>> their names. A recpient's address may look like:
>>>
>>> "Jörg Nørgens" <joerg at nowhere>
>>>
>>> My example code:
>>>
>>> =================================
>>> def sendmail(sender, recipient, body, subject):
>>>    message = MIMEText(body)
>>>    message['Subject'] = Header(subject, 'iso-8859-1')
>>>    message['From'] = Header(sender, 'iso-8859-1')
>>>    message['To'] = Header(recipient, 'iso-8859-1')
>>>
>>>    s = smtplib.SMTP()
>>>    s.connect()
>>>    s.sendmail(sender, recipient, message.as_string())
>>>    s.close()
>>> =================================
>>>
>>> However the Header() method encodes the whole expression in
>>> ISO-8859-1:
>>>
>>> =?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=
>>>
>>> However I had expected something like:
>>>
>>> "=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg at nowhere>
>>>
>>> Of course my mail transfer agent is not happy with the first string
>> Why offcourse?
> 
> Because my MTA doesn't care about MIME. It just transports the email. And 
> it expects an email address in <...> but doesn't decode =?iso...? strings.
> 
>> But it seems that you are passing the Header object a 
>> utf-8 encoded string, not a latin-1 encoded.
>> You are telling the header the encoding. Not asking it to encode.
> 
> Uhm, okay. Let's see:
> 
> u'"Jörg Nørgens" <joerg at nowhere>'.encode('latin-1')
> 
> => '"J\xc3\xb6rg N\xc3\xb8rgens" <joerg at nowhere>'
> 
> So far so good. Now run Header() on it:
> 
> => '=?utf-8?b?IkrDtnJnIE7DuHJnZW5zIiA8am9lcmdAbm93aGVyZT4=?='
> 
> Still nothing like <...> in it and my MTA is unhappy again. What am I 
> missing? Doesn't anyone know how mail clients handle that encoding?


 >>> address = u'"Jörg Nørgens" <joerg at nowhere>'.encode('latin-1')
 >>> address
'"J\xf6rg N\xf8rgens" <joerg at nowhere>'
 >>> from email.Header import Header
 >>> hdr = str(Header(address, 'latin-1'))
 >>> hdr
'=?iso-8859-1?q?=22J=F6rg_N=F8rgens=22_=3Cjoerg=40nowhere=3E?='

Is this not correct?

At least roundtripping works:

 >>> from email.Header import decode_header
 >>> encoded, coding = decode_header(hdr)[0]
 >>> encoded, coding
('"J\xf6rg N\xf8rgens" <joerg at nowhere>', 'iso-8859-1')
 >>> encoded.decode(coding)
u'"J\xf6rg N\xf8rgens" <joerg at nowhere>'

And parsing the address works too.

 >>> from email.Utils import parseaddr
 >>> parseaddr(encoded.decode(coding))
(u'J\xf6rg N\xf8rgens', u'joerg at nowhere')
 >>>

-- 

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science



More information about the Python-list mailing list