Email headers and non-ASCII characters

Fri Nov 24 07:59:56 EST 2006

Christoph Haas wrote:
> Hello, everyone...
>
> I'm trying to send an email to people with non-ASCII characters in their
> names. A recpient's address may look like:
>
> "Jörg Nørgens" <joerg at nowhere>
>
> My example code:
>
> =================================
> def sendmail(sender, recipient, body, subject):
>    message = MIMEText(body)
>    message['Subject'] = Header(subject, 'iso-8859-1')
>    message['From'] = Header(sender, 'iso-8859-1')
>    message['To'] = Header(recipient, 'iso-8859-1')
>
>    s = smtplib.SMTP()
>    s.connect()
>    s.sendmail(sender, recipient, message.as_string())
>    s.close()
> =================================
>
> However the Header() method encodes the whole expression in ISO-8859-1:
>
> =?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=
>
> However I had expected something like:
>
> "=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg at nowhere>
>
> Of course my mail transfer agent is not happy with the first string
> although I see that Header() is just doing its job. I'm looking for a way
> though to encode just the non-ASCII parts like any mail client does. Does
> anyone have a recipe on how to do that? Or is there a method in
> the "email" module of the standard library that does what I need? Or
> should I split by regular expression to extract the email address
> beforehand? Or a list comprehension to just look for non-ASCII character
> and Header() them? Sounds dirty.

Why dirty?

from email.Header import Header
from itertools import groupby
h = Header()
addr = u'"Jörg Nørgens" <joerg at nowhere>'
def is_ascii(char):
    return ord(char) < 128
for ascii, group in groupby(addr, is_ascii):
    h.append(''.join(group),"latin-1")

print h
=>
"J =?iso-8859-1?q?=F6?= rg N =?iso-8859-1?q?=F8?= rgens"
<joerg at nowhere>

  -- Leo