[Tutor] Fixing garbled email addresses

Daniel Yoo dyoo at cs.wpi.edu
Tue May 1 17:59:58 CEST 2007


Hi Dotan,


Just for reference, the weirdness that you're seeing before the email 
addresses in your text file are "MIME-encoded" strings.

     http://en.wikipedia.org/wiki/MIME

Concretely, the string

     "=?UTF-8?B?157XqNeZ15Qg15nXoNeY16bXnw==?="

is an encoding of a string in MIME format, and in particular, a utf-8 
string in base-64 format.

(See http://www.joelonsoftware.com/articles/Unicode.html for details on 
unicode if you need to brush up.)


There are libraries in Python to help decode this stuff.  In particular, 
the 'email' library.

     http://docs.python.org/lib/module-email.header.html

###################################################
>>> from email.header import decode_header
>>> from email.header import make_header
>>> s = "=?UTF-8?B?157XqNeZ15Qg15nXoNeY16bXnw==?="
>>> h = make_header(decode_header(s))
###################################################


At this point, h is a "Header" object, whose unicode characters are:

#########################################################
>>> unicode(h)
u'\u05de\u05e8\u05d9\u05d4 \u05d9\u05e0\u05d8\u05e6\u05df'
#########################################################

I have a console that supports printing utf-8, and when I look at this, it 
looks like Hebrew.  A direct letter-for-letter transliteration would be:

     "Mem" "Resh" "Yod" "He" "Yod" "Nun" "Tet" "Tsadi" "Final Nun"

I'm sure these consonants make more sense to you than they do to me, since 
I don't speak Hebrew.  In any case, the point is that you may be able to 
maintain the name-to-email correspondence in your email lists by using 
Python's support for decoding those base64-encoded strings.


More information about the Tutor mailing list