[Tutor] Fixing garbled email addresses
Daniel Yoo
dyoo at cs.wpi.edu
Tue May 1 17:59:58 CEST 2007
Hi Dotan,
Just for reference, the weirdness that you're seeing before the email
addresses in your text file are "MIME-encoded" strings.
http://en.wikipedia.org/wiki/MIME
Concretely, the string
"=?UTF-8?B?157XqNeZ15Qg15nXoNeY16bXnw==?="
is an encoding of a string in MIME format, and in particular, a utf-8
string in base-64 format.
(See http://www.joelonsoftware.com/articles/Unicode.html for details on
unicode if you need to brush up.)
There are libraries in Python to help decode this stuff. In particular,
the 'email' library.
http://docs.python.org/lib/module-email.header.html
###################################################
>>> from email.header import decode_header
>>> from email.header import make_header
>>> s = "=?UTF-8?B?157XqNeZ15Qg15nXoNeY16bXnw==?="
>>> h = make_header(decode_header(s))
###################################################
At this point, h is a "Header" object, whose unicode characters are:
#########################################################
>>> unicode(h)
u'\u05de\u05e8\u05d9\u05d4 \u05d9\u05e0\u05d8\u05e6\u05df'
#########################################################
I have a console that supports printing utf-8, and when I look at this, it
looks like Hebrew. A direct letter-for-letter transliteration would be:
"Mem" "Resh" "Yod" "He" "Yod" "Nun" "Tet" "Tsadi" "Final Nun"
I'm sure these consonants make more sense to you than they do to me, since
I don't speak Hebrew. In any case, the point is that you may be able to
maintain the name-to-email correspondence in your email lists by using
Python's support for decoding those base64-encoded strings.
More information about the Tutor
mailing list