How do I decode unicode characters in the subject using email.message_from_string()?
rdmurray at bitdance.com
rdmurray at bitdance.com
Wed Feb 25 08:39:58 EST 2009
John Machin <sjmachin at lexicon.net> wrote:
> On Feb 25, 11:07=A0am, "Roy H. Han" <starsareblueandfara... at gmail.com>
> wrote:
> > Dear python-list,
> >
> > I'm having some trouble decoding an email header using the standard
> > imaplib.IMAP4 class and email.message_from_string method.
> >
> > In particular, email.message_from_string() does not seem to properly
> > decode unicode characters in the subject.
> >
> > How do I decode unicode characters in the subject?
>
> You don't. You can't. You decode str objects into unicode objects. You
> encode unicode objects into str objects. If your input is not a str
> object, you have a problem.
I can't speak for the OP, but I had a similar (and possibly
identical-in-intent) question. Suppose you have a Subject line that
looks like this:
Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
How do you get the email module to decode that into unicode? The same
question applies to the other header lines, and the answer is it isn't
easy, and I had to read and reread the docs and experiment for a while
to figure it out. I understand there's going to be a sprint on the
email module at pycon, maybe some of this will get improved then.
Here's the final version of my test program. The third to last line is
one I thought ought to work given that Header has a __unicode__ method.
The final line is the one that did work (note the kludge to turn None
into 'ascii'...IMO 'ascii' is what deocde_header _should_ be returning,
and this code shows why!)
-------------------------------------------------------------------
from email import message_from_string
from email.header import Header, decode_header
x = message_from_string("""\
To: test
Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
this is a test.
""")
print x
print "--------------------"
for key, header in x.items():
print key, 'type', type(header)
print key+":", unicode(Header(header)).decode('utf-8')
print key+":", decode_header(header)
print key+":", ''.join([s.decode(t or 'ascii') for (s, t) in decode_header(header)]).encode('utf-8')
-------------------------------------------------------------------
From nobody Wed Feb 25 08:35:29 2009
To: test
Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?=
=?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
this is a test.
--------------------
To type <type 'str'>
To: test
To: [('test', None)]
To: test
Subject type <type 'str'>
Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
Subject: [("'u' Obselete type", None), ("-- it is identical to 'd'. (7)", 'iso-8859-1')]
Subject: 'u' Obselete type-- it is identical to 'd'. (7)
--RDM
More information about the Python-list
mailing list