How do I decode unicode characters in the subject using email.message_from_string()?

Roy H. Han starsareblueandfaraway at
Wed Feb 25 15:09:49 CET 2009

Thanks for writing back, RDM and John Machin.  Tomorrow I'll try the
code you suggested, RDM.  It looks quite helpful and I'll report the

In the meantime, John asked for more data.  The sender's email client
is Microsoft Outlook 11.  The recipient email client is Lotus Notes.

Actual Subject

Expected Subject
Inteum C/SR User Tip: Quick Access to Recently Opened Inteum C/SR Records

Microsoft Office Outlook 11

Produced By Microsoft MimeOLE V6.00.2900.5579


On Wed, Feb 25, 2009 at 8:39 AM,  <rdmurray at> wrote:
> John Machin <sjmachin at> wrote:
>> On Feb 25, 11:07=A0am, "Roy H. Han" <starsareblueandfara... at>
>> wrote:
>> > Dear python-list,
>> >
>> > I'm having some trouble decoding an email header using the standard
>> > imaplib.IMAP4 class and email.message_from_string method.
>> >
>> > In particular, email.message_from_string() does not seem to properly
>> > decode unicode characters in the subject.
>> >
>> > How do I decode unicode characters in the subject?
>> You don't. You can't. You decode str objects into unicode objects. You
>> encode unicode objects into str objects. If your input is not a str
>> object, you have a problem.
> I can't speak for the OP, but I had a similar (and possibly
> identical-in-intent) question.  Suppose you have a Subject line that
> looks like this:
>    Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?=   =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
> How do you get the email module to decode that into unicode?  The same
> question applies to the other header lines, and the answer is it isn't
> easy, and I had to read and reread the docs and experiment for a while
> to figure it out.  I understand there's going to be a sprint on the
> email module at pycon, maybe some of this will get improved then.
> Here's the final version of my test program.  The third to last line is
> one I thought ought to work given that Header has a __unicode__ method.
> The final line is the one that did work (note the kludge to turn None
> into 'ascii'...IMO 'ascii' is what deocde_header _should_ be returning,
> and this code shows why!)
> -------------------------------------------------------------------
> from email import message_from_string
> from email.header import Header, decode_header
> x = message_from_string("""\
> To: test
> Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?=   =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
> this is a test.
> """)
> print x
> print "--------------------"
> for key, header in x.items():
>    print key, 'type', type(header)
>    print key+":", unicode(Header(header)).decode('utf-8')
>    print key+":", decode_header(header)
>    print key+":", ''.join([s.decode(t or 'ascii') for (s, t) in decode_header(header)]).encode('utf-8')
> -------------------------------------------------------------------
>    From nobody Wed Feb 25 08:35:29 2009
>    To: test
>    Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?=
>            =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
>    this is a test.
>    --------------------
>    To type <type 'str'>
>    To: test
>    To: [('test', None)]
>    To: test
>    Subject type <type 'str'>
>    Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?=   =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?=
>    Subject: [("'u' Obselete type", None), ("-- it is identical to 'd'. (7)", 'iso-8859-1')]
>    Subject: 'u' Obselete type-- it is identical to 'd'. (7)
> --RDM
> --

More information about the Python-list mailing list