[Email-SIG] Patch: Improve recognition of attachment file name, with encodings
Stuart Bishop
stuart at stuartbishop.net
Tue Feb 26 05:11:51 CET 2008
Nando wrote:
> OK, I get question number 2 now. My question was:
>
> 2) Is there some flaw in decode_header()? Something that Thunderbird
> displays as "Eduardo & Mônica" is being decoded with the wrong character
> in place of the ô:
> repr(decode_header(m["subject"])[0][0])
> 'Eduardo & M\xf4nica'
> The header being tested is:
> Subject: =?iso-8859-1?Q?Eduardo_&_M=F4nica?=
> In case we are again doing the Right Thing, then why does Thunderbird
> display it the way it was intended?
>
>
> The answer is I have to use codecs.decode():
>
> import codecs
>
> In [20]: [(s, encoding)] = decode_header("=?iso-8859-1?Q?P=F4nei?=")
>
> In [21]: s
> Out[21]: 'P\xf4nei'
>
> In [22]: encoding
> Out[22]: 'iso-8859-1'
>
> In [23]: print codecs.decode(s, encoding)
> Pônei
>
> Well, that just makes it even harder to use the return value of the
> decode_header() function. And instead of encapsulating all that
> complexity in the email library, you are forcing every user of the
> library to find all this out by himself, just as I had to.
It gets harder, as you are not handling Unicode domain names. Code to
convert email addresses between their ASCII and Unicode representations can
be found at http://stuartbishop.net/Software/EmailAddress/
(Barry - we should discuss getting code to do this into the standard library
again. I think I opened a bug on this soon after I wrote it - in 2004!)
It is a bit of a learning curve, and I suspect that most users of the
library have written the same or similar helpers, possibly several times.
eg. the nearly mandatory header decoder:
def decode_header(s):
'''Decode an RFC2047 email header into a Unicode string.'''
s = email.Header.decode_header(s)
s = [b[0].decode(b[1] or 'ascii') for b in s]
return u''.join(s)
--
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url : http://mail.python.org/pipermail/email-sig/attachments/20080226/dfbd5963/attachment.pgp
More information about the Email-SIG
mailing list