[Email-SIG] email.header.decode_header eats my spaces
Tokio Kikuchi
tkikuchi at is.kochi-u.ac.jp
Tue Mar 27 02:40:30 CEST 2007
Jasper Spaans wrote:
> Hello SIG,
>
> Today I was playing around with the decode_header function of the
> email.header module, and it is eating my spaces.
> Some people have filed bugs about this [1] [2] and have proposed the
> following patch, which to me seems to be obviously correct:
>
> etchy:/usr/lib/python2.5/email# diff -u header.py{~,}
> --- header.py~ 2007-03-27 01:10:31.000000000 +0200
> +++ header.py 2007-03-27 01:10:31.000000000 +0200
> @@ -77,7 +77,7 @@
> continue
> parts = ecre.split(line)
> while parts:
> - unenc = parts.pop(0).strip()
> + unenc = parts.pop(0).rstrip()
> if unenc:
> # Should we continue a long line?
> if decoded and decoded[-1][1] is None:
>
> (Doing a test-run on a corpus of about 23k messages posted to a
> public mailing list with these two variants shows that several (imho)
> bugs dissappear and no new bugs appear; typical example:
> -RenéPfeiffer <> vs =?utf-8?B?UmVuw6k=?= Pfeiffer <>
> +René Pfeiffer <> vs =?utf-8?B?UmVuw6k=?= Pfeiffer <>
> )
What program make this output ?
Python 2.5 (r25:51908, Feb 7 2007, 19:53:49)
[GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.header
>>> t = email.header.decode_header('=?utf-8?B?UmVuw6k=?= Pfeiffer <>')
>>> t
[('Ren\xc3\xa9', 'utf-8'), ('Pfeiffer <>', None)]
>>> h = email.header.make_header(t)
>>> unicode(h)
u'Ren\xe9 Pfeiffer <>'
>>> unicode(h).encode('iso-8859-1')
'Ren\xe9 Pfeiffer <>'
Use email.header module to re-construct your header from the decoded
tuple list.
HTH
BTW, there is another space-eating problem in the current email package
and a patch is in the tracker:
http://sourceforge.net/tracker/index.php?func=detail&aid=1681333&group_id=5470&atid=305470
--
Tokio Kikuchi, tkikuchi at is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/
More information about the Email-SIG
mailing list