[Email-SIG] email.header.decode_header eats my spaces

Tue Mar 27 02:40:30 CEST 2007

Jasper Spaans wrote:
> Hello SIG,
> 
> Today I was playing around with the decode_header function of the  
> email.header module, and it is eating my spaces.
> Some people have filed bugs about this [1] [2] and have proposed the  
> following patch, which to me seems to be obviously correct:
> 
> etchy:/usr/lib/python2.5/email# diff -u header.py{~,}
> --- header.py~  2007-03-27 01:10:31.000000000 +0200
> +++ header.py   2007-03-27 01:10:31.000000000 +0200
> @@ -77,7 +77,7 @@
>               continue
>           parts = ecre.split(line)
>           while parts:
> -            unenc = parts.pop(0).strip()
> +            unenc = parts.pop(0).rstrip()
>               if unenc:
>                   # Should we continue a long line?
>                   if decoded and decoded[-1][1] is None:
> 
> (Doing a test-run on a corpus of about 23k messages posted to a  
> public mailing list with these two variants shows that several (imho)  
> bugs dissappear and no new bugs appear; typical example:
> -RenéPfeiffer <> vs =?utf-8?B?UmVuw6k=?= Pfeiffer <>
> +René Pfeiffer <> vs =?utf-8?B?UmVuw6k=?= Pfeiffer <>
> )

What program make this output ?

Python 2.5 (r25:51908, Feb  7 2007, 19:53:49)
[GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import email.header
 >>> t = email.header.decode_header('=?utf-8?B?UmVuw6k=?= Pfeiffer <>')
 >>> t
[('Ren\xc3\xa9', 'utf-8'), ('Pfeiffer <>', None)]
 >>> h = email.header.make_header(t)
 >>> unicode(h)
u'Ren\xe9 Pfeiffer <>'
 >>> unicode(h).encode('iso-8859-1')
'Ren\xe9 Pfeiffer <>'

Use email.header module to re-construct your header from the decoded 
tuple list.

HTH

BTW, there is another space-eating problem in the current email package 
and a patch is in the tracker:
http://sourceforge.net/tracker/index.php?func=detail&aid=1681333&group_id=5470&atid=305470

-- 
Tokio Kikuchi, tkikuchi at is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/