[Email-SIG] email.header.decode_header eats my spaces

Tokio Kikuchi tkikuchi at is.kochi-u.ac.jp
Tue Mar 27 09:06:03 CEST 2007


Hi,

Barry Warsaw wrote:
> 
>> Today I was playing around with the decode_header function of the
>> email.header module, and it is eating my spaces.
>> Some people have filed bugs about this [1] [2] and have proposed the
>> following patch, which to me seems to be obviously correct:
>>
>> Is there any reason for this not to be incorporated into the package?
> 
> Have you run the test suite with this change?

Don't commit this patch in.  As I've written earlier, header 
manipulation should be done through email.header module and current code 
is not broken with regard this person's example.
> 
> I've been working on a branch since Pycon, which tries to fix this  
> and pass all the unit tests.  ISTR that this patch causes several  
> tests to fail.  However, resolving the tests was like pulling a  
> thread from a sweater.  It now leads me to think that we really  
> aren't true to RFC 2822 wrt folding whitespace.  However, I haven't  
> been able to fix that without breaking some current assumptions in  
> the email package.  I've been trying to get my branch to a point  
> where it passes all the tests before I posted a message here, but I  
> haven't had a chance to finish it yet.

In my opinion (may not be true to RFC2822 in detail), ascii strings in 
header object should be strip()ped and separated by FWS (including '\r\n 
' or '\r\n\t').  If you like to see [('Hi! ', None), ('there.', None)] 
to be represented by 'Hi!  there.' (note two spaces between '!' and 
't'), you may have to use workaround like:

 >>> h = email.Header.Header('Hi! ', 'iso-8859-1')
 >>> h.append('there.', 'us-ascii')
 >>> print h
=?iso-8859-1?q?Hi!_?= there.
 >>> print str(unicode(h))
Hi!  there.

Use of space between the encoded/unencoded words should be:

lastcs \ nextcs | ascii | other |
           ascii | sp    | sp   |
           other | sp    | nosp |

Current code for generating unicode string breaks this for ascii/ascii 
case, see:

 >>> h = email.Header.Header('Hi!', 'us-ascii')
 >>> h.append('there.', 'us-ascii')
 >>> print h
Hi! there.
 >>> unicode(h)
u'Hi!there.'

Cheers,
-- 
Tokio Kikuchi, tkikuchi at is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/



More information about the Email-SIG mailing list