[Python-Dev] Adding .decode() method to Unicode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 12 Jun 2001 20:18:45 +0200


> > Having just followed this thread tangentially, I do have to say it
> > seems quite cool to be able to do something like the following in
> > Python 2.2:
> > 
> > >>> s = msg['from']
> > >>> parts = s.split('?')
> > >>> if parts[2].lower() == 'q':
> > ...   name = parts[3].decode('quopri')
> > ... elif parts[2].lower() == 'b':
> > ...   name = parts[3].decode('base64')
> > ...
> 
> I think that the central point is that if code like the above is useful
> and supported then it needs to be the same for Unicode strings as for
> 8-bit strings. 

Why is that? An encoding, by nature, is something that produces a byte
sequence from some input. So you can only decode byte sequences, not
character strings.

> If the code above is NOT useful and should NOT be supported then we
> need to undo it before 2.2 ships. This unicode.decode argument is
> just a proxy for the real argument about the above.

No, it isn't. The code is useful for byte strings, but not for Unicode
strings.

> I don't feel strongly one way or another about this (ab?)use of the
> codecs concept, myself, but I do feel strongly that Unicode strings
> should behave as much as possible like 8-bit strings.

Not at all. Byte strings and character strings are as different as are
byte strings and lists of DOM child nodes (i.e. the only common thing
is that they are sequences).

Regards,
Martin