[Python-Dev] "data".decode(encoding) ?!
M.-A. Lemburg
mal@lemburg.com
Fri, 11 May 2001 12:07:40 +0200
Fredrik Lundh wrote:
>
> mal wrote:
>
> > > I may be being dense, but can you explain what's going on here:
> > >
> > > ->> u'\u00e3'.encode('latin-1')
> > > '\xe3'
> > > ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> > > Traceback (most recent call last):
> > > File "<input>", line 1, in ?
> > > UnicodeError: ASCII encoding error: ordinal not in range(128)
> >
> > The string.decode() method will try to reuse the Unicode
> > codecs here. To do this, it will have to convert the string
> > to Unicode first and this fails due to the character not being
> > in the ASCII range.
>
> can you take that again? shouldn't michael's example be
> equivalent to:
>
> unicode(u"\u00e3".encode("latin-1"), "latin-1")
>
> if not, I'd argue that your "decode" design is broken, instead
> of just buggy...
Well, it is sort of broken, I agree. The reason is that
PyString_Encode() and PyString_Decode() guarantee the returned
object to be a string object. To be able to reuse Unicode codecs
I added code which converts Unicode back to a string in case the
codec return an Unicode object (which the .decode() method does).
This is what's failing.
Perhaps I should simply remove the restriction and have both
APIs return the codec's return object as-is ?! (I would be in
favour of this, but I'm not sure whether this is already in use
by someone...)
--
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/