[Python-Dev] "data".decode(encoding) ?!
M.-A. Lemburg
mal@lemburg.com
Sun, 13 May 2001 18:53:55 +0200
Michael Hudson wrote:
>
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>
> > Fredrik Lundh wrote:
> > > can you take that again? shouldn't michael's example be
> > > equivalent to:
> > >
> > > unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > >
> > > if not, I'd argue that your "decode" design is broken, instead
> > > of just buggy...
> >
> > Well, it is sort of broken, I agree. The reason is that
> > PyString_Encode() and PyString_Decode() guarantee the returned
> > object to be a string object. To be able to reuse Unicode codecs
> > I added code which converts Unicode back to a string in case the
> > codec return an Unicode object (which the .decode() method does).
> > This is what's failing.
>
> It strikes me that if someone executes
>
> aString.decode("latin-1")
>
> they're going to expect a unicode string. AIUI, what's currently
> happening is that the string is converted from a latin-1 8-bit string
> to the 16-bit unicode string I expected and then there is an attempt
> to convert it back to an 8-bit string using the default encoding. So
> if I'd done a
>
> sys.setdefaultencoding("latin-1")
>
> in my sitecustomize.py, then aString.decode("latin-1") would just be
> aString again? This doesn't seem optimal.
True and that's why I am proposing to losen the restriction
on having the two APIs returning strings only.
> > Perhaps I should simply remove the restriction and have both APIs
> > return the codec's return object as-is ?! (I would be in favour of
> > this, but I'm not sure whether this is already in use by someone...)
>
> Are all the codecs ditributed with Python 2.1 unicode-related? If
> that's the case, PyString_Decode isn't terribly useful is it? It
> seems unlikely that it received much use. Could be wrong of course.
All standard codecs in 2.0 and 2.1 are Unicode related. I am
planning to write up a bunch of string-to-string codecs next
week though which will then be the first non-Unicode related
codecs in 2.2.
> OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
> operation. Do we want
>
> open(file).read().decode("jpeg") -> some kind of PIL object
>
> to be possible?
This would be possible indeed. Even though some may find this
coding style obscure, I think this technique has the same
usefulness as e.g. piping at OS level.
I am thinking of these use cases:
"äöü".decode("latin-1") -> Unicode (object construction)
"...jpeg data...".decode("jpeg") -> JpegImage object (dito)
"äöü".decode("latin-1").encode("cp1521") -> string (recoding data)
"...long data...".encode("gzip") -> string (transfer encoding)
"...gzipped data...".decode("gzip") -> string (transfer decoding)
--
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/