[Python-Dev] Unicode support in getargs.c
M.-A. Lemburg
mal@lemburg.com
Wed, 02 Jan 2002 11:24:45 +0100
"Martin v. Loewis" wrote:
>
> > True; "u#" does exactly the same as "s#" -- it interprets the
> > input as binary buffer.
>
> It doesn't do exactly the same. If s# is applied to a Unicode object,
> it transparently invokes the default encoding, which is sensible. If
> u# is applied to a byte string, it does not apply the default encoding.
That's because the buffer interface on Unicode objects doesn't
return the raw binary buffer. If you pass in a memory mapped
file or a buffer object wrapping some memory area, u# will
take the input as raw binary stream.
All this weird behaviour is needed to make Unicode objects
behave well together with s#.
The implementation of u# is completely symmetric to that of s#
though. I agree, though, that it would make more sense to special
case Unicode objects here and have u# return a pointer to the
raw internal buffer of the Unicode object.
Jack will probably also need a way to say "decode this encoded
object into Unicode using the encoding xyz". Something like the
Unicode version of "es#". How about "eu#" which then passes through
Unicode as-is while decoding all other objects according to the
given encoding ?!
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/