[Python-Dev] Unicode support in getargs.c

M.-A. Lemburg mal@lemburg.com
Wed, 02 Jan 2002 11:24:45 +0100


"Martin v. Loewis" wrote:
> 
> > True; "u#" does exactly the same as "s#" -- it interprets the
> > input as binary buffer.
> 
> It doesn't do exactly the same. If s# is applied to a Unicode object,
> it transparently invokes the default encoding, which is sensible.  If
> u# is applied to a byte string, it does not apply the default encoding.

That's because the buffer interface on Unicode objects doesn't
return the raw binary buffer. If you pass in a memory mapped
file or a buffer object wrapping some memory area, u# will
take the input as raw binary stream.

All this weird behaviour is needed to make Unicode objects
behave well together with s#.

The implementation of u# is completely symmetric to that of s#
though. I agree, though, that it would make more sense to special
case Unicode objects here and have u# return a pointer to the
raw internal buffer of the Unicode object.

Jack will probably also need a way to say "decode this encoded
object into Unicode using the encoding xyz". Something like the
Unicode version of "es#". How about "eu#" which then passes through
Unicode as-is while decoding all other objects according to the
given encoding ?!

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/