[Python-Dev] Unicode support in getargs.c

M.-A. Lemburg mal@lemburg.com
Wed, 02 Jan 2002 20:40:56 +0100


"Martin v. Loewis" wrote:
> 
> > That's because the buffer interface on Unicode objects doesn't
> > return the raw binary buffer. If you pass in a memory mapped
> > file or a buffer object wrapping some memory area, u# will
> > take the input as raw binary stream.
> >
> > All this weird behaviour is needed to make Unicode objects
> > behave well together with s#.
> 
> I don't believe this. Why would the implementation of u# have any
> effect on making s# work?

To make s# work, we had to map the read buffer interface to the
encoded version of Unicode -- not the binary version which would
have been the "right" choice in terms of the buffer interface (s#
maps to the read buffer interface, while t# maps to the character
buffer interface).
 
u# is simply a copy&paste implementation of s# interpreting the
results of the read buffer interface as Py_UNICODE array. As I menioned
in another mail, we should probably let u# pass through Unicode
objects as-is without going through the read buffer interface.
This functionality is clearly missing and should be added to
make u# useful.

> > Jack will probably also need a way to say "decode this encoded
> > object into Unicode using the encoding xyz". Something like the
> > Unicode version of "es#". How about "eu#" which then passes through
> > Unicode as-is while decoding all other objects according to the
> > given encoding ?!
> 
> I'd like to see the requirements, in terms of real-world problems,
> before considering any extensions.

Agreed. Jack should post some examples of what he needs for his
application.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/