[Python-Dev] Unicode support in getargs.c
M.-A. Lemburg
mal@lemburg.com
Wed, 02 Jan 2002 20:40:56 +0100
"Martin v. Loewis" wrote:
>
> > That's because the buffer interface on Unicode objects doesn't
> > return the raw binary buffer. If you pass in a memory mapped
> > file or a buffer object wrapping some memory area, u# will
> > take the input as raw binary stream.
> >
> > All this weird behaviour is needed to make Unicode objects
> > behave well together with s#.
>
> I don't believe this. Why would the implementation of u# have any
> effect on making s# work?
To make s# work, we had to map the read buffer interface to the
encoded version of Unicode -- not the binary version which would
have been the "right" choice in terms of the buffer interface (s#
maps to the read buffer interface, while t# maps to the character
buffer interface).
u# is simply a copy&paste implementation of s# interpreting the
results of the read buffer interface as Py_UNICODE array. As I menioned
in another mail, we should probably let u# pass through Unicode
objects as-is without going through the read buffer interface.
This functionality is clearly missing and should be added to
make u# useful.
> > Jack will probably also need a way to say "decode this encoded
> > object into Unicode using the encoding xyz". Something like the
> > Unicode version of "es#". How about "eu#" which then passes through
> > Unicode as-is while decoding all other objects according to the
> > given encoding ?!
>
> I'd like to see the requirements, in terms of real-world problems,
> before considering any extensions.
Agreed. Jack should post some examples of what he needs for his
application.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/