[Python-Dev] Bug or feature? Unicode vs t#

M.-A. Lemburg mal@lemburg.com
Fri, 12 Oct 2001 10:19:18 +0200


Guido van Rossum wrote:
> 
> > My real question is whether there is any value in having Unicode objects
> > expose their internal representation to Python programmers through the
> > buffer interface?
> 
> I used to think so, but I no longer believe this.  UTF-16 should be an
> encoding and that's that.

... and later ...

> > And I think I agree, even though that /could/ break code.  Then again,
> > maybe Paul's suggestion that hexlify() should reject Unicode strings
> > is the better approach.
> 
> +1

Since hexlify() uses a parser marker which does not involve a
type check, there's no way to have it reject Unicode objects.

BTW, the "s#" parser marker does *not* map to getreadbuffer
for Unicode objects. Long ago we decided that the difference
between "s#" and "t#" does not make sense for Unicode objects
and, in order to increase compatibility of Unicode objects to
existing code which uses "s#", to have both parser markers
map to getcharbuffer.

As a result, both parser markers return the default encoded
version of the Unicode object. The getreadbuffer interface
is still in place, though... perhaps we ought to consider
removing it ?!

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/