[Python-Dev] Bug or feature? Unicode vs t#

M.-A. Lemburg mal@lemburg.com
Thu, 11 Oct 2001 23:08:49 +0200


Jeremy Hylton wrote:
> 
> >>>>> "PP" == Paul Prescod <paul@ActiveState.com> writes:
> 
>   PP> Translating the data before handing it to the user violates the
>   PP> "raw"-ness of the byte-oriented format and the principle about
>   PP> not needing to copy it. It behaves quite differently than other
>   PP> implementations of getcharbuffer.
> 
> As far as I can tell, all other implementations of getcharbuffer are
> exact duplicates of getreadbuffer.  If the Unicode object doesn't have
> an appropriate implementation of getcharbuffer, one wonders why the
> method exists at all:  In every case it would be redundant or
> incorrect.

Well, the question is whether hexlify() should use "t#" which
is specifically intended to return *character* data and not "s#"
which means *binary* data. I think the latter is more appropriate
for hexlify() as its intention is to encode binary data.

Regarding Paul's question: the getcharbuffer interface was designed
with Unicode in mind before a Unicode implementation even existed.
It turned out to be a more or less useless design :-( since "character
data" is not enough -- you also need to be able to specify an
encoding. Unicode only supports this feature because some IO objects
rely on it (e.g. the file object when opened in text mode).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/