[Python-Dev] Unicode charnames impl.

M.-A. Lemburg mal@lemburg.com
Sat, 25 Mar 2000 11:47:30 +0100


"Andrew M. Kuchling" wrote:
> 
> M.-A. Lemburg writes:
> >.encode() should translate Unicode to a string. Since the
> >named char thing is probably only useful on input, I'd say:
> >don't do anything, except maybe return input.encode('unicode-escape').
> 
> Wait... then you can't stack it on top of unicode-escape, because it
> would already be Unicode escaped.

Sorry for the mixup (I guess yesterday wasn't my day...). I had
stream codecs in mind: these are stackable, meaning that you can
wrap one codec around another. And its also their interface API
that was changed -- not the basic stateless encoder/decoder ones.

Stacking of .encode()/.decode() must be done "by hand" in e.g.
the way I described above. Another approach would be subclassing
the unicode-escape Codec and then calling the base class method.

> >> 4) What do you with the error \N{...... no closing right bracket.
> >I'd suggest to take the upper bound of all Unicode name
> >lengths as limit.
> 
> Seems like a hack.

It is... but what other way would there be ?
 
> >Note that .decode() must only return the decoded data.
> >The "bytes read" integer was removed in order to make
> >the Codec APIs compatible with the standard file object
> >APIs.
> 
> Huh? Why does Misc/unicode.txt describe decode() as "Decodes the
> object input and returns a tuple (output object, length consumed)"?
> Or are you talking about a different .decode() method?

You're right... I was thinking about .read() and .write().
.decode() should do return a tuple, just as documented in
unicode.txt.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/