[I18n-sig] Re: [Python-Dev] Pre-PEP: Python Character Model

Paul Prescod paulp@ActiveState.com
Tue, 06 Feb 2001 10:27:10 -0800


"M.-A. Lemburg" wrote:
> 
> ...
> 
> Unicode is the defacto international standard for unified
> script encodings. Discussing whether Unicode is good or bad is
> really beyond the scope of language design and should be dealt
> with in other more suitable forums, IMHO.

We are in violent agreement.

>...
> 
> I don't understand your statement about allowing string objects
> to support "higher" ordinals... are you proposing to add a third
> character type ?

Yes and no. I want to make a type with a superset of the functionality
of strings and Unicode strings.

> > Similarly, we could improve socket objects so that they have different
> > readtext/readbinary and writetext/writebinary without unifying the
> > string objects. There are lots of small changes we can make without
> > breaking anything. 

Before we go on: do you agree that we could add fopen and
readtext/readbinary on various I/O types without breaking anything? And
that that we should do so?

> > One I would like to see right now is a unification of
> > chr() and unichr().
> 
> This won't work: programs simply do not expect to get Unicode
> characters out of chr() and would break. 

Why would a program pass a large integer to chr() if it cannot handle
the resulting wide string????

> OTOH, programs using
> unichr() don't expect 8bit-strings as output.

Where would an 8bit string break code that expected a Unicode string?
The upward conversion is automatic and lossless! 

Having chr() and unichr() is like having a special function for adding
integers versus longs. IMO it is madness.

> Let's keep the two worlds well separated for a while and
> unify afterwards (this is much easier to do when everything's
> in place and well tested).

No, the more we keep the worlds seperated the more code will be written
that expects to deal with two separate types. We need to get people
thinking in terms of strings of characters not strings of bytes and we
need to do it as soon as possible.

 Paul Prescod