[Python-Dev] Pre-PEP: Python Character Model

Paul Prescod paulp@ActiveState.com
Tue, 06 Feb 2001 06:44:12 -0800


"M.-A. Lemburg" wrote:
> 
> [pre-PEP]
> 
> You have a lot of good points in there (also some inaccuracies) and
> I agree that Python should move to using Unicode for text data
> and arrays for binary data.

That's my primary goal. If we can all agree that is the goal then we can
start to design new features with that mind. I'm overjoyed to have you
on board. I'm pretty sure Fredrick agrees with the goals (probably not
every implementation detail). I'll send to i18n sig and see if I can get
buy-in from Andy Robinson et. al. Then it's just Guido.

> Some things you may be missing though is that Python already
> has support for a few features you mention, e.g. codecs.open()
> provide more or less what you have in mind with fopen() and
> the compiler can already unify Unicode and string literals using
> the -U command line option.

The problem with unifying string literals without unifying string
*types* is that many functions probably check for and type("") not
type(u"").

> What you don't talk about in the PEP is that Python's stdlib isn't
> even Unicode aware yet, and whatever unification steps we take,
> this project will have to preceed it. 

I'm not convinced that is true. We should be able to figure it out
quickly though.

> The problem with making the
> stdlib Unicode aware is that of deciding which parts deal with
> text data or binary data -- the code sometimes makes assumptions
> about the nature of the data and at other times it simply doesn't
> care.

Can you give an example? If the new string type is 100% backwards
compatible in every way with the old string type then the only code that
should break is silly code that did stuff like:

try:
    something = chr( somethingelse ) 
except ValueError:
    print "Unicode is evil!"

Note that I expect types.StringType == types(chr(10000)) etc.

> In this light I think you ought to focus Python 3k with your
> PEP. This will also enable better merging techniques due to the
> lifting of the type/class difference.

Python3K is a beautiful dream but we have problems we need to solve
today. We could start moving to a Unicode future in baby steps right
now. Your "open" function could be moved into builtins as "fopen".
Python's "binary" open function could be deprecated under its current
name and perhaps renamed.

The sooner we start the sooner we finish. You and /F laid some beautiful
groundwork. Now we just need to keep up the momentum. I think we can do
this without a big backwards compatibility earthquake. VB and TCL
figured out how to do it...

 Paul Prescod