[I18n-sig] Pre-PEP: Proposed Python Character Model

M.-A. Lemburg mal@lemburg.com
Thu, 08 Feb 2001 14:34:07 +0100

Toby Dickenson wrote:
> > > There is already a large body of code that mixes text and
> > binary data
> > > in the same type. If we have separate text/binary types,
> > then we need
> > > to plan a transition period to allow code to distinguish between the
> > > two uses.
> >
> > I think the current Unicode implementation has this property: Unicode
> > is the type for representing character strings; the string type the
> > one for representing byte strings.
> The problem isnt so much in the current implementation; its in the code that
> has been written to that implementation. At the moment it is unnatural to
> write
> print u"hello world"
> rather than the easier
> print "hello world"
> even though the message is clearly text.

Sure, but how is Python going to deduce this information from the
string ?

I once proposed to use a new qualifier for binary data, e.g.
b"binary data" or d"binary data". Don't remember the outcome though
as this was during the heated debate over how to do Unicode right
earlier last year.

Perhaps the only new type we need is an easy to manage
binary data type that behaves very much like the old-school

In Py3K we can then all fit them into a new class hierarchie to 
come close to unification:

                  binary data string
                  text data string 
                    |           |
                    |           |
         Unicode string      encoded 8-bit string (with encoding 
                                                   information !)

> I think we agree that, eventually, we would like the simple notation for a
> string literal to create a unicode string. What Im not sure about is whether
> we can make that change soon. How often are string literals used to create
> what is logically just binary data?

Often enough to make "python -U" fail badly...

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/