Python - Next Release Questions

Dennis E. Hamilton infonuovo at email.com
Tue Mar 28 12:30:43 EST 2000


I'd say the transition is dangerous as a one-step gear-shift.  Transitioning
through a period of overlap is conceivable, but the legacy requirement may
be a very extended one, especially in Internet years.

Here's my thinking based on experience with non-Python settings and not so
much direct experience with Python.

With the international and cross-platform application of Python, I see lots
of uses for octet-string character codes (aka ASCII).  The adoption of
16-bit Unicode (or any duet-string character codes or whatever term we use
for that) is not all that universal just yet, and there are all those font
issues and locale issues to deal with.  There are also language communities
that have difficulty with Unicode for cultural as well as application
reasons.

More than that, I see painful interoperability issues if the current
octet-string functionality is not preserved.  I am thinking of all of those
bindings through standard C functions and Posix functions that use octet
strings.  Including HTML and XML data streams.  And bindings to all of those
other packages out there.  Going straight to Unicode would either break a
potfull of libraries and applications or else introduce inscrutible failures
related to hidden translation machinery.  Figuring out automatic
down-shifting from Unicode to octet-string is going to be messy and I like a
solution, in this transition period, that lets applications deal with it.
And this is just the beginning.  I'm told that the next move down the road
is to quartet-string character codes, and I think that's pretty painful all
the way around.  I'd be surprised if anyone makes that a default character
representation any time soon, so there is no easy way to rule out having to
deal with multiple string formats to some degree.

The Java world has taken a purist approach to this.  Can someone from an
area of wide-character practice (i.e., China, Japan, or Korea) comment on
how well that works out?

Even though the COM world also features duet-strings (aka BSTRs), a sudden
gear-shift in Python would still make life difficult for PythonWIN, I'd say.
And then there are those Python on DOS and Win16 configurations that still
run and have to deal with code pages.

Prudent steps that let existing code run without fear would be very wise.

-- Dennis

-----Original Message-----
From: python-list-admin at python.org
[mailto:python-list-admin at python.org]On Behalf Of Will Ware
Sent: Tuesday, March 28, 2000 05:35
To: python-list at python.org
Subject: Re: Python - Next Release Questions


Moshe Zadka (moshez at math.huji.ac.il) wrote:
> -- Python now supports unicode. Unicode strings are marked with
>    u"string".
> -- Strings (both unicode and regular) have methods

It sounds like there will be two distinct kinds of strings, Unicode
and ASCII. Do the memory savings available from keeping 8-bit ASCII
around justify the potential confusion? It seems like a messy thing,
and there are so few messy things in Python now.
--
 - - - - - - - - - - - - - - - - - - - - - - - -
Resistance is futile. Capacitance is efficacious.
Will Ware	email:    wware @ world.std.com
--
http://www.python.org/mailman/listinfo/python-list





More information about the Python-list mailing list