[XML-SIG] Re: [I18n-sig] Re: [Python-Dev] Unicode debate

Mark Hammond mhammond@skippinet.com.au
Tue, 2 May 2000 12:17:09 +1000


> Guido van  Rossum wrote, about how to represent strings:
>
> > Paul, we're both just saying the same thing over and
> over without
> > convincing each other.  I'll wait till someone who
> wasn't in this
> > debate before chimes in.

Ive chimed in a little, but Ill chime in again :-)

> I'm with Paul and Federick on this one - at least about
> characters being the
> atoms of a string.  We **have** to be able to refer to
> **characters** in a
> string, and without guessing.  Otherwise, how could you

I see the point, and agree 100% with the intent.  However, reality
does bite.

As far as I can see, the following are immuatable:
* There will be 2 types - a string type and a Unicode type.
* History dicates that the string type may hold binary data.

Thus, it is clear that Python simply can not treat characters as the
smallest atoms of strings.  If I understand things correctly, this
is key to Guido's point, and a bit of a communication block.

The issue, to my mind, is how we handle these facts to produce "the
principal of least surprise".  We simply need to accept that Python
1.x will never be able to treat string objects as sequences of
"characters" - only bytes.

However, with my limited understanding of the full issues, it does
appear that the proposal championed by Fredrik, Just and Paul is the
best solution - not because it magically causes Python to treat
strings as characters in all cases, but because it offers the
prinipcal of least surprise.

As I said, I dont really have a deep enough understanding of the
issues, so this is probably (hopefully!?) my last word on the
matter - but that doesnt mean I dont share the concerns raised
here...

Mark.