[I18n-sig] Re: [Python-Dev] Unicode debate
Neil Hodgson
nhodgson@bigpond.net.au
Tue, 2 May 2000 21:40:44 +1000
> u = aUnicodeStringFromSomewhere
> s = an8bitStringFromSomewhere
>
> DoSomething(s + u)
> in Guido's design, the first example may or may not result in
> an "UTF-8 decoding error: UTF-8 decoding error: unexpected
> code byte" exception.
I would say it is less surprising for most people for this to follow the
silent-widening of each byte - the Fredrik-Paul position. With the current
scarcity of UTF-8 code, very few people will expect an automatic UTF-8 to
UTF-16 conversion. While complete prohibition of automatic conversion has
some appeal, it will just be more noise to many.
> u = aUnicodeStringFromSomewhere
> s = an8bitStringFromSomewhere
>
> if len(u) + len(s) == len(u + s):
> print "true"
> else:
> print "not true"
> the second example may result in a
> similar error, print "true", or print "not true", depending on the
> contents of the 8-bit string.
I don't see this as important as its trying to take the Unicode strings
are equivalent to 8 bit strings too far. How much further before you have to
break? I always thought of len measuring the number of bytes rather than
characters when applied to strings. The same as strlen in C when you have a
DBCS string.
I should correct some of the stuff Mark wrote about me. At Fujitsu we did
a lot more DBCS work than Unicode because that's what Japanese code uses.
Even with Java most storage is still DBCS. I was more involved with Unicode
architecture at Reuters 6 or so years ago.
Neil