[I18n-sig] Unicode strings: an alternative
Guido van Rossum
guido@python.org
Fri, 05 May 2000 12:07:10 -0400
> > I'll just say that I am very happy with ASCII as the default.
>
> It's better than UTF-8, but 8bit Unicode would be better, because
> it's the least suprising alternative.
>
> People who use Python with "funny" languages, are already used to
> converting their strings around, and they treat their Python
> strings as byte arrays anyway. With Python 1.6 they can start
> to switch to Pythons unicode strings without any problems.
> That isn't so with UTF-8. I wonder how it will work with ASCII.
> Will this ASCII restriction only be enforced when converting
> to Unicode, or will the string type itself be restricted to
> ASCII?
No, 8-bit strings will always be 8-bit clear, of course! The ASCII
restriction is only used for conversion to Unicode when no explicit
encoding is given. For example, "abc" + u"xyz" is u"abcxyz", but "θι"
+ u"xyz" raises an exception. However you can write
unicode("θι","latin-1") and it will yield u"\350\351".
> IMHO the long term goal should be to have only one string type
> (being Unicode) and one byte array type (being our current string
> type?)
The byte array type should not support string literals at all. The
Java model is right.
--Guido van Rossum (home page: http://www.python.org/~guido/)