[Python-Dev] Unicode
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Wed, 17 May 2000 00:02:10 +0200
> perfectionist or not, I only want Python's Unicode support to
> be as intuitive as anything else in Python. as it stands right
> now, Perl and Tcl's Unicode support is intuitive. Python's not.
I haven't much experience with Perl, but I don't think Tcl is
intuitive in this area. I really think that they got it all wrong.
They use the string type for "plain bytes", just as we do, but then
have the notion of "correct" and "incorrect" UTF-8 (i.e. strings with
violations of the encoding rule). For a "plain bytes" string, the
following might happen
- the string is scanned for non-UTF-8 characters
- if any are found, the string is converted into UTF-8, essentially
treating the original string as Latin-1.
- it then continues to use the UTF-8 "version" of the original string,
and converts it back on demand.
Maybe I got something wrong, but the Unicode support in Tcl makes me
worry very much.
> btw, I thought we'd all agreed on GvR's solution for 1.6?
>
> what did I miss?
I like the 'only ASCII is converted' approach very much, so I'm not
objecting to that solution - just as I wasn't objecting to the
previous one.
> so tell me, if "good enough" is what we're aiming at, why isn't
> my counter-proposal good enough?
Do you mean the one in
http://www.python.org/pipermail/python-dev/2000-April/005218.html
which I suppose is the same one as the "java-like approach"? AFAICT,
all it does is to change the default encoding from UTF-8 to Latin-1.
I can't follow why this should be *better*, but it would be certainly
as good... In comparison, restricting the "character" interpretation
of the string type (in terms of your proposal) to 7-bit characters
has the advantage that it is less error-prone, as Guido points out.
Regards,
Martin