[I18n-sig] Re: [Python-Dev] Unicode debate
M.-A. Lemburg
mal@lemburg.com
Wed, 03 May 2000 01:05:28 +0200
Paul Prescod wrote:
>
> Combining characters are a whole 'nother level of complexity. Charater
> sets are hard. I don't accept that the argument that "Unicode itself has
> complexities so that gives us license to introduce even more
> complexities at the character representation level."
>
> > FYI: Normalization is needed to make comparing Unicode
> > strings robust, e.g. u"é" should compare equal to u"e\u0301".
>
> That's a whole 'nother debate at a whole 'nother level of abstraction. I
> think we need to get the bytes/characters level right and then we can
> worry about display-equivalent characters (or leave that to the Python
> programmer to figure out...).
I just wanted to point out that the argument "slicing doesn't
work with UTF-8" is moot.
I do see a point against UTF-8 auto-conversion given the example
that Guido mailed me:
"""
s = 'ab\341\210\264def' # == str(u"ab\u1234def")
s.find(u"def")
This prints 3 -- the wrong result since "def" is found at s[5:8], not
at s[3:6].
"""
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/