[Python-3000] String comparison
Rauli Ruohonen
rauli.ruohonen at gmail.com
Fri Jun 8 00:47:07 CEST 2007
On 6/8/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> How would you expect them to work on arrays of code points?
Just like they do with Python 2.5 unicode objects, as long as the
"array of code points" is str, not e.g. a numpy array or tuple of ints,
which I don't expect to grow string methods :-)
> What sort of answer should the following produce?
That depends on what Python does when it reads in the source code.
I think it should normalize to NFC (which Python 2.5 does not do).
> # matches by codepoints, but doesn't look like it
> "LoĴwis".startswith("Lo")
> # if the above did match, then people will assume ö folds to o
> "L�F6wis".startswith("Lo")
> # looks like it matches. Matches as text. Does not match as bytes.
> "LoĴwis".startswith("L�F6")
Normalized to NFC:
"L�F6;wis".startswith("Lo")
"L�F6;wis".startswith("Lo")
"L�F6;wis".startswith("L�F6;")
After this Python lexes, parses and executes. The first two are false,
the last one true. All of the examples should look the same in your editor
(at least ideally). The following would, OTOH, be true false false:
"Lo\u0308wis".startswith("Lo")
"L\u00F6wis".startswith("Lo")
"Lo\u0308wis".startswith("L\u00F6")
As here the source code is pure ASCII, it's WYSIWYG everywhere.
Python 2.5's output with each:
>>> u"Löwis".startswith(u"Lo")
True
>>> u"Löwis".startswith(u"Lo")
False
>>> u"Löwis".startswith(u"Lö")
False
>>> u"Lo\u0308wis".startswith(u"Lo")
True
>>> u"L\u00F6wis".startswith(u"Lo")
False
>>> u"Lo\u0308wis".startswith(u"L\u00F6")
False
More information about the Python-3000
mailing list