python 2.7.12 on Linux behaving differently than on Windows

Chris Angelico rosuav at gmail.com
Thu Dec 8 17:31:55 EST 2016


On Fri, Dec 9, 2016 at 8:42 AM, BartC <bc at freeuk.com> wrote:
> Python3 tells me that original, lower-case and upper-case versions are:
>
> ßẞıİiIÅσςσ
> ßßıi̇iiåσςσ
> SSẞIİIIÅΣΣΣ

Now lower-case the upper-case version and see what you get. And
upper-case the lower-case version. Because x.upper().lower() should be
the same as x.lower(), right? And x.lower().upper().lower() is the
same too. Right?

>>> x = "ßẞıİiIÅσςσ"
>>> x.upper().lower() == x.lower()
False
>>> x.upper().lower() == x.lower().upper().lower()
False

> (Python2 failed to run the code:
>
> s="ßẞıİiIÅσςσ"
> print (s)
> print (s.lower())
> print (s.upper())
> )

I don't know what you mean by "failed", but you shouldn't have
non-ASCII characters in Python 2 source code without a coding cookie.
Also, you should be using a Unicode string. Or just stick with Py3.

> But, given that, what's your point? That some esoteric Unicode characters
> have ill-defined upper and lower case versions, and therefore it is
> essential to treat them distinctly in EVERY SINGLE ALPHABET including
> English?

Yes, because it's Unicode's fault, isn't it. The world was so nice and
simple before Unicode came along and created all these messes for us
to have to deal with. And you're quite right - these characters are
esoteric and hardly ever used. [1] And they're so ill-defined that
nobody could predict or figure out what the upper-case and lower-case
forms are. It's not like there are rules that come from the languages
where they're used.

> I guess that means that if I try a write a book about a character called
> HarrY potter or james BOND then I cannot be sued.

This is not legal advice, but I think you'll agree that "HarrY" is not
equal to "Harry". Whether it's a good idea to have two files in the
same directory that have those names, it's certainly the case that the
names can be distinguished. (Sir HarrY is a very distinguished name.)
And if you decide not to distinguish between "Y" and "y", then which
of the above characters should be not-distinguished?

ChrisA

[1] Okay, to be fair, one of the ones I used *is* esoteric. But most
of them aren't.



More information about the Python-list mailing list