[IronPython] Issue about string.upper and string.lower
dinov at microsoft.com
Tue Dec 16 08:12:33 CET 2008
I've actually looked at this not too long ago and I think your proposal of calling the Invariant functions is the correct solution. I was looking at a few things: This bug http://bugs.python.org/issue1528802, the 3.0 decimal.py module, and also just using Turkish I at the command prompt. If you follow the comments the bug says:
"String upper and lower conversion are locale dependent and
implemented by the underlying libc, whereas Unicode
upper/lower conversion is not and only depends on the
Unicode character database."
That's a pretty clear statement that we shouldn't be using the current locale for our upper/lower string conversions. It wouldn't surprise me if that breaks something somewhere because we won't be doing locale dependent conversions on what someone expects to its type to be str not unicode but in this case I think it'd be better to be consistent with the Unicode side of Python as that's our future.
As for the decimal module it doesn't change from 2.x to 3.0. So .upper() apparently doesn't have this problem when CPython switches to Unicode strings. Or at least no one's hit it, and when they do I think the resolution would be the same as 1528802.
Finally at the command prompt I could never get CPython to do a culture-sensitive operation. I hadn't fully convinced myself on that part though because I hadn't yet escalated to a Turkish install of the OS running IronPython.
But I'm still pretty confident we're at fault and we should change our lower/upper implementation. Obviously the change is easy but I'll do a full test pass to see if it breaks anything.
I also think calling ToUpper to get non-Pythonic results is easy enough (I actually think it's kind of better this way - it saves typing out the framework friendly ToUpperInvariant :)).
From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Glenn Jones
Sent: Friday, December 12, 2008 7:27 AM
To: Discussion of IronPython
Subject: [IronPython] Issue about string.upper and string.lower
We ran across http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=13629 (turkish collation issues) today while trying to port Resolver One to IronPython 2.0.
This is in a test that tries to import decimal while in the turkish locale (it was actually reported by a user!). It does an .upper on a string with an 'i', and that doesn't give the expected results.
This is a very big issue because all the python code out there expects string transformations to be locale-independent, and there may be strange bugs in strange places.
Is mapping .upper and .lower to ToUpperInvariant and ToLowerInvariant an acceptable solution? People that want to do locale-dependent transformations can always use the .NET specific ToUpper/ToLower.
We can work around the decimal being unimportable by hacking it, but clearly this is not a general solution. We will report other modules that might fail from this as we find them.
Glenn and Orestis
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ironpython-users