[Tutor] While loop issue, variable not equal to var or var
Steven D'Aprano
steve at pearwood.info
Sat Jul 12 14:19:50 CEST 2014
On Sat, Jul 12, 2014 at 11:27:17AM +0100, Alan Gauld wrote:
> On 12/07/14 10:28, Steven D'Aprano wrote:
>
> >If you're using Python 3.3 or higher, it is better to use
> >message.casefold rather than lower. For English, there's no real
> >difference:
> >...
> >but it can make a difference for non-English languages:
> >
> >py> "Große".lower() # German for "great" or "large"
> >'große'
> >py> "Große".casefold()
> >'grosse'
>
> You learn something new etc...
>
> But I'm trying to figure out what difference this makes in
> practice?
>
> If you were targeting a German audience wouldn't you just test
> against the German alphabet? After all you still have to expect 'grosse'
> which isn't English, so if you know to expect grosse
> why not just test against große instead?
Because the person might have typed any of:
grosse
GROSSE
gROSSE
große
Große
GROßE
GROẞE
etc., and you want to accept them all, just like in English you'd want
to accept any of GREAT great gREAT Great gReAt etc. Hence you want to
fold everything to a single, known, canonical version. Case-fold will do
that, while lowercasing won't.
(The last example includes a character which might not be visible to
many people, since it is quite unusual and not supported by many fonts
yet. If it looks like a box or empty space for you, it is supposed
to be capital sharp-s, matching the small sharp-s ß.)
Oh, here's another example of the difference, this one from Greek:
py> 'Σσς'.lower() # three versions of sigma
'σσς'
py> 'Σσς'.upper()
'ΣΣΣ'
py> 'Σσς'.casefold()
'σσσ'
I suspect that there probably aren't a large number of languages where
casefold and lower do something different, since most languages don't
have distinguish between upper and lower case at all. But there's no
harm in using it, since at worst it returns the same as lower().
--
Steven
More information about the Tutor
mailing list