convert Unicode to lower/uppercase?

jallan jallan at smrtytrek.com
Thu Sep 25 20:50:41 CEST 2003


martin at v.loewis.de (Martin v. Löwis) wrote in message news:<m3fzinp4k0.fsf at mira.informatik.hu-berlin.de>...
> jallan at smrtytrek.com (jallan) writes:
> 
> > A: No. The UnicodeData.txt file includes all of the 1:1 case mappings,
> > but doesn't include 1:many mappings such as the one needed for
> > uppercasing ß. Since many parsers now expect this file to have at most
> > single characters in the case mapping fields, an additional file
> > (SpecialCasing.txt) was added to provide the 1:many mappings. For more
> > information, see UTR #21- Case Mappings [MD]
> > >>
> > 
> > Python specifications make an implied claim of full support for
> > Unicode and an implied claim that the function upper() uppercases a
> > string properly.
> 
> This is a contradiction: SpecialCasing contains 1:n mappings, whereas
> .upper() can only return a single result. So how do you think
> SpecialCasing should be considered in the implementation of .upper()?

I am not aware that it is philosophically a *necessary* feature of
.upper() that a single character not be replaced by a string of two or
more characters.

One should fix the contradition by either changing the behavior of
.upper() so that it will properly case all strings or documenting
clearly that .upper() does not handle particular kinds of casing. Of
course users often don't read the documentation. :-(

> > Users should not have to know such details. They may wish to know
> > where a particular function does not do what might be expected of it.
> 
> Things are more difficult than they appear to be.

Yes.

Again and again one thinks one has a solution for a problem and then
exceptions turn up.

Again and again one finds things that one's code doesn't handle, often
from failure to analyze fully in the intitial stages and adopting
algorithms that prove insufficient to handle the data found in
reality.

Jim Allan








Jim Allan




More information about the Python-list mailing list