[Python-Dev] Re: test_unicode_file failing on Mac OS X
Martin v. Löwis
martin at v.loewis.de
Mon Dec 8 12:53:37 EST 2003
David Eppstein <eppstein at ics.uci.edu> writes:
> > For the test, it would be best to compare normal forms, and have the
> > test pass if the normal forms (NFD) are equal.
>
> Shouldn't that be what happens in general for equality testing of
> unicodes, not just for this test?
There was a BDFL pronouncement once that this should not be done
automatically, in general. Normalization is a very slow algorithm, and
it might not be meaningful in all cases.
E.g. XML 1.1 will require that all documents are in NFC. So the
XML-generating application will have to normalize on output; all
XML-processing applications can then assume that all strings are
normalized. Converting them to NFD all the time would be wasteful.
Python is still lacking an efficient test function to determine
whether a string is in normal form already; reportedly, a yes-no-maybe
function with a reasonably slow rate of maybe answers can be
implemented much more efficiently than performing the actual
conversion.
Regards,
Martin
More information about the Python-Dev
mailing list