[Python-3000] String comparison

Fri Jun 8 19:36:30 CEST 2007

> The additional field is 8 bits, two bits for each normalization (a
> Yes/Maybe/No value). In Unicode 4.1 only 5 different combinations are
> used, but I don't know if that's true of later versions. As
> _PyUnicode_Database_Records stores only unique records, this also results
> in an increase of the number of records, from 219 to 304. Each record
> looks like this:

If I count correctly, this gives roughly 900 additional bytes. That's
fine.

> It doesn't affect behavior or the API much(*), only performance. Current
> test_normalize.py uses a test suite it fetches from UCD, so it
> should be adequate.

I assumed you want to expose it to Python also, as an is_normalized
function. I guess not having such a function is fine if applications
can do normalize(form, s) == s and have that be efficient as long
as the outcome is true (i.e. if it is more expensive only if it's
not normalized).

Regards,
Martin