[Python-Dev] Normalizing unicode?

Thu Dec 11 05:34:37 EST 2003

Edward Loper <edloper at gradient.cis.upenn.edu> writes:

> Scott David Daniels wrote:
>> I naïvely wrote:
>>  >Could we perhaps use a comparison that, in effect, did:
>>  >     def uni_equal(first, second):
>>  >         if first == second:
>>  >             return True
>>  >         return first.normalize() == second.normalize()
>>  >That is, take advantage of the fact that normalization is often
>>  >unnecessary for "trivial" reasons.
>> [...]
>
> Before we start considering how it's possible to make
> unicode.__equal__ act encoding-insensitively[1], I think we need to
> consider whether that's really the behavior we want.  In some ways,
> this seems like case-insensitive equality to me: it's certainly a
> useful operation, but I don't think it should be the object's builtin
> notion of equality..
>    - I think people will be confused if s1==s2 but s1[0]!=s2[0].
>    - Sometimes you might *want* to distinguish different encodings of
>      the "same" string; a "normalized" equality test makes that very
>      difficult.

In general it seems to me that == should, given a choice, err on the
side of being an overly tight equivalence relation -- i.e. return True
less often.

Cheers,
mwh

-- 
81. In computing, turning the obvious into the useful is a living
    definition of the word "frustration".
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html