[Python-Dev] Hash values and comparing objects

Ka-Ping Yee pingster@ilm.com
Thu, 6 Jul 2000 14:01:49 -0700 (PDT)

On Thu, 6 Jul 2000, M.-A. Lemburg wrote:
> Previously, Unicode used UTF-8 as basis for calculating the
> hash value

Right, and i was trying to suggest (in a previous message)
that the hash value should be calculated from the actual
Unicode character values themselves.  Then for any case where
it's possible for an 8-bit string to be =3D=3D to a Unicode
string, they will have the same hash.  Doesn't this solve the
problem?  Have i misunderstood?

> How serious is the need for objects which compare equal to
> have the same hash value ?

For basic, immutable types like strings -- quite serious indeed,
i would imagine.

> 2. In some locales '=E4=F6=FC' =3D=3D u'=E4=F6=FC' is true, while in othe=
rs this is
>    not the case. If they do compare equal, the hash values
>    must match.

This sounds very bad.  I thought we agreed that attempting to
compare (or add) a Unicode string and an 8-bit string containing
non-ASCII characters (as in your example) should raise an exception.

Such an attempt constitutes an ambiguous request -- you haven't
specified how to turn the 8-bit bytes into Unicode, and it's better
to be explicit than to have the interpreter guess (and guess
differently depending on the environment!!)

-- ?!ng