[Python-Dev] Hash values and comparing objects
Ka-Ping Yee
pingster@ilm.com
Thu, 6 Jul 2000 14:01:49 -0700 (PDT)
On Thu, 6 Jul 2000, M.-A. Lemburg wrote:
> Previously, Unicode used UTF-8 as basis for calculating the
> hash value
Right, and i was trying to suggest (in a previous message)
that the hash value should be calculated from the actual
Unicode character values themselves. Then for any case where
it's possible for an 8-bit string to be =3D=3D to a Unicode
string, they will have the same hash. Doesn't this solve the
problem? Have i misunderstood?
> How serious is the need for objects which compare equal to
> have the same hash value ?
For basic, immutable types like strings -- quite serious indeed,
i would imagine.
> 2. In some locales '=E4=F6=FC' =3D=3D u'=E4=F6=FC' is true, while in othe=
rs this is
> not the case. If they do compare equal, the hash values
> must match.
This sounds very bad. I thought we agreed that attempting to
compare (or add) a Unicode string and an 8-bit string containing
non-ASCII characters (as in your example) should raise an exception.
Such an attempt constitutes an ambiguous request -- you haven't
specified how to turn the 8-bit bytes into Unicode, and it's better
to be explicit than to have the interpreter guess (and guess
differently depending on the environment!!)
-- ?!ng