[Python-Dev] Hash values and comparing objects

M.-A. Lemburg mal@lemburg.com
Thu, 06 Jul 2000 18:28:12 +0200


There currently is a problem with the Unicode objects which
I'd like to resolve:

Since Unicode object are comparable to strings, they should
have the same hash value as their string correspondents (the
8-bit strings which compare equal -- this can depend on the
default encoding which again depends on the locale setting).

Previously, Unicode used UTF-8 as basis for calculating the
hash value (the Unicode object created a UTF-8 string object
and delegated the hash value calculation to it, caching the
result and the string for future use). Since I would like
to make the internal encoding cache use the default encoding
instead, I have two problems to solve:

1. It is sometimes not possible to encode the Unicode value
   using the default encoding. A different strategy for
   calculating the hash value would have to be used.

2. In some locales 'äöü' == u'äöü' is true, while in others this is
   not the case. If they do compare equal, the hash values
   must match.

How serious is the need for objects which compare equal to
have the same hash value ?

(I would much prefer to calculate the hash value using the
internal UTF-16 buffer rather than first creating an
encoded string.)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/