[Python-Dev] Why not using the hash when comparing strings?

Fri Oct 19 03:07:17 CEST 2012

On 19/10/12 12:03, Victor Stinner wrote:
> Hi,
>
> I would like to know if there a reason for not using the hash of
> (bytes or unicode) strings when comparing two objects and the hash of
> the two objects was already been computed. Using the hash would speed
> up comparaison of long strings when the two strings are different.

Assuming the hash has already been compared, then I imagine it would be
faster.

> Something like:
>
>      if ((op == Py_EQ || op == Py_NE)
>          &&  a->ob_shash != -1
>          &&  b->ob_shash != -1
>          &&  a->ob_shash != b->ob_shash) {
>          /* strings are not equal */
>      }
>
> There are hash collision, so a->ob_shash == b->ob_shash doesn't mean
> that the two strings are equal. But if the two hashs are different,
> the two strings are different. Isn't it?

I would certainly hope so :)

-- 
Steven