[Python-Dev] Why not using the hash when comparing strings?

Fri Oct 19 03:03:53 CEST 2012

Hi,

I would like to know if there a reason for not using the hash of
(bytes or unicode) strings when comparing two objects and the hash of
the two objects was already been computed. Using the hash would speed
up comparaison of long strings when the two strings are different.

Something like:

    if ((op == Py_EQ || op == Py_NE)
        && a->ob_shash != -1
        && b->ob_shash != -1
        && a->ob_shash != b->ob_shash) {
        /* strings are not equal */
    }

There are hash collision, so a->ob_shash == b->ob_shash doesn't mean
that the two strings are equal. But if the two hashs are different,
the two strings are different. Isn't it?

Victor