[Python-Dev] Why not using the hash when comparing strings?

Victor Stinner victor.stinner at gmail.com
Fri Oct 19 03:03:53 CEST 2012


Hi,

I would like to know if there a reason for not using the hash of
(bytes or unicode) strings when comparing two objects and the hash of
the two objects was already been computed. Using the hash would speed
up comparaison of long strings when the two strings are different.

Something like:

    if ((op == Py_EQ || op == Py_NE)
        && a->ob_shash != -1
        && b->ob_shash != -1
        && a->ob_shash != b->ob_shash) {
        /* strings are not equal */
    }

There are hash collision, so a->ob_shash == b->ob_shash doesn't mean
that the two strings are equal. But if the two hashs are different,
the two strings are different. Isn't it?

Victor


More information about the Python-Dev mailing list