[Python-Dev] [Python-checkins] cpython: Further reduce the cost of hash collisions by inspecting an additional nearby

Eli Bendersky eliben at gmail.com
Mon Sep 2 00:57:02 CEST 2013


On Sat, Aug 31, 2013 at 9:29 PM, raymond.hettinger <
python-checkins at python.org> wrote:

> http://hg.python.org/cpython/rev/d40a65658ff0
> changeset:   85486:d40a65658ff0
> parent:      85482:4d604f1f0219
> user:        Raymond Hettinger <python at rcn.com>
> date:        Sat Aug 31 21:27:08 2013 -0700
> summary:
>   Further reduce the cost of hash collisions by inspecting an additional
> nearby entry.
>


Hi Raymond,

I'm curious about the benchmarks used to test such micro-optimizations. Do
you use one of the existing Python benchmark suites or do you have a
particular set of micro-benchmarks you're running this on?

Eli




>
> files:
>   Objects/setobject.c |  43 +++++++++++++++++++++++++++++---
>   1 files changed, 39 insertions(+), 4 deletions(-)
>
>
> diff --git a/Objects/setobject.c b/Objects/setobject.c
> --- a/Objects/setobject.c
> +++ b/Objects/setobject.c
> @@ -65,10 +65,11 @@
>  The initial probe index is computed as hash mod the table size. Subsequent
>  probe indices are computed as explained in Objects/dictobject.c.
>
> -To improve cache locality, each probe is done in pairs.
> -After the probe is examined, an adjacent entry is then examined as well.
> -The likelihood is that an adjacent entry is in the same cache line and
> -can be examined more cheaply than another probe elsewhere in memory.
> +To improve cache locality, each probe inspects nearby entries before
> +moving on to probes elsewhere in memory.  Depending on alignment and the
> +size of a cache line, the nearby entries are cheaper to inspect than
> +other probes elsewhere in memory.  This probe strategy reduces the cost
> +of hash collisions.
>
>  All arithmetic on hash should ignore overflow.
>
> @@ -130,6 +131,26 @@
>          if (entry->key == dummy && freeslot == NULL)
>              freeslot = entry;
>
> +        entry = &table[j ^ 2];
> +        if (entry->key == NULL)
> +            break;
> +        if (entry->key == key)
> +            return entry;
> +        if (entry->hash == hash && entry->key != dummy) {
> +            PyObject *startkey = entry->key;
> +            Py_INCREF(startkey);
> +            cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
> +            Py_DECREF(startkey);
> +            if (cmp < 0)
> +                return NULL;
> +            if (table != so->table || entry->key != startkey)
> +                return set_lookkey(so, key, hash);
> +            if (cmp > 0)
> +                return entry;
> +        }
> +        if (entry->key == dummy && freeslot == NULL)
> +            freeslot = entry;
> +
>          i = i * 5 + perturb + 1;
>          j = i & mask;
>          perturb >>= PERTURB_SHIFT;
> @@ -190,6 +211,17 @@
>          if (entry->key == dummy && freeslot == NULL)
>              freeslot = entry;
>
> +        entry = &table[j ^ 2];
> +        if (entry->key == NULL)
> +            break;
> +        if (entry->key == key
> +            || (entry->hash == hash
> +            && entry->key != dummy
> +            && unicode_eq(entry->key, key)))
> +            return entry;
> +        if (entry->key == dummy && freeslot == NULL)
> +            freeslot = entry;
> +
>          i = i * 5 + perturb + 1;
>          j = i & mask;
>          perturb >>= PERTURB_SHIFT;
> @@ -258,6 +290,9 @@
>          entry = &table[j ^ 1];
>          if (entry->key == NULL)
>              break;
> +        entry = &table[j ^ 2];
> +        if (entry->key == NULL)
> +            break;
>          i = i * 5 + perturb + 1;
>          j = i & mask;
>          perturb >>= PERTURB_SHIFT;
>
> --
> Repository URL: http://hg.python.org/cpython
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-checkins
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130901/1873bb9b/attachment.html>


More information about the Python-Dev mailing list