[Python-Dev] [Python-checkins] cpython: Further reduce the cost of hash collisions by inspecting an additional nearby
Eli Bendersky
eliben at gmail.com
Mon Sep 2 00:57:02 CEST 2013
On Sat, Aug 31, 2013 at 9:29 PM, raymond.hettinger <
python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/d40a65658ff0
> changeset: 85486:d40a65658ff0
> parent: 85482:4d604f1f0219
> user: Raymond Hettinger <python at rcn.com>
> date: Sat Aug 31 21:27:08 2013 -0700
> summary:
> Further reduce the cost of hash collisions by inspecting an additional
> nearby entry.
>
Hi Raymond,
I'm curious about the benchmarks used to test such micro-optimizations. Do
you use one of the existing Python benchmark suites or do you have a
particular set of micro-benchmarks you're running this on?
Eli
>
> files:
> Objects/setobject.c | 43 +++++++++++++++++++++++++++++---
> 1 files changed, 39 insertions(+), 4 deletions(-)
>
>
> diff --git a/Objects/setobject.c b/Objects/setobject.c
> --- a/Objects/setobject.c
> +++ b/Objects/setobject.c
> @@ -65,10 +65,11 @@
> The initial probe index is computed as hash mod the table size. Subsequent
> probe indices are computed as explained in Objects/dictobject.c.
>
> -To improve cache locality, each probe is done in pairs.
> -After the probe is examined, an adjacent entry is then examined as well.
> -The likelihood is that an adjacent entry is in the same cache line and
> -can be examined more cheaply than another probe elsewhere in memory.
> +To improve cache locality, each probe inspects nearby entries before
> +moving on to probes elsewhere in memory. Depending on alignment and the
> +size of a cache line, the nearby entries are cheaper to inspect than
> +other probes elsewhere in memory. This probe strategy reduces the cost
> +of hash collisions.
>
> All arithmetic on hash should ignore overflow.
>
> @@ -130,6 +131,26 @@
> if (entry->key == dummy && freeslot == NULL)
> freeslot = entry;
>
> + entry = &table[j ^ 2];
> + if (entry->key == NULL)
> + break;
> + if (entry->key == key)
> + return entry;
> + if (entry->hash == hash && entry->key != dummy) {
> + PyObject *startkey = entry->key;
> + Py_INCREF(startkey);
> + cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
> + Py_DECREF(startkey);
> + if (cmp < 0)
> + return NULL;
> + if (table != so->table || entry->key != startkey)
> + return set_lookkey(so, key, hash);
> + if (cmp > 0)
> + return entry;
> + }
> + if (entry->key == dummy && freeslot == NULL)
> + freeslot = entry;
> +
> i = i * 5 + perturb + 1;
> j = i & mask;
> perturb >>= PERTURB_SHIFT;
> @@ -190,6 +211,17 @@
> if (entry->key == dummy && freeslot == NULL)
> freeslot = entry;
>
> + entry = &table[j ^ 2];
> + if (entry->key == NULL)
> + break;
> + if (entry->key == key
> + || (entry->hash == hash
> + && entry->key != dummy
> + && unicode_eq(entry->key, key)))
> + return entry;
> + if (entry->key == dummy && freeslot == NULL)
> + freeslot = entry;
> +
> i = i * 5 + perturb + 1;
> j = i & mask;
> perturb >>= PERTURB_SHIFT;
> @@ -258,6 +290,9 @@
> entry = &table[j ^ 1];
> if (entry->key == NULL)
> break;
> + entry = &table[j ^ 2];
> + if (entry->key == NULL)
> + break;
> i = i * 5 + perturb + 1;
> j = i & mask;
> perturb >>= PERTURB_SHIFT;
>
> --
> Repository URL: http://hg.python.org/cpython
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-checkins
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130901/1873bb9b/attachment.html>
More information about the Python-Dev
mailing list