[Python-Dev] C coding experiment

Raymond Hettinger raymond.hettinger at verizon.net
Fri Sep 16 18:58:08 CEST 2005

> -----Original Message-----
> From: Andrew Durdin [mailto:adurdin at gmail.com]
> Sent: Friday, September 16, 2005 8:25 AM
> To: python at rcn.com
> Cc: python-dev at python.org
> Subject: Re: [Python-Dev] C coding experiment
> On 9/1/05, Raymond Hettinger <raymond.hettinger at verizon.net> wrote:
> > The goal is to determine whether the setobject.c implementation
would be
> > improved by recoding the set_lookkey() function to optimize key
> > insertion order using Brent's variation of Algorithm D (See Knuth
> > III, section 6.4, page 525).
> Brent's variation depends on the next probe position for a key being
> derivable just from the key and its current position. The use of
> perturbation in set_lookkey() prevents that, as we cannot say, given a
> key at a certain position, where the next probe location for that key
> would have been, without doing a complete lookup of that key
> (obviously too expensive).

Right.  For Brent's variation to work, I've had to replace perturbation
with a secondary hash function with linear spacing.  

> It would be interesting to see whether Brent's variation or
> perturbation give better expected overall performance -- the latter
> may well prove better in most cases, as it should reduce the number of
> probes needed for both successful and unsuccessful lookups, while
> Brent's variation only speeds up successful lookups. Well, this sort
> of question is what the experiment is meant to resolve, no?

Right again.

I'm also experimenting with a simpler approach -- whenever there are
more than three probes, always swap the new key into the first position
and then unconditionally re-insert the swapped-out key.  Most of the
time that gives an improvement and it doesn't require changing the
perturbation logic -- it is equivalent to changing the key insertion
order.  The only implementation wrinkle is that the approach needs
separate lookup and insert functions (only the latter does the swap).

The simpler approach is cheap to implement as it doesn't slow-down the
first three probes.  OTOH, te benefits are smaller than with Brent's
variation -- it only relieves the worst collisions.  I'm still
interested in it because it helps the two most important use cases for
sets (uniquification and membership testing).


More information about the Python-Dev mailing list