On Dec 10, 2012, at 2:48 AM, Christian Heimes <christian@python.org>
wrote:
On the other hand every lookup and collision checks needs an
additional multiplication, addition and pointer dereferencing:
entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx
Currently, the dict implementation allows alternative lookup
functions based on whether the keys are all strings.
The choice of lookup function is stored in a function pointer.
That lets each lookup use the currently active lookup
function without having to make any computations or branches.
An indirect function call is technically a branch, as seen from the CPU
(and not necessarily a very predictable one, although recent Intel
CPUs are said to be quite good at that).
FWIW, we already have an indirection to the lookup function.
I would piggyback on that, so no new indirections are required.
My plan now is to apply the space compaction idea to sets.
That code is less complex than dicts, and set operations
stand to benefit the most from improved iteration speed.
The steps would be:
* Create a good set of benchmarks for set operations
for both size and speed.
* Start with the simplest version of the idea: separate the
entries table from the hash table. Keep the hash table at
Py_ssize_t, and pre-allocate the entry array to two-thirds the size
of the hash table. This should give about a 25% space savings
and speed-up iteration for all the set-to-set operations.
* If that all works out, I want to trim the entry table for frozensefs
so that the entry table has no over-allocations. This should
give a much larger space savings.
* Next, I want to experiment with the idea of using char/short/long
sizes for the hash table. Since there is already an existing
lookup indirection, this can be done with no additional overhead.
Small sets will get the most benefit for the space savings and
the cache performance for hash lookups should improve nicely
(for a hash table of size 32 could fit in a single cache line).
At each step, I'll run the benchmarks to make sure the expected
speed and space benefits are being achieved.
As a side-effect, sets without deletions will retain their insertion
order. If this is of concern, it would be no problem to implement
Antoine's idea of scrambling the entries during iteration.
Raymond
P.S. I've gotten a lot of suggestions for improvements to the
proof-of-concept code. Thank you for that. The latest version
In that code, entries are stored in regular Python lists
and inherit their over-allocation characteristics (about
12.5% overallocated for large lists). There are many
other possible allocation strategies with their own
respective speed/space trade-offs.