<br><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Dec 9, 2012 at 5:44 PM, Raymond Hettinger <span dir="ltr"><<a href="mailto:raymond.hettinger@gmail.com" target="_blank">raymond.hettinger@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The current memory layout for dictionaries is<br>

unnecessarily inefficient.  It has a sparse table of<br>

24-byte entries containing the hash value, key pointer,<br>

and value pointer.<br>

<br>

Instead, the 24-byte entries should be stored in a<br>

dense table referenced by a sparse table of indices.<br>

<br>

For example, the dictionary:<br>

<br>

    d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}<br>

<br>

is currently stored as:<br>

<br>

    entries = [['--', '--', '--'],<br>

               [-8522787127447073495, 'barry', 'green'],<br>

               ['--', '--', '--'],<br>

               ['--', '--', '--'],<br>

               ['--', '--', '--'],<br>

               [-9092791511155847987, 'timmy', 'red'],<br>

               ['--', '--', '--'],<br>

               [-6480567542315338377, 'guido', 'blue']]<br>

<br>

Instead, the data should be organized as follows:<br>

<br>

    indices =  [None, 1, None, None, None, 0, None, 2]<br>

    entries =  [[-9092791511155847987, 'timmy', 'red'],<br>

                [-8522787127447073495, 'barry', 'green'],<br>

                [-6480567542315338377, 'guido', 'blue']]<br>

<br>

Only the data layout needs to change.  The hash table<br>

algorithms would stay the same.  All of the current<br>

optimizations would be kept, including key-sharing<br>

dicts and custom lookup functions for string-only<br>

dicts.  There is no change to the hash functions, the<br>

table search order, or collision statistics.<br>

<br>

The memory savings are significant (from 30% to 95%<br>

compression depending on the how full the table is).<br>

Small dicts (size 0, 1, or 2) get the most benefit.<br>

<br>

For a sparse table of size t with n entries, the sizes are:<br>

<br>

    curr_size = 24 * t<br>

    new_size = 24 * n + sizeof(index) * t<br>

<br>

In the above timmy/barry/guido example, the current<br>

size is 192 bytes (eight 24-byte entries) and the new<br>

size is 80 bytes (three 24-byte entries plus eight<br>

1-byte indices).  That gives 58% compression.<br>

<br>

Note, the sizeof(index) can be as small as a single<br>

byte for small dicts, two bytes for bigger dicts and<br>

up to sizeof(Py_ssize_t) for huge dict.<br>

<br>

In addition to space savings, the new memory layout<br>

makes iteration faster.  Currently, keys(), values, and<br>

items() loop over the sparse table, skipping-over free<br>

slots in the hash table.  Now, keys/values/items can<br>

loop directly over the dense table, using fewer memory<br>

accesses.<br>

<br>

Another benefit is that resizing is faster and<br>

touches fewer pieces of memory.  Currently, every<br>

hash/key/value entry is moved or copied during a<br>

resize.  In the new layout, only the indices are<br>

updated.  For the most part, the hash/key/value entries<br>

never move (except for an occasional swap to fill a<br>

hole left by a deletion).<br>

<br>

With the reduced memory footprint, we can also expect<br>

better cache utilization.<br>

<br>

For those wanting to experiment with the design,<br>

there is a pure Python proof-of-concept here:<br>

<br>

   <a href="http://code.activestate.com/recipes/578375" target="_blank">http://code.activestate.com/recipes/578375</a><br>

<br>

YMMV: Keep in mind that the above size statics assume a<br>

build with 64-bit Py_ssize_t and 64-bit pointers.  The<br>

space savings percentages are a bit different on other<br>

builds.  Also, note that in many applications, the size<br>

of the data dominates the size of the container (i.e.<br>

the weight of a bucket of water is mostly the water,<br>

not the bucket).<br></blockquote><div><br></div><div>+1  I like it.  The plethora of 64-bit NULL values we cart around in our internals bothers me, this gets rid of some.</div><div><br></div><div>The worst case of ~30% less memory consumed for the bucket (ignoring the water) is still decent.  For a program with a lot of instances of small to medium sized objects I'd expect a measurable win.</div>


<div><br></div><div>I'm interested in seeing performance and memory footprint numbers from a C implementation of this.</div><div><br></div><div>-gps</div></div></div>