More compact dictionaries with faster iteration
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer. Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices. For example, the dictionary: d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'} is currently stored as: entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']] Instead, the data should be organized as follows: indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']] Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics. The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit. For a sparse table of size t with n entries, the sizes are: curr_size = 24 * t new_size = 24 * n + sizeof(index) * t In the above timmy/barry/guido example, the current size is 192 bytes (eight 24-byte entries) and the new size is 80 bytes (three 24-byte entries plus eight 1-byte indices). That gives 58% compression. Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict. In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses. Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion). With the reduced memory footprint, we can also expect better cache utilization. For those wanting to experiment with the design, there is a pure Python proof-of-concept here: http://code.activestate.com/recipes/578375 YMMV: Keep in mind that the above size statics assume a build with 64-bit Py_ssize_t and 64-bit pointers. The space savings percentages are a bit different on other builds. Also, note that in many applications, the size of the data dominates the size of the container (i.e. the weight of a bucket of water is mostly the water, not the bucket). Raymond
On 2012-12-10 01:44, Raymond Hettinger wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as:
entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']]
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics.
The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit.
For a sparse table of size t with n entries, the sizes are:
curr_size = 24 * t new_size = 24 * n + sizeof(index) * t
In the above timmy/barry/guido example, the current size is 192 bytes (eight 24-byte entries) and the new size is 80 bytes (three 24-byte entries plus eight 1-byte indices). That gives 58% compression.
Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict.
In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses.
Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion).
With the reduced memory footprint, we can also expect better cache utilization.
For those wanting to experiment with the design, there is a pure Python proof-of-concept here:
http://code.activestate.com/recipes/578375
YMMV: Keep in mind that the above size statics assume a build with 64-bit Py_ssize_t and 64-bit pointers. The space savings percentages are a bit different on other builds. Also, note that in many applications, the size of the data dominates the size of the container (i.e. the weight of a bucket of water is mostly the water, not the bucket).
What happens when a key is deleted? Will that leave an empty slot in the entry array? If so, will the array be compacted if the proportion of entries grows beyond a certain limit? Adding a key would involve simply appending to the entry array (filling the last empty slot), but if there could be empty slots in other places, would it look for the first? Actually, as I think about it, you could form a linked list (using the 'hash' field) of those empty slots that aren't part of the final contiguous block and fill those preferentially.
On Dec 9, 2012, at 10:03 PM, MRAB <python@mrabarnett.plus.com> wrote:
What happens when a key is deleted? Will that leave an empty slot in the entry array?
Yes. See the __delitem__() method in the pure python implemention at http://code.activestate.com/recipes/578375
If so, will the array be compacted if the proportion of entries grows beyond a certain limit?
Yes. Compaction happens during resizing() which triggers when the dict reaches two-thirds full (including dummy entries). See the __setitem__() method in the pure python implementation.
Adding a key would involve simply appending to the entry array (filling the last empty slot), but if there could be empty slots in other places, would it look for the first?
Yes. See the _next_open_index() helper method in the pure python implemention.
Actually, as I think about it, you could form a linked list (using the 'hash' field) of those empty slots that aren't part of the final contiguous block and fill those preferentially.
That's the plan. See the comment above the keylist.index(UNUSED) line in the _next_open_index() method in the pure python implementation. Raymond
On 10.12.12 05:30, Raymond Hettinger wrote:
On Dec 9, 2012, at 10:03 PM, MRAB <python@mrabarnett.plus.com <mailto:python@mrabarnett.plus.com>> wrote:
What happens when a key is deleted? Will that leave an empty slot in the entry array? Yes. See the __delitem__() method in the pure python implemention at http://code.activestate.com/recipes/578375
You can move the last entry on freed place. This will preserve compactness of entries array and simplify and speedup iterations and some other operations. def __delitem__(self, key, hashvalue=None): if hashvalue is None: hashvalue = hash(key) found, i = self._lookup(key, hashvalue) if found: index = self.indices[i] self.indices[i] = DUMMY self.size -= 1 if index != size: lasthash = self.hashlist[index] lastkey = self.keylist[index] found, j = self._lookup(lastkey, lasthash) assert found assert i != j self.indices[j] = index self.hashlist[index] = lasthash self.keylist[index] = lastkey self.valuelist[index] = self.valuelist[size] index = size self.hashlist[index] = UNUSED self.keylist[index] = UNUSED self.valuelist[index] = UNUSED else: raise KeyError(key)
On Dec 10, 2012, at 4:02 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 10.12.12 05:30, Raymond Hettinger wrote:
On Dec 9, 2012, at 10:03 PM, MRAB <python@mrabarnett.plus.com <mailto:python@mrabarnett.plus.com>> wrote:
What happens when a key is deleted? Will that leave an empty slot in the entry array? Yes. See the __delitem__() method in the pure python implemention at http://code.activestate.com/recipes/578375
You can move the last entry on freed place. This will preserve compactness of entries array and simplify and speedup iterations and some other operations.
def __delitem__(self, key, hashvalue=None): if hashvalue is None: hashvalue = hash(key) found, i = self._lookup(key, hashvalue) if found: index = self.indices[i] self.indices[i] = DUMMY self.size -= 1 if index != size: lasthash = self.hashlist[index] lastkey = self.keylist[index] found, j = self._lookup(lastkey, lasthash) assert found assert i != j self.indices[j] = index self.hashlist[index] = lasthash self.keylist[index] = lastkey self.valuelist[index] = self.valuelist[size] index = size self.hashlist[index] = UNUSED self.keylist[index] = UNUSED self.valuelist[index] = UNUSED else: raise KeyError(key)
That is a clever improvement. Thank you. Using your idea (plus some tweaks) cleans-up the code a bit (simplifying iteration, simplifying the resizing logic, and eliminating the UNUSED constant). I'm updating the posted code to reflect your suggestion. Thanks again, Raymond
Yet some comments about your Python implementation. 1. Don't use "is FREE" and "is DUMMY" as array doesn't preserve identity. 2. Wrong limits used in _make_index(): 128 overflows 'b', 65536 overflows 'h' and 'l' can be not enough for ssize_t. 3. round_upto_powtwo() can be implemented as 1 << n.bit_length(). 4. i * 5 faster than (i << 2) + i on Python. 5. You can get rid of "size" attribute and use len(self.keylist) instead.
On Sun, Dec 9, 2012 at 5:44 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as:
entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']]
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics.
The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit.
For a sparse table of size t with n entries, the sizes are:
curr_size = 24 * t new_size = 24 * n + sizeof(index) * t
In the above timmy/barry/guido example, the current size is 192 bytes (eight 24-byte entries) and the new size is 80 bytes (three 24-byte entries plus eight 1-byte indices). That gives 58% compression.
Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict.
In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses.
Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion).
With the reduced memory footprint, we can also expect better cache utilization.
For those wanting to experiment with the design, there is a pure Python proof-of-concept here:
http://code.activestate.com/recipes/578375
YMMV: Keep in mind that the above size statics assume a build with 64-bit Py_ssize_t and 64-bit pointers. The space savings percentages are a bit different on other builds. Also, note that in many applications, the size of the data dominates the size of the container (i.e. the weight of a bucket of water is mostly the water, not the bucket).
+1 I like it. The plethora of 64-bit NULL values we cart around in our internals bothers me, this gets rid of some. The worst case of ~30% less memory consumed for the bucket (ignoring the water) is still decent. For a program with a lot of instances of small to medium sized objects I'd expect a measurable win. I'm interested in seeing performance and memory footprint numbers from a C implementation of this. -gps
Hello Raymond Am 10.12.2012 02:44, schrieb Raymond Hettinger:
Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion).
With the reduced memory footprint, we can also expect better cache utilization.
On the other hand every lookup and collision checks needs an additional multiplication, addition and pointer dereferencing: entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx It's hard to predict how the extra CPU instructions are going to affect the performance on different CPUs and scenarios. My guts tell me that your proposal is going to perform better on modern CPUs with lots of L1 and L2 cache and in an average case scenario. A worst case scenario with lots of collisions might be measurable slower for large dicts and small CPU cache. But this is pure speculation and my guts could be terrible wrong. :) I'm +1 to give it a shot. Good luck! Christian
On 10.12.12 09:48, Christian Heimes wrote:
On the other hand every lookup and collision checks needs an additional multiplication, addition and pointer dereferencing:
entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx
I think that main slowdown will be in getting index from array with variable size of elements. It requires conditional jump which is not such cheap as additional or shifting. switch (self->index_size) { case 1: idx = ((uint8_t *)self->indices)[i]; break; case 2: idx = ((uint16_t *)self->indices)[i]; break; ... } I think that variable-size indices don't worth effort.
On Dec 10, 2012, at 08:48 AM, Christian Heimes wrote:
It's hard to predict how the extra CPU instructions are going to affect the performance on different CPUs and scenarios. My guts tell me that your proposal is going to perform better on modern CPUs with lots of L1 and L2 cache and in an average case scenario. A worst case scenario with lots of collisions might be measurable slower for large dicts and small CPU cache.
I'd be interested to see what effect this has on memory constrained platforms such as many current ARM applications (mostly likely 32bit for now). Python's memory consumption is an overheard complaint for folks developing for those platforms. Cheers, -Barry
Indeed, I had to change the dict tuning parameters to make dicts behave reasonably on the PS3. Interesting stuff indeed.
-----Original Message----- From: Python-Dev [mailto:python-dev- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Barry Warsaw Sent: 10. desember 2012 15:28 To: python-dev@python.org Subject: Re: [Python-Dev] More compact dictionaries with faster iteration
I'd be interested to see what effect this has on memory constrained platforms such as many current ARM applications (mostly likely 32bit for now). Python's memory consumption is an overheard complaint for folks developing for those platforms.
Cheers, -Barry
On Mon, 10 Dec 2012 16:06:23 +0000, kristjan@ccpgames.com wrote:
Indeed, I had to change the dict tuning parameters to make dicts behave reasonably on the PS3.
Interesting stuff indeed.
Out of both curiosity and a possible application of my own for the information, what values did you end up with? --David
On Mon, Dec 10, 2012 at 10:28 AM, Barry Warsaw <barry@python.org> wrote:
On Dec 10, 2012, at 08:48 AM, Christian Heimes wrote:
It's hard to predict how the extra CPU instructions are going to affect the performance on different CPUs and scenarios. My guts tell me that your proposal is going to perform better on modern CPUs with lots of L1 and L2 cache and in an average case scenario. A worst case scenario with lots of collisions might be measurable slower for large dicts and small CPU cache.
I'd be interested to see what effect this has on memory constrained platforms such as many current ARM applications (mostly likely 32bit for now). Python's memory consumption is an overheard complaint for folks developing for those platforms.
Maybe Kristjan can shed some light on how this would have helped him on the ps3 work he has been doing for Dust 514 as he has had memory issues there.
Am 10.12.2012 16:28, schrieb Barry Warsaw:
I'd be interested to see what effect this has on memory constrained platforms such as many current ARM applications (mostly likely 32bit for now). Python's memory consumption is an overheard complaint for folks developing for those platforms.
I had ARM platforms in mind, too. Unfortunately we don't have any ARM platforms for testing and performance measurement. Even Trent's Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi, that's all. Perhaps somebody (PSF ?) is willing to donate a couple of ARM boards to Snakebite. I'm thinking of Raspberry Pi (ARMv6), Pandaboard (ARMv7 Cortex-A9) and similar. Christian
On Dec 10, 2012, at 05:39 PM, Christian Heimes wrote:
I had ARM platforms in mind, too. Unfortunately we don't have any ARM platforms for testing and performance measurement. Even Trent's Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi, that's all.
I have a few ARM machines that I can use for testing, though I can't provide external access to them. * http://buildbot.python.org/all/buildslaves/warsaw-ubuntu-arm (Which oops, I see is down. Why did I not get notifications about that?) * I have a Nexus 7 flashed with Ubuntu 12.10 (soon to be 13.04). * Pandaboard still sitting in its box. ;)
Perhaps somebody (PSF ?) is willing to donate a couple of ARM boards to Snakebite. I'm thinking of Raspberry Pi (ARMv6), Pandaboard (ARMv7 Cortex-A9) and similar.
Suitable ARM boards can be had cheap, probably overwhelmed by labor costs of getting them up and running. I am not offering for *that*. ;) Cheers, -Barry
On Mon, 10 Dec 2012 11:53:17 -0500 Barry Warsaw <barry@python.org> wrote:
On Dec 10, 2012, at 05:39 PM, Christian Heimes wrote:
I had ARM platforms in mind, too. Unfortunately we don't have any ARM platforms for testing and performance measurement. Even Trent's Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi, that's all.
I have a few ARM machines that I can use for testing, though I can't provide external access to them.
* http://buildbot.python.org/all/buildslaves/warsaw-ubuntu-arm (Which oops, I see is down. Why did I not get notifications about that?)
Because buildbot.python.org is waiting for someone (mail.python.org, perhaps) to accept its SMTP requests. Feel free to ping the necessary people ;-) Regards Antoine.
On Mon, Dec 10, 2012 at 08:53:17AM -0800, Barry Warsaw wrote:
On Dec 10, 2012, at 05:39 PM, Christian Heimes wrote:
I had ARM platforms in mind, too. Unfortunately we don't have any ARM platforms for testing and performance measurement. Even Trent's Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi, that's all.
Perhaps somebody (PSF ?) is willing to donate a couple of ARM boards to Snakebite. I'm thinking of Raspberry Pi (ARMv6), Pandaboard (ARMv7 Cortex-A9) and similar.
Suitable ARM boards can be had cheap, probably overwhelmed by labor costs of getting them up and running. I am not offering for *that*. ;)
If someone donates the hardware, I'll take care of everything else. Trent.
On Dec 10, 2012, at 2:48 AM, Christian Heimes <christian@python.org> wrote:
On the other hand every lookup and collision checks needs an additional multiplication, addition and pointer dereferencing:
entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx
Currently, the dict implementation allows alternative lookup functions based on whether the keys are all strings. The choice of lookup function is stored in a function pointer. That lets each lookup use the currently active lookup function without having to make any computations or branches. Likewise, the lookup functions could be swapped between char, short, long, and ulong index sizes during the resize step. IOW, the selection only needs to be made once per resize, not one per lookup. Raymond
Le Mon, 10 Dec 2012 18:17:57 -0500, Raymond Hettinger <raymond.hettinger@gmail.com> a écrit :
On Dec 10, 2012, at 2:48 AM, Christian Heimes <christian@python.org> wrote:
On the other hand every lookup and collision checks needs an additional multiplication, addition and pointer dereferencing:
entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx
Currently, the dict implementation allows alternative lookup functions based on whether the keys are all strings. The choice of lookup function is stored in a function pointer. That lets each lookup use the currently active lookup function without having to make any computations or branches.
An indirect function call is technically a branch, as seen from the CPU (and not necessarily a very predictable one, although recent Intel CPUs are said to be quite good at that). Regards Antoine.
On Dec 11, 2012, at 1:13 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Dec 10, 2012, at 2:48 AM, Christian Heimes <christian@python.org> wrote:
On the other hand every lookup and collision checks needs an additional multiplication, addition and pointer dereferencing:
entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx
Currently, the dict implementation allows alternative lookup functions based on whether the keys are all strings. The choice of lookup function is stored in a function pointer. That lets each lookup use the currently active lookup function without having to make any computations or branches.
An indirect function call is technically a branch, as seen from the CPU (and not necessarily a very predictable one, although recent Intel CPUs are said to be quite good at that).
FWIW, we already have an indirection to the lookup function. I would piggyback on that, so no new indirections are required. My plan now is to apply the space compaction idea to sets. That code is less complex than dicts, and set operations stand to benefit the most from improved iteration speed. The steps would be: * Create a good set of benchmarks for set operations for both size and speed. * Start with the simplest version of the idea: separate the entries table from the hash table. Keep the hash table at Py_ssize_t, and pre-allocate the entry array to two-thirds the size of the hash table. This should give about a 25% space savings and speed-up iteration for all the set-to-set operations. * If that all works out, I want to trim the entry table for frozensefs so that the entry table has no over-allocations. This should give a much larger space savings. * Next, I want to experiment with the idea of using char/short/long sizes for the hash table. Since there is already an existing lookup indirection, this can be done with no additional overhead. Small sets will get the most benefit for the space savings and the cache performance for hash lookups should improve nicely (for a hash table of size 32 could fit in a single cache line). At each step, I'll run the benchmarks to make sure the expected speed and space benefits are being achieved. As a side-effect, sets without deletions will retain their insertion order. If this is of concern, it would be no problem to implement Antoine's idea of scrambling the entries during iteration. Raymond P.S. I've gotten a lot of suggestions for improvements to the proof-of-concept code. Thank you for that. The latest version is at: http://code.activestate.com/recipes/578375/ In that code, entries are stored in regular Python lists and inherit their over-allocation characteristics (about 12.5% overallocated for large lists). There are many other possible allocation strategies with their own respective speed/space trade-offs.
Good plan! On Tue, Dec 18, 2012 at 11:35 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Dec 11, 2012, at 1:13 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Dec 10, 2012, at 2:48 AM, Christian Heimes <christian@python.org> wrote:
On the other hand every lookup and collision checks needs an additional multiplication, addition and pointer dereferencing:
entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx
Currently, the dict implementation allows alternative lookup functions based on whether the keys are all strings. The choice of lookup function is stored in a function pointer. That lets each lookup use the currently active lookup function without having to make any computations or branches.
An indirect function call is technically a branch, as seen from the CPU (and not necessarily a very predictable one, although recent Intel CPUs are said to be quite good at that).
FWIW, we already have an indirection to the lookup function. I would piggyback on that, so no new indirections are required.
My plan now is to apply the space compaction idea to sets. That code is less complex than dicts, and set operations stand to benefit the most from improved iteration speed.
The steps would be:
* Create a good set of benchmarks for set operations for both size and speed.
* Start with the simplest version of the idea: separate the entries table from the hash table. Keep the hash table at Py_ssize_t, and pre-allocate the entry array to two-thirds the size of the hash table. This should give about a 25% space savings and speed-up iteration for all the set-to-set operations.
* If that all works out, I want to trim the entry table for frozensefs so that the entry table has no over-allocations. This should give a much larger space savings.
* Next, I want to experiment with the idea of using char/short/long sizes for the hash table. Since there is already an existing lookup indirection, this can be done with no additional overhead. Small sets will get the most benefit for the space savings and the cache performance for hash lookups should improve nicely (for a hash table of size 32 could fit in a single cache line).
At each step, I'll run the benchmarks to make sure the expected speed and space benefits are being achieved.
As a side-effect, sets without deletions will retain their insertion order. If this is of concern, it would be no problem to implement Antoine's idea of scrambling the entries during iteration.
Raymond
P.S. I've gotten a lot of suggestions for improvements to the proof-of-concept code. Thank you for that. The latest version is at: http://code.activestate.com/recipes/578375/ In that code, entries are stored in regular Python lists and inherit their over-allocation characteristics (about 12.5% overallocated for large lists). There are many other possible allocation strategies with their own respective speed/space trade-offs.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
-- Thanks, Andrew Svetlov
Hello everyone. Thanks raymond for writing down a pure python version ;-) I did an initial port to RPython for experiments. The results (on large dicts only) are inconclusive - it's either a bit faster or a bit slower, depending what exactly you do. There is a possibility I messed something up too (there is a branch rdict-experiments in PyPy, in a very sorry state though). One effect we did not think about is that besides extra indirection, there is an issue with data locality - having to look in two different large lists seems to be a hit. Again, while I tried, the approach is not scientific at all, but unless someone points a clear flaw in the code (it's in pypy/rlib/dict.py in rdict-experiments branch), I'm probably abandoning this for now. Cheers, fijal
On Thu, 3 Jan 2013 12:22:27 +0200 Maciej Fijalkowski <fijall@gmail.com> wrote:
Hello everyone.
Thanks raymond for writing down a pure python version ;-)
I did an initial port to RPython for experiments. The results (on large dicts only) are inconclusive - it's either a bit faster or a bit slower, depending what exactly you do. There is a possibility I messed something up too (there is a branch rdict-experiments in PyPy, in a very sorry state though).
But what about the memory consumption? This seems to be the main point of Raymond's proposal. Regards Antoine.
On Sat, Jan 5, 2013 at 1:34 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 3 Jan 2013 12:22:27 +0200 Maciej Fijalkowski <fijall@gmail.com> wrote:
Hello everyone.
Thanks raymond for writing down a pure python version ;-)
I did an initial port to RPython for experiments. The results (on large dicts only) are inconclusive - it's either a bit faster or a bit slower, depending what exactly you do. There is a possibility I messed something up too (there is a branch rdict-experiments in PyPy, in a very sorry state though).
But what about the memory consumption? This seems to be the main point of Raymond's proposal.
Er. The memory consumption can be measured on pen and paper, you don't actually need to see right? After a lot more experimentation I came up with something that behaves better for large dictionaries. This was for a long time a weak point of PyPy, because of some GC details. So I guess I'll try to implement it fully and see how it goes. Stay tuned, I'll keep you posted. PS. PyPy does not have lots of small dictionaries because of maps (so you don't have a dict per instance), hence their performance is not at all that interesting for PyPy. Cheers, fijal
The memory part is also why I am interested in this approach. Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%. Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter. K ________________________________________ Frá: Python-Dev [python-dev-bounces+kristjan=ccpgames.com@python.org] fyrir hönd Maciej Fijalkowski [fijall@gmail.com] Sent: 5. janúar 2013 21:03 To: Antoine Pitrou Cc: <python-dev@python.org> Efni: Re: [Python-Dev] More compact dictionaries with faster iteration On Sat, Jan 5, 2013 at 1:34 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 3 Jan 2013 12:22:27 +0200 Maciej Fijalkowski <fijall@gmail.com> wrote:
Hello everyone.
Thanks raymond for writing down a pure python version ;-)
I did an initial port to RPython for experiments. The results (on large dicts only) are inconclusive - it's either a bit faster or a bit slower, depending what exactly you do. There is a possibility I messed something up too (there is a branch rdict-experiments in PyPy, in a very sorry state though).
But what about the memory consumption? This seems to be the main point of Raymond's proposal.
Er. The memory consumption can be measured on pen and paper, you don't actually need to see right? After a lot more experimentation I came up with something that behaves better for large dictionaries. This was for a long time a weak point of PyPy, because of some GC details. So I guess I'll try to implement it fully and see how it goes. Stay tuned, I'll keep you posted. PS. PyPy does not have lots of small dictionaries because of maps (so you don't have a dict per instance), hence their performance is not at all that interesting for PyPy. Cheers, fijal _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com
On Jan 6, 2013, at 1:40 PM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
The memory part is also why I am interested in this approach. Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%.
Dicts resize when the table is over two-thirds full. Small dicts contain zero to five keys.
Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter.
I looked at this a few years ago and found that it hurt performance considerably. Uncle Timmy (and others) had done a darned fine job of tuning dictionaries. Raymond
On Sun, Jan 6, 2013 at 10:40 PM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter.
Something that I was thinking is if for small tables does make sense to actually do the hashing... a small table could be kept with just key/value pairs (saving also the hash space) with a linear scan first for identity and then a second scan for equality. Given the level of optimization of dicts however I'm 99% sure this was already tried before tho.
On Sun, Jan 6, 2013 at 11:40 PM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
The memory part is also why I am interested in this approach. Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%. Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter. K
If you're very interested in the memory footprint you should do what PyPy does. It gives you no speed benefits without the JIT, but it makes all the instances behave like they are having slots. The trick is to keep a dictionary name -> index into a list on a class (you go through hierarchy of such dicts as you add or remove attributes) and a list of attributes on the instance. We call it maps, V8 calls it hidden classes, it's well documented on the pypy blog. Cheers, fijal
On 07/01/13 10:55, Maciej Fijalkowski wrote:
On Sun, Jan 6, 2013 at 11:40 PM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
The memory part is also why I am interested in this approach. Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%. Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter. K
In 3.3 dicts no longer have a small table.
If you're very interested in the memory footprint you should do what PyPy does. It gives you no speed benefits without the JIT, but it makes all the instances behave like they are having slots. The trick is to keep a dictionary name -> index into a list on a class (you go through hierarchy of such dicts as you add or remove attributes) and a list of attributes on the instance.
We call it maps, V8 calls it hidden classes, it's well documented on the pypy blog.
They are called shared keys in CPython 3.3 :)
Cheers, fijal _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org
On Mon, Jan 7, 2013 at 1:08 PM, Mark Shannon <mark@hotpy.org> wrote:
On 07/01/13 10:55, Maciej Fijalkowski wrote:
On Sun, Jan 6, 2013 at 11:40 PM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
The memory part is also why I am interested in this approach. Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%. Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter. K
In 3.3 dicts no longer have a small table.
If you're very interested in the memory footprint you should do what PyPy does. It gives you no speed benefits without the JIT, but it makes all the instances behave like they are having slots. The trick is to keep a dictionary name -> index into a list on a class (you go through hierarchy of such dicts as you add or remove attributes) and a list of attributes on the instance.
We call it maps, V8 calls it hidden classes, it's well documented on the pypy blog.
They are called shared keys in CPython 3.3 :)
shared keys don't go to the same lengths, do they?
Cheers, fijal _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On Jan 3, 2013, at 2:22 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Hello everyone.
Thanks raymond for writing down a pure python version ;-)
Thanks for running with it.
I did an initial port to RPython for experiments. The results (on large dicts only) are inconclusive - it's either a bit faster or a bit slower, depending what exactly you do. There is a possibility I messed something up too (there is a branch rdict-experiments in PyPy, in a very sorry state though).
One effect we did not think about is that besides extra indirection, there is an issue with data locality - having to look in two different large lists seems to be a hit.
In pure python, I didn't see a way to bring the hash/key/value entries side-by-side as they currently are in CPython. How does PyPy currently handle this issue? Is there a change I could make to the recipe that would restore data locality? Raymond
On 10/12/12 01:44, Raymond Hettinger wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
What minimum size and resizing factor do you propose for the entries array? Cheers, Mark.
On Dec 10, 2012, at 4:06 AM, Mark Shannon <mark@hotpy.org> wrote:
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
What minimum size and resizing factor do you propose for the entries array?
There are many ways to do this. I don't know which is best. The proof-of-concept code uses the existing list resizing logic. Another approach is to pre-allocate the two-thirds maximum (This is simple and fast but gives the smallest space savings.) A hybrid approach is to allocate in two steps (1/3 and then 2/3 if needed). Since hash tables aren't a new problem, there may already be published research on the best way to handle the entries array. Raymond
On 10/12/12 23:39, Raymond Hettinger wrote:
On Dec 10, 2012, at 4:06 AM, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
What minimum size and resizing factor do you propose for the entries array?
There are many ways to do this. I don't know which is best. The proof-of-concept code uses the existing list resizing logic. Another approach is to pre-allocate the two-thirds maximum
What do you mean by maximum?
(This is simple and fast but gives the smallest space savings.) A hybrid approach is to allocate in two steps (1/3 and then 2/3 if needed).
I think you need to do some more analysis on this. It is possible that there is some improvement to be had from your approach, but I don't think the improvements will be as large as you have claimed. I suspect that you may be merely trading performance for reduced memory use, which can be done much more easily by reducing the minimum size and increasing the load factor. Cheers, Mark.
On Dec 10, 2012, at 7:04 PM, Mark Shannon <mark@hotpy.org> wrote:
Another approach is to pre-allocate the two-thirds maximum (This is simple and fast but gives the smallest space savings.)
What do you mean by maximum?
A dict with an index table size of 8 gets resized when it is two-thirds full, so the maximum number of entries is 5. If you pre-allocate five entries for the initial dict, you've spent 5 * 24 bytes + 8 bytes for the indices for a total of 128 bytes. This compares to the current table of 8 * 24 bytes totaling 192 bytes. Many other strategies are possible. The proof-of-concept code uses the one employed by regular python lists. Their growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, .... This produces nice memory savings for entry lists. If you have a suggested allocation pattern or other constructive suggestion, it would be would welcome. Further sniping and unsubstantiated FUD would not. Raymond
On 11/12/12 03:45, Raymond Hettinger wrote:
On Dec 10, 2012, at 7:04 PM, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:
Another approach is to pre-allocate the two-thirds maximum (This is simple and fast but gives the smallest space savings.)
What do you mean by maximum?
A dict with an index table size of 8 gets resized when it is two-thirds full, so the maximum number of entries is 5. If you pre-allocate five entries for the initial dict, you've spent 5 * 24 bytes + 8 bytes for the indices for a total of 128 bytes. This compares to the current table of 8 * 24 bytes totaling 192 bytes.
Many other strategies are possible. The proof-of-concept code uses the one employed by regular python lists. Their growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, .... This produces nice memory savings for entry lists.
If you have a suggested allocation pattern or other constructive suggestion, it would be would welcome.
It seems like a reasonable starting point. Trying to avoid resizing the index array and the entries array at the same time is probably a good idea.
Further sniping and unsubstantiated FUD would not.
Is asking that you back up your claims with some analysis that unreasonable? When you make claims such as """ The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit. """ is it a surprise that I am sceptical? I like you idea. I just don't want everyone getting their hopes up for what may turn out to be a fairly minor improvement. Don't forget Unladen Swallow :) Cheers, Mark.
Le Tue, 11 Dec 2012 08:41:32 +0000, Mark Shannon <mark@hotpy.org> a écrit :
If you have a suggested allocation pattern or other constructive suggestion, it would be would welcome.
It seems like a reasonable starting point. Trying to avoid resizing the index array and the entries array at the same time is probably a good idea.
Why would you want to avoid that? If we want to allocate the dict's data as a single memory block (which saves a bit in memory consumption and also makes dict allocations faster), we need to resize both arrays at the same time. Regards Antoine.
Hi Raymond, On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
As a side note, your suggestion also enables order-preserving dictionaries: iter() would automatically yield items in the order they were inserted, as long as there was no deletion. People will immediately start relying on this "feature"... and be confused by the behavior of deletion. :-/ A bientôt, Armin.
Hi! 2012/12/10 Armin Rigo <arigo@tunes.org>
Hi Raymond,
On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
As a side note, your suggestion also enables order-preserving dictionaries: iter() would automatically yield items in the order they were inserted, as long as there was no deletion. People will immediately start relying on this "feature"... and be confused by the behavior of deletion. :-/
I'm not sure about "relying" cause currently Python supports this feature only with OrderedDict object. So it's common for python developers do not rely on inserting ordering when using generic dict.
A bientôt,
Armin. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kachayev%40gmail.com
-- Kind regards, Alexey S. Kachayev, CTO at Kitapps Inc. ---------- http://codemehanika.org Skype: kachayev Tel: +380-996692092
Le Mon, 10 Dec 2012 10:40:30 +0100, Armin Rigo <arigo@tunes.org> a écrit :
Hi Raymond,
On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
As a side note, your suggestion also enables order-preserving dictionaries: iter() would automatically yield items in the order they were inserted, as long as there was no deletion. People will immediately start relying on this "feature"... and be confused by the behavior of deletion. :-/
If that's really an issue, we can deliberately scramble the iteration order a bit :-) (of course it might negatively impact HW prefetching) Regards Antoine.
On Mon, Dec 10, 2012 at 10:48 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Mon, 10 Dec 2012 10:40:30 +0100, Armin Rigo <arigo@tunes.org> a écrit :
Hi Raymond,
On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
As a side note, your suggestion also enables order-preserving dictionaries: iter() would automatically yield items in the order they were inserted, as long as there was no deletion. People will immediately start relying on this "feature"... and be confused by the behavior of deletion. :-/
If that's really an issue, we can deliberately scramble the iteration order a bit :-) (of course it might negatively impact HW prefetching)
On the other hand, this would also make a fast ordered dictionary subclass possible, just by not using the free list for additions, combined with periodic compaction before adds or after deletes.
Le Mon, 10 Dec 2012 11:16:49 -0500, PJ Eby <pje@telecommunity.com> a écrit :
On Mon, Dec 10, 2012 at 10:48 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Mon, 10 Dec 2012 10:40:30 +0100, Armin Rigo <arigo@tunes.org> a écrit :
Hi Raymond,
On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
As a side note, your suggestion also enables order-preserving dictionaries: iter() would automatically yield items in the order they were inserted, as long as there was no deletion. People will immediately start relying on this "feature"... and be confused by the behavior of deletion. :-/
If that's really an issue, we can deliberately scramble the iteration order a bit :-) (of course it might negatively impact HW prefetching)
On the other hand, this would also make a fast ordered dictionary subclass possible, just by not using the free list for additions, combined with periodic compaction before adds or after deletes.
I suspect that's what Raymond has in mind :-) Regards Antoine.
Hi Philip, On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <pje@telecommunity.com> wrote:
On the other hand, this would also make a fast ordered dictionary subclass possible, just by not using the free list for additions, combined with periodic compaction before adds or after deletes.
Technically, I could see Python switching to ordered dictionaries everywhere. Raymond's insight suddenly makes it easy for CPython and PyPy, and at least Jython could use the LinkedHashMap class (although this would need checking with Jython guys). I'd vaguely argue that dictionary orders are one of the few non-reproducible factors in a Python program, so it might be a good thing. But only vaguely --- maybe I got far too used to random orders over time... A bientôt, Armin.
On Mon, Dec 10, 2012 at 1:01 PM, Armin Rigo <arigo@tunes.org> wrote:
On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <pje@telecommunity.com> wrote:
On the other hand, this would also make a fast ordered dictionary subclass possible, just by not using the free list for additions, combined with periodic compaction before adds or after deletes.
Technically, I could see Python switching to ordered dictionaries everywhere. Raymond's insight suddenly makes it easy for CPython and PyPy, and at least Jython could use the LinkedHashMap class (although this would need checking with Jython guys).
What about IronPython? Also, note that using ordered dictionaries carries a performance cost for dictionaries whose keys change a lot. This probably wouldn't affect most dictionaries in most programs, because module and object dictionaries generally don't delete and re-add a lot of keys very often. But in cases where a dictionary is used as a queue or stack or something of that sort, the packing costs could add up. Under the current scheme, as long as collisions were minimal, the contents wouldn't be repacked very often. Without numbers it's hard to say for certain, but the advantage to keeping ordered dictionaries a distinct type is that the standard dictionary type can then get that extra bit of speed in exchange for dropping the ordering requirement.
On 12/10/2012 1:38 PM, PJ Eby wrote:
On Mon, Dec 10, 2012 at 1:01 PM, Armin Rigo <arigo@tunes.org> wrote:
On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <pje@telecommunity.com> wrote:
On the other hand, this would also make a fast ordered dictionary subclass possible, just by not using the free list for additions, combined with periodic compaction before adds or after deletes.
Technically, I could see Python switching to ordered dictionaries everywhere. Raymond's insight suddenly makes it easy for CPython and PyPy, and at least Jython could use the LinkedHashMap class (although this would need checking with Jython guys).
What about IronPython?
Also, note that using ordered dictionaries carries a performance cost for dictionaries whose keys change a lot. This probably wouldn't affect most dictionaries in most programs, because module and object dictionaries generally don't delete and re-add a lot of keys very often. But in cases where a dictionary is used as a queue or stack or something of that sort, the packing costs could add up. Under the current scheme, as long as collisions were minimal, the contents wouldn't be repacked very often.
Without numbers it's hard to say for certain, but the advantage to keeping ordered dictionaries a distinct type is that the standard dictionary type can then get that extra bit of speed in exchange for dropping the ordering requirement.
I think that there should be a separate OrderedDict class (or subclass) and that the dict docs should warn (if not now) that while iterating a dict *may*, in some circumstances, give items in insertion *or* sort order, it *will not* in all cases. Example of the latter:
d = {8:0, 9:0, 15:0, 13:0, 14:0} for k in d: print(k)
8 9 13 14 15 If one entered the keys in sorted order, as might be natural, one might mistakenly think that insertion order was being reproduced. -- Terry Jan Reedy
On Dec 10, 2012, at 1:38 PM, PJ Eby <pje@telecommunity.com> wrote:
Without numbers it's hard to say for certain, but the advantage to keeping ordered dictionaries a distinct type is that the standard dictionary type can then get that extra bit of speed in exchange for dropping the ordering requirement.
I expect that dicts and OrderedDicts will remain separate for reasons of speed, space, and respect for people's mental models. Raymond
On 11 December 2012 05:01, Armin Rigo <arigo@tunes.org> wrote:
Hi Philip,
On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <pje@telecommunity.com> wrote:
On the other hand, this would also make a fast ordered dictionary subclass possible, just by not using the free list for additions, combined with periodic compaction before adds or after deletes.
Technically, I could see Python switching to ordered dictionaries everywhere. Raymond's insight suddenly makes it easy for CPython and PyPy, and at least Jython could use the LinkedHashMap class (although this would need checking with Jython guys). I'd vaguely argue that dictionary orders are one of the few non-reproducible factors in a Python program, so it might be a good thing. But only vaguely --- maybe I got far too used to random orders over time...
Whilst I think Python should not move to ordered dictionaries everywhere, I would say there is an argument (no pun intended) for making **kwargs a dictionary that maintains insertion order *if there are no deletions*. It sounds like we could get that for free with this implementation, although from another post IronPython might not have something suitable. I think there are real advantages to doing so - a trivial one being the ability to easily initialise an ordered dictionary from another ordered dictionary. I could also see an argument for having this property for all dicts. There are many dictionaries that are never deleted from (probably most dict literals) and it would be nice to have an iteration order for them that matched the source code. However if deletions occur all bets would be off. If you need to maintain insertion order in the face of deletions, use an explicit ordereddict. Tim Delaney
On Mon, Dec 10, 2012 at 11:29 PM, Tim Delaney <tim.delaney@aptare.com> wrote:
On 11 December 2012 05:01, Armin Rigo <arigo@tunes.org> wrote:
Hi Philip,
On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <pje@telecommunity.com> wrote:
On the other hand, this would also make a fast ordered dictionary subclass possible, just by not using the free list for additions, combined with periodic compaction before adds or after deletes.
Technically, I could see Python switching to ordered dictionaries everywhere. Raymond's insight suddenly makes it easy for CPython and PyPy, and at least Jython could use the LinkedHashMap class (although this would need checking with Jython guys). I'd vaguely argue that dictionary orders are one of the few non-reproducible factors in a Python program, so it might be a good thing. But only vaguely --- maybe I got far too used to random orders over time...
Whilst I think Python should not move to ordered dictionaries everywhere, I would say there is an argument (no pun intended) for making **kwargs a dictionary that maintains insertion order *if there are no deletions*. It sounds like we could get that for free with this implementation, although from another post IronPython might not have something suitable.
Please, no. dict and kwargs should be unordered without any assumptions.
I think there are real advantages to doing so - a trivial one being the ability to easily initialise an ordered dictionary from another ordered dictionary.
It can be done with adding short-circuit for OrderedDict class to accept another OrderedDict instance.
I could also see an argument for having this property for all dicts. There are many dictionaries that are never deleted from (probably most dict literals) and it would be nice to have an iteration order for them that matched the source code.
However if deletions occur all bets would be off. If you need to maintain insertion order in the face of deletions, use an explicit ordereddict.
Tim Delaney
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
-- Thanks, Andrew Svetlov
On Mon, Dec 10, 2012 at 4:29 PM, Tim Delaney <tim.delaney@aptare.com> wrote:
Whilst I think Python should not move to ordered dictionaries everywhere, I would say there is an argument (no pun intended) for making **kwargs a dictionary that maintains insertion order *if there are no deletions*.
Oooh. Me likey. There have been many times where I've wished kwargs were ordered when designing an API. (Oddly, I don't remember any one of the APIs specifically, so don't ask me for a good example. I just remember a bunch of different physical locations where I was when I thought, "Ooh, what if I could... no, that's not going to work.") One other useful place for ordered dictionaries is class definitions processed by class decorators: no need to write a metaclass just to know what order stuff was defined in.
It sounds like we could get that for free with this implementation, although from another post IronPython might not have something suitable.
Actually, IronPython may already have ordered dictionaries by default; see: http://mail.python.org/pipermail/ironpython-users/2006-May/002319.html It's described as an implementation detail that may change, perhaps that could be changed to being unchanging. ;-)
I think there are real advantages to doing so - a trivial one being the ability to easily initialise an ordered dictionary from another ordered dictionary.
Or to merge two of them together, either at creation or .update(). I'm really starting to wonder if it might not be worth paying the compaction overhead to just make all dictionaries ordered, all the time. The problem is that if you are always adding new keys and deleting old ones (as might be done in a LRU cache, a queue, or other things like that) you'll probably see a lot of compaction overhead compared to today's dicts. OTOH... with a good algorithm for deciding *when* to compact, we can actually make the amortized cost O(1) or so, so maybe that's not a big deal. The cost to do a compaction is at worst, the current size of the table. So if you wait until a table has twice as many entries (more precisely, until the index of the last entry is twice what it was at last compaction), you will amortize the compaction cost down to one entry move per add, or O(1). That would handle the case of a cache or queue, but I'm not sure how it would work with supersized dictionaries that are then deleted down to a fraction of their original size. I suppose if you delete your way down to half the entries being populated, then you end up with two moves per delete, or O(2). (Yes, I know that's not a valid O number.) So, offhand, it seems pretty doable, and unlikely to significantly change worst-case performance even for pathological use cases. (Like using a dict when you should be using a deque.)
PJ wrote:
Actually, IronPython may already have ordered dictionaries by default; see:
http://mail.python.org/pipermail/ironpython-users/2006- May/002319.html
It's described as an implementation detail that may change, perhaps that could be changed to being unchanging. ;-)
I think this has changed since 2006. IronPython was originally using the .NET dictionary class and just locking while using it, but it now has a custom dictionary which is thread safe for multiple readers and allows 1 writer. But it doesn't do anything to preserve order of insertions. OTOH changing certain dictionaries in IronPython (such as keyword args) to be ordered would certainly be possible. Personally I just wouldn't want to see it be the default as that seems like unnecessary overhead when the specialized class exists.
On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov@microsoft.com> wrote:
OTOH changing certain dictionaries in IronPython (such as keyword args) to be ordered would certainly be possible. Personally I just wouldn't want to see it be the default as that seems like unnecessary overhead when the specialized class exists.
Which reminds me, I was going to note that one of the main gains with ordered keyword arguments, is their use in the construction of string-keyed objects where you want to be able to control the order of iteration (e.g. for serialisation or display purposes). Currently you have to go the path of something like namedtuple where you define the order of iteration in one operation, and set the values in another. Initialising an ordered dict itself is one obvious use case, but anything else where you want to control the iteration order *and* set field names and values in a single call will potentially benefit. Independently of that, I'll note that this change would make it possible to add a .sort() method to dictionaries. Any subsequent mutation of the dictionary would requiring resorting, though (which isn't really all that different from maintaining a sorted list). The performance impact definitely needs to be benchmarked though, as the need to read two memory locations rather than one for a dictionary read could have weird caching effects. (Fortunately, many of the benchmarks run on Python 3.3 now, so it should be possible to get that data fairly easily) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 12/12/2012 01:15 AM, Nick Coghlan wrote:
On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov@microsoft.com <mailto:dinov@microsoft.com>> wrote:
OTOH changing certain dictionaries in IronPython (such as keyword args) to be ordered would certainly be possible. Personally I just wouldn't want to see it be the default as that seems like unnecessary overhead when the specialized class exists.
Which reminds me, I was going to note that one of the main gains with ordered keyword arguments, is their use in the construction of string-keyed objects where you want to be able to control the order of iteration (e.g. for serialisation or display purposes). Currently you have to go the path of something like namedtuple where you define the order of iteration in one operation, and set the values in another.
So here's a brand new argument against ordered dicts: The existence of perfect hashing schemes. They fundamentally conflict with ordered dicts. I played with using them for vtable dispatches in Cython this summer, and they can perform really, really well for branch-predicted lookups in hot loops, because you always/nearly always eliminate linear probing and so there's no branch misses or extra comparisons. (The overhead of a perfect hash table lookup over a traditional vtable lookup was only a couple of cycles in my highly artificial fully branch-predicted micro-benchmark.) There's some overhead in setup; IIRC, ~20 microseconds for 64 elements, 2 GHz CPU, though that was a first prototype implementation and both algorithmic improvements and tuning should be possible. So it's not useful for everything, but perhaps for things like module dictionaries and classes an "optionally perfect" dict can make sense. Note: I'm NOT suggesting the use of perfect hashing, just making sure it's existence is mentioned and that one is aware that if always-ordered dicts become the language standard it precludes this option far off in the future. (Something like a sort() method could still work and make the dict "unperfect"; one could also have a pack() method that made the dict perfect again.). That concludes the on-topic parts of my post. -- Dag Sverre Seljebotn APPENDIX Going off-topic for those who are interested, here's the longwinded and ugly details. My code [1] is based on the paper [2] (psuedo-code in Appendix A), but I adapted it a bit to be useful for tens/hundreds of elements rather than billions. The ingredients: 1) You need the hash to be 32 bits (or 64) of good entropy (md5 or murmurhash or similar). (Yes, that's a tall order for CPython, I'm just describing the scheme.) (If the hash collides on all bits you *will* collide, so some fallback is still necesarry, just unlikely.) 2) To lookup, the idea is (psuedo-code!) typedef struct { int m_f m_g, r, k; int16_t d[k]; /* "small" int, like current proposal */ } table_header_t; And then one computes index of an element with hash "h" using the function ((h >> tab->r) & tab->m_f) ^ tab->d[h & tab->m_g] rather than the usual "h % n". While more arithmetic, arithmetic is cheap and branch misses are not. 3) To set up/repack a table one needs to find the parameters. The general idea is: a) Partition the hashes into k bins by using "h & m_g". There will be collisions, but the number of bins with many collisions will be very small; most bins will have 2 or 1 or 0 elements. b) Starting with the largest bin, distribute the elements according to the hash function. If a bin collides with the existing contents, try another value for d[binindex] until it doesn't. The r parameter let's you try again 32 (or 64) times to find a solution. In my testcases there was ~0.1% chance of not finding a solution (that is, exhausting possible choices of r) with 64-bit hashes with 4 or 8 elements and no empty table elements. For any other number of elements, or with some empty elements, the chance of failure was much lower.) [1] It's not exactly a great demo, but it contains the algorithm. If there's much interest I should clean it up and make a proper benchmark demo out of it: https://github.com/dagss/pyextensibletype/blob/perfecthash/include/perfectha... [2] Pagh (1999) http://www.brics.dk/RS/99/13/BRICS-RS-99-13.ps.gz
On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 12/12/2012 01:15 AM, Nick Coghlan wrote:
On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov@microsoft.com <mailto:dinov@microsoft.com>> wrote:
OTOH changing certain dictionaries in IronPython (such as keyword args) to be ordered would certainly be possible. Personally I just wouldn't want to see it be the default as that seems like unnecessary overhead when the specialized class exists.
Which reminds me, I was going to note that one of the main gains with ordered keyword arguments, is their use in the construction of string-keyed objects where you want to be able to control the order of iteration (e.g. for serialisation or display purposes). Currently you have to go the path of something like namedtuple where you define the order of iteration in one operation, and set the values in another.
So here's a brand new argument against ordered dicts: The existence of perfect hashing schemes. They fundamentally conflict with ordered dicts.
If I understand your explanation, then they don't conflict with the type of ordering described in this thread. Raymond's optimization separates the "hash table" part from the "contents" part of a dictionary, and there is no requirement that these two parts be in the same size or the same order. Indeed, Raymond's split design lets you re-parameterize the hashing all you want, without perturbing the iteration order at all. You would in fact be able to take a dictionary at any moment, and say, "optimize the 'hash table' part to a non-colliding state based on the current contents", without touching the 'contents' part at all. (One could do this at class creation time on a class dictionary, and just after importing on a module dictionary, for example.)
On 12/12/2012 10:31 PM, PJ Eby wrote:
On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 12/12/2012 01:15 AM, Nick Coghlan wrote:
On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov@microsoft.com <mailto:dinov@microsoft.com>> wrote:
OTOH changing certain dictionaries in IronPython (such as keyword args) to be ordered would certainly be possible. Personally I just wouldn't want to see it be the default as that seems like unnecessary overhead when the specialized class exists.
Which reminds me, I was going to note that one of the main gains with ordered keyword arguments, is their use in the construction of string-keyed objects where you want to be able to control the order of iteration (e.g. for serialisation or display purposes). Currently you have to go the path of something like namedtuple where you define the order of iteration in one operation, and set the values in another.
So here's a brand new argument against ordered dicts: The existence of perfect hashing schemes. They fundamentally conflict with ordered dicts.
If I understand your explanation, then they don't conflict with the type of ordering described in this thread. Raymond's optimization separates the "hash table" part from the "contents" part of a dictionary, and there is no requirement that these two parts be in the same size or the same order.
I don't fully agree. Perfect hashing already separates "hash table" from "contents" (sort of), and saves the memory in much the same way. Yes, you can repeat the trick and have 2 levels of indirection, but that then requires an additional table of small ints which is pure overhead present for the sorting; in short, it's no longer an optimization but just overhead for the sortability. Dag Sverre
Indeed, Raymond's split design lets you re-parameterize the hashing all you want, without perturbing the iteration order at all. You would in fact be able to take a dictionary at any moment, and say, "optimize the 'hash table' part to a non-colliding state based on the current contents", without touching the 'contents' part at all.
(One could do this at class creation time on a class dictionary, and just after importing on a module dictionary, for example.)
On 12/12/2012 11:06 PM, Dag Sverre Seljebotn wrote:
On 12/12/2012 10:31 PM, PJ Eby wrote:
On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 12/12/2012 01:15 AM, Nick Coghlan wrote:
On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov@microsoft.com <mailto:dinov@microsoft.com>> wrote:
OTOH changing certain dictionaries in IronPython (such as keyword args) to be ordered would certainly be possible. Personally I just wouldn't want to see it be the default as that seems like unnecessary overhead when the specialized class exists.
Which reminds me, I was going to note that one of the main gains with ordered keyword arguments, is their use in the construction of string-keyed objects where you want to be able to control the order of iteration (e.g. for serialisation or display purposes). Currently you have to go the path of something like namedtuple where you define the order of iteration in one operation, and set the values in another.
So here's a brand new argument against ordered dicts: The existence of perfect hashing schemes. They fundamentally conflict with ordered dicts.
If I understand your explanation, then they don't conflict with the type of ordering described in this thread. Raymond's optimization separates the "hash table" part from the "contents" part of a dictionary, and there is no requirement that these two parts be in the same size or the same order.
I don't fully agree.
Perfect hashing already separates "hash table" from "contents" (sort of), and saves the memory in much the same way.
This was a bit inaccurate, but the point is: The perfect hash function doesn't need any fill-in to avoid collisions, you can (except in exceptional circumstances) fill the table 100%, so the memory is already saved. Dag Sverre
Yes, you can repeat the trick and have 2 levels of indirection, but that then requires an additional table of small ints which is pure overhead present for the sorting; in short, it's no longer an optimization but just overhead for the sortability.
Dag Sverre
Indeed, Raymond's split design lets you re-parameterize the hashing all you want, without perturbing the iteration order at all. You would in fact be able to take a dictionary at any moment, and say, "optimize the 'hash table' part to a non-colliding state based on the current contents", without touching the 'contents' part at all.
(One could do this at class creation time on a class dictionary, and just after importing on a module dictionary, for example.)
On Wed, Dec 12, 2012 at 5:06 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
Perfect hashing already separates "hash table" from "contents" (sort of), and saves the memory in much the same way.
Yes, you can repeat the trick and have 2 levels of indirection, but that then requires an additional table of small ints which is pure overhead present for the sorting; in short, it's no longer an optimization but just overhead for the sortability.
I'm confused. I understood your algorithm to require repacking, rather than it being a suitable algorithm for incremental change to an existing dictionary. ISTM that that would mean you still pay some sort of overhead (either in time or space) while the dictionary is still being mutated. Also, I'm not sure how 2 levels of indirection come into it. The algorithm you describe has, as I understand it, only up to 12 perturbation values ("bins"), so it's a constant space overhead, not a variable one. What's more, you can possibly avoid the extra memory access by using a different perfect hashing algorithm, at the cost of a slower optimization step or using a little more memory.
Note: I'm NOT suggesting the use of perfect hashing, just making sure it's existence is mentioned and that one is aware that if always-ordered dicts become the language standard it precludes this option far off in the future.
Not really. It means that some forms of perfect hashing might require adding a few more ints worth of overhead for the dictionaries that use it. If it's really a performance benefit for very-frequently-used dictionaries, that might still be worthwhile.
On 12/13/2012 06:11 AM, PJ Eby wrote:
On Wed, Dec 12, 2012 at 5:06 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
Perfect hashing already separates "hash table" from "contents" (sort of), and saves the memory in much the same way.
Yes, you can repeat the trick and have 2 levels of indirection, but that then requires an additional table of small ints which is pure overhead present for the sorting; in short, it's no longer an optimization but just overhead for the sortability.
I'm confused. I understood your algorithm to require repacking, rather than it being a suitable algorithm for incremental change to an existing dictionary. ISTM that that would mean you still pay some sort of overhead (either in time or space) while the dictionary is still being mutated.
As-is the algorithm just assumes all key-value-pairs are available at creation time. So yes, if you don't reallocate when making the dict perfect then it could make sense to combine it with the scheme discussed in this thread. If one does leave some free slots open there's some probability of an insertion working without complete repacking, but the probability is smaller than with a normal dict. Hybrid schemes and trade-offs in this direction could be possible.
Also, I'm not sure how 2 levels of indirection come into it. The algorithm you describe has, as I understand it, only up to 12 perturbation values ("bins"), so it's a constant space overhead, not a variable one. What's more, you can possibly avoid the extra memory access by using a different perfect hashing algorithm, at the cost of a slower optimization step or using a little more memory.
I said there's k perturbation values; you need an additional array some_int_t d[k] where some_int_t is large enough to hold n (the number of entries). Just like what's proposed in this thread. The paper recommends k > 2*n, but in my experiments I could get away with k = n in 99.9% of the cases (given perfect entropy in the hashes...). So the overhead is roughly the same as what's proposed here. I think the most promising thing would be to have always have a single integer table and either use it for indirection (usual case) or perfect hash function parameters (say, after a pack() method has been called and before new insertions).
Note: I'm NOT suggesting the use of perfect hashing, just making sure it's existence is mentioned and that one is aware that if always-ordered dicts become the language standard it precludes this option far off in the future.
Not really. It means that some forms of perfect hashing might require adding a few more ints worth of overhead for the dictionaries that use it. If it's really a performance benefit for very-frequently-used dictionaries, that might still be worthwhile.
As mentioned above the overhead is larger. I think the main challenge is to switch to a hashing scheme with larger entropy for strings, like murmurhash3. Having lots of zero bits in the string for short strings will kill the scheme, since it needs several attempts to succeed (the "r" parameter). So string hashing is slowed down a bit (given the caching I don't know how important this is). Ideally one should make sure the hashes 64-bit on 64-bit platforms too (IIUC long is 32-bit on Windows but I don't know Windows well). But the main reason I say I'm not proposing it is I don't have time to code it up for demonstration and people like to have something to look at when they get proposals :-) Dag Sverre
On Mon, Dec 10, 2012 at 10:01 AM, Armin Rigo <arigo@tunes.org> wrote:
Technically, I could see Python switching to ordered dictionaries everywhere. Raymond's insight suddenly makes it easy for CPython and PyPy, and at least Jython could use the LinkedHashMap class (although this would need checking with Jython guys). I honestly hope this doesn't happen - we use ConcurrentHashMap for our dictionaries (which lack ordering) and I'm sure getting it to preserve insertion order would cost us.
-Frank
On Mon, Dec 10, 2012 at 3:13 PM, fwierzbicki@gmail.com <fwierzbicki@gmail.com> wrote:
On Mon, Dec 10, 2012 at 10:01 AM, Armin Rigo <arigo@tunes.org> wrote:
Technically, I could see Python switching to ordered dictionaries everywhere. Raymond's insight suddenly makes it easy for CPython and PyPy, and at least Jython could use the LinkedHashMap class (although this would need checking with Jython guys). I honestly hope this doesn't happen - we use ConcurrentHashMap for our dictionaries (which lack ordering) and I'm sure getting it to preserve insertion order would cost us. I just found this http://code.google.com/p/concurrentlinkedhashmap/ so maybe it wouldn't be all that bad. I still personally like the idea of leaving basic dict order undetermined (there is already an OrderedDict if you need it right?) But if ConcurrentLinkedHashMap is as good as is suggested on that page then Jython doesn't need to be the thing that blocks the argument.
-Frank
On 10/12/12 20:40, Armin Rigo wrote:
As a side note, your suggestion also enables order-preserving dictionaries: iter() would automatically yield items in the order they were inserted, as long as there was no deletion. People will immediately start relying on this "feature"... and be confused by the behavior of deletion. :-/
If we want to avoid the attractive nuisance of iteration order being almost, but not quite, order-preserving, there is a simple fix: when iterating over the dict, instead of starting at the start of the table, start at some arbitrary point in the middle and wrap around. That will increase the cost of iteration slightly, but avoid misleading behaviour. I think all we need do is change the existing __iter__ method from this: def __iter__(self): for key in self.keylist: if key is not UNUSED: yield key to this: # Untested! def __iter__(self): # Choose an arbitrary starting point. # 103 is chosen because it is prime. n = (103 % len(self.keylist)) if self.keylist else 0 for key in self.keylist[n:] + self.keylist[:n]: # I presume the C version will not duplicate the keylist. if key is not UNUSED: yield key This mixes the order of iteration up somewhat, just enough to avoid misleading people into thinking that this is order-preserving. The order will depend on the size of the dict, not the history. For example, with keys a, b, c, ... added in that order, the output is: len=1 key=a len=2 key=ba len=3 key=bca len=4 key=dabc len=5 key=deabc len=6 key=bcdefa len=7 key=fgabcde len=8 key=habcdefg len=9 key=efghiabcd which I expect is mixed up enough to discourage programmers from jumping to conclusions about dicts having guaranteed order. -- Steven
On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as:
entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']]
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics.
The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit.
For a sparse table of size t with n entries, the sizes are:
curr_size = 24 * t new_size = 24 * n + sizeof(index) * t
In the above timmy/barry/guido example, the current size is 192 bytes (eight 24-byte entries) and the new size is 80 bytes (three 24-byte entries plus eight 1-byte indices). That gives 58% compression.
Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict.
In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses.
Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion).
With the reduced memory footprint, we can also expect better cache utilization.
For those wanting to experiment with the design, there is a pure Python proof-of-concept here:
http://code.activestate.com/recipes/578375
YMMV: Keep in mind that the above size statics assume a build with 64-bit Py_ssize_t and 64-bit pointers. The space savings percentages are a bit different on other builds. Also, note that in many applications, the size of the data dominates the size of the container (i.e. the weight of a bucket of water is mostly the water, not the bucket).
Raymond _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
One question Raymond. The compression ratios stay true provided you don't overallocate entry list. If you do overallocate you don't really gain that much (it all depends vastly on details), or even loose in some cases. What do you think should the strategy be?
Hi Raymond, On 08.01.13 15:49, Maciej Fijalkowski wrote:
On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as:
entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']]
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics.
The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit.
For a sparse table of size t with n entries, the sizes are:
curr_size = 24 * t new_size = 24 * n + sizeof(index) * t
In the above timmy/barry/guido example, the current size is 192 bytes (eight 24-byte entries) and the new size is 80 bytes (three 24-byte entries plus eight 1-byte indices). That gives 58% compression.
Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict.
In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses.
Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion).
With the reduced memory footprint, we can also expect better cache utilization.
For those wanting to experiment with the design, there is a pure Python proof-of-concept here:
http://code.activestate.com/recipes/578375
YMMV: Keep in mind that the above size statics assume a build with 64-bit Py_ssize_t and 64-bit pointers. The space savings percentages are a bit different on other builds. Also, note that in many applications, the size of the data dominates the size of the container (i.e. the weight of a bucket of water is mostly the water, not the bucket).
Raymond _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com One question Raymond.
The compression ratios stay true provided you don't overallocate entry list. If you do overallocate you don't really gain that much (it all depends vastly on details), or even loose in some cases. What do you think should the strategy be?
What is the current status of this discussion? I'd like to know whether it is a considered alternative implementation. There is also a discussion in python-ideas right now where this alternative is mentioned, and I think especially for small dicts as **kwargs, it could be a cheap way to introduce order. Is this going on, somewhere? I'm quite interested on that. cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
Hi Chris, On 15.05.13 13:32 Christian Tismer wrote:
Hi Raymond,
On 08.01.13 15:49, Maciej Fijalkowski wrote:
On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
...
What is the current status of this discussion? I'd like to know whether it is a considered alternative implementation.
There is also a discussion in python-ideas right now where this alternative is mentioned, and I think especially for small dicts as **kwargs, it could be a cheap way to introduce order.
Is this going on, somewhere? I'm quite interested on that.
+1 I am also interested on the status. Many people seemed to have copied the recipe from the activestate site (was it?) but I wonder if it maybe was to cool to be progressed into "the field" or simply some understandable lack of resources? All the best, Stefan
On 15.05.13 14:01, Stefan Drees wrote:
Hi Chris,
On 15.05.13 13:32 Christian Tismer wrote:
Hi Raymond,
On 08.01.13 15:49, Maciej Fijalkowski wrote:
On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
...
What is the current status of this discussion? I'd like to know whether it is a considered alternative implementation.
There is also a discussion in python-ideas right now where this alternative is mentioned, and I think especially for small dicts as **kwargs, it could be a cheap way to introduce order.
Is this going on, somewhere? I'm quite interested on that.
+1 I am also interested on the status. Many people seemed to have copied the recipe from the activestate site (was it?) but I wonder if it maybe was to cool to be progressed into "the field" or simply some understandable lack of resources?
Right, found the references: http://mail.python.org/pipermail/python-dev/2012-December/123028.html http://stackoverflow.com/questions/14664620/python-dictionary-details http://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-space... cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
On Wed, May 15, 2013 at 2:36 PM, Christian Tismer <tismer@stackless.com> wrote:
On 15.05.13 14:01, Stefan Drees wrote:
Hi Chris,
On 15.05.13 13:32 Christian Tismer wrote:
Hi Raymond,
On 08.01.13 15:49, Maciej Fijalkowski wrote:
On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
...
What is the current status of this discussion? I'd like to know whether it is a considered alternative implementation.
There is also a discussion in python-ideas right now where this alternative is mentioned, and I think especially for small dicts as **kwargs, it could be a cheap way to introduce order.
Is this going on, somewhere? I'm quite interested on that.
+1 I am also interested on the status. Many people seemed to have copied the recipe from the activestate site (was it?) but I wonder if it maybe was to cool to be progressed into "the field" or simply some understandable lack of resources?
Right, found the references: http://mail.python.org/pipermail/python-dev/2012-December/123028.html http://stackoverflow.com/questions/14664620/python-dictionary-details http://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-space...
cheers - chris
-- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
I implemented one for pypy btw (it's parked on a branch for messiness reasons) Cheers, fijal
On May 15, 2013, at 4:32 AM, Christian Tismer <tismer@stackless.com> wrote:
What is the current status of this discussion? I'd like to know whether it is a considered alternative implementation.
As far as I can tell, I'm the only one working on it (and a bit slowly at that). My plan is to implement it for frozensets to see how it works out. Frozensets are a nice first experiment for several reasons: * The current implementation is cleaner than dictionaries (which have become more complicated due to key-sharing). * It will be easy to benchmark (by racing sets vs frozen sets) for an apples-to-apples comparison. * There is no need to have a list-like over-allocation scheme since frozensets can't grow after they are created. That will guarantee a significant space savings and it will simplify the coding. * I wrote the code for setobject.c so I know all the ins-and-outs.
There is also a discussion in python-ideas right now where this alternative is mentioned, and I think especially for small dicts as **kwargs, it could be a cheap way to introduce order.
The compaction of keys and values into a dense array was intended to save space, improve cache performance, and improve iteration speed. The ordering was just a side-effect and one that is easily disturbed if keys ever get deleted. So a compacted dict might be a cheap way to introduce order for kwargs, but it would need special handling if the user decided to delete keys. BTW, I'm +1 on the idea for ordering keyword-args. It makes it easier to debug if the arguments show-up in the order they were created. AFAICT, no purpose is served by scrambling them (which is exacerbated by the new randomized hashing security feature). Raymond
On Sun, May 19, 2013 at 7:27 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On May 15, 2013, at 4:32 AM, Christian Tismer <tismer@stackless.com> wrote:
What is the current status of this discussion? I'd like to know whether it is a considered alternative implementation.
As far as I can tell, I'm the only one working on it (and a bit slowly at that). My plan is to implement it for frozensets to see how it works out.
Frozensets are a nice first experiment for several reasons: * The current implementation is cleaner than dictionaries (which have become more complicated due to key-sharing). * It will be easy to benchmark (by racing sets vs frozen sets) for an apples-to-apples comparison. * There is no need to have a list-like over-allocation scheme since frozensets can't grow after they are created. That will guarantee a significant space savings and it will simplify the coding. * I wrote the code for setobject.c so I know all the ins-and-outs.
There is also a discussion in python-ideas right now where this alternative is mentioned, and I think especially for small dicts as **kwargs, it could be a cheap way to introduce order.
The compaction of keys and values into a dense array was intended to save space, improve cache performance, and improve iteration speed. The ordering was just a side-effect and one that is easily disturbed if keys ever get deleted.
So a compacted dict might be a cheap way to introduce order for kwargs, but it would need special handling if the user decided to delete keys.
BTW, I'm +1 on the idea for ordering keyword-args. It makes it easier to debug if the arguments show-up in the order they were created. AFAICT, no purpose is served by scrambling them (which is exacerbated by the new randomized hashing security feature).
Raymond
The completely ordered dict is easy to get too - you mark deleted entries instead of removing them (then all the keys are in order) and every now and then you just compact the whole thing by removing all the delted entries, presumably on the resize or so. Cheers, fijal
On 10.12.12 03:44, Raymond Hettinger wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
FYI PHP 7 will use this technique [1]. In conjunction with other optimizations this will decrease memory consumption of PHP hashtables up to 4 times. [1] http://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html
On Wed, Dec 31, 2014 at 3:12 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 10.12.12 03:44, Raymond Hettinger wrote:
The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.
Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.
FYI PHP 7 will use this technique [1]. In conjunction with other optimizations this will decrease memory consumption of PHP hashtables up to 4 times.
"up to 4 times" is a bit of a stretch, given that most of their savings come from: * saving on the keeping of ordering * other optimizations in zval None of it applies to python PHP does not implement differing sizes of ints in key dict, which makes memory saving php-only (if we did the same thing as PHP, we would save more or less nothing, depending on how greedy you are with the list overallocation) We implemented the same strategy in PyPy as of last year, testing it to become a default "dict" and "OrderedDict" for PyPy in the next release. Cheers, fijal PS. I wonder who came up with the idea first, PHP or rhettinger and who implemented it first (I'm pretty sure it was used in hippy before it was used in Zend PHP)
Hi all, On 1 January 2015 at 14:52, Maciej Fijalkowski <fijall@gmail.com> wrote:
PS. I wonder who came up with the idea first, PHP or rhettinger and who implemented it first (I'm pretty sure it was used in hippy before it was used in Zend PHP)
We'd need to look more in detail to that question, but a quick look made me find this Java code from 2012: https://code.google.com/r/wassermanlouis-guava/source/browse/guava/src/com/g... which implements almost exactly the original idea of Raymond. (It has a twist because Java doesn't support arrays of (int, int, Object, Object), and instead encodes it as one array of long and one array of Objects. It also uses a chain of buckets instead of open addressing.) A bientôt, Armin.
participants (27)
-
Alexey Kachayev
-
Andrea Griffini
-
Andrew Svetlov
-
Antoine Pitrou
-
Armin Rigo
-
Barry Warsaw
-
Brett Cannon
-
Christian Heimes
-
Christian Tismer
-
Dag Sverre Seljebotn
-
Dino Viehland
-
fwierzbicki@gmail.com
-
Gregory P. Smith
-
Kristján Valur Jónsson
-
Maciej Fijalkowski
-
Mark Shannon
-
MRAB
-
Nick Coghlan
-
PJ Eby
-
R. David Murray
-
Raymond Hettinger
-
Serhiy Storchaka
-
Stefan Drees
-
Steven D'Aprano
-
Terry Reedy
-
Tim Delaney
-
Trent Nelson