[Python-Dev] A new dict for Xmas?

Stefan Behnel stefan_ml at behnel.de
Fri Dec 23 13:03:17 CET 2011

Mark Shannon, 23.12.2011 12:21:
> Martin v. Löwis wrote:
>>>> - it would be useful to have a specialized representation for
>>>> all-keys-are-strings. In that case, me_hash could be dropped
>>>> from the representation. You would get savings compared to
>>>> the status quo even in the non-shared case.
>>> It might tricky switching key tables and I dont think it would save much
>>> memory as keys that are widely shared take up very little memory anyway,
>>> and not many other dicts are long-lived.
>> Why do you say that? In a plain 3.3 interpreter, I counted 595 dict
>> objects (see script below). Of these, 563 (so nearly of them) had
>> only strings as keys. Among those, I found 286 different key sets,
>> where 231 key sets occurred only once (i.e. wouldn't be shared).
>> Together, the string dictionaries had 13282 keys, and you could save
>> as many pointers (actually more, because there will be more key slots
>> than keys).
> The question is how much memory needs to be saved to be worth adding the
> complexity, 10kb: No, 100Mb: yes.
> So data from "real" benchmarks would be useful.

Consider taking a parsed MiniDOM tree as a benchmark. It contains so many 
instances of just a couple of different classes that it just has to make a 
huge difference if each of those instances is even just a bit smaller. It 
should also make a clear difference for plain Python ElementTree.

I attached a benchmark script that measures the parsing speed as well as 
the total memory usage of the in-memory tree. You can get data files from 
the following places, just download them and pass their file names on the 
command line:



Here are some results from my own machine for comparison:


-------------- next part --------------
A non-text attachment was scrubbed...
Name: etbenchmark.py
Type: text/x-python
Size: 4760 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111223/cfc8eb7d/attachment.py>

More information about the Python-Dev mailing list