Experimenting with dict performance, and an immutable dict

Let me first say that the code and discussion, as per title, is also about possible performance improvements to the dict base type. TL;DR I implemented a frozendict using CPython 3.9 code. It seems that an immutable dict *could* be faster than dict in some cases. Furthermore, some optimization could be applied to dict too. Long explaining: Since now I have some time, I decided to experiment a little with the creation of an immutable dict in Python. Unluckily, I started this experiment many months ago, so the CPython code I used is old. Maybe some or all of my considerations are outdated. Initially, I wrote a quick and dirty implementation: https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c5... The code was very simple, and the performance was identical to dict. So in theory, adding a frozendict to CPython is not hard. But there's more. After the first implementation, I started to try to improve the performance of frozendict. The result of the improvements are here: https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt For benchmarks, I used simply timeit, with autorange and repeat and, as suggested in the module documentation, I got the minimum of the results. Here is the code: https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py I have not tested with an optimized build, since optimization data is collected using the unit tests, and I didn't write tests for frozendict in the official CPython test framework. The tests and benchmarks were done using CPython 3.9a0. CPU and other pc resources were not restricted using pyperf or similar tools, to see the "real" speed. CPython was compiled using gcc and g++. In benchmarks, I compared methods and operators using dict and frozendict. The benchmark uses dicts with all integers and dicts with all strings. Furthermore, I tested dicts with size 8 (the most optimized size in CPython) and 1000 elements (maybe too much, but I wanted to see how they perform with a high number of elements). Every benchmark has a line in the output. The Name describes the benchmarked method, operator or code snippet. Size is the size of the dict, 8 or 1000. Keys are the keys type, str or int. Type is the dictionary type, dict or frozendict. In Name, the "o" represents the object itself (dict or frozendict). "d", in benchmark with dict, is "o"; in benchmarks with frozendict is an equivalent instance of type dict. Some consideration: 1. frozendict is very fast, as expected, at copying. But it's also faster at creation, using a (not frozen) dict, kwargs or a sequence2. Speedups range from 20% to 45%. 2. frozendict is also a bit faster when you iterate over it, especially over values, where is ~15% faster 3. hash seems really fast, but this is because it's cached the first time hash() is invoked 4. where frozendict is slower is when you unpickle it and with fromkeys and repr. This is because I wrote a very naif implementation of these methods, without optimizing them. The other methods have a comparable speed. Here is the code: https://github.com/Marco-Sulla/cpython Here is the diff between the CPython code and my commits: https://github.com/python/cpython/compare/master...Marco-Sulla:master About code I coded the implementation doing a simple copy/paste of the existing dict functions, modifying their code and renaming them. This way I'm sure dict continues to work as before, and I can compare the speed gain. Some of the optimization I adopted can be implemented in `dict` too. For example, instead of creating an empty dict and resizing it, I create it with the "maximum" size and I fill it. It seems to work, even if I did not explore the possibility that a mutable object can change while a frozendict creation is in progress. Some problems with optimizing dict and maintaining a frozendict: 1. duplication of code. To gain a significant boost, I had to copy and paste a lot of code. Functions can be remerged again, but maybe the speedup will be reduced. 2. split vs combined dicts. As I wrote, split dicts seem to be faster in reading than combined dicts. For example, iterating over values is faster with a split dict, as expected. But writing was not tested; furthermore, some of the optimizations can be adopted for dicts too, so the convenience of a split table can be lowered. dict continues to maintain both split and combined tables, so this could be not a problem. But the code could be less and more fast if only a table layout is supported 3. the CPython code I used is old, so some of the improvements I adopted could be already implemented About frozendict Apart the considerations done in the [PEP 416]( https://www.python.org/dev/peps/pep-0416/), that was rejected since there was little gain from its implementation, I think that frozendict can be useful as a substitute of MappingProxyType, that is really slow. MappingProxyType is not much used, but it's present in critical parts of CPython code, for example in _sre. We have to see if a mapping proxy type *can* be substituted with an immutable map in some critical part of CPython code. Furthermore, frozendicts could be used for implementing "immutable" classes and modules, and can be used as a faster dict if its content does not change.

What, exactly, is frozen? My understanding is that one problem with frozen dicts in the past is deciding exactly what is mutable and what is immutable. Can you change what object a key maps to so long as the set of keys stay the same? Can you modify the contents of mutable object that is a value? On Tue, Jul 21, 2020 at 6:30 PM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:

Yes. Values can be mutable, as tuple. A "pure" immutable dict is a frozendict with only immutable values (strings, tuples of numbers etc). In this case you also have a hash. frozendict apart, I think some of my "tricks" could be applied to dict, if the implementation sounds and there's a real benefit. I'm not a C expert and my knowledge of the Python C API is very limited. But, if you core devs think it's worth it, I could try to submit some patch in CPython trunk.
one problem with frozen dicts in the past is deciding exactly what is mutable and what is immutable.
Well, I had the same problem... Indeed it seems there's no way to use the duck typing to understand if a type is mutable or not. I suppose this can only be done using a convention, or a contract. If we know that an object is immutable, you can adopt some cache and speedup strategies. On Wed, 22 Jul 2020 at 06:53, Guido van Rossum <guido@python.org> wrote:

On Wed, Jul 22, 2020 at 7:31 AM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
For benchmarks, I used simply timeit, with autorange and repeat and, as suggested in the module documentation, I got the minimum of the results. Here is the code:
https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py
I strongly recommend to use pyperf for benchmarking. Otherwise, you will see random performance changes caused by random reasons including ASLR. https://pypi.org/project/pyperf/ https://pyperf.readthedocs.io/en/latest/ Regards, -- Inada Naoki <songofacandy@gmail.com>

Well, I tried it. The result is interesting: https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench_pyp... The main speedups are confirmed, and maybe also accentuated. But some others are diminished or vanished. The most notable is pickle.dumps(). I have to say I did not really know why pickle.dumps() is faster on my pc in "normal" conditions. I wrote the implementation of pickle.dumps() in _pickle.c doing a quick copy/pasting of the dict and frozenset implementations. Indeed pickle.loads() is really slow. Furthermore, it seems that pyperf has not disabled ASLR. After `sudo python -m pyperf system tune`, ASRL continues to be in "Full randomization" mode. On Wed, 22 Jul 2020 at 07:48, Inada Naoki <songofacandy@gmail.com> wrote:

On Wed, Jul 22, 2020 at 4:29 PM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
Furthermore, it seems that pyperf has not disabled ASLR. After `sudo python -m pyperf system tune`, ASRL continues to be in "Full randomization" mode.
You are right. pyperf doesn't disable ASLR, because code performance is changed by code layout. pyperf runs benchmark multiple times in isolated processes and measures stats instead. Victor Stinner, the author of pyperf wrote a lot of information about measuring performance. It's very nice to read before benchmarking. https://vstinner.readthedocs.io/benchmark.html Regards, -- Inada Naoki <songofacandy@gmail.com>

The article is really interesting. Anyway, I think pyperf was developed mainly for macro benchmarks. Its goal is to make different benchmarks on different machines or in different times comparable. What I'm doing on the contrary is a micro benchmark: I would confront if an algorithm is faster in a particular case (immutable dict). I runned the benchmarks many times, and even if the absolute values are not stable, the delta between dict and frozendict is constant. Anyway I think that adding to the output also the sigma it's a good idea. If the sigma is too high maybe the pc is under load and the bench is not reliable. Furthermore I noticed that frozendict is a little more slow at checking if a key is present in the dictionary, if the key is not present, the dictionary has only string keys and the dictionary is small. I suppose that small dicts with only strings as keys are the majority. I'll take a look. On Wed, 22 Jul 2020 at 10:03, Inada Naoki <songofacandy@gmail.com> wrote:

Did you study PEP 416 (frozendict) and PEP 603 (frozenmap)? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

pyrsistent.PMap and PRecord may be worth a look: https://pyrsistent.readthedocs.io/en/latest/api.html#pyrsistent.PMap : pmap() to create an instance.
Was originally written as a very close copy of the Clojure equivalent but
was later rewritten to closer re-assemble the python dict. This means that a sparse vector (a PVector) of buckets is used. The keys are hashed and the elements inserted at position hash % len(bucket_vector). Whenever the map size exceeds 2/3 of the containing vectors size the map is reallocated to a vector of double the size. This is done to avoid excessive hash collisions.
This structure corresponds most closely to the built in dict type and is
intended as a replacement. Where the semantics are the same (more or less) the same function names have been used but for some cases it is not possible, for example assignments and deletion of values.
PMap implements the Mapping protocol and is Hashable. It also supports
dot-notation for element access.
Random access and insert is log32(n) where n is the size of the map.
... https://pyrsistent.readthedocs.io/en/latest/intro.html#pyrsistent : throughout its lifetime and need not worry that somewhere five stack levels below you in the darkest corner of your application someone has decided to remove that element that you expected to be there.
Pyrsistent is influenced by persistent data structures such as those
found in the standard library of Clojure. The data structures are designed to share common elements through path copying. It aims at taking these concepts and make them as pythonic as possible so that they can be easily integrated into any python program without hassle. On Wed, Jul 22, 2020, 12:28 PM Guido van Rossum <guido@python.org> wrote:

What, exactly, is frozen? My understanding is that one problem with frozen dicts in the past is deciding exactly what is mutable and what is immutable. Can you change what object a key maps to so long as the set of keys stay the same? Can you modify the contents of mutable object that is a value? On Tue, Jul 21, 2020 at 6:30 PM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:

Yes. Values can be mutable, as tuple. A "pure" immutable dict is a frozendict with only immutable values (strings, tuples of numbers etc). In this case you also have a hash. frozendict apart, I think some of my "tricks" could be applied to dict, if the implementation sounds and there's a real benefit. I'm not a C expert and my knowledge of the Python C API is very limited. But, if you core devs think it's worth it, I could try to submit some patch in CPython trunk.
one problem with frozen dicts in the past is deciding exactly what is mutable and what is immutable.
Well, I had the same problem... Indeed it seems there's no way to use the duck typing to understand if a type is mutable or not. I suppose this can only be done using a convention, or a contract. If we know that an object is immutable, you can adopt some cache and speedup strategies. On Wed, 22 Jul 2020 at 06:53, Guido van Rossum <guido@python.org> wrote:

On Wed, Jul 22, 2020 at 7:31 AM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
For benchmarks, I used simply timeit, with autorange and repeat and, as suggested in the module documentation, I got the minimum of the results. Here is the code:
https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py
I strongly recommend to use pyperf for benchmarking. Otherwise, you will see random performance changes caused by random reasons including ASLR. https://pypi.org/project/pyperf/ https://pyperf.readthedocs.io/en/latest/ Regards, -- Inada Naoki <songofacandy@gmail.com>

Well, I tried it. The result is interesting: https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench_pyp... The main speedups are confirmed, and maybe also accentuated. But some others are diminished or vanished. The most notable is pickle.dumps(). I have to say I did not really know why pickle.dumps() is faster on my pc in "normal" conditions. I wrote the implementation of pickle.dumps() in _pickle.c doing a quick copy/pasting of the dict and frozenset implementations. Indeed pickle.loads() is really slow. Furthermore, it seems that pyperf has not disabled ASLR. After `sudo python -m pyperf system tune`, ASRL continues to be in "Full randomization" mode. On Wed, 22 Jul 2020 at 07:48, Inada Naoki <songofacandy@gmail.com> wrote:

On Wed, Jul 22, 2020 at 4:29 PM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
Furthermore, it seems that pyperf has not disabled ASLR. After `sudo python -m pyperf system tune`, ASRL continues to be in "Full randomization" mode.
You are right. pyperf doesn't disable ASLR, because code performance is changed by code layout. pyperf runs benchmark multiple times in isolated processes and measures stats instead. Victor Stinner, the author of pyperf wrote a lot of information about measuring performance. It's very nice to read before benchmarking. https://vstinner.readthedocs.io/benchmark.html Regards, -- Inada Naoki <songofacandy@gmail.com>

The article is really interesting. Anyway, I think pyperf was developed mainly for macro benchmarks. Its goal is to make different benchmarks on different machines or in different times comparable. What I'm doing on the contrary is a micro benchmark: I would confront if an algorithm is faster in a particular case (immutable dict). I runned the benchmarks many times, and even if the absolute values are not stable, the delta between dict and frozendict is constant. Anyway I think that adding to the output also the sigma it's a good idea. If the sigma is too high maybe the pc is under load and the bench is not reliable. Furthermore I noticed that frozendict is a little more slow at checking if a key is present in the dictionary, if the key is not present, the dictionary has only string keys and the dictionary is small. I suppose that small dicts with only strings as keys are the majority. I'll take a look. On Wed, 22 Jul 2020 at 10:03, Inada Naoki <songofacandy@gmail.com> wrote:

Did you study PEP 416 (frozendict) and PEP 603 (frozenmap)? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

pyrsistent.PMap and PRecord may be worth a look: https://pyrsistent.readthedocs.io/en/latest/api.html#pyrsistent.PMap : pmap() to create an instance.
Was originally written as a very close copy of the Clojure equivalent but
was later rewritten to closer re-assemble the python dict. This means that a sparse vector (a PVector) of buckets is used. The keys are hashed and the elements inserted at position hash % len(bucket_vector). Whenever the map size exceeds 2/3 of the containing vectors size the map is reallocated to a vector of double the size. This is done to avoid excessive hash collisions.
This structure corresponds most closely to the built in dict type and is
intended as a replacement. Where the semantics are the same (more or less) the same function names have been used but for some cases it is not possible, for example assignments and deletion of values.
PMap implements the Mapping protocol and is Hashable. It also supports
dot-notation for element access.
Random access and insert is log32(n) where n is the size of the map.
... https://pyrsistent.readthedocs.io/en/latest/intro.html#pyrsistent : throughout its lifetime and need not worry that somewhere five stack levels below you in the darkest corner of your application someone has decided to remove that element that you expected to be there.
Pyrsistent is influenced by persistent data structures such as those
found in the standard library of Clojure. The data structures are designed to share common elements through path copying. It aims at taking these concepts and make them as pythonic as possible so that they can be easily integrated into any python program without hassle. On Wed, Jul 22, 2020, 12:28 PM Guido van Rossum <guido@python.org> wrote:
participants (6)
-
Antoine Pitrou
-
Guido van Rossum
-
Inada Naoki
-
Marco Sulla
-
Todd
-
Wes Turner