Let me first say that the code and discussion, as per title, is also about possible performance improvements to the dict base type.

TL;DR I implemented a frozendict using CPython 3.9 code. It seems that an immutable dict *could* be faster than dict in some cases. Furthermore, some optimization could be applied to dict too.

Long explaining:

Since now I have some time, I decided to experiment a little with the creation of an immutable dict in Python.

Unluckily, I started this experiment many months ago, so the CPython code I used is old. Maybe some or all of my considerations are outdated.

Initially, I wrote a quick and dirty implementation:
https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334

The code was very simple, and the performance was identical to dict. So in theory, adding a frozendict to CPython is not hard. But there's more.

After the first implementation, I started to try to improve the performance of frozendict. The result of the improvements are here:

https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt

For benchmarks, I used simply timeit, with autorange and repeat and, as suggested in the module documentation, I got the minimum of the results. Here is the code:

https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py

I have not tested with an optimized build, since optimization data is collected using the unit tests, and I didn't write tests for frozendict in the official CPython test framework.

The tests and benchmarks were done using CPython 3.9a0. CPU and other pc resources were not restricted using pyperf or similar tools, to see the "real" speed. CPython was compiled using gcc and g++.

In benchmarks, I compared methods and operators using dict and frozendict. The benchmark uses dicts with all integers and dicts with all strings. Furthermore, I tested dicts with size 8 (the most optimized size in CPython) and 1000 elements (maybe too much, but I wanted to see how they perform with a high number of elements).

Every benchmark has a line in the output. The Name describes the benchmarked method, operator or code snippet. Size is the size of the dict, 8 or 1000. Keys are the keys type, str or int. Type is the dictionary type, dict or frozendict.

In Name, the "o" represents the object itself (dict or frozendict). "d", in benchmark with dict, is "o"; in benchmarks with frozendict is an equivalent instance of type dict.

Some consideration:

1. frozendict is very fast, as expected, at copying. But it's also faster at creation, using a (not frozen) dict, kwargs or a sequence2. Speedups range from 20% to 45%.
2. frozendict is also a bit faster when you iterate over it, especially over values, where is ~15% faster
3. hash seems really fast, but this is because it's cached the first time hash() is invoked
4. where frozendict is slower is when you unpickle it and with fromkeys and repr. This is because I wrote a very naif implementation of these methods, without optimizing them. The other methods have a comparable speed.

Here is the code:
https://github.com/Marco-Sulla/cpython

Here is the diff between the CPython code and my commits:
https://github.com/python/cpython/compare/master...Marco-Sulla:master

About code

I coded the implementation doing a simple copy/paste of the existing dict functions, modifying their code and renaming them. This way I'm sure dict continues to work as before, and I can compare the speed gain.

Some of the optimization I adopted can be implemented in `dict` too. For example, instead of creating an empty dict and resizing it, I create it with the "maximum" size and I fill it. It seems to work, even if I did not explore the possibility that a mutable object can change while a frozendict creation is in progress.

Some problems with optimizing dict and maintaining a frozendict:

1. duplication of code. To gain a significant boost, I had to copy and paste a lot of code. Functions can be remerged again, but maybe the speedup will be reduced.
2. split vs combined dicts. As I wrote, split dicts seem to be faster in reading than combined dicts. For example, iterating over values is faster with a split dict, as expected.
But writing was not tested; furthermore, some of the optimizations can be adopted for dicts too, so the convenience of a split table can be lowered.
dict continues to maintain both split and combined tables, so this could be not a problem. But the code could be less and more fast if only a table layout is supported
3. the CPython code I used is old, so some of the improvements I adopted could be already implemented

About frozendict

Apart the considerations done in the [PEP 416](https://www.python.org/dev/peps/pep-0416/), that was rejected since there was little gain from its implementation, I think that frozendict can be useful as a substitute of MappingProxyType, that is really slow. MappingProxyType is not much used, but it's present in critical parts of CPython code, for example in _sre. We have to see if a mapping proxy type *can* be substituted with an immutable map in some critical part of CPython code.

Furthermore, frozendicts could be used for implementing "immutable" classes and modules, and can be used as a faster dict if its content does not change.