[Python-ideas] Dict literal use for custom dict classes

Thu Dec 17 01:00:40 EST 2015

On Wednesday, December 16, 2015 9:36 PM, Nathaniel Smith <njs at pobox.com> wrote:

> > On Wed, Dec 16, 2015 at 9:12 PM, Andrew Barnert <abarnert at yahoo.com> 
> wrote:
>>  On Dec 16, 2015, at 15:17, Nathaniel Smith <njs at pobox.com> wrote:
>>> 
>>>>  On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller 
> <python-ideas at mgmiller.net> wrote:
> [...]
>>>  That's not so clear, actually! It turns out that PyPy was able to 
> make
>>>  their regular 'dict' implementation ordered, while at the same 
> time
>>>  making it faster and more memory-efficient compared to their previous
>>>  (CPython-like) implementation:
>>> 
>>>   
> http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html
>>> 
>>>  So in PyPy all these issues are automatically solved for free. The
>>>  $1e6-question these other proposals have to answer is, why not do what
>>>  PyPy did?
>> 
>>  You don't even need that; a dict that's ordered as long as you 
> never delete from it or expand it after initial creation is good enough, and 
> that may be simpler. (For example, Raymond Hettinger's two-table prototype 
> guarantees this much, even though it isn't order-preserving on deletes, and 
> it should be faster and more compact than the current design, although I 
> don't know if anyone's proven that part, and it's the 
> "dead-simple once you see it" kind of brilliant rather than the 
> deep-magic kind.)
> 
> IIUC, the PyPy dict is exactly a fully-worked-out version of Raymond
> Hettinger's two-table design, and they claim that it is in fact faster
> and more compact than the current design, so I suppose one could argue
> that someone has indeed proven that part :-).

According the blog post, some cases are slower, "for example when repeatedly adding and removing keys in equal number".

For PyPy, that's obviously fine. But PyPy generally isn't worried about small performance regressions between PyPy versions for relatively uncommon edge cases, especially if the new code is still much faster than CPython, and when it enables "various optimization possibilities which we're going to explore in the near future", and so on.

I don't know if the same applies for CPython. So it may be better to stick with Raymond's original design, which should be faster than 3.5 in all cases, not just most, and require less and simpler code, and still provide the guarantee we actually need here (which will hopefully be an easier requirement on the other implementations as well).

As an aside, IIRC, the rejected "blist" proposal from a few years ago sped up every benchmark, and also provided all the guarantees we need here. (I think it used a flat a-list for tiny dicts, a variant B-tree for medium-sized dicts and all non-tiny literals, and a hash when you resize a dict beyond a certain cutoff, or something like that.) It's obviously more complicated than the Raymond design or the PyPy design, and was presumably rejected for a good reason, but it's more evidence that the requirement may not be too strenuous.