[Python-ideas] Contiguous-array-based ordering for OrderedDict

Sat Dec 12 06:06:55 EST 2015

On Dec 12, 2015, at 01:27, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
> 
> Here's Tim Peters's 2013 explanation of OrderedDict:
> http://stackoverflow.com/a/18951209/2963903
> 
> He said:
>> But if [the order list] were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list.
> 
> I didn't see why this would need to be the case.
> 1. If you start searching from the front, then each element would be
> inspected or shifted, never both. This would mean only one full
> iteration.

Even if Tim is wrong about it being O(N) twice over, it's still O(N) once over, instead of O(1) like the current implementation.

To remove the Kth element takes K compares, and N-K moves. Yes, that N-K can be one big memmove, which can reduce the constant factor by orders of magnitude... But it's still linear work that the linked list version doesn't have to do, and N/100 is still worse than 1. 

Plus, the K compares can't be similarly accelerated, and on average K is N/2.

The current implementation, on the other hand, takes one hash lookup, one hash deletion, and two node pointer twiddles, which means it's O(1).

> 2. You don't have to shift for each deletion. You can wait until some
> threshold is reached before shifting, and maybe that will spread out
> the cost of shift among deletions enough to make an impact.

I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you.

Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose?

> 3. He then mentions that the linkedlist implementation uses a second
> dict to map keys to their list nodes. Well, that would solve the cost
> of lookup for lists, too.

No it won't. With a linked list, you only have to update a single hash entry when a node is inserted or deleted, because the rest of the nodes are unchanged. But with an array, you have to update half the hash entries, because half of the indices are shifted. (And, again, spreading out the work isn't going to help unless you can actually reduce the actual amount of work to something sublinear.)

> So I ended up taking 3.5.0's OrderedDict and implemented the idea. I'm
> not really sure what to do next, in terms of testing and comparing it
> against the existing implementation,

There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start.

> and whether it's worth pushing
> for such a design for OrderedDict (current Python version or future C
> cersion).

What would be the benefit of such a design? Unless it's a huge performance win, I don't see what your argument would be. (Sure, it's simpler to describe--but the implementation is more complex, so it probably doesn't serve as better example code.)

> It's enough for me if I can figure out how this compares in
> performance, both algorithmically and actually, to the current
> implementation.
> 
> --------------------------------------------------------------------
> 
> First idea: (link: http://pastebin.com/LESRktJw)
> 
>    Data members (with ListDict() as od):
>    - super() : dict[key] => value
>    - od.__map : dict[key] => index where od.__list[index] = key
>    - od.__list : list[index] => key or `_sentinel`
>    - od.__size : int - number of non-sentinel elements in list (aka len(dict))
>        - This might be replaceable by len(self).
> 
> 
>    Some algorithms to note:
>    - .__setitem__: Add it to the end of __list if it's new.
>    - .__delitem__: Replace the __list entry with `sentinel`, and
> attempt to compact the list.
>    - .__compact: Called when items are removed. Naive threshold: If
> the list is 50% empty, then make a new list and update the indices for
> each key.
>    - (iterator views): Simply yield things that aren't `sentinel`.
>    - .move_to_end(last=False): The biggest loss. Adding to the front
> of the list will shift potentially many things up. Maybe if the list
> were a dequeue, this would be faster.

Honestly, I'd just leave that out of the preliminary tests. If everything else gets faster, then you can worry about how to make this faster as well.

>    The actual length of __list will slow down iteration through the
> list, but the factor of O(n) is dependent on the threshold value. And
> the code itself MIGHT be faster per-element, and maybe (big maybe)
> easier for the hardware.

Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.) 

>    Needs special review:
>    - __sizeof__: I have no idea if I did this right.
>    - (iterators): Can they be made to sometimes detect
> additions/removals during iteration?
>    - Exception safety: E.g. in __setitem__, the order of updates for
> the metadata might leave the dict in an invalid state.
> 
>    Also needs review:
>    - I put a lot of global calls as default args, but the existing
> use of default args for static name lookup seemed inconsistent.
> 
> --------------------------------------------------------------------
> 
> Second idea: thicker wrapper.
> 
>    Store a key,value,index triple in the internal dictionary, instead
> of just the value.
> 
>    This will have a wrapper cost on __getitem__ (have to get the
> value from the triple), but will save many lookups for adding and
> removing elements (which is probably the less-common use).

I actually thought about collapsing the linked list implementation in the same way. But I think there are many uses where a 2x slower lookup hurts a lot more than a 50% faster mutate helps. And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.)

>    (At the C level, it should be possible to keep multiple storage
> dictionaries in lockstep. Also, the triple doesn't need to be exposed
> to the Python interface, right?)

Are you suggesting increasing every hash bucket in every dictionary by an extra pointer, just to speed up OrderedDict? That would definitely slow down regular dicts, and increase their memory use, and they get used a lot more often.

>    Type:
>        class _DictStorage(object):
>            __slots__ = ['key', 'value', 'index']
> 
> 
>    Members:
>        (external): ListDict[key] => value
>        super() : dict[key] => [key, value, index]
>        .__list : list[i] => [key, value, index] (same as above) or None
>        .__size : as in the other implementation. Probably just as unnecessary.
>        .__map : (removed; unneeded)
> 
> 
>    Some method implementations:
>        .__getitem__:
>            Grab the [k,v,i] from the inner dict and unwrap it.
>        .__setitem__:
>            If the triple exists, update its value.
>            If it doesn't exist, add in [k, v, len(__list)-1]
>                and append it to __list.
>            (This saves a lookup in the internal dict.)
>        .__delitem__:
>            inner.pop(key), and use the index to None it on the list.
>        (iteration):
>            Same as the first ListDict implementation, except using None
>            instead of _sentinel.
>        .move_to_end, .__compact:
>            Same as the first ListDict, except I won't have to
>            dict.get the keys to update their indices.
> 
> 
> Third idea: Well, I have to learn PyPy's implementation of OrderedDict.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/