Contiguous-array-based ordering for OrderedDict

Here's Tim Peters's 2013 explanation of OrderedDict: http://stackoverflow.com/a/18951209/2963903 He said:
But if [the order list] were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list.
I didn't see why this would need to be the case. 1. If you start searching from the front, then each element would be inspected or shifted, never both. This would mean only one full iteration. 2. You don't have to shift for each deletion. You can wait until some threshold is reached before shifting, and maybe that will spread out the cost of shift among deletions enough to make an impact. 3. He then mentions that the linkedlist implementation uses a second dict to map keys to their list nodes. Well, that would solve the cost of lookup for lists, too. So I ended up taking 3.5.0's OrderedDict and implemented the idea. I'm not really sure what to do next, in terms of testing and comparing it against the existing implementation, and whether it's worth pushing for such a design for OrderedDict (current Python version or future C cersion). It's enough for me if I can figure out how this compares in performance, both algorithmically and actually, to the current implementation. -------------------------------------------------------------------- First idea: (link: http://pastebin.com/LESRktJw) Data members (with ListDict() as od): - super() : dict[key] => value - od.__map : dict[key] => index where od.__list[index] = key - od.__list : list[index] => key or `_sentinel` - od.__size : int - number of non-sentinel elements in list (aka len(dict)) - This might be replaceable by len(self). Some algorithms to note: - .__setitem__: Add it to the end of __list if it's new. - .__delitem__: Replace the __list entry with `sentinel`, and attempt to compact the list. - .__compact: Called when items are removed. Naive threshold: If the list is 50% empty, then make a new list and update the indices for each key. - (iterator views): Simply yield things that aren't `sentinel`. - .move_to_end(last=False): The biggest loss. Adding to the front of the list will shift potentially many things up. Maybe if the list were a dequeue, this would be faster. The actual length of __list will slow down iteration through the list, but the factor of O(n) is dependent on the threshold value. And the code itself MIGHT be faster per-element, and maybe (big maybe) easier for the hardware. Needs special review: - __sizeof__: I have no idea if I did this right. - (iterators): Can they be made to sometimes detect additions/removals during iteration? - Exception safety: E.g. in __setitem__, the order of updates for the metadata might leave the dict in an invalid state. Also needs review: - I put a lot of global calls as default args, but the existing use of default args for static name lookup seemed inconsistent. -------------------------------------------------------------------- Second idea: thicker wrapper. Store a key,value,index triple in the internal dictionary, instead of just the value. This will have a wrapper cost on __getitem__ (have to get the value from the triple), but will save many lookups for adding and removing elements (which is probably the less-common use). (At the C level, it should be possible to keep multiple storage dictionaries in lockstep. Also, the triple doesn't need to be exposed to the Python interface, right?) Type: class _DictStorage(object): __slots__ = ['key', 'value', 'index'] Members: (external): ListDict[key] => value super() : dict[key] => [key, value, index] .__list : list[i] => [key, value, index] (same as above) or None .__size : as in the other implementation. Probably just as unnecessary. .__map : (removed; unneeded) Some method implementations: .__getitem__: Grab the [k,v,i] from the inner dict and unwrap it. .__setitem__: If the triple exists, update its value. If it doesn't exist, add in [k, v, len(__list)-1] and append it to __list. (This saves a lookup in the internal dict.) .__delitem__: inner.pop(key), and use the index to None it on the list. (iteration): Same as the first ListDict implementation, except using None instead of _sentinel. .move_to_end, .__compact: Same as the first ListDict, except I won't have to dict.get the keys to update their indices. Third idea: Well, I have to learn PyPy's implementation of OrderedDict.

On Dec 12, 2015, at 01:27, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Even if Tim is wrong about it being O(N) twice over, it's still O(N) once over, instead of O(1) like the current implementation. To remove the Kth element takes K compares, and N-K moves. Yes, that N-K can be one big memmove, which can reduce the constant factor by orders of magnitude... But it's still linear work that the linked list version doesn't have to do, and N/100 is still worse than 1. Plus, the K compares can't be similarly accelerated, and on average K is N/2. The current implementation, on the other hand, takes one hash lookup, one hash deletion, and two node pointer twiddles, which means it's O(1).
I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose?
No it won't. With a linked list, you only have to update a single hash entry when a node is inserted or deleted, because the rest of the nodes are unchanged. But with an array, you have to update half the hash entries, because half of the indices are shifted. (And, again, spreading out the work isn't going to help unless you can actually reduce the actual amount of work to something sublinear.)
There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start.
What would be the benefit of such a design? Unless it's a huge performance win, I don't see what your argument would be. (Sure, it's simpler to describe--but the implementation is more complex, so it probably doesn't serve as better example code.)
Honestly, I'd just leave that out of the preliminary tests. If everything else gets faster, then you can worry about how to make this faster as well.
Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.)
I actually thought about collapsing the linked list implementation in the same way. But I think there are many uses where a 2x slower lookup hurts a lot more than a 50% faster mutate helps. And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.)
Are you suggesting increasing every hash bucket in every dictionary by an extra pointer, just to speed up OrderedDict? That would definitely slow down regular dicts, and increase their memory use, and they get used a lot more often.

On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote:
All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run.

On Dec 12, 2015, at 04:19, Serhiy Storchaka <storchaka@gmail.com> wrote:
That's not the same at all. The benefit of dict isn't that it's amortized, it's that it's amortized _constant_ instead of linear, which is a huge improvement, worth a bit of chunkiness. Going from linear to amortized linear, as the OP proposes, doesn't get you anything good for the cost. I think you may be forgetting that dict doesn't rehash "from time to time" as this proposal does, it only does it when you grow. And it expands exponentially rather than linearly, so even in the special case where 100% of your operations are inserts, it's still only going to happen a few times.

On 13.12.15 00:11, Andrew Barnert via Python-ideas wrote:
Either you or me misunderstood the OP proposition. An array of indices needs to be "compacted" (that costs O(n)) only after at least O(n) addition/deletion/moving operations. Therefore the amortized cost is constant in worst case. All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] [1] https://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-spac...

On Dec 12, 2015, at 15:20, Serhiy Storchaka <storchaka@gmail.com> wrote:
You've got two halves of a process that both take N/2 time. If you turn one of those halves into amortized constant time, your total time is still linear. (Maybe in some cases you've made it twice as fast, but that's still linear--but at any rate, in this case, he hasn't really made it twice as fast, because the first half, the linear search, now has twice as much to search.) So he's adding linear chunkiness, without reducing the total time. That isn't the same as a normal dict, which adds only logarithmic chunkiness (it does its extra N work log N times, instead of extra N work N/C times), and in service of reducing the total time from linear to amortized constant. It's possible that there's some other optimization here that the OP didn't include (or that I didn't notice and reply to here) that you've intuitively assumed, which can get us back to constant time. If so, that would be great. (But if there were a way to get constant-time inserts and deletes at arbitrary locations within an array, I suspect nobody would have invented linked lists; am I missing something?)
All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1]
But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). For example: >>> d = Dict(((1,1), (2,2), (3,3), (4,4))) >>> list(d) [1, 2, 3, 4] >>> del d[1] >>> list(d) [4, 2, 3] And constant time deletes (or moves) without destroying order, which is exactly the problem that the hash of linked list nodes in the current design is there to solve. If you don't care about that problem, you don't need that design, but if you're building a general replacement for OrderedDict, you need it (or something to replace it). Of course an ordered mapping that doesn't support deletions (or moves) could still be useful for lots of things (e.g., most of the time people say they want ordering for a **kw or a class dict, they don't need to delete or move anything).

I'm not following this thread all that closely, but am I the only one who thinks that the LRU cache (discussed in another thread) could be implemented with much less code on top of OrderedDict? Basically whenever you access a key you just move it to the front, and when you add a key when it is already at capacity you delete the first one. -- --Guido van Rossum (python.org/~guido)

On 13.12.15 02:27, Guido van Rossum wrote:
I have doubts about this. First, an ordered dictionary is only a part of the LRU cache implementation, using OrderedDict wouldn't make the code much less. Second, the LRU cache needs only small and very limited part of OrderedDict, with different requirements to reentrancy, thread-safety, and errors handling (partially more strong, partially more lenient). The code currently used by the LRU cache is much simpler, likely faster, and more bug-free.

On Sat, Dec 12, 2015 at 5:16 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Fair enough. Perhaps it would have made more sense if we had a C version of OrderedDict but not of lru_cache. Still, two things: (1) Anything written in Python that implements a linked list feels like a code smell to me (it's hard to make up for the constant factor with a better algorithm unless the data structure is pretty large). (2) If you want to implement something that's close to lru_cache but not quite (e.g. one where items also come with a timeout, like DNS lookups), you can't subclass its implementation, because it's all hidden in a function. But you can wrap a little bit of extra logic around the 10 or so lines it takes to implement a decent LRU cache on top of OrderedDict. -- --Guido van Rossum (python.org/~guido)

On 13.12.15 02:06, Andrew Barnert via Python-ideas wrote:
What is linear time about what you are saying? I don't see anything that has no amortized constant complexity. What I have missed?
All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1]
But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that).
This is only because preserving ordering is not a goal for a dict. Moving the last entry to the place of deleted one is the simplest design if don't care about ordering. But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted.

On Dec 12, 2015, at 17:03, Serhiy Storchaka <storchaka@gmail.com> wrote:
Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time. He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here. He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below. What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations.
Sure. But it is the goal for OrderedDict. Which is exactly why a design that makes sense for dict doesn't necessarily make sense for OrderedDict, unless you add something else. The hash table of list nodes works as such a something else. But Raymond's array doesn't (and isn't intended to).
Moving the last entry to the place of deleted one is the simplest design if don't care about ordering.
But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted.
The entire benefit of Raymond's design is that it lets you store, and iterate, a dense array instead of a sparse one. If you're going to make that array every sparser than the hash table, you've lost all of the benefits. (But you're still paying the cost of an extra index or pointer, extra dereference, etc.)

FWIW, I think PyPy's dict implementation is advanced version of Raymond's. More compact and preserves order on deletion. http://morepypy.blogspot.jp/2015/01/faster-more-memory-efficient-and-more.ht...

On 13.12.15 06:57, Andrew Barnert via Python-ideas wrote:
Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time.
A linked list searching for the value takes linear time too. To avoid it, an auxiliary hashtable is used. In current Python implementation it maps key to a list node, in proposed implementation it maps key to an index in a continuous array. No linear search anymore, this problem already is solved.
He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here.
He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below.
What do you mean saying about "change half the hash values for each shift" and why this is a problem?
What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations.
Let's distinguish between two scenarios. In the first scenario, the dictionary is only growing. Every O(N) additions you need to resize and rebuild a hashtable of size O(N). This gives amortized constant time of addition. In proposed OrderedDict design you need also resize an array and resize and rebuild an auxiliary hashtable, both have linear complexity. Resulting amortized time is constant again. In the second scenario, the dictionary has equal number of additions and deletions, and its size is not changed. For a dict it dousn't add anything to base cost of addition and deletion. In proposed OrderedDict design, since almost every addition append an item to an array, periodical repacking is needed. It has linear cost (repacking an array and updating an auxiliary hashtable). Total amortized cost of one additional or deletion operations is constant. The implementation with continuous array has about the same complexity as the implementation with linked list. The difference is only in constant multiplier, and without real C code we can't determine what constant is lower. Actually the current C implementation of OrderedDict uses an continuous array for mapping an index in the base hashtable to a list node. It is rebuild from a linked list when the base hashtable is rebuild. I where planned to experiment with getting rid of linked list at all.
The benefit of Raymond's design is that it lets you iterate a continuous array instead of jump forward and backward on randomly distributed in memory list nodes. And on modern computers iterating even sparse array can be faster that iterating linked list. [*] This is not the only benefit of Raymond's design, the other is presumably more compact representation. [*] Of course needed benchmarks to prove this.

On Dec 14, 2015, at 06:58, Serhiy Storchaka <storchaka@gmail.com> wrote:
This is already covered in the very next paragraph.
If you delete a linked list node, none of the other list nodes change. So, if you have hash entries pointing at all of the list nodes, only one hash entry has to be changed for a delete. If you delete an array element, all of the subsequent indices change. So, if you have hash entries pointing at the indices, on average, half of the hash entries have to be changed for a delete. And if you're about to say "but combining the two solves that problem", that's already in the very next paragraph, so please read on.
What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations.
Let's distinguish between two scenarios. In the first scenario, the dictionary is only growing. Every O(N) additions you need to resize and rebuild a hashtable of size O(N).
If that were true, dicts would be amortized linear time. You do O(N) work O(N) times, divided over O(N) operations, which means amortized O(N*N/N), which is linear. The fact that dict (and list) grow exponentially rather than linearly is critical. It means that, even in the case of nothing but inserts, you only rehash O(log N) times, not O(N), and the work done each time also only grows logarithmically. The total work done for all resizes ends up being just a constant times N (think about the sum of 1 + 1/2 + 1/4 + 1/8 + ... vs. 1 + 9/10 + 8/10 + ...), so, divided by N operations, you get amortized O(C*N/N), which is constant.
This gives amortized constant time of addition. In proposed OrderedDict design you need also resize an array and resize and rebuild an auxiliary hashtable, both have linear complexity. Resulting amortized time is constant again.
Sure, but this just shows that the fact that deletes are linear time doesn't matter if you never do any deletes.
In the second scenario, the dictionary has equal number of additions and deletions, and its size is not changed. For a dict it dousn't add anything to base cost of addition and deletion. In proposed OrderedDict design, since almost every addition append an item to an array, periodical repacking is needed. It has linear cost (repacking an array and updating an auxiliary hashtable). Total amortized cost of one additional or deletion operations is constant.
No, deletion operations do O(N) work O(M) times amortized over M, where N is the dict size and M the number of operations. It's linear in the max size of the dict. In the special case where the dict never changes size, you could just declare that a constant--but that's pretty misleading. For one thing, it's usually going to be a very large constant. For example, if you start with 1000 elements and do 2000 deletes and 2000 inserts, your constant is N/4. More importantly, a constant-sized dict with lots of mutations is a pretty rare use case, and in the more common case where there are more inserts than deletes, so the dict is growing, the delete times are increasing with the dict size. Unless the ratio of deletes to inserts is falling at the same rate (another rare special case), this isn't going to be constant. And if you're wondering how dict gets away with this, it's kind of cheating by just not generally shrinking on deletes, but that's a cheat that works just fine for the majority of real-life use cases (and it's hard to work around in the vast majority of cases where it doesn't). That obviously won't work here--if a dict later regrows, it automatically uses up all the deleted buckets, but if an array-based OrderedDict later regrows, it will just append onto the end, leaving all the holes behind. You have no choice but to compact those holes at some point, and there's no way to make that constant time in an array.
The implementation with continuous array has about the same complexity as the implementation with linked list. The difference is only in constant multiplier, and without real C code we can't determine what constant is lower.
If your argument were correct, you could always simulate a linked list with an array with only a constant-factor cost, and that just isn't true.
The benefit of Raymond's design over the existing dict, as he explains in the ActiveState post, is that you can iterate (and store some of the data in) the dense array rather than iterating the hash table (which is a sparse array). He can maintain that denseness even in the face of deletions because he isn't trying to preserve order, so he can always just swap the last value into the deleted slot. If you try to use the same design to preserve order, you can't do that. If you instead mark deleted slots as deleted, and recompact every time it gets to 50% deleted, then you're iterating a sparse array that's more sparse than the hash table, so you do not get the time (or space) benefits over a dict that Raymond's design offers.
And on modern computers iterating even sparse array can be faster that iterating linked list. [*]
Now you're proposing a completely different theoretical benefit from the one Raymond designed for and explained, so the analogy with his design doesn't buy you anything. You'd be better off sticking to analogy with the current plain dict, which, like your design, iterates over a sparse array, and is simpler to explain and analyze.
This is not the only benefit of Raymond's design, the other is presumably more compact representation.
Yes, if you can move each 48-byte hash bucket into a dense array, leaving only a 4-byte integer in the sparse one in the hash table, then you save 48 * load - 4 bytes per entry. But the design under discussion doesn't get that, because you're putting the large data in an array that's even sparser than the hash table, so you're losing space. One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc.

Despite all the nice discussion about runtime and so on and so forth. Couldn't he just provide a working implementation to prove his point? Maybe, he finds out himself that it's not working as intended and in the course of doing so, he finds an even better solution. Best, Sven On 14.12.2015 18:30, Andrew Barnert via Python-ideas wrote:

On Dec 14, 2015 1:11 PM, "Sven R. Kunze" <srkunze@mail.de> wrote:
Despite all the nice discussion about runtime and so on and so forth.
Couldn't he just provide a working implementation to prove his point? Maybe, he finds out himself that it's not working as intended and in the course of doing so, he finds an even better solution.
I did. It is in a pastebin link in my original message, based on the 3.5.0 Python (not C) version of OrderedDict. I was hoping for guidance on evaluating it. Maybe it wasn't seen because I used '-'*n to separate it from my intro, or maybe pastebin is so disliked that people couldn't see it. Here it is again: http://pastebin.com/LESRktJw Sorry I haven't been responding. I wanted to respond on a PC instead of my phone because I don't see a plaintext option on the app, but I get distracted when on the PC. My implementation combines the ideas I listed. Raymond's idea seems similar, but more clever with indexing and less focused on ordering. Regarding PyPy's, I get the impression that CPython can make an efficient implementation based on it. I think that's the real way to go for an efficient OrderedDict implementation. By the way, here are the costs for some of the algorithms, compared to the raw Python OrderedDict: - getitem: unchanged - setitem: Amortized O(1). but for slightly different reasons (list expansion in addition to expansion of two maps). If new key, it will usually be better: no creation of a new object, and a lot less constant work in bytecode if the list isn't expanded (incrementing and appending instead of linking). - delitem: Amortized O(1). A lot less bytecode overhead as above, but there can be a compaction here. - (iteration): Iterating over a 50% empty array may be better than iterating over a linked list, especially if cache is taken into account. - Python version: For a 50% empty array, you pay len(odict) extra "if"s and "is"s, plus the cost of iterating over a (reversed, for reverse iteration) list of length 2*len(odict). The list version has a lot of extra dereferences instead. - Potential C version: It's just a simple for loop with a check of pointer identity, versus a chain of dereferences. It should definitely be better on the cache, maybe less good for branch prediction. - pop: Same as delitem: possible compaction. - move_to_end: Moving to the back is less constant work (though I do a lot of checks, and the C version might not have the same benefits). Moving to the front is awful: it shifts everything up. This can be alleviated by using a deque, or using the list as a list-backed deque: treat the array as circular. This would add overhead to everything else, though, so I don't like it. (How do I find out how popular the second parameter to move_to_end is?) If using my second idea of storing inner_dict[key] = [key, value, index] (where index is into the order array), there would be a lot less overhead on many things, but more overhead on getitem. In C code, it might be barely any overhead. On Sat, Dec 12, 2015 at 6:06 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start.
I, uh, have no professional experience in programming. So it's not as easy for me to know how to start.
Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.)
When iterating the array, here are the costs per element: - used: Same as linked list, but without the linkedlist overhead. - unused: Check if the entry is valid. This doesn't require dereferencing. It's possible that the branch for checking validity is higher than the linkedlist overhead, but that's the only way I can think of that the array would be worse at iterating, even when sparse.
And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.)
I'm pretty sure that all C code uses the C version of __getitem__ on the dict, because it's NOT statically known which __getitem__ function to call. There's a pointer in each dict to its __getitem__ function. On Sat, Dec 12, 2015 at 7:34 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
The performance of Python implementation doesn't matter much now (while it has the same computational complexity).
It might matter a little, for alternative Python implementations which haven't yet made their own optimized ODicts. On Mon, Dec 14, 2015 at 12:30 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc.
I think it's much better to just live with a half-empty array. But this would have to be proven with code and benchmarks. As an aside, have you seen Stroustrup and Sutter talking about linked lists versus vectors? - Stroustrup: https://www.youtube.com/watch?v=YQs6IC-vgmo - Sutter: https://channel9.msdn.com/Events/Build/2014/2-661 Here's an example: - Vector: Linear search for an element, then remove it by shifting everything down. - List: Linear search for an element, then remove it by delinking. Vector will beat list for even very large sizes, simply because the dereferencing in the search will be worse than the shifting (which would take advantage of prefetching, branch prediction, etc.). For iteration of an OrderedDict based on an array, there won't be a branch prediction benefit, since the holes will be distributed solely based on how it's used. There might not even be a prefetching benefit, usually, because of how many objects might be involved during the iteration (that is, more memory is touched than with equivalent C++ code). But that's what testing is for, I guess.

In a message of Tue, 15 Dec 2015 04:09:55 -0500, "Franklin? Lee" writes:
That is the problem. We absolutely do not want links to things like pastebin. We want the code here, as part of the text. 5 years from now, when other people are scratching their heads saying, I wonder why Guido decided things the way he did, and whether that decision can and should be revisited, the first thing we will do is to go back and read all this discussion. And if the discussion is about code we can no longer see, because the pastebin has expired, then we won't be able to learn much. Anything that matters needs to be part of the archived discussion. Laura

On Tue, Dec 15, 2015 at 9:53 AM, Laura Creighton <lac@openend.se> wrote:
I feel similarly about information history, which is why I always set "Expire: Never" when I use pastebin :D. But alright. The rest of this message is the code as I had it when I sent the original message. from collections.abc import * from reprlib import recursive_repr as _recursive_repr from collections import OrderedDict class _ListDictKeysView(KeysView): def __reversed__(self): yield from reversed(self._mapping) class _ListDictItemsView(ItemsView): def __reversed__(self): for key in reversed(self._mapping): yield (key, self._mapping[key]) class _ListDictValuesView(ValuesView): def __reversed__(self): for key in reversed(self._mapping): yield self._mapping[key] _sentinel = object() class ListDict(dict): 'Dictionary that remembers insertion order' # An inherited dict maps keys to values. # The inherited dict provides __getitem__, __len__, __contains__, and get. # The remaining methods are order-aware. # Big-O running times for all methods are the same as regular dictionaries. def __init__(*args, **kwds): '''Initialize an ordered dictionary. The signature is the same as regular dictionaries, but keyword arguments are not recommended because their insertion order is arbitrary. ''' if not args: raise TypeError("descriptor '__init__' of 'ListDict' object " "needs an argument") self, *args = args if len(args) > 1: raise TypeError('expected at most 1 arguments, got %d' % len(args)) try: # self.__root self.__list except AttributeError: self.__map = {} self.__list = [] self.__size = 0 self.__update(*args, **kwds) def __setitem__(self, key, value, dict_setitem=dict.__setitem__, len=len): 'od.__setitem__(i, y) <==> od[i]=y' # If it's a new key, we need to track it. if key not in self: self.__map[key] = len(self.__list) self.__list.append(key) self.__size += 1 dict_setitem(self, key, value) def __delitem__(self, key, dict_delitem=dict.__delitem__, sentinel=_sentinel): 'od.__delitem__(y) <==> del od[y]' dict_delitem(self, key) # Remove the tracking for this item index = self.__map.pop(key) self.__list[index] = sentinel self.__size -= 1 self.__compact() def __iter__(self, sentinel=_sentinel): 'od.__iter__() <==> iter(od)' for key in self.__list: if key is not sentinel: yield key def __reversed__(self, sentinel=_sentinel, reversed=reversed): 'od.__reversed__() <==> reversed(od)' for key in reversed(self.__list): if key is not sentinel: yield key def clear(self): 'od.clear() -> None. Remove all items from od.' self.__list.clear() self.__map.clear() self.__size = 0 dict.clear(self) # dict.clear isn't cached? def popitem(self, last=True, sentinel=_sentinel, reversed=reversed, next=next): '''od.popitem() -> (k, v), return and remove a (key, value) pair. Pairs are returned in LIFO order if last is true or FIFO order if false. ''' if not self: raise KeyError('dictionary is empty') if last: lst = reversed(self.__list) else: lst = self.__list # Could use the key lookup to find this, but... meh. # Note that attempting to compact first might have helped. index, key = next((i, k) for i, k in enumerate(lst) if k is not sentinel) # We're calling dict.pop later, which won't handle # the metadata. del self.__map[key] self.__list[index] = sentinel self.__size -= 1 self.__compact() value = dict.pop(self, key) # dict.pop isn't cached? return key, value def __compact(self, sentinel=_sentinel, enumerate=enumerate, reversed=reversed): ''' Compact the order __list if necessary. ''' # May need to use this a lot in the upcoming `else`. lst = self.__list if not lst: return if self.__size / len(lst) <= 0.5: #chosen at non-random # Implementation 1: list comprehension self.__list = [k for k in lst if k is not sentinel] # Implementation 2: # If only `list` had a `remove_all` method... pass ''' Update all indices after a reordering. Should only be done when full (because it shouldn't be necessary otherwise). ''' inner_map = self.__map for index, key in enumerate(self.__list): inner_map[key] = index else: # Even if the list isn't mostly empty, # we can try to clear the back. # TODO: How can this be more efficient? # Note: There exists a non-sentinel because # otherwise, .__size/positive == 0 < positive. # # Implementation 1: Pop until it drops. # while lst[-1] is sentinel: # lst.pop() # Implementation 2: Count the number of sentinels at the end. emptys = next(i for i, k in enumerate(reversed(lst)) if k is not sentinel) # guaranteed not to StopIteration since .__size > 0 del lst[:-emptys] #safe even if 0 def move_to_end(self, key, last=True, sentinel=_sentinel, enumerate=enumerate, len=len): '''Move an existing element to the end (or beginning if last==False). Raises KeyError if the element does not exist. When last=True, acts like a fast version of self[key]=self.pop(key). ''' index = self.__map[key] lst = self.__list if last: if index + 1 == len(lst): # already last # Not sure if this is the right path to optimize. # But I think redundant move_to_ends shouldn't # blow up the __list. return lst[index] = sentinel if lst[-1] is sentinel: # can just swap with last lst[-1] = key self.__map[key] = len(lst) - 1 else: # append and maybe compact lst[index] = sentinel lst.append(key) self.__map[key] = len(lst) - 1 self.__compact() else: # This is costly. But this shouldn't # be a common case anyway, right? # I mean, who repeatedly adds to the front # of an OrderedDict? # And this is basically the only costly # operation I'm adding. lst[index] = sentinel # Propagate forward from the front. for i, newkey in enumerate(lst): self.__map[key] = i lst[i], key = key, newkey if key is sentinel: break def __sizeof__(self): sizeof = _sys.getsizeof size = sizeof(self.__dict__) # instance dictionary size += sizeof(self.__map) * 2 # internal dict and inherited dict size += sizeof(self.__list) size += sizeof(self.__size) return size update = __update = MutableMapping.update def keys(self): "D.keys() -> a set-like object providing a view on D's keys" return _ListDictKeysView(self) def items(self): "D.items() -> a set-like object providing a view on D's items" return _ListDictItemsView(self) def values(self): "D.values() -> an object providing a view on D's values" return _ListDictValuesView(self) __ne__ = MutableMapping.__ne__ __marker = object() def pop(self, key, default=__marker): '''od.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised. ''' if key in self: result = self[key] del self[key] return result if default is self.__marker: raise KeyError(key) return default def setdefault(self, key, default=None): 'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od' if key in self: return self[key] self[key] = default return default @_recursive_repr() def __repr__(self): 'od.__repr__() <==> repr(od)' if not self: return '%s()' % (self.__class__.__name__,) return '%s(%r)' % (self.__class__.__name__, list(self.items())) def __reduce__(self): 'Return state information for pickling' inst_dict = vars(self).copy() for k in vars(ListDict()): inst_dict.pop(k, None) return self.__class__, (), inst_dict or None, None, iter(self.items()) def copy(self): 'od.copy() -> a shallow copy of od' return self.__class__(self) @classmethod def fromkeys(cls, iterable, value=None): '''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S. If not specified, the value defaults to None. ''' self = cls() for key in iterable: self[key] = value return self def __eq__(self, other): '''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive while comparison to a regular mapping is order-insensitive. ''' if isinstance(other, ListDict): return dict.__eq__(self, other) and all(map(_eq, self, other)) return dict.__eq__(self, other)

Wow. Thanks for that. :) Well, is there some performance case scenario? Something which tests all possible interactions with OrderedDict which could be used as the authoritative Benchmark on this. Otherwise, I fear, it's just talking about possible possibilities nobody really evaluated properly. Best, Sven On 15.12.2015 16:00, Franklin? Lee wrote:

Franklin? Lee wrote:
I feel similarly about information history, which is why I always set "Expire: Never" when I use pastebin :D.
Never is a long time. In this context it really means "as long as pastebin itself continues to exist", which could be less than the time the python mailing list archives continue to exist. -- Greg

On Wed, Dec 16, 2015 at 8:07 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Or, more simply: As long as pastebin is accessible. It's entirely possible to create an offline archive of python-ideas posts (whether you see it as a mailing list or a newsgroup), and then to browse it at a time when you have no internet connection. That archive will be incomplete to the value of all externally-linked content. ChrisA

On Tue, Dec 15, 2015 at 7:00 AM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
But alright. The rest of this message is the code as I had it when I sent the original message.
Be sure to run the full OrderedDict test suite. [1] Also, I posted a rudimentary benchmark when working on the C implementation of OrdereDict that may help you. [2] Finally, be sure to respect OrderedDict's invariants. You may find my notes useful. [3] -eric [1] (on default branch) make && ./python -m test.test_ordered_dict [2] "odict-speed.diff" on https://bugs.python.org/issue16991 [3] https://hg.python.org/cpython/file/default/Objects/odictobject.c#l1

On 12.12.15 11:27, Franklin? Lee wrote:
Did you try to replace standard OrderedDict implementation with your implementation and run tests? Be aware, some details of C implementation were changed, an new tests were added in 3.5.1. It is better to take 3.5.1 for testing. Both current implementations of OrderedDict tried to make OrderedDict threadsafe and don't left an OrderedDict in inconsistent state if hashing or comparison raise an exception. There is also an attempt to handle cases when an OrderedDict was changed passing over OrderedDict API (with direct calls of dict modifying methods: dict.__setitem__, dict.__delitem__, dict.update, etc). The performance of Python implementation doesn't matter much now (while it has the same computational complexity). As for C implementation, I can't forecast what approach will be faster.

On Dec 12, 2015, at 01:27, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Even if Tim is wrong about it being O(N) twice over, it's still O(N) once over, instead of O(1) like the current implementation. To remove the Kth element takes K compares, and N-K moves. Yes, that N-K can be one big memmove, which can reduce the constant factor by orders of magnitude... But it's still linear work that the linked list version doesn't have to do, and N/100 is still worse than 1. Plus, the K compares can't be similarly accelerated, and on average K is N/2. The current implementation, on the other hand, takes one hash lookup, one hash deletion, and two node pointer twiddles, which means it's O(1).
I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you. Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose?
No it won't. With a linked list, you only have to update a single hash entry when a node is inserted or deleted, because the rest of the nodes are unchanged. But with an array, you have to update half the hash entries, because half of the indices are shifted. (And, again, spreading out the work isn't going to help unless you can actually reduce the actual amount of work to something sublinear.)
There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start.
What would be the benefit of such a design? Unless it's a huge performance win, I don't see what your argument would be. (Sure, it's simpler to describe--but the implementation is more complex, so it probably doesn't serve as better example code.)
Honestly, I'd just leave that out of the preliminary tests. If everything else gets faster, then you can worry about how to make this faster as well.
Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.)
I actually thought about collapsing the linked list implementation in the same way. But I think there are many uses where a 2x slower lookup hurts a lot more than a 50% faster mutate helps. And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.)
Are you suggesting increasing every hash bucket in every dictionary by an extra pointer, just to speed up OrderedDict? That would definitely slow down regular dicts, and increase their memory use, and they get used a lot more often.

On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote:
All this is true for ordinal dict too. A hashtable needs extra storage, and iterating needs to skip unused entries. From time to time you need resize the storage and fill new storage with O(N) complexity. Due to unpredictability of hashes of strings and pointers, the number of collisions an resizings is not predicable, and a time of work can significantly vary from run to run.

On Dec 12, 2015, at 04:19, Serhiy Storchaka <storchaka@gmail.com> wrote:
That's not the same at all. The benefit of dict isn't that it's amortized, it's that it's amortized _constant_ instead of linear, which is a huge improvement, worth a bit of chunkiness. Going from linear to amortized linear, as the OP proposes, doesn't get you anything good for the cost. I think you may be forgetting that dict doesn't rehash "from time to time" as this proposal does, it only does it when you grow. And it expands exponentially rather than linearly, so even in the special case where 100% of your operations are inserts, it's still only going to happen a few times.

On 13.12.15 00:11, Andrew Barnert via Python-ideas wrote:
Either you or me misunderstood the OP proposition. An array of indices needs to be "compacted" (that costs O(n)) only after at least O(n) addition/deletion/moving operations. Therefore the amortized cost is constant in worst case. All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1] [1] https://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-spac...

On Dec 12, 2015, at 15:20, Serhiy Storchaka <storchaka@gmail.com> wrote:
You've got two halves of a process that both take N/2 time. If you turn one of those halves into amortized constant time, your total time is still linear. (Maybe in some cases you've made it twice as fast, but that's still linear--but at any rate, in this case, he hasn't really made it twice as fast, because the first half, the linear search, now has twice as much to search.) So he's adding linear chunkiness, without reducing the total time. That isn't the same as a normal dict, which adds only logarithmic chunkiness (it does its extra N work log N times, instead of extra N work N/C times), and in service of reducing the total time from linear to amortized constant. It's possible that there's some other optimization here that the OP didn't include (or that I didn't notice and reply to here) that you've intuitively assumed, which can get us back to constant time. If so, that would be great. (But if there were a way to get constant-time inserts and deletes at arbitrary locations within an array, I suspect nobody would have invented linked lists; am I missing something?)
All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1]
But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that). For example: >>> d = Dict(((1,1), (2,2), (3,3), (4,4))) >>> list(d) [1, 2, 3, 4] >>> del d[1] >>> list(d) [4, 2, 3] And constant time deletes (or moves) without destroying order, which is exactly the problem that the hash of linked list nodes in the current design is there to solve. If you don't care about that problem, you don't need that design, but if you're building a general replacement for OrderedDict, you need it (or something to replace it). Of course an ordered mapping that doesn't support deletions (or moves) could still be useful for lots of things (e.g., most of the time people say they want ordering for a **kw or a class dict, they don't need to delete or move anything).

I'm not following this thread all that closely, but am I the only one who thinks that the LRU cache (discussed in another thread) could be implemented with much less code on top of OrderedDict? Basically whenever you access a key you just move it to the front, and when you add a key when it is already at capacity you delete the first one. -- --Guido van Rossum (python.org/~guido)

On 13.12.15 02:27, Guido van Rossum wrote:
I have doubts about this. First, an ordered dictionary is only a part of the LRU cache implementation, using OrderedDict wouldn't make the code much less. Second, the LRU cache needs only small and very limited part of OrderedDict, with different requirements to reentrancy, thread-safety, and errors handling (partially more strong, partially more lenient). The code currently used by the LRU cache is much simpler, likely faster, and more bug-free.

On Sat, Dec 12, 2015 at 5:16 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Fair enough. Perhaps it would have made more sense if we had a C version of OrderedDict but not of lru_cache. Still, two things: (1) Anything written in Python that implements a linked list feels like a code smell to me (it's hard to make up for the constant factor with a better algorithm unless the data structure is pretty large). (2) If you want to implement something that's close to lru_cache but not quite (e.g. one where items also come with a timeout, like DNS lookups), you can't subclass its implementation, because it's all hidden in a function. But you can wrap a little bit of extra logic around the 10 or so lines it takes to implement a decent LRU cache on top of OrderedDict. -- --Guido van Rossum (python.org/~guido)

On 13.12.15 02:06, Andrew Barnert via Python-ideas wrote:
What is linear time about what you are saying? I don't see anything that has no amortized constant complexity. What I have missed?
All this looks similar to Raymond's proof-of-concept for a compact dictionary (ordering is a side effect). [1]
But his design only preserves ordering if you never delete (or move, but he doesn't have an API for that).
This is only because preserving ordering is not a goal for a dict. Moving the last entry to the place of deleted one is the simplest design if don't care about ordering. But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted.

On Dec 12, 2015, at 17:03, Serhiy Storchaka <storchaka@gmail.com> wrote:
Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time. He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here. He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below. What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations.
Sure. But it is the goal for OrderedDict. Which is exactly why a design that makes sense for dict doesn't necessarily make sense for OrderedDict, unless you add something else. The hash table of list nodes works as such a something else. But Raymond's array doesn't (and isn't intended to).
Moving the last entry to the place of deleted one is the simplest design if don't care about ordering.
But it is not hard to make the ordering be preserved (at the cost of larger amortized constant time and memory usage). Deleted entry should be just marked as deleted, and the storage should be packed only when a half (or 1/3, or 1/4) of entries is deleted.
The entire benefit of Raymond's design is that it lets you store, and iterate, a dense array instead of a sparse one. If you're going to make that array every sparser than the hash table, you've lost all of the benefits. (But you're still paying the cost of an extra index or pointer, extra dereference, etc.)

FWIW, I think PyPy's dict implementation is advanced version of Raymond's. More compact and preserves order on deletion. http://morepypy.blogspot.jp/2015/01/faster-more-memory-efficient-and-more.ht...

On 13.12.15 06:57, Andrew Barnert via Python-ideas wrote:
Let's go back to the original message. He started off talking about Tim Peters' post, which explains the need for a linked list because in an array, searching for the value takes linear time, and then removing it in-place takes linear time.
A linked list searching for the value takes linear time too. To avoid it, an auxiliary hashtable is used. In current Python implementation it maps key to a list node, in proposed implementation it maps key to an index in a continuous array. No linear search anymore, this problem already is solved.
He suggested fixing that by batching up the deletes. That solves the cost of removing, but it doesn't solve the problem with the linear search at all--in fact, it makes things worse, because you're now searching a sparse array. That's what I was commenting on here.
He later suggested maybe using a second hash for indices, as the current design does. This of course does solve the problem of the search, but then you have to change half the hash values for each shift, which makes that part even worse. I commented on that further below.
What do you mean saying about "change half the hash values for each shift" and why this is a problem?
What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations.
Let's distinguish between two scenarios. In the first scenario, the dictionary is only growing. Every O(N) additions you need to resize and rebuild a hashtable of size O(N). This gives amortized constant time of addition. In proposed OrderedDict design you need also resize an array and resize and rebuild an auxiliary hashtable, both have linear complexity. Resulting amortized time is constant again. In the second scenario, the dictionary has equal number of additions and deletions, and its size is not changed. For a dict it dousn't add anything to base cost of addition and deletion. In proposed OrderedDict design, since almost every addition append an item to an array, periodical repacking is needed. It has linear cost (repacking an array and updating an auxiliary hashtable). Total amortized cost of one additional or deletion operations is constant. The implementation with continuous array has about the same complexity as the implementation with linked list. The difference is only in constant multiplier, and without real C code we can't determine what constant is lower. Actually the current C implementation of OrderedDict uses an continuous array for mapping an index in the base hashtable to a list node. It is rebuild from a linked list when the base hashtable is rebuild. I where planned to experiment with getting rid of linked list at all.
The benefit of Raymond's design is that it lets you iterate a continuous array instead of jump forward and backward on randomly distributed in memory list nodes. And on modern computers iterating even sparse array can be faster that iterating linked list. [*] This is not the only benefit of Raymond's design, the other is presumably more compact representation. [*] Of course needed benchmarks to prove this.

On Dec 14, 2015, at 06:58, Serhiy Storchaka <storchaka@gmail.com> wrote:
This is already covered in the very next paragraph.
If you delete a linked list node, none of the other list nodes change. So, if you have hash entries pointing at all of the list nodes, only one hash entry has to be changed for a delete. If you delete an array element, all of the subsequent indices change. So, if you have hash entries pointing at the indices, on average, half of the hash entries have to be changed for a delete. And if you're about to say "but combining the two solves that problem", that's already in the very next paragraph, so please read on.
What about combining the two? He didn't mention that, but if you do that, every time you compact the array, you have to do a hash rebuild. Yes, a normal dict does this, but only logarithmically often, not every N/2 operations.
Let's distinguish between two scenarios. In the first scenario, the dictionary is only growing. Every O(N) additions you need to resize and rebuild a hashtable of size O(N).
If that were true, dicts would be amortized linear time. You do O(N) work O(N) times, divided over O(N) operations, which means amortized O(N*N/N), which is linear. The fact that dict (and list) grow exponentially rather than linearly is critical. It means that, even in the case of nothing but inserts, you only rehash O(log N) times, not O(N), and the work done each time also only grows logarithmically. The total work done for all resizes ends up being just a constant times N (think about the sum of 1 + 1/2 + 1/4 + 1/8 + ... vs. 1 + 9/10 + 8/10 + ...), so, divided by N operations, you get amortized O(C*N/N), which is constant.
This gives amortized constant time of addition. In proposed OrderedDict design you need also resize an array and resize and rebuild an auxiliary hashtable, both have linear complexity. Resulting amortized time is constant again.
Sure, but this just shows that the fact that deletes are linear time doesn't matter if you never do any deletes.
In the second scenario, the dictionary has equal number of additions and deletions, and its size is not changed. For a dict it dousn't add anything to base cost of addition and deletion. In proposed OrderedDict design, since almost every addition append an item to an array, periodical repacking is needed. It has linear cost (repacking an array and updating an auxiliary hashtable). Total amortized cost of one additional or deletion operations is constant.
No, deletion operations do O(N) work O(M) times amortized over M, where N is the dict size and M the number of operations. It's linear in the max size of the dict. In the special case where the dict never changes size, you could just declare that a constant--but that's pretty misleading. For one thing, it's usually going to be a very large constant. For example, if you start with 1000 elements and do 2000 deletes and 2000 inserts, your constant is N/4. More importantly, a constant-sized dict with lots of mutations is a pretty rare use case, and in the more common case where there are more inserts than deletes, so the dict is growing, the delete times are increasing with the dict size. Unless the ratio of deletes to inserts is falling at the same rate (another rare special case), this isn't going to be constant. And if you're wondering how dict gets away with this, it's kind of cheating by just not generally shrinking on deletes, but that's a cheat that works just fine for the majority of real-life use cases (and it's hard to work around in the vast majority of cases where it doesn't). That obviously won't work here--if a dict later regrows, it automatically uses up all the deleted buckets, but if an array-based OrderedDict later regrows, it will just append onto the end, leaving all the holes behind. You have no choice but to compact those holes at some point, and there's no way to make that constant time in an array.
The implementation with continuous array has about the same complexity as the implementation with linked list. The difference is only in constant multiplier, and without real C code we can't determine what constant is lower.
If your argument were correct, you could always simulate a linked list with an array with only a constant-factor cost, and that just isn't true.
The benefit of Raymond's design over the existing dict, as he explains in the ActiveState post, is that you can iterate (and store some of the data in) the dense array rather than iterating the hash table (which is a sparse array). He can maintain that denseness even in the face of deletions because he isn't trying to preserve order, so he can always just swap the last value into the deleted slot. If you try to use the same design to preserve order, you can't do that. If you instead mark deleted slots as deleted, and recompact every time it gets to 50% deleted, then you're iterating a sparse array that's more sparse than the hash table, so you do not get the time (or space) benefits over a dict that Raymond's design offers.
And on modern computers iterating even sparse array can be faster that iterating linked list. [*]
Now you're proposing a completely different theoretical benefit from the one Raymond designed for and explained, so the analogy with his design doesn't buy you anything. You'd be better off sticking to analogy with the current plain dict, which, like your design, iterates over a sparse array, and is simpler to explain and analyze.
This is not the only benefit of Raymond's design, the other is presumably more compact representation.
Yes, if you can move each 48-byte hash bucket into a dense array, leaving only a 4-byte integer in the sparse one in the hash table, then you save 48 * load - 4 bytes per entry. But the design under discussion doesn't get that, because you're putting the large data in an array that's even sparser than the hash table, so you're losing space. One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc.

Despite all the nice discussion about runtime and so on and so forth. Couldn't he just provide a working implementation to prove his point? Maybe, he finds out himself that it's not working as intended and in the course of doing so, he finds an even better solution. Best, Sven On 14.12.2015 18:30, Andrew Barnert via Python-ideas wrote:

On Dec 14, 2015 1:11 PM, "Sven R. Kunze" <srkunze@mail.de> wrote:
Despite all the nice discussion about runtime and so on and so forth.
Couldn't he just provide a working implementation to prove his point? Maybe, he finds out himself that it's not working as intended and in the course of doing so, he finds an even better solution.
I did. It is in a pastebin link in my original message, based on the 3.5.0 Python (not C) version of OrderedDict. I was hoping for guidance on evaluating it. Maybe it wasn't seen because I used '-'*n to separate it from my intro, or maybe pastebin is so disliked that people couldn't see it. Here it is again: http://pastebin.com/LESRktJw Sorry I haven't been responding. I wanted to respond on a PC instead of my phone because I don't see a plaintext option on the app, but I get distracted when on the PC. My implementation combines the ideas I listed. Raymond's idea seems similar, but more clever with indexing and less focused on ordering. Regarding PyPy's, I get the impression that CPython can make an efficient implementation based on it. I think that's the real way to go for an efficient OrderedDict implementation. By the way, here are the costs for some of the algorithms, compared to the raw Python OrderedDict: - getitem: unchanged - setitem: Amortized O(1). but for slightly different reasons (list expansion in addition to expansion of two maps). If new key, it will usually be better: no creation of a new object, and a lot less constant work in bytecode if the list isn't expanded (incrementing and appending instead of linking). - delitem: Amortized O(1). A lot less bytecode overhead as above, but there can be a compaction here. - (iteration): Iterating over a 50% empty array may be better than iterating over a linked list, especially if cache is taken into account. - Python version: For a 50% empty array, you pay len(odict) extra "if"s and "is"s, plus the cost of iterating over a (reversed, for reverse iteration) list of length 2*len(odict). The list version has a lot of extra dereferences instead. - Potential C version: It's just a simple for loop with a check of pointer identity, versus a chain of dereferences. It should definitely be better on the cache, maybe less good for branch prediction. - pop: Same as delitem: possible compaction. - move_to_end: Moving to the back is less constant work (though I do a lot of checks, and the C version might not have the same benefits). Moving to the front is awful: it shifts everything up. This can be alleviated by using a deque, or using the list as a list-backed deque: treat the array as circular. This would add overhead to everything else, though, so I don't like it. (How do I find out how popular the second parameter to move_to_end is?) If using my second idea of storing inner_dict[key] = [key, value, index] (where index is into the order array), there would be a lot less overhead on many things, but more overhead on getitem. In C code, it might be barely any overhead. On Sat, Dec 12, 2015 at 6:06 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start.
I, uh, have no professional experience in programming. So it's not as easy for me to know how to start.
Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.)
When iterating the array, here are the costs per element: - used: Same as linked list, but without the linkedlist overhead. - unused: Check if the entry is valid. This doesn't require dereferencing. It's possible that the branch for checking validity is higher than the linkedlist overhead, but that's the only way I can think of that the array would be worse at iterating, even when sparse.
And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.)
I'm pretty sure that all C code uses the C version of __getitem__ on the dict, because it's NOT statically known which __getitem__ function to call. There's a pointer in each dict to its __getitem__ function. On Sat, Dec 12, 2015 at 7:34 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
The performance of Python implementation doesn't matter much now (while it has the same computational complexity).
It might matter a little, for alternative Python implementations which haven't yet made their own optimized ODicts. On Mon, Dec 14, 2015 at 12:30 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc.
I think it's much better to just live with a half-empty array. But this would have to be proven with code and benchmarks. As an aside, have you seen Stroustrup and Sutter talking about linked lists versus vectors? - Stroustrup: https://www.youtube.com/watch?v=YQs6IC-vgmo - Sutter: https://channel9.msdn.com/Events/Build/2014/2-661 Here's an example: - Vector: Linear search for an element, then remove it by shifting everything down. - List: Linear search for an element, then remove it by delinking. Vector will beat list for even very large sizes, simply because the dereferencing in the search will be worse than the shifting (which would take advantage of prefetching, branch prediction, etc.). For iteration of an OrderedDict based on an array, there won't be a branch prediction benefit, since the holes will be distributed solely based on how it's used. There might not even be a prefetching benefit, usually, because of how many objects might be involved during the iteration (that is, more memory is touched than with equivalent C++ code). But that's what testing is for, I guess.

In a message of Tue, 15 Dec 2015 04:09:55 -0500, "Franklin? Lee" writes:
That is the problem. We absolutely do not want links to things like pastebin. We want the code here, as part of the text. 5 years from now, when other people are scratching their heads saying, I wonder why Guido decided things the way he did, and whether that decision can and should be revisited, the first thing we will do is to go back and read all this discussion. And if the discussion is about code we can no longer see, because the pastebin has expired, then we won't be able to learn much. Anything that matters needs to be part of the archived discussion. Laura

On Tue, Dec 15, 2015 at 9:53 AM, Laura Creighton <lac@openend.se> wrote:
I feel similarly about information history, which is why I always set "Expire: Never" when I use pastebin :D. But alright. The rest of this message is the code as I had it when I sent the original message. from collections.abc import * from reprlib import recursive_repr as _recursive_repr from collections import OrderedDict class _ListDictKeysView(KeysView): def __reversed__(self): yield from reversed(self._mapping) class _ListDictItemsView(ItemsView): def __reversed__(self): for key in reversed(self._mapping): yield (key, self._mapping[key]) class _ListDictValuesView(ValuesView): def __reversed__(self): for key in reversed(self._mapping): yield self._mapping[key] _sentinel = object() class ListDict(dict): 'Dictionary that remembers insertion order' # An inherited dict maps keys to values. # The inherited dict provides __getitem__, __len__, __contains__, and get. # The remaining methods are order-aware. # Big-O running times for all methods are the same as regular dictionaries. def __init__(*args, **kwds): '''Initialize an ordered dictionary. The signature is the same as regular dictionaries, but keyword arguments are not recommended because their insertion order is arbitrary. ''' if not args: raise TypeError("descriptor '__init__' of 'ListDict' object " "needs an argument") self, *args = args if len(args) > 1: raise TypeError('expected at most 1 arguments, got %d' % len(args)) try: # self.__root self.__list except AttributeError: self.__map = {} self.__list = [] self.__size = 0 self.__update(*args, **kwds) def __setitem__(self, key, value, dict_setitem=dict.__setitem__, len=len): 'od.__setitem__(i, y) <==> od[i]=y' # If it's a new key, we need to track it. if key not in self: self.__map[key] = len(self.__list) self.__list.append(key) self.__size += 1 dict_setitem(self, key, value) def __delitem__(self, key, dict_delitem=dict.__delitem__, sentinel=_sentinel): 'od.__delitem__(y) <==> del od[y]' dict_delitem(self, key) # Remove the tracking for this item index = self.__map.pop(key) self.__list[index] = sentinel self.__size -= 1 self.__compact() def __iter__(self, sentinel=_sentinel): 'od.__iter__() <==> iter(od)' for key in self.__list: if key is not sentinel: yield key def __reversed__(self, sentinel=_sentinel, reversed=reversed): 'od.__reversed__() <==> reversed(od)' for key in reversed(self.__list): if key is not sentinel: yield key def clear(self): 'od.clear() -> None. Remove all items from od.' self.__list.clear() self.__map.clear() self.__size = 0 dict.clear(self) # dict.clear isn't cached? def popitem(self, last=True, sentinel=_sentinel, reversed=reversed, next=next): '''od.popitem() -> (k, v), return and remove a (key, value) pair. Pairs are returned in LIFO order if last is true or FIFO order if false. ''' if not self: raise KeyError('dictionary is empty') if last: lst = reversed(self.__list) else: lst = self.__list # Could use the key lookup to find this, but... meh. # Note that attempting to compact first might have helped. index, key = next((i, k) for i, k in enumerate(lst) if k is not sentinel) # We're calling dict.pop later, which won't handle # the metadata. del self.__map[key] self.__list[index] = sentinel self.__size -= 1 self.__compact() value = dict.pop(self, key) # dict.pop isn't cached? return key, value def __compact(self, sentinel=_sentinel, enumerate=enumerate, reversed=reversed): ''' Compact the order __list if necessary. ''' # May need to use this a lot in the upcoming `else`. lst = self.__list if not lst: return if self.__size / len(lst) <= 0.5: #chosen at non-random # Implementation 1: list comprehension self.__list = [k for k in lst if k is not sentinel] # Implementation 2: # If only `list` had a `remove_all` method... pass ''' Update all indices after a reordering. Should only be done when full (because it shouldn't be necessary otherwise). ''' inner_map = self.__map for index, key in enumerate(self.__list): inner_map[key] = index else: # Even if the list isn't mostly empty, # we can try to clear the back. # TODO: How can this be more efficient? # Note: There exists a non-sentinel because # otherwise, .__size/positive == 0 < positive. # # Implementation 1: Pop until it drops. # while lst[-1] is sentinel: # lst.pop() # Implementation 2: Count the number of sentinels at the end. emptys = next(i for i, k in enumerate(reversed(lst)) if k is not sentinel) # guaranteed not to StopIteration since .__size > 0 del lst[:-emptys] #safe even if 0 def move_to_end(self, key, last=True, sentinel=_sentinel, enumerate=enumerate, len=len): '''Move an existing element to the end (or beginning if last==False). Raises KeyError if the element does not exist. When last=True, acts like a fast version of self[key]=self.pop(key). ''' index = self.__map[key] lst = self.__list if last: if index + 1 == len(lst): # already last # Not sure if this is the right path to optimize. # But I think redundant move_to_ends shouldn't # blow up the __list. return lst[index] = sentinel if lst[-1] is sentinel: # can just swap with last lst[-1] = key self.__map[key] = len(lst) - 1 else: # append and maybe compact lst[index] = sentinel lst.append(key) self.__map[key] = len(lst) - 1 self.__compact() else: # This is costly. But this shouldn't # be a common case anyway, right? # I mean, who repeatedly adds to the front # of an OrderedDict? # And this is basically the only costly # operation I'm adding. lst[index] = sentinel # Propagate forward from the front. for i, newkey in enumerate(lst): self.__map[key] = i lst[i], key = key, newkey if key is sentinel: break def __sizeof__(self): sizeof = _sys.getsizeof size = sizeof(self.__dict__) # instance dictionary size += sizeof(self.__map) * 2 # internal dict and inherited dict size += sizeof(self.__list) size += sizeof(self.__size) return size update = __update = MutableMapping.update def keys(self): "D.keys() -> a set-like object providing a view on D's keys" return _ListDictKeysView(self) def items(self): "D.items() -> a set-like object providing a view on D's items" return _ListDictItemsView(self) def values(self): "D.values() -> an object providing a view on D's values" return _ListDictValuesView(self) __ne__ = MutableMapping.__ne__ __marker = object() def pop(self, key, default=__marker): '''od.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised. ''' if key in self: result = self[key] del self[key] return result if default is self.__marker: raise KeyError(key) return default def setdefault(self, key, default=None): 'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od' if key in self: return self[key] self[key] = default return default @_recursive_repr() def __repr__(self): 'od.__repr__() <==> repr(od)' if not self: return '%s()' % (self.__class__.__name__,) return '%s(%r)' % (self.__class__.__name__, list(self.items())) def __reduce__(self): 'Return state information for pickling' inst_dict = vars(self).copy() for k in vars(ListDict()): inst_dict.pop(k, None) return self.__class__, (), inst_dict or None, None, iter(self.items()) def copy(self): 'od.copy() -> a shallow copy of od' return self.__class__(self) @classmethod def fromkeys(cls, iterable, value=None): '''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S. If not specified, the value defaults to None. ''' self = cls() for key in iterable: self[key] = value return self def __eq__(self, other): '''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive while comparison to a regular mapping is order-insensitive. ''' if isinstance(other, ListDict): return dict.__eq__(self, other) and all(map(_eq, self, other)) return dict.__eq__(self, other)

Wow. Thanks for that. :) Well, is there some performance case scenario? Something which tests all possible interactions with OrderedDict which could be used as the authoritative Benchmark on this. Otherwise, I fear, it's just talking about possible possibilities nobody really evaluated properly. Best, Sven On 15.12.2015 16:00, Franklin? Lee wrote:

Franklin? Lee wrote:
I feel similarly about information history, which is why I always set "Expire: Never" when I use pastebin :D.
Never is a long time. In this context it really means "as long as pastebin itself continues to exist", which could be less than the time the python mailing list archives continue to exist. -- Greg

On Wed, Dec 16, 2015 at 8:07 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Or, more simply: As long as pastebin is accessible. It's entirely possible to create an offline archive of python-ideas posts (whether you see it as a mailing list or a newsgroup), and then to browse it at a time when you have no internet connection. That archive will be incomplete to the value of all externally-linked content. ChrisA

On Tue, Dec 15, 2015 at 7:00 AM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
But alright. The rest of this message is the code as I had it when I sent the original message.
Be sure to run the full OrderedDict test suite. [1] Also, I posted a rudimentary benchmark when working on the C implementation of OrdereDict that may help you. [2] Finally, be sure to respect OrderedDict's invariants. You may find my notes useful. [3] -eric [1] (on default branch) make && ./python -m test.test_ordered_dict [2] "odict-speed.diff" on https://bugs.python.org/issue16991 [3] https://hg.python.org/cpython/file/default/Objects/odictobject.c#l1

On 12.12.15 11:27, Franklin? Lee wrote:
Did you try to replace standard OrderedDict implementation with your implementation and run tests? Be aware, some details of C implementation were changed, an new tests were added in 3.5.1. It is better to take 3.5.1 for testing. Both current implementations of OrderedDict tried to make OrderedDict threadsafe and don't left an OrderedDict in inconsistent state if hashing or comparison raise an exception. There is also an attempt to handle cases when an OrderedDict was changed passing over OrderedDict API (with direct calls of dict modifying methods: dict.__setitem__, dict.__delitem__, dict.update, etc). The performance of Python implementation doesn't matter much now (while it has the same computational complexity). As for C implementation, I can't forecast what approach will be faster.
participants (10)
-
Andrew Barnert
-
Chris Angelico
-
Eric Snow
-
Franklin? Lee
-
Greg Ewing
-
Guido van Rossum
-
INADA Naoki
-
Laura Creighton
-
Serhiy Storchaka
-
Sven R. Kunze