
On Nov 4, 2017, at 7:04 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
When I asked Damien George about this for MicroPython, he indicated that they'd have to choose between guaranteed order and O(1) lookups given their current dict implementation. That surprised me a bit (since PyPy and CPython both *saved* memory by switching to their guaranteed order implementations, hence the name "compact dict representation"), but my (admittedly vague) understand is that the presence of a space/speed trade-off in their case has something to do with MicroPython deliberately running with a much higher chance of hash collisions in general (since the data sets it deals with are naturally smaller).
So if we make the change, MicroPython will likely go along with it, but it may mean that dict lookups there become O(N), and folks will be relying on "N" being consistently small due to memory constraints (but some typically O(N) algorithms will still become O(N^2) when run on MicroPython).
I don't think that situation should change the decision, but I do think it would be helpful if folks that understand CPython's dict implementation could take a look at MicroPython's dict implementation and see if it might be possible for them to avoid having to make that trade-off and instead be able to use a naturally insertion ordered hashmap implementation.
I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering. The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search. However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered. See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a... Pretty much any implementation hashed lookup of keys and values is amenable to being compact and ordered. Whatever existing logic that looks up an entry becomes a lookup into a table of indices which in turn references a sequential array of keys and values. This logic is independent of hashing scheme or density, and it has no effect on the number of probes or collision rate. The cost is an extra level of indirection and an extra array of indices (typically very small). The benefit is faster iteration over the smaller dense key/value array, overall memory savings resulting in improved cache utilization, and the side-effect of remembering insertion order. Summary: I think MicroPython will be just fine and if needed I will help create the patch that implements compact-and-ordered behavior. Raymond