[Python-ideas] Contiguous-array-based ordering for OrderedDict
Serhiy Storchaka
storchaka at gmail.com
Sat Dec 12 07:19:08 EST 2015
On 12.12.15 13:06, Andrew Barnert via Python-ideas wrote:
> On Dec 12, 2015, at 01:27, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
>> 2. You don't have to shift for each deletion. You can wait until some
>> threshold is reached before shifting, and maybe that will spread out
>> the cost of shift among deletions enough to make an impact.
>
> I don't think this would help nearly as much as you think. Keeping up to half the array for deleted slots also makes things more complex, doubles the extra storage. But, worst of all, whatever savings you get in the shift time (minus the increased time for the extra logic) are going to be lost in the increased search time: if the array is twice as big, the Kth real element is at 2K. So, instead of K + (N-K), you have 2K + x(N - K), and no matter how good that x speedup is, the 2K part is going to kill you.
>
> Plus, an API that spreads out about the same work but does it in large batches is less usable. Imagine that you had two functions, one which always takes 1.1ms, one which usually takes 100us but every 1000 times it takes 1s (and freezes up the interpreter whole doing so). Which one would you choose?
All this is true for ordinal dict too. A hashtable needs extra storage,
and iterating needs to skip unused entries. From time to time you need
resize the storage and fill new storage with O(N) complexity. Due to
unpredictability of hashes of strings and pointers, the number of
collisions an resizings is not predicable, and a time of work can
significantly vary from run to run.
More information about the Python-ideas
mailing list