[Python-ideas] `OrderedDict.items().__getitem__`

spir denis.spir at gmail.com
Thu Jan 16 11:20:08 CET 2014


On 01/11/2014 03:36 PM, Chris Angelico wrote:
> On Sun, Jan 12, 2014 at 1:18 AM, Ram Rachum <ram.rachum at gmail.com> wrote:
>> I think that `OrderedDict.items().__getitem__` should be implemented, to
>> solve this ugliness:
>>
>> http://stackoverflow.com/questions/21062781/shortest-way-to-get-first-item-of-ordereddict-in-python-3
>>
>> What do you think?
>
> Well, the first problem with that is that __getitem__ already exists,
> and it's dict-style :) So you can't fetch out an item by its position
> that way. But suppose you create a method that returns the Nth
> element.
>
> The implementation in CPython 3.4 is a linked list, so getting an
> arbitrary element by index would be quite inefficient. [...]

I have occasionnally implemented ordered sets or associative arrays in a really 
simple, stupid manner [1]: just store the items or entries (pairs) in an array 
(instead of, say, anywhere in memory at the allocator's convenience), in 
addition to the usual array of "buckets".

About efficiency, there is a kind of balance of benefits & costs: In average, 
common operations should in principle be faster due to memory compacity and 
consequent cache usage. No cost in memory size (entries must be somewhere 
anyway). There is a little cost when the whole data structure grows, since now 2 
arrays have to be resized up; to mitigate, i provide a 'predim' (predimension) 
method that avoids most growing operations.

The point is that now entries form an array that keeps insertion order, can be 
traversed and even indexed. An issue, however, like for any flexible-size array, 
is with item deletion. I don't delete at once, which would require compacting 
the array everytime an entry is removed, instead just mark entries as deleted. 
Whenever a big proportion of items are removed [2], there is automatic 
compaction. But with such a trick, indexing in then invalid (and prevented); to 
retrieve this indexing feature, if items have been deleted, clients must first 
run a 'compact' method, that actually removes all deleted items at once (and if 
many worth it resizes down). For traversal however, there is no issue: the 
implementation just needs to skip items marked as deleted.

I have no idea whether such a stupid way to make ordered sets/dicts is 
compatible with the present requirements or implementation for python's 
ordereddicts (but suspect it is not). And I guess the constraint on indexing 
does not really fit the python way, in that an implementation constraint leaks 
into the client interface. Just wanted however to say a few words about that 
scheme due its simplicity and practicality. Comments welcome.

Denis

[1] The common need is usually for what I call "mod tables", used as symbol 
tables: a mod table is like a hash table, but with keys beeing unsigned ints, 
thus there is no hash, instead plain modulo. This makes a sort of sparse array, 
but ordered. Numeric keys actually represent interned strings which themselves 
are keys in symbol tables (scopes, namespaces...).

[2] To avoid a kind of threashold effect, compaction happens when the count of 
items is less than 3/8 of capacity, not half of it. There must be an hysteresis 
(difference of threashold to resize up versus down) to avoid instability in the 
hypothetical case where the count of items is close to a resizing capacity and 
items are constantly beeing put & removed.


More information about the Python-ideas mailing list