PEP 372 -- Adding an ordered directory to collections

dbpokorny at gmail.com dbpokorny at gmail.com
Wed Jun 18 17:58:04 EDT 2008


Wow, I was completely wrong about sorted dicts and odicts.

On Jun 17, 4:21 am, bearophileH... at lycos.com wrote:
> mean. I think for this data structure it's important to keep all the
> normal dict operations at the same speed. If you use a C

Why keep the normal dict operations at the same speed? There is a
substantial cost this entails.

It seems that some odict solutions take up 4n words per entry in the
limit (ignoring the fixed percentage growth space for lists and
dicts). This is n words for _keys and 3n words for the underlying
dict. What about storing the key:value pairs as a pair of lists (maybe
interleaved)? Key lookup then becomes an O(n) operation, but the
storage requirements are reduced to 2n from 4n.

Here is a hypothetical (somewhat contrived) example: odicts are used
to store the attributes of XML elements in a medium-sized XML document
(say 250K). If the required munging does some surgery on elements 3-4
layers deep in a few places, then the number of key lookups may be
relatively small. (Assuming that very few odicts will have more than
30 entries, each individual lookup will be fairly quick as well). The
document has to be loaded into memory anyway, so the extra cost of key
lookups is more than compensated for by the substantial reduction in
the number of cache line misses.

The main concern is that doubling the size of the data structure in
order to turn key lookup into an O(1) operation may give worse
performance in the common case (this is a personal impression from
reading the use case list in the rationale), and a data structure like
this seems to be way off by itself in the time-space complexity
landscape.

In this case the odict no longer inherits from the dict.

Cheers,
David



More information about the Python-list mailing list