Large Dictionaries

Duncan Booth duncan.booth at invalid.invalid
Tue May 16 15:03:05 CEST 2006


jantod at gmail.com wrote:

> BTW why are python dicts implemented as hash tables and not judy
> arrays? 
> 
Probably because:

Python predates judy arrays.
Nobody has proposed replacing the current dict implementation.
Nobody has demonstrated that it would actually be an advantage to
replace the current implementation. 
Or maybe Python dicts are just more flexible?

You choose.

A quick scan of the Judy IV Shop Manual reveals:

> Judy is a dessert topping and a floor wax, but it’s not intended for
> use as a house paint. 
...
> Judy is a programming library that provides a relatively simple
> interface (API) for array-like storage of word or string indexes with
> optional mapping of indexes to single-word values.

It doesn't sound to me as though they match the flexibility of Python
dictionaries. Without reading further through the documentation, the
implication is that keys are restricted to simple word or string values.
Remember, all that Python requires of a dict key is that it has a hash
value and can be compared for equal/not equal with other keys. It
doesn't require the key to be expressible as a sequence of bits which is
what the Judy manual seems to imply: 

> You can think of digital trees as peeling (decoding) leading bits off
> a key until only one bit is left, but in the case of an unbounded
> variable-size key there is no definite “bottom” (that is, a definite
> last bit or maximum length for every key). However, there are always
> unpopulated subexpanses, except with a fixed-size key where every
> possible key value is stored in the data structure. When decoding keys
> top-down in this way, each (sub)expanse is defined by the bits already
> decoded and the number of bits remaining (if finite). 



More information about the Python-list mailing list