splitting a large dictionary into smaller ones
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Mon Mar 23 03:31:54 EDT 2009
On Sun, 22 Mar 2009 23:10:21 -0400, Terry Reedy wrote:
> Searching for a key in, say, 10 dicts will be slower than searching for
> it in just one. The only reason I would do this would be if the dict
> had to be split, say over several machines. But then, you could query
> them in parallel.
That surely depends on the number of collisions? With millions of keys,
the number of collisions could be high (although it almost certainly
won't be). As I understand it, searching a dict is O(1) on average, but
if there are collisions it can degrade to O(m) (where m is the number of
items colliding for some hash). In principle, m could be large.
Hypothetically speaking, if you could predict where collisions occur and
split your data into multiple smaller dicts with fewer collisions, you
could delegate to one of the smaller dicts and reduce O(m) lookups to
2*O(1) lookups, which of course is just O(1).
Of course, this entirely assumes that collisions in Python's hash
function are both predictable and frequent in the OP's data. If they're
not both, then it's unlikely that this suggestion will be of practical
use. Could be an interesting experiment though :)
--
Steven
More information about the Python-list
mailing list