Populating a dictionary, fast [SOLVED]
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Tue Nov 13 17:31:57 EST 2007
On Tue, 13 Nov 2007 17:27:11 +0100, Francesc Altet wrote:
> I don't know exactly why do you need a dictionary for keeping the data,
> but in case you want ultra-fast access to values, there is no
> replacement for keeping a sorted list of keys and a list with the
> original indices to values, and the proper list of values. Then, to
> access a value, you only have to do a binary search on the sorted list,
> another lookup in the original indices list and then go straight to the
> value in the value list. This should be the faster approach I can think
> of.
Maybe on Bizarro World, but not on Planet Earth.
Searching a dict takes on average a constant amount of time, regardless
of the key and the size of the dict. Because dicts are the basis of
Python's namespaces, it is optimized up the wazoo, and being implemented
in C it is extremely fast.
Using your suggestion, to get a value: you have to do a binary search
which is O(log N) instead of O(1), then ANOTHER lookup into a list of
indices, and finally a THIRD lookup to actually retrieve the value -- all
implemented in relatively slow Python instead of fast C.
And let's not even discuss the overhead of keeping these three lists
synchronized.
But don't take my word for it: measure for yourself. I'm not even going
to attempt to code your suggestion, but here's how you measure the time
it takes for dictionary lookup.
# Create a dataset to search on.
D = {}
chars = "abcdefghijklmnopqrstuvwxyz"
triplets = (a+b+c for a in chars for b in chars for c in chars)
for i, s in enumerate(triplets):
D[s] = i # D is of the form {'abc': 12345}
# Now time a successful search for an arbitrary triplet.
import timeit
min(timeit.Timer("D['foo']", "from __main__ import D").repeat(6))
On my PC, it takes about 0.2 seconds per million searches.
--
Steven.
More information about the Python-list
mailing list