Populating a dictionary, fast [SOLVED]

Francesc Altet faltet at carabos.com
Mon Nov 19 20:33:53 CET 2007


A Wednesday 14 November 2007, Francesc Altet escrigué:
> I think I've got messed on some benchmarks that I've done on that
> subject some time ago, but either my memory is bad or I've made some
> mistake on those experiments.  My apologies.

Just for the record.  I was unable to stop thinking about this, and 
after some investigation, I guess that my rememberings were gathered 
from some benchmarks with code in Pyrex (i.e. pretty near to C speed).

In the attached benchmark, I compare the speed of dictionary lookups in 
a big dictionary (more than 8 millions of entries) against doing the 
same, but using a binary search approach.  Here are the results:

$ python2.5 gq-binary-search.py
Creating the dictionary...
Time for dict creation: 13.001
Calibrating loop...
Calibrating time: 3.095
Timing queries with a dict...
Time for dict query: 7.747
Timing queries with binary search...
Time for binary queries: 3.551

so, a dictionary lookup takes around 950 ns, while a binary search 
lookup takes around 380 ns, i.e. binary searches are more than 2x 
faster than dictionary lookups, at least in this benchmark.

It might well be that the difference is due to the fact that the binary 
lookup in this case is made for only 64-bit integers, while the 
dictionary code has to check the type of index data first.  OTOH, it is 
also apparent that the binary code can be optimized for cache misses by 
using two levels of indirection for binary lookups, but this is too 
much work for today ;).  At any rate, it seems rather clear that binary 
lookups can be competive against hashes, when using Pyrex at least!

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gq-binary-search.py
Type: application/x-python
Size: 1704 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20071119/4feec254/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup.py
Type: application/x-python
Size: 360 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20071119/4feec254/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: query_ext.pyx
Type: text/x-c++src
Size: 1124 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20071119/4feec254/attachment.c>


More information about the Python-list mailing list