[issue9520] Add Patricia Trie high performance container

Antoine Pitrou report at bugs.python.org
Sat Aug 14 13:25:51 CEST 2010


Antoine Pitrou <pitrou at free.fr> added the comment:

So, here is the modified benchmark. It first creates a cache of the random wordlist, because it is quite long to compute (for N up to 10000000). The cache is reused in subsequent runs. This takes some memory though (probably won't run it if you have less than 2GB). The cache itself is quite small on-disk (around 80 MB).

I'm outputting the total dict size and the space occupied per individual key, and I've also timed lookups separately. Here are results on my machine:

    1000 words (     961 keys), 3269137 words/s, 16980987 lookups/s, 51 bytes/key (0.0MB)
   10000 words (    9042 keys), 2697648 words/s, 13680052 lookups/s, 87 bytes/key (0.8MB)
  100000 words (   83168 keys), 2462269 words/s, 6956074 lookups/s, 37 bytes/key (3.0MB)
  500000 words (  389442 keys), 1802431 words/s, 4733774 lookups/s, 64 bytes/key (24.0MB)
 1000000 words (  755372 keys), 1702130 words/s, 4485229 lookups/s, 66 bytes/key (48.0MB)
 2000000 words ( 1463359 keys), 1616658 words/s, 4251021 lookups/s, 68 bytes/key (96.0MB)
 5000000 words ( 3501140 keys), 1608889 words/s, 3909212 lookups/s, 57 bytes/key (192.0MB)
10000000 words ( 6764089 keys), 1531136 words/s, 3600395 lookups/s, 59 bytes/key (384.0MB)

As you can see, the O(1) behaviour seems almost observed up to the 10000000 words limit (given the way the data is computed, I'm afraid I can't go further than that). Of course, very small dicts are faster because everything fits in the CPU caches.

----------
Added file: http://bugs.python.org/file18520/dcbench-py3k.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9520>
_______________________________________


More information about the Python-bugs-list mailing list