optimizing large dictionaries

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Jan 15 23:49:29 CET 2009

Matimus, your suggestions are all good.

Try-except is slower than:
if x in adict: ... else: ...
A defaultdict is generally faster (there are some conditions when it's
not faster, but they aren't much common. I think it's when the ratio
of duplicates is really low), creating just a tuple instead of a class
helps a lot, and when the CPU/OS allow it, Psyco too may help some

If the resulting speed isn't enough yet, consider that Python dicts
are quite fast, so you may need lot of care to write D/C/C++/Clisp
code that's faster for this problem.

I also suspect that when they become very large, Python dicts lose
some of their efficiency. If this is true, you may switch to a new
dictionary every chunk of file, and then merge the dicts at the end. I
don't actually know if this may speed up your Python code even more
(if you experiment this, I'd like to know if it's false).


More information about the Python-list mailing list