optimizing large dictionaries

Thu Jan 15 17:49:29 EST 2009

Matimus, your suggestions are all good.

Try-except is slower than:
if x in adict: ... else: ...
A defaultdict is generally faster (there are some conditions when it's
not faster, but they aren't much common. I think it's when the ratio
of duplicates is really low), creating just a tuple instead of a class
helps a lot, and when the CPU/OS allow it, Psyco too may help some
here.

If the resulting speed isn't enough yet, consider that Python dicts
are quite fast, so you may need lot of care to write D/C/C++/Clisp
code that's faster for this problem.

I also suspect that when they become very large, Python dicts lose
some of their efficiency. If this is true, you may switch to a new
dictionary every chunk of file, and then merge the dicts at the end. I
don't actually know if this may speed up your Python code even more
(if you experiment this, I'd like to know if it's false).

Bye,
bearophile