[Python-Dev] Rethinking intern() and its data structure

Fri Apr 10 10:58:56 CEST 2009

John Arbash Meinel wrote:

> Not as big of a difference as I thought it would be... But I bet if
> there was a way to put the random shuffle in the inner loop, so you
> weren't accessing the same identical 25k keys internally, you might get
> more interesting results.

You can prepare a few random samples during startup:

$ python -m timeit -s"from random import sample; d =
dict.fromkeys(xrange(10**7)); nextrange = iter([sample(xrange(10**7),25000)
for i in range(200)]).next" "for x in nextrange(): d.get(x)"
10 loops, best of 3: 20.2 msec per loop

To put it into perspective:

$ python -m timeit -s"d = dict.fromkeys(xrange(10**7)); nextrange =
iter([range(25000)]*200).next" "for x in nextrange(): d.get(x)"
100 loops, best of 3: 10.9 msec per loop

Peter