Cache memory and its effect on list searching

Chris Angelico rosuav at gmail.com
Fri Dec 16 21:27:01 EST 2016


On Sat, Dec 17, 2016 at 1:20 PM,  <sohcahtoa82 at gmail.com> wrote:
> I thought this was curious behavior.  I created a list of random-looking strings, then made a sorted copy.  I then found that using "in" to see if a string exists in the sorted list took over 9 times as long!
>

My numbers replicate yours (though my computer's faster). But my
conclusion is different:

Python 3.7.0a0 (default:ecd218c41cd4, Dec 16 2016, 03:08:47)
[GCC 6.2.0 20161027] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> import timeit
>>> hashes =  [hashlib.md5(bytes(str(i), "utf-8")).hexdigest() for i in range(10000000)]
>>> sortedhashes = sorted(hashes)
>>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
0.8167107938788831
>>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, number=10)
5.029693723190576
>>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
0.855183657258749
>>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, number=10)
5.0585526106879115
>>> sethashes = set(hashes)
>>> timeit.timeit('"x" in hashes', globals={'hashes': sethashes}, number=10)
6.13601878285408e-06

You want containment checks? Use a set :)

ChrisA


More information about the Python-list mailing list