Cache memory and its effect on list searching
Chris Angelico
rosuav at gmail.com
Fri Dec 16 21:27:01 EST 2016
On Sat, Dec 17, 2016 at 1:20 PM, <sohcahtoa82 at gmail.com> wrote:
> I thought this was curious behavior. I created a list of random-looking strings, then made a sorted copy. I then found that using "in" to see if a string exists in the sorted list took over 9 times as long!
>
My numbers replicate yours (though my computer's faster). But my
conclusion is different:
Python 3.7.0a0 (default:ecd218c41cd4, Dec 16 2016, 03:08:47)
[GCC 6.2.0 20161027] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> import timeit
>>> hashes = [hashlib.md5(bytes(str(i), "utf-8")).hexdigest() for i in range(10000000)]
>>> sortedhashes = sorted(hashes)
>>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
0.8167107938788831
>>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, number=10)
5.029693723190576
>>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
0.855183657258749
>>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, number=10)
5.0585526106879115
>>> sethashes = set(hashes)
>>> timeit.timeit('"x" in hashes', globals={'hashes': sethashes}, number=10)
6.13601878285408e-06
You want containment checks? Use a set :)
ChrisA
More information about the Python-list
mailing list