Why searching in a set is much faster than in a list ?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Sep 28 04:23:23 EDT 2016
On Wednesday 28 September 2016 15:51, ast wrote:
> Hello
>
> I noticed that searching in a set is faster than searching in a list.
[...]
> I tried a search in a tuple, it's not different that in a list.
> Any comments ?
A list, array or tuple is basically a linear array that needs to be searched
one item at a time:
[ a | b | c | d | ... | x | y | z ]
To find x, Python has to start at the beginning and look at every item in turn,
until it finds x.
But a set is basically a hash table. It is an array, but laid out differently,
with blank cells:
[ # | # | h | p | a | # | m | y | b | # | # | f | x | ... | # ]
Notice that the items are jumbled up in arbitrary order. So how does Python
find them?
Python calls hash() on the value, which returns a number, and that points
directly to the cell which would contain the value if it were there. So if you
search for x, Python calls hash(x) which will return (say) 12. Python then
looks in cell 12, and if x is there, it returns True, and if it's not, it
returns False. So instead of looking at 24 cells in the array to find x, it
calculates a hash (which is usually fast), then looks at 1 cell.
(This is a little bit of a simplification -- the reality is a bit more
complicated, but you can look up "hash tables" on the web or in computer
science books. They are a standard data structure, so there is plenty of
information available.)
On average, if you have a list with 1000 items, you need to look at 500 items
before finding the one you are looking for. With a set or dict with 1000 items,
on average you need to look at 1 item before finding the one you are looking
for. And that is why sets and dicts are usually faster than lists.
--
Steven
git gets easier once you get the basic idea that branches are homeomorphic
endofunctors mapping submanifolds of a Hilbert space.
More information about the Python-list
mailing list