[Tutor] Best Known Method for Filtering redundant list items.

John Fouhy john at fouhy.net
Thu Nov 30 22:56:19 CET 2006


On 01/12/06, Chris Hengge <pyro9219 at gmail.com> wrote:
> Nice! Thank you.
>
> Curious as to why this happens though...
>
> >>> list1 = ['1','1','2','3','4']
> >>> list2 = list(set(list1))
> >>> list2
> ['1', '3', '2', '4'] <-- here the order has changed.

For the same reason that dictionaries don't preserve order.
Basically, sets are (I think) implemented using a hash table.  You can
read about hash tables on wikipedia (or many other places), but one of
the components of a hash table is a function mapping keys to integers
in a particular range.  To produce an efficient hash table, you want
this function to spread the input out evenly across the range, which
means you want random behaviour.

Hence you lose the order of strings :-)

In fact, you can think of a set as a dictionary where the values are
always True (or something), and you only care about the keys.

eg:

>>> import itertools
>>> list1 = ['1', '1', '2', '3', '4']
>>> list2 = dict(zip(list1, itertools.repeat(True))).keys()
>>> list2
['1', '3', '2', '4']

-- 
John.


More information about the Tutor mailing list