[Tutor] Select distinct item form list
Gregor Lingl
glingl@aon.at
Mon Feb 24 05:32:02 2003
janos.juhasz@VELUX.com schrieb:
>Dear All,
>
>Can someone show me a simple list comprehension to do the same thing as
>"Select distinct item form list" does in SQL
>
>So i have a list
>
>
>>>>l = (1,2,3,4,5,5,6,7,7,7,2)
>>>>
>>>>
>but i would have just
>l=(1,2,3,4,5,6,7)
>
>I know i have seen this somewhere, but i cannot find it :(
>
>
Me neither! And I also can't figure out a clear and fast example.
But here are two functions, which do what you need, the second one beeing
*much* faster than the first one, although it uses a kind of detour:
>>> def uniques(list):
... u=[]
... for l in list:
... if l not in u:
... u.append(l)
... return u
...
>>> def distincts(list):
... d = {}
... for l in list:
... d[l]=None
... return d.keys()
...
>>> from time import clock
>>> from random import randrange
>>> example = [randrange(100) for i in range(1000)]
>>> if 1:
... a=clock()
... result1 = uniques(example)
... b=clock()
... result2 = distincts(example)
... c=clock()
... print b-a, c-b
...
0.0271014815845 0.00253942818447
>>> example = [randrange(1000) for i in range(1000)]
>>> if 1:
... a=clock()
... result1 = uniques(example)
... b=clock()
... result2 = distincts(example)
... c=clock()
... print b-a, c-b
...
0.15894725197 0.00390971368995
>>> len(result1)
637
>>> len(result2)
637
>>> result1
[798, 106, 230, 694, 163, 709, 666, 29, 481, 115, 682, 467, 872, 195,
311, 800, 420, 423, 881, ...
Regards, Gregor
P.S.: The following is also possible, but certainly not what you had in
mind:
>>> def weird(list):
... e = []
... u = [e.append(l) for l in list if l not in e]
... return e
...
>>> result3 = weird(example)
>>> result3[:20]
[798, 106, 230, 694, 163, 709, 666, 29, 481, 115, 682, 467, 872, 195,
311, 800, 420, 423, 881, 8]
>>> len(result3)
637
OOPS! Michael's idea just arrived (including a typo):
>>> def uniqs(inp, was_there=[]):
... if not inp in was_there:
... was_there.append(inp)
... return 1 # sending "True" to filter
...
>>> if 1:
... a = clock()
... result4 = filter(uniqs, example)
... b = clock()
... print b - a, len(result4)
...
0.167666793498 637
It uses a similar idea to my first example.
!!! But ti has a severe disadvantage as you can see,
if you use it twice: !!!
>>> if 1:
... a = clock()
... result4 = filter(uniqs, example)
... b = clock()
... print b - a, len(result4)
...
0.168840126653 0
>>> result4
[]
>>>
Now the resulting list is empty! This comes from using
a mutable object, namely a list, as default value for a parameter.
After the first run was_there contains all the numbers in example,
so nothing will be added to result.
So, I think, it's better to discard that idea. Sorry.
>Please CC me.
>Best regards,
>-----------------------
>Juhász János
>IT department
>
>
>
>_______________________________________________
>Tutor maillist - Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor
>
>
>
>