[Tutor] List intersect
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Mon Sep 27 20:52:56 CEST 2004
> > I never wrote a python script. I am looking for a script that I can use
> > to generate a uniq list out of five lists. These lists are bunch of
> > usernames.
[text cut]
> Unless I'm mistaken about what you're trying to do, writing a Python
> script sounds like making things harder than necessary. Assuming you're
> on a Unix-like system and your lists are in separate files:
>
> cat list1 list2 list3 list4 list5 | uniq > newlist
Yes, the Unix shell solution should be really straightforward. But don't
forget to sort!
$ cat list1 list2 list3 list4 list5 | sort | uniq > newlist
The 'uniquing' algorithm that uniq uses won't see duplicates unless
they're adjacent to each other. The 'uniq' utility does something like
this:
###
def unique(sequence):
if len(sequence) == 1:
return sequence
results = [sequence[0]]
i = 1
while i < len(sequence):
if sequence[i] != sequence[i-1]:
results.append(sequence[i])
i += 1
return results
###
And we can see that it works, just as long as the sequence is sorted:
###
>>> unique([1, 1, 2, 4, 6, 8, 9, 9, 10])
[1, 2, 4, 6, 8, 9, 10]
###
But if the elements are not in sorted order, then unique() won't catch all
duplicate elements:
###
>>> import random
>>> l = [1, 1, 2, 4, 6, 8, 9, 9, 10]
>>> random.shuffle(l)
>>> l
[4, 6, 8, 1, 10, 1, 9, 9, 2]
>>> unique(l)
[4, 6, 8, 1, 10, 1, 9, 2]
###
So if you use this approach, don't forget to sort first.
An alternative way to solve the uniqueness problem is to use dictionaries
or sets to maintain a unique list of elements. All of the tutorials on:
http://www.python.org/topics/learn/non-prog.html
should cover how to use dictionaries.
More information about the Tutor
mailing list