[Tutor] Filtering out unique list elements

Wed May 4 12:12:44 CEST 2011

Spyros Charonis wrote:
> Dear All,
> 
> I have built a list with multiple occurrences of a string after some text
> processing that goes something like this:
> 
> [cat, dog, cat, cat, cat, dog, dog, tree, tree, tree, bird, bird, woods,
> woods]
> 
> I am wondering how to truncate this list so that I only print out the unique
> elements, i.e. the same list but with one occurrence per element:
> 
> [cat, dog, tree, bird, woods]

Others have already mentioned set(), but unless I missed something, 
nobody pointed out that sets are unordered, and so will lose whatever 
order was in the list:

 >>> # words = [cat, dog, cat, cat, cat etc...]
 >>> set(words)
set(['bird', 'woods', 'tree', 'dog', 'cat'])

They also didn't mention that sets require the items to be hashable:

 >>> set(['bird', {}, 'cow'])
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: dict objects are unhashable

If neither of those limitations matter to you, then sets will be the 
fastest and easiest solution.

Alternatively, if you only have a few elements:

unique = []
for element in items:
     if element not in unique:
         unique.append(element)

However this will be SLOW if you have many items.

Here are some more recipes:

http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/

-- 
Steven