sorted unique elements from a list; using 2.3 features
Andrew Dalke
adalke at mindspring.com
Sat Jan 4 14:26:18 EST 2003
I often need to get the subset of unique element from a list, in
sorted order. For example, if I have
[1, 4, 3, 4, 3, 1]
I want to get
[1, 3, 4]
The usual way to do this is
# Let 'data' be a list or iterable object
# For example, data = [1, 4, 3, 4, 3, 1]
# or, data = sys.stdin
d = {}
for x in data:
d[x] = 1
subset = d.keys()
subset.sort()
# Use 'subset' as needed
Python 2.3 offers at least two new ways to do this. The first is
with the new 'Set' class
# Let 'data' be a list or iterable object
import sets
subset = list(sets.Set(data))
subset.sort()
# Use 'subset' as needed
(The 'list()' is needed because that's the only way to get elements
out from a list. It provides an __iter__ but no 'tolist()' method.)
The other is with the new 'fromkeys' class, which constructs
a dictionary from a list -- the elements of the list become the
keys and you can choose the value for the item, or just use the
default of None. To show you what that means
>>> dict.fromkeys([1,2,5,3,2,1], 0)
{1: 0, 2: 0, 3: 0, 5: 0}
>>> dict.fromkeys([1,2,5,3,2,1])
{1: None, 2: None, 3: None, 5: None}
>>>
So for the task at hand,
# Let 'data' be a list or iterable object
subset = dict.fromkeys(data).keys()
subset.sort()
# Use 'subset' as needed
For a real-life example, suppose you want to get unique lines
from the stdin input stream, sort them, and dump the results
to stdout. Here's how to do it in Python 2.3
import sys
unique_lines = dict.fromkeys(sys.stdin).keys()
unique_lines.sort()
sys.stdout.writelines(unique_lines)
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list