sorted unique elements from a list; using 2.3 features

Delaney, Timothy tdelaney at avaya.com
Mon Jan 6 00:49:47 EST 2003


> From: Andrew Dalke [mailto:adalke at mindspring.com]
> 
> Python 2.3 offers at least two new ways to do this.  The first is
> with the new 'Set' class
> 
> # Let 'data' be a list or iterable object
> import sets
> subset = list(sets.Set(data))
> subset.sort()
> # Use 'subset' as needed

Using sets is definitely the Right Way (TM) to do it. This is one of the
primary use cases for sets (*everyone* wants to do this).

> (The 'list()' is needed because that's the only way to get elements
> out from a list.  It provides an __iter__ but no 'tolist()' method.)

And this is the canonical way to transform any iterable to a list. Why
should every class that you want to transform to a list have to supply a
`tolist` method? Why not a `totuple` method?

> The other is with the new 'fromkeys' class, which constructs

Actually, dictionary class (static?) method.

> # Let 'data' be a list or iterable object
> subset = dict.fromkeys(data).keys()
> subset.sort()
> # Use 'subset' as needed

This, whilst slightly shorter (due to no import - which in future versions
will be going away anyway), is definitely *not* the Right Way (TM) to do it.
It is likely to confuse people.

> For a real-life example, suppose you want to get unique lines
> from the stdin input stream, sort them, and dump the results
> to stdout.  Here's how to do it in Python 2.3
> 
> import sys
> unique_lines = dict.fromkeys(sys.stdin).keys()
> unique_lines.sort()
> sys.stdout.writelines(unique_lines)

Nope - this is better done as:

    import sets
    import sys

    unique_lines = list(sets.Set(sys.stdin))
    unique_lines.sort()
    sys,stdout.writelines(unique_lines)

It says explicitly what you are doing - creating a set of unique *values*
(since that is the definition of a set), the sorting the result.

Tim Delaney





More information about the Python-list mailing list