[Patches] [ python-Patches-460402 ] PEP 270: uniq() method for list objects

Thu, 13 Sep 2001 09:59:47 -0700

Patches item #460402, was opened at 2001-09-10 11:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470

Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Jason Petrone (jpetrone)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 270: uniq() method for list objects

Initial Comment:
Remove duplicate elements from a list object.

3 Patches:

uniq.doc.patch - Documentation changes in 
		 libstdtypes.tex

uniq.ordered.patch - Changes to listmodule.c for brute
        force duplicate removal with uniq(). 
        Maintains list order.    

uniq.unordered.patch - Changes to listmodule.c for
        duplicate removal with uniq() First tries 
        using a mapping object. If the list elements
        aren't mapable,  tries sorted removal.  If 
        the list isn't sortable, finally falls back
        on brute force removal.  Does not maintain
        list order.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-13 09:59

Message:
Logged In: YES 
user_id=6380

I talked this over with Tim Peters (whose Cookbook entry is
the source of your original code, we presume). He doesn't
want a Unix-style uniq(), because usually the dict approach
is fastest, and it is O(N).  But he also thinks there is
little reason to make this a built-in, given that the
cookbook example exists and is easy enough to follow. The
only reason to make this a built-in would be if it can be
done much faster in Python, which we doubt since most of the
time goes in the dictionary implementation anyway. Finally,
this is the realm of sets, which (if Greg V Wilson ever
finishes his work for PEP 218) will eventually become a
standard Python datatype.

So, I'm rejecting this patch. Sorry!

----------------------------------------------------------------------

Comment By: Jason Petrone (jpetrone)
Date: 2001-09-13 08:56

Message:
Logged In: YES 
user_id=71210

I don't really have any motivations for this which aren't
apparent.  If you don't immediately see this as useful it 
probably should be rejected.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-13 08:34

Message:
Logged In: YES 
user_id=6380

I've always assumed that the reason uniq(1) only removed
consecutive duplicates would be so that it doesn't have to
use a O(N**2) algorithm where an O(N log N) algorithm will
do. If you need all duplicates removed, you can do it as
follows:

  L.sort()
  L.uniq()

In many cases, you can ensure uniqueness by simply choosing
the right data structures. The keys of a dictionary can be
used to represent a set; there's also talk of adding a set
data type (see PEP 218).

I still haven't seen a good motivation for this addition. If
you don't post it here, I'll reject this.

----------------------------------------------------------------------

Comment By: Jason Petrone (jpetrone)
Date: 2001-09-13 07:54

Message:
Logged In: YES 
user_id=71210

I've always assumed the unix uniq(1) program only removes
consecutive duplicates so it may operate on very large
files without buffering them entirely into memory.  

In Python, buffering the data in memory isn't an issue 
since it is already there.  Also, in most cases a hash
table can be used instead of sorting for slightly better
performance than pre-sorting and removing consecutive 
duplicates.

My main reason for not wanting to mimic unix uniq's
functionality is that I've never really been in a 
situation where I've only needed consecutive duplicates 
removed.  I think achieving global uniqueness in a list is 
a much more common task.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-12 14:18

Message:
Logged In: YES 
user_id=6380

Hm.  The unix uniq(1) program only removes *consecutive*
duplicates. That's an O(N) algorithm and only requires
equality to be defined, and leaves the sorting to the
caller. Would it make sense to support only that?

I guess I'm looking for a use case here...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470