[Patches] [ python-Patches-460402 ] PEP 270: uniq() method for list objects

Thu, 13 Sep 2001 07:54:54 -0700

Patches item #460402, was opened at 2001-09-10 11:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Petrone (jpetrone)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 270: uniq() method for list objects

Initial Comment:
Remove duplicate elements from a list object.

3 Patches:

uniq.doc.patch - Documentation changes in 
		 libstdtypes.tex

uniq.ordered.patch - Changes to listmodule.c for brute
        force duplicate removal with uniq(). 
        Maintains list order.    

uniq.unordered.patch - Changes to listmodule.c for
        duplicate removal with uniq() First tries 
        using a mapping object. If the list elements
        aren't mapable,  tries sorted removal.  If 
        the list isn't sortable, finally falls back
        on brute force removal.  Does not maintain
        list order.

----------------------------------------------------------------------

>Comment By: Jason Petrone (jpetrone)
Date: 2001-09-13 07:54

Message:
Logged In: YES 
user_id=71210

I've always assumed the unix uniq(1) program only removes
consecutive duplicates so it may operate on very large
files without buffering them entirely into memory.  

In Python, buffering the data in memory isn't an issue 
since it is already there.  Also, in most cases a hash
table can be used instead of sorting for slightly better
performance than pre-sorting and removing consecutive 
duplicates.

My main reason for not wanting to mimic unix uniq's
functionality is that I've never really been in a 
situation where I've only needed consecutive duplicates 
removed.  I think achieving global uniqueness in a list is 
a much more common task.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-12 14:18

Message:
Logged In: YES 
user_id=6380

Hm.  The unix uniq(1) program only removes *consecutive*
duplicates. That's an O(N) algorithm and only requires
equality to be defined, and leaves the sorting to the
caller. Would it make sense to support only that?

I guess I'm looking for a use case here...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470