[Patches] [ python-Patches-460402 ] PEP 270: uniq() method for list objects
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 13 Sep 2001 07:54:54 -0700
Patches item #460402, was opened at 2001-09-10 11:48
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Petrone (jpetrone)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 270: uniq() method for list objects
Initial Comment:
Remove duplicate elements from a list object.
3 Patches:
uniq.doc.patch - Documentation changes in
libstdtypes.tex
uniq.ordered.patch - Changes to listmodule.c for brute
force duplicate removal with uniq().
Maintains list order.
uniq.unordered.patch - Changes to listmodule.c for
duplicate removal with uniq() First tries
using a mapping object. If the list elements
aren't mapable, tries sorted removal. If
the list isn't sortable, finally falls back
on brute force removal. Does not maintain
list order.
----------------------------------------------------------------------
>Comment By: Jason Petrone (jpetrone)
Date: 2001-09-13 07:54
Message:
Logged In: YES
user_id=71210
I've always assumed the unix uniq(1) program only removes
consecutive duplicates so it may operate on very large
files without buffering them entirely into memory.
In Python, buffering the data in memory isn't an issue
since it is already there. Also, in most cases a hash
table can be used instead of sorting for slightly better
performance than pre-sorting and removing consecutive
duplicates.
My main reason for not wanting to mimic unix uniq's
functionality is that I've never really been in a
situation where I've only needed consecutive duplicates
removed. I think achieving global uniqueness in a list is
a much more common task.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-12 14:18
Message:
Logged In: YES
user_id=6380
Hm. The unix uniq(1) program only removes *consecutive*
duplicates. That's an O(N) algorithm and only requires
equality to be defined, and leaves the sorting to the
caller. Would it make sense to support only that?
I guess I'm looking for a use case here...
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470