[Python-ideas] Uniquify attribute for lists
Andrew Barnert
abarnert at yahoo.com
Sat Nov 17 04:17:59 CET 2012
I created a fork of more-itertools, with unique_everseen modified to work with
non-hashable elements. It starts off assuming everything is hashable, and using
a set; if it gets a TypeError, it converts the set to a list and carries on.
Very simple, and it's within 5% of the speed of the original in the best case.
>From a quick glance at your recipe, it looks like you had the same idea long
ago.
Meanwhile, I've been thinking that you don't really have to fall all the way
back to a list if things aren't hashable; there are plenty of intermediate
steps:
There are plenty of types that aren't generally hashable, but you can be sure
that your algorithm won't mutate them, and you can devise a hash function that
guarantees the hashable requirement. For some easy examples, hash mutable
sequences as (type(x), tuple(x)), mutable sets as (type(x), frozenset(x)),
mutable mappings as (type(x), tuple(dict(x).items())), mutable buffers as
(type(x), bytes(x)), etc. Or, if you have a bunch of your own mutable classes,
maybe add a "fakehash" method to them and use that.
As another option, a set of pickle.dumps(x) would work for many types, and it's
still O(N), although with a huge multiplier, so it's not worth it
unless len(sequence) >> avg(len(element)). Also, it's not guaranteed that x==y
implies dumps(x)==dumps(y), so you'd need to restrict it to types for which this
is known to be true.
There are plenty of types that are not hashable, but are fully ordered, and
(type(x), x) is fully ordered as long as all of the types are, so in such cases
you can use a sorted collection (blist.sortedset?) and get O(N log N) time.
Of course none of these works for all types, so you'd still have to fall back to
linear searching through a list in some cases.
At any rate, I don't think any of these alternatives needs to be added to a
general-purpose uniquifier, but they should all be doable if your use case
requires better performance for, e.g., a list of lists or a generator of mutable
class objects or a huge collection of quickly-picklable objects.
----- Original Message ----
> From: Steven D'Aprano <steve at pearwood.info>
> To: python-ideas at python.org
> Sent: Fri, November 16, 2012 5:18:02 PM
> Subject: Re: [Python-ideas] Uniquify attribute for lists
>
> On 17/11/12 05:21, Steven D'Aprano wrote:
>
> > Here's an old recipe predating sets that solves the problem in a number
> > of different ways. Read the comments for versions that don't lose order.
>
> And it would help if I actually included the URL. Sorry about that.
>
>
> http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
>
>
>
>
> -- Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
More information about the Python-ideas
mailing list