[Python-Dev] LinkedHashSet/LinkedHashMap equivalents

Steven Bethard steven.bethard at gmail.com
Thu Mar 10 06:46:16 CET 2005


Delaney, Timothy C (Timothy) <tdelaney at avaya.com> wrote:
> Steven Bethard wrote:
> 
> > def filterdups(iterable):
> >      seen = set()
> >      for item in iterable:
> >          if item not in seen:
> >              seen.add(item)
> >              yield item
> >
> > Adding this to, say, itertools would cover all my use cases.  And as
> > long as you don't have too many duplicates, filterdups as above should
> > keep memory consumption down better.
> 
> Thinking about this further - memory usage would be almost identical. By
> the time you completed the iterable, you would have built up exactly the
> same set internally - although probably not as memory efficient since it
> would be being built piecemeal. OTOH, an ordered set has a bit of extra
> memory for maintaining the order, so it's going to be pretty close.

Definitely true that if you iterate through your entire iterable, it
doesn't gain you anything over an OrderedSet.  OTOH, if you only end
up looking at the first N elements of the iterable, then this would be
more efficient than putting the entire iterable into an OrderedSet and
taking the first N from there.  Of course you can put only the first
elements into the OrderedSet, but note that you can't just put in the
first N; you have to add elements of the iterable into the OrderedSet
until it has len(obj) == N.  Not that this should be more than a few
lines of code, but it's code that you wouldn't have to write with
fitlerdups.

That said, you're right of course that filterdups is certainly not a
big win over OrderedSet.  I was really just trying to point out that
if we were just trying to provide a solution to the
filtering-duplicates-while-maintaining-order problem that OrderedSet
wasn't the only path to that goal.  I think since then there have been
a number of other reasonable cases suggested for wanting an
OrderedSet, e.g.:

* A DB result set that is indexed by a key but also maintains row
order [Eli Stevens, Jeff Bauer]

* A config-file data structure that indexes by option names but
maintains the order of elements read from a config file [John
Williams]

* Drop-down field selectors indexed by both name and sequence position
[Jeff Bauer]

So I'm now probably +0.5 on adding an OrderedSet to collections.  Not
that my opinion is particularly influential, of course. ;-)

Steve
-- 
You can wordify anything if you just verb it.
        --- Bucky Katt, Get Fuzzy


More information about the Python-Dev mailing list