[Python-Dev] LinkedHashSet/LinkedHashMap equivalents

Brett C. bac at OCF.Berkeley.EDU
Wed Mar 9 02:03:04 CET 2005


Steven Bethard wrote:
> Delaney, Timothy C (Timothy) <tdelaney at avaya.com> wrote:
> 
>>The perennial "how do I remove duplicates from a list" topic came up on
>>c.l.py and in the discussion I mentioned the java 1.5 LinkedHashSet and
>>LinkedHashMap. I'd thought about proposing these before, but couldn't
>>think of where to put them. It was pointed out that the obvious place
>>would be the collections module.
>>
>>For those who don't know, LinkedHashSet and LinkedHashMap are simply
>>hashed sets and maps that iterate in the order that the keys were added
>>to the set/map. I almost invariably use them for the above scenario -
>>removing duplicates without changing order.
>>
>>Does anyone else think it would be worthwhile adding these to
>>collections, or should I just make a cookbook entry?
> 
> 
> I guess I'm -0 on this.
> 
> Though I was the one that suggested that collections is the right
> place to put them, I'm not really certain how much we gain by
> including them.  I too would only ever use them for removing
> duplicates from a list.  But if we're trying to provide a solution to
> this problem, I'd rather see an iterable-friendly one.  See a previous
> thread on this issue[1] where I suggest something like:
> 
> def filterdups(iterable):
>      seen = set()
>      for item in iterable:
>          if item not in seen:
>              seen.add(item)
>              yield item
> 
> Adding this to, say, itertools would cover all my use cases.  And as
> long as you don't have too many duplicates, filterdups as above should
> keep memory consumption down better.
> 

I am -1 on adding LinkedHash*.  While I can understand wanting to get rid of 
duplicates easily and wanting a good solutoin, Steven's snippet of code shows 
rolling your own solution is easy.

Plus this can even be simplified down to a one-liner using itertools already::

   itertools.ifilterfalse(lambda item, _set=set():
                            (item in _set) or (_set.add(item) and False),
                          iterable)

I don't think it is the prettiest solution, but it does show that coming up 
with one is not hard nor restricted to only a single solution that requires 
knowing some Python idiom (well, mine does for the default arg to the lambda, 
but Steven's doesn't).

The last thing I want to happen is for Python's stdlib to grow every possible 
data structure out there like Java seems to have.  I don't ever want to have to 
think about what *variant* of a data structure I should use, only what *type* 
of data structure (and no, I don't count collections.deque and Queue.Queue an 
overlap since the latter is meant more for thread communication, at least in my 
mind).

-Brett


More information about the Python-Dev mailing list