[Python-Dev] LinkedHashSet/LinkedHashMap equivalents
Brett C.
bac at OCF.Berkeley.EDU
Wed Mar 9 02:03:04 CET 2005
Steven Bethard wrote:
> Delaney, Timothy C (Timothy) <tdelaney at avaya.com> wrote:
>
>>The perennial "how do I remove duplicates from a list" topic came up on
>>c.l.py and in the discussion I mentioned the java 1.5 LinkedHashSet and
>>LinkedHashMap. I'd thought about proposing these before, but couldn't
>>think of where to put them. It was pointed out that the obvious place
>>would be the collections module.
>>
>>For those who don't know, LinkedHashSet and LinkedHashMap are simply
>>hashed sets and maps that iterate in the order that the keys were added
>>to the set/map. I almost invariably use them for the above scenario -
>>removing duplicates without changing order.
>>
>>Does anyone else think it would be worthwhile adding these to
>>collections, or should I just make a cookbook entry?
>
>
> I guess I'm -0 on this.
>
> Though I was the one that suggested that collections is the right
> place to put them, I'm not really certain how much we gain by
> including them. I too would only ever use them for removing
> duplicates from a list. But if we're trying to provide a solution to
> this problem, I'd rather see an iterable-friendly one. See a previous
> thread on this issue[1] where I suggest something like:
>
> def filterdups(iterable):
> seen = set()
> for item in iterable:
> if item not in seen:
> seen.add(item)
> yield item
>
> Adding this to, say, itertools would cover all my use cases. And as
> long as you don't have too many duplicates, filterdups as above should
> keep memory consumption down better.
>
I am -1 on adding LinkedHash*. While I can understand wanting to get rid of
duplicates easily and wanting a good solutoin, Steven's snippet of code shows
rolling your own solution is easy.
Plus this can even be simplified down to a one-liner using itertools already::
itertools.ifilterfalse(lambda item, _set=set():
(item in _set) or (_set.add(item) and False),
iterable)
I don't think it is the prettiest solution, but it does show that coming up
with one is not hard nor restricted to only a single solution that requires
knowing some Python idiom (well, mine does for the default arg to the lambda,
but Steven's doesn't).
The last thing I want to happen is for Python's stdlib to grow every possible
data structure out there like Java seems to have. I don't ever want to have to
think about what *variant* of a data structure I should use, only what *type*
of data structure (and no, I don't count collections.deque and Queue.Queue an
overlap since the latter is meant more for thread communication, at least in my
mind).
-Brett
More information about the Python-Dev
mailing list