[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists
Michael Lenzen
m.lenzen at gmail.com
Sat Jul 18 17:11:47 CEST 2009
On 07/18/2009 01:30 AM, Chris Rebert wrote:
> On Fri, Jul 17, 2009 at 11:18 PM, Chris Rebert<pyideas at rebertia.com> wrote:
>> Truth be told, it's more than just defaultdict(int). It adds
>> .elements() and .most_common().
>>
>> Seems bag-like to me.
>> - Unordered? Check.
>> - Allows duplicates? Check.
>> - O(1) containment test? Check.
>> - Counts multiplicity of elements? Check.
>> - Iterable? Check.
>>
>> The only non-bag thing about it is allowing 0 and negative
>> multiplicities, which I agree is unintuitive; I don't like that
>> "feature" either.
>
> Actually, from the docs, it also appears (I don't have 3.0 handy to
> test) to get len() wrong, using the dict definition of "number of
> unique items" rather than just "number of items" as would be more
> appropriate for a bag.
>
> In the event a Bag is not added, +1 for adding a method to Counter to
> return `sum(count if count> 0 else 0 for count in
> a_counter.values())`
>
> Cheers,
> Chris
As well as getting len() wrong, it gets iteration wrong. It iterates
over elements with counts of 0 and -1 as well as only iterating once
over elements that appear multiple times. Yes you can iterate over
.elements(), but this should be the default not a special case.
As for adding most_common, it just calls
heapq.nlargest(n, self.items(), key=_itemgetter(1))
which anyone can do, and my bag class does.
My bag class behaves like a collection and provides a .unique_elements()
method that returns the underlying set. You can .add(elem) and
.delete(elem) just like you can with a set, or you can manually change
their multiplicities like in Counter with bag[elem] = 5 or bag[elem] -= 2.
If Counter is supposed to be a collection of elements, this makes no sense:
>>> c = Counter()
>>> c['a'] += 1
>>> c['a'] -= 1
>>> 'a' in c
True
-Michael Lenzen
More information about the Python-ideas
mailing list