[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists

Nick Coghlan ncoghlan at gmail.com
Tue Jul 21 15:12:40 CEST 2009


Michael Lenzen wrote:
> 
> On 07/18/2009 01:30 AM, Chris Rebert wrote:
>> On Fri, Jul 17, 2009 at 11:18 PM, Chris Rebert<pyideas at rebertia.com> 
>> wrote:
>>> Truth be told, it's more than just defaultdict(int). It adds
>>> .elements() and .most_common().
>>>
>>> Seems bag-like to me.
>>> - Unordered? Check.
>>> - Allows duplicates? Check.
>>> - O(1) containment test? Check.
>>> - Counts multiplicity of elements? Check.
>>> - Iterable? Check.
>>>
>>> The only non-bag thing about it is allowing 0 and negative
>>> multiplicities, which I agree is unintuitive; I don't like that
>>> "feature" either.
>>
>> Actually, from the docs, it also appears (I don't have 3.0 handy to
>> test) to get len() wrong, using the dict definition of "number of
>> unique items" rather than just "number of items" as would be more
>> appropriate for a bag.
>>
>> In the event a Bag is not added, +1 for adding a method to Counter to
>> return `sum(count if count>  0 else 0 for count in
>> a_counter.values())`
>>
>> Cheers,
>> Chris
> 
> 
> As well as getting len() wrong, it gets iteration wrong.  It iterates
> over elements with counts of 0 and -1 as well as only iterating once
> over elements that appear multiple times.  Yes you can iterate over
> .elements(), but this should be the default not a special case.
> 
> As for adding most_common, it just calls
> heapq.nlargest(n, self.items(), key=_itemgetter(1))
> which anyone can do, and my bag class does.
> 
> My bag class behaves like a collection and provides a .unique_elements()
> method that returns the underlying set.  You can .add(elem) and
> .delete(elem) just like you can with a set, or you can manually change
> their multiplicities like in Counter with bag[elem] = 5 or bag[elem] -= 2.
> 
> If Counter is supposed to be a collection of elements, this makes no sense:
>>>> c = Counter()
>>>> c['a'] += 1
>>>> c['a'] -= 1
>>>> 'a' in c
> True

I encourage you to put your questions/concerns regarding the new
collections.Counter class into a separate email and send them to
python-dev. It seems to me that it is possible some revisions could be
made to the API. Whether or not that happens will depend on the precise
use cases Raymond had in mind when he added it, but even if nothing
changes such an email thread should provide some more insight into the
rationale driving the API design choices.

Although rather than calling the current API wrong from the outset, I'd
suggest phrasing it as asking "why is the interface this way?". We don't
know whether or not the API is actually wrong without knowing the
objectives Raymond was setting out to achieve.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------



More information about the Python-ideas mailing list