negative "counts" in collections.Counter?
Raymond Hettinger
python at rcn.com
Mon Mar 8 01:31:00 EST 2010
On Mar 7, 5:46 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> Given that Counter supports negative counts, it looks to me that the
> behaviour of __add__ and __sub__ is fundamentally flawed. You should
> raise a bug report (feature enhancement) on the bug tracker.
It isn't a bug. I designed it that way.
There were several possible design choices,
each benefitting different use cases.
FWIW, here is the reasoning behind the design.
The basic approach to Counter() is to be a dict subclass
that supplies zero for missing values. This approach
places almost no restrictions on what can be stored in it.
You can store floats, decimals, fractions, etc.
Numbers can be positive, negative, or zero.
This design leaves users with a good deal of flexibility,
but more importantly it keeps the API very close to
regular dictionaries (thus lowering the learning curve).
It also makes basic counter operations very fast.
The update() method differs from regular dictionaries
because dict.update() isn't very helpful in a counter context.
Instead, it allows one counter to update another. Like
the other basic counter operations, it places no restrictions
on type (i.e. it works on with ints, floats, decimals, fractions,
etc).
or on sign (positive, negative, or zero). The only requirement
is that the values support addition.
The basic API also adds some convenience methods.
The most_common() method tries to not be restrictive.
It only requires the count values be orderable.
For obvious reasons, the elements() method *does* have
restrictions and won't work with float values and won't
emit entries with non-positive counts.
Beyond the basic API listed above which is straight-forward,
the area where there were more interesting design choices
were the Counter-to-Counter operations (__add__, __sub__,
__or__, and __and__).
One possible choice (the one preferred by the OP) was to
has addition and subtraction be straight adds and subtracts
without respect to sign and to not support __and__ and __or__.
Straight addition was already supported via the update() method.
But no direct support was provided for straight subtractions
that leave negative values. Sorry about that.
Instead the choice was to implement the four methods as
multiset operations. As such, they need to correspond
to regular set operations. For example, with sets:
set('abc') - set('cde') # gives set('ab')
Notice how subtracting 'e' did not produce a negative result?
Now with multisets, we want the same result:
Counter({'a':1, 'b':1, 'c':1'}) - Counter({'c':1, 'd':1, 'e':1})
So the design decision was to support multiset operations
that produces only results with positive counts.
These multiset-style mathematical operations are discussed in:
Knuth's TAOCP Volume II section 4.6.3 exercise 19
and at http://en.wikipedia.org/wiki/Multiset .
Also C++ has multisets that support only positive counts.
So, there you have it. There design tries to be as
unrestrictive as possible. When a choice had to be
made, it favored multisets over behaviors supporting
negative values.
Fortunately, it is trivially easy to extend the class
to add any behavior you want.
class MyCounter(Counter):
def __sub__(self, other):
result = self.copy()
for elem, cnt in other.items():
result[elem] -= cnt
Hopes this gives you some insight into the design choices.
Raymond Hettinger
More information about the Python-list
mailing list