negative "counts" in collections.Counter?

Mon Mar 8 16:44:34 EST 2010

[Vlastimil Brom]
> Thank you very much for the exhaustive explanation Raymond!

You're welcome.

> I am by far not able to follow all of the mathematical background, but
> even for zero-truncating multiset, I would expect the truncation on
> input rather than on output of some operations.

I debated about this and opted for be-loose-in-receiving-and-strict-on-
output.
One thought is that use cases for multisets would have real multisets
as inputs (no negative counts) and as outputs.  The user controls
the inputs, and the method only has a say in what its outputs are.

Also, truncating input would complicate the mathematical definition
of
what is happening.  Compare:

    r = a[x] - b[x]
    if r > 0:
        emit(r)

vs.

    r = max(0, a[x]) - max(0, b[x])
    if r > 0:
        emit(r)

Also, the design parallels what is done in the decimal module
where rounding is applied only to the results of operations,
not to the inputs.

> Probably a kind of negative_update()  or some better named method will
> be handy, like the one you supplied or simply the current module code
> without the newcount > 0: ... condition.

See my other post on this subject.  There is no doubt that
such a method would be handy for signed arithmetic.
The question is whether conflating two different models hurts
the API more than it helps.  Right now, the Counter() class
has no explicit support for negative values.  It is
designed around natural numbers and counting numbers.

> Or would it be an option to
> have a keyword argument like zero_truncate=False which would influence
> this behaviour?

Guido's thoughts on behavior flags is that they are usually a signal
that you need two different classes.  That is why itertools has
ifilter() and ifilterfalse() or izip() and izip_longest() instead
of having behavior flags.

In this case, we have an indication that what you really want is
a separate class supporting elementwise binary and unary operations
on vectors (where the vector fields are accessed by a dictionary
key instead of a positional value).

> Additionally, were issubset and issuperset considered for this
> interface (not sure whether symmetric_difference would be applicable)?

If the need arises, these could be included.  Right now, you
can get the same result with:  "if a - b: ..."

FWIW, I never liked those two method names.  Can't remember whether
a.issubset(b) means "a is a subset of b" or "b issubset of a'.

Raymond