[Python-ideas] collections.Counter should implement __mul__, __rmul__
Serhiy Storchaka
storchaka at gmail.com
Wed Apr 18 16:24:13 EDT 2018
18.04.18 22:34, Tim Peters пише:
> Counter supports a wonderfully weird mix of methods driven by use
> cases, not by ideology.
>
> + (binary)
> - (binary)
> |
> &
>
> have semantics driven by viewing a Counter as a multiset
> implementation. That's why they discard values <= 0. They
> correspond, respectively, to "the standard" multiset operations of sum
> (disjoint union), difference, union, and intersection.
This explains only why binary "-" discards non-positive values and "&"
discards keys that are only in one Counter. Multisets contain only
positive counts.
> Nothing else in Counter is trying to cater to the multiset view, but
> to other use cases. And that's why "*" and "/" should do what
> everyone _expects_ them to do ;-) There are no analogous multiset
> operations to justify them caring at all what the values are.
Isn't everyone expect that x*2 == x + x? Isn't this the definition of
multiplication? And when we have a multiplication, it can be generalized
to division.
> But there there's no good reason for "*" or "/" to care at all. They
> don't make sense for multisets.
I disagree. "+" and "*" are defined for sequences, and these operations
can be defined for multisets in terms of sequences of their elements.
x + y = multiset(x.elements() + y.elements())
x * n = multiset(x.elements() * n)
> After, e.g.,
>
> c /= sum(c.values())
>
> it's sane to expect that the new sum(c.values()) is close to 1
> regardless of the numeric types or signs of the original values.
> Indeed, normalizing values so that their sum _is_ close to 1 is a
> primary use case motivating the current change.
If there are negative values, then their sum can be very small, and the
relative error of the sum can be large. Dividing by it can results in
values with large magnitude, significantly larger than 1, and large errors.
What is the use case for division a Counter with negative values by the
sum of its values?
More information about the Python-ideas
mailing list