Mailman 3 Counter comparison - Python-ideas

Counter comparison

Antony Lee

Nov. 21, 2014

5:55 p.m.

As Counter objects effectively behave like multi-sets, it seems reasonable to overload <, <=, >, >= as for set objects to check whether a Counter is a sub/super-set of another Counter: c < d <===> all(c[k] < d[k] for k in c) Thoughts? Antony

Attachments:

attachment.htm (text/html — 366 bytes)

Show replies by date

MRAB

November 2014

6:27 p.m.

On 2014-11-21 23:55, Antony Lee wrote:

...

As Counter objects effectively behave like multi-sets, it seems reasonable to overload <, <=, >, >= as for set objects to check whether a Counter is a sub/super-set of another Counter:

c < d <===> all(c[k] < d[k] for k in c)

Thoughts?

More correctly, it would be: all(c[k] < d[k] for k in set(c) | set(d))

Raymond Hettinger

9:47 a.m.

...

On Nov 21, 2014, at 3:55 PM, Antony Lee <antony.lee@berkeley.edu> wrote:

As Counter objects effectively behave like multi-sets, it seems reasonable to overload <, <=, >, >= as for set objects to check whether a Counter is a sub/super-set of another Counter:

c < d <===> all(c[k] < d[k] for k in c)

Thoughts

This is something that could be done, but I think the demonstrated need is nearly zero. It doesn't seem to arise in practice. Conceptually, a counter is just a dictionary that returns zero instead of raising a KeyError. So, while they can be used as multisets, they have other uses as well (for example, allowing negative counts or fractional counts). Another thought, is that overloading comparison operators risks isn't always a good thing. Even for regular sets, people sometimes get surprised that the sort order isn't deterministic (because sets have a partial ordering instead of a total ordering). The idea of adding counter comparisons was considered from the outset and was intentionally not included. If compelling use cases were arising in practice, we could add comparison support, but until then it is more prudent to be conservative with the API. Raymond

Antoine Pitrou

10:06 a.m.

On Sat, 22 Nov 2014 07:47:51 -0800 Raymond Hettinger <raymond.hettinger@gmail.com> wrote:

...

...
On Nov 21, 2014, at 3:55 PM, Antony Lee <antony.lee@berkeley.edu> wrote:

As Counter objects effectively behave like multi-sets, it seems reasonable to overload <, <=, >, >= as for set objects to check whether a Counter is a sub/super-set of another Counter:

c < d <===> all(c[k] < d[k] for k in c)

Thoughts

This is something that could be done, but I think the demonstrated need is nearly zero. It doesn't seem to arise in practice.

You say this while you closed an issue where the poster asked for this very feature, and clearly stated what his needs were: http://bugs.python.org/issue22515

...

Conceptually, a counter is just a dictionary that returns zero instead of raising a KeyError.

Come on. If that were the case, it wouldn't expose other operators such as addition, subtraction, intersection and union. https://docs.python.org/3.3/library/collections.html#collections.Counter

...

So, while they can be used as multisets, they have other uses as well (for example, allowing negative counts or fractional counts).

The demonstrated need for those "other uses" is nearly zero. It doesn't seem to arise in practice. Your insistance on defending irrational API decisions is staggering. Regards Antoine.

Raymond Hettinger

8:51 p.m.

...

On Nov 22, 2014, at 8:06 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

Your insistance on defending irrational API decisions is staggering.

Gee, that's a little harsh. I'm trying to exercise good judgment over API design decisions. You may not agree, but I should at least share my thought process: * Even for regular sets, overriding the comparison operators was a bit dodgy leading to some oddities -- one because it is partial ordering that confuses sort() and another because of people's propensity to write "set_a < set_b" when they really meant "set_a <= set_b". * Also, I have reservations about having added any multi-set operations to counters in the first place. That seems to be the only the part of counter API that has agitated or confused anyone. The problems arise the counter is an extension of the dict API rather than being a fully encapsulated bag or multiset that can prevent counts from being zero, negative, or non-integral. In other words, the multi-set operations may have added some value but they weren't a perfect fit. * With those two thoughts in mind, I had some reluctance to add anything "doubly dodgy" by adding multi-set comparisons. * Next, there is some question of what the comparison operations should do in the presence of zeros or negative counts. Inspired by the design principles in the decimal module (all numbers are exact and rounding is only applied after operations rather than before), the current multi-set operations eliminate non-negative results after the operation rather than before. That said, I don't know whether it makes sense multi-set comparison to follow that model (for consistency) or to clean-up the inputs prior to the operation. For example, should Counter(a=-1, b=0, c=1) be considered a proper subset of Counter(c=2, d=3, e=0, f=-1)? There are reasons to say yes and reasons you could say no. To inform the best possible decision, it would be great to have real use cases to guide the design process. * Which gets us to use cases. When Ram suggested multiset operations in a feature request, all he said was "I suggest implementing `Counter.__lt__` which will be a partial order, similarly to `set.__lt__`.". There was no mention of use case until later where all he said was "I needed it for an internal calculation in a combinatorics package I'm writing. " and he did not elaborate further. As the discussion evolved, no further use case detail emerged. However, later in the discussion it because clear that several of the posters either thought the idea didn't make sense or were confused about what the it should do. Near the end of the discussion, the OP said he had given up on the idea because of the issues regarding the "hybrid mapping/multiset" design. After the -1s, examples of people being confused, and the OP's withdrawal, I stepped in and closed the feature request. * This current thread did not begin with a use case either. It was more along the lines of "in for a penny, in for a pound". Since some multiset operations were implemented, maybe you should do them all. For the reasons listed above, I prefer to be conservative and not expand the API further. For the most part, people seem to use counters for counting. They are good at that task and popular. I don't feel a strong need to expand the API into dodgy territory unless really good use cases arise to guide the design and make it feel like the added benefits were worth moving further on to shaky ground. Also, I've taken some comfort in knowing that since a counter is a dictionary subclass, it is trivially easy to extend for users. If the user can already meet any perceived need by writing "all(c[k] < d[k] for k in c)", then I'm not sure we can make much of a value add. You may not agree with the outcome of my thought process, but it is unfair to call it irrational. not-everything-that-can-be-done-should-be-done-ly yours, Raymond

Nick Coghlan

11:33 p.m.

On 23 November 2014 at 12:51, Raymond Hettinger <raymond.hettinger@gmail.com> wrote: [snip]

...

For the reasons listed above, I prefer to be conservative and not expand the API further. For the most part, people seem to use counters for counting. They are good at that task and popular. I don't feel a strong need to expand the API into dodgy territory unless really good use cases arise to guide the design and make it feel like the added benefits were worth moving further on to shaky ground.

Also, I've taken some comfort in knowing that since a counter is a dictionary subclass, it is trivially easy to extend for users. If the user can already meet any perceived need by writing "all(c[k] < d[k] for k in c)", then I'm not sure we can make much of a value add.

You may not agree with the outcome of my thought process, but it is unfair to call it irrational.

Thanks for the detailed explanation.

...

not-everything-that-can-be-done-should-be-done-ly yours,

This is a key point. Providing APIs that have odd edge case behaviour can be worse than not providing a particular feature at all. When the feature doesn't exist, folks are more inclined to just write their own custom class or function that does exactly what they need. It's generally feasible to keep those quite concise, since they don't need to handle all possible variations that may arise. With the existing Counter-as-multiset features already offering some potential for confusion, and the potential for further confusing interactions between additional multiset features and Counter's support for non-integer values, zero values (distinct from keys being missing entirely) and negative values, there may be scope for a separate true multiset API that draws more heavily on the set API design than the dict API design. A dedicated multiset API would potentially be able to avoid the confusing aspects of Counters-as-multisets by only allowing non-negative integer values. Is there sufficient value in such an API to justify adding it? Or would it just create confusion as folks tried to decide between using Counter or using the new multiset/bag container for their algorithm? That's an open question, but at the very least, it's worth considering as an alternative to further elaborating on an already confusing aspect of the collections.Counter design. There's at least one such example of a bag API already available on PyPI: https://pypi.python.org/pypi/data-structures/0.1.4#bag (you need "-Counter" in a Google search to find that, as most current hits describe the use of Counter as a multiset, rather than using a dedicated container type) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ethan Furman

9:40 a.m.

On 11/22/2014 09:33 PM, Nick Coghlan wrote:

...

With the existing Counter-as-multiset features already offering some potential for confusion, and the potential for further confusing interactions between additional multiset features and Counter's support for non-integer values, zero values (distinct from keys being missing entirely) and negative values, there may be scope for a separate true multiset API that draws more heavily on the set API design than the dict API design.

A dedicated multiset API would potentially be able to avoid the confusing aspects of Counters-as-multisets by only allowing non-negative integer values. Is there sufficient value in such an API to justify adding it? Or would it just create confusion as folks tried to decide between using Counter or using the new multiset/bag container for their algorithm?

That's an open question, but at the very least, it's worth considering as an alternative to further elaborating on an already confusing aspect of the collections.Counter design. There's at least one such example of a bag API already available on PyPI: https://pypi.python.org/pypi/data-structures/0.1.4#bag (you need "-Counter" in a Google search to find that, as most current hits describe the use of Counter as a multiset, rather than using a dedicated container type)

I have also wondered about the feasibility of separating out the multiset features into a distinct type. Seems like that would avoid a bunch of confusion. -- ~Ethan~

Antony Lee

1:30 p.m.

My goal was very simply to check whether it was possible to remove a multi-set of elements from another, without any counts going below 0 (as should be the case for "natural" counters). Antony 2014-11-23 7:40 GMT-08:00 Ethan Furman <ethan@stoneleaf.us>:

...

On 11/22/2014 09:33 PM, Nick Coghlan wrote:

...
With the existing Counter-as-multiset features already offering some potential for confusion, and the potential for further confusing interactions between additional multiset features and Counter's support for non-integer values, zero values (distinct from keys being missing entirely) and negative values, there may be scope for a separate true multiset API that draws more heavily on the set API design than the dict API design.

A dedicated multiset API would potentially be able to avoid the confusing aspects of Counters-as-multisets by only allowing non-negative integer values. Is there sufficient value in such an API to justify adding it? Or would it just create confusion as folks tried to decide between using Counter or using the new multiset/bag container for their algorithm?

That's an open question, but at the very least, it's worth considering as an alternative to further elaborating on an already confusing aspect of the collections.Counter design. There's at least one such example of a bag API already available on PyPI: https://pypi.python.org/pypi/data-structures/0.1.4#bag (you need "-Counter" in a Google search to find that, as most current hits describe the use of Counter as a multiset, rather than using a dedicated container type)

I have also wondered about the feasibility of separating out the multiset features into a distinct type. Seems like that would avoid a bunch of confusion.

-- ~Ethan~

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

6:04 p.m.

On Sun, Nov 23, 2014 at 11:30:25AM -0800, Antony Lee wrote:

...

My goal was very simply to check whether it was possible to remove a multi-set of elements from another, without any counts going below 0 (as should be the case for "natural" counters).

Do you mean this? py> from collections import Counter py> c1 = Counter({'a': 5, 'b': 2}) py> c2 = Counter({'a': 1, 'b': 4}) py> c1 - c2 Counter({'a': 4}) -- Steven

Ethan Furman

7:03 p.m.

On 11/23/2014 04:04 PM, Steven D'Aprano wrote:

...

On Sun, Nov 23, 2014 at 11:30:25AM -0800, Antony Lee wrote:

...
My goal was very simply to check whether it was possible to remove a multi-set of elements from another, without any counts going below 0 (as should be the case for "natural" counters).

Do you mean this?

py> from collections import Counter py> c1 = Counter({'a': 5, 'b': 2}) py> c2 = Counter({'a': 1, 'b': 4}) py> c1 - c2 Counter({'a': 4})

Um, how does that show that 'b' went below zero? -- ~Ethan~

Steven D'Aprano

7:20 p.m.

On Sun, Nov 23, 2014 at 05:03:27PM -0800, Ethan Furman wrote:

...

On 11/23/2014 04:04 PM, Steven D'Aprano wrote:

...
On Sun, Nov 23, 2014 at 11:30:25AM -0800, Antony Lee wrote:

...
My goal was very simply to check whether it was possible to remove a multi-set of elements from another, without any counts going below 0 (as should be the case for "natural" counters).

Do you mean this?

py> from collections import Counter py> c1 = Counter({'a': 5, 'b': 2}) py> c2 = Counter({'a': 1, 'b': 4}) py> c1 - c2 Counter({'a': 4})

Um, how does that show that 'b' went below zero?

It doesn't, which is the point. Antony asked for removal WITHOUT counts going below zero. -- Steve

Ethan Furman

7:30 p.m.

On 11/23/2014 05:20 PM, Steven D'Aprano wrote:

...

On Sun, Nov 23, 2014 at 05:03:27PM -0800, Ethan Furman wrote:

...
On 11/23/2014 04:04 PM, Steven D'Aprano wrote:

...
On Sun, Nov 23, 2014 at 11:30:25AM -0800, Antony Lee wrote:

...
My goal was very simply to check whether it was possible to remove a multi-set of elements from another, without any counts going below 0 (as should be the case for "natural" counters).

Do you mean this?

py> from collections import Counter py> c1 = Counter({'a': 5, 'b': 2}) py> c2 = Counter({'a': 1, 'b': 4}) py> c1 - c2 Counter({'a': 4})

Um, how does that show that 'b' went below zero?

It doesn't, which is the point. Antony asked for removal WITHOUT counts going below zero.

Ah. I read that as "being able to tell if any count would go below zero" -- as in, counter1 is my inventory, counter2 is an order, can I fill the order? -- ~Ethan~

Antony Lee

11:37 p.m.

I guess my post was unclear but my intent was indeed Ethan's (can I satisfy an order from my inventory?), not Steven's. 2014-11-23 17:30 GMT-08:00 Ethan Furman <ethan@stoneleaf.us>:

...

On 11/23/2014 05:20 PM, Steven D'Aprano wrote:

...
On Sun, Nov 23, 2014 at 05:03:27PM -0800, Ethan Furman wrote:

...
On 11/23/2014 04:04 PM, Steven D'Aprano wrote:

...
On Sun, Nov 23, 2014 at 11:30:25AM -0800, Antony Lee wrote:

...
My goal was very simply to check whether it was possible to remove a multi-set of elements from another, without any counts going below 0

(as

...
should be the case for "natural" counters).

Do you mean this?

py> from collections import Counter py> c1 = Counter({'a': 5, 'b': 2}) py> c2 = Counter({'a': 1, 'b': 4}) py> c1 - c2 Counter({'a': 4})

Um, how does that show that 'b' went below zero?

It doesn't, which is the point. Antony asked for removal WITHOUT counts going below zero.

Ah. I read that as "being able to tell if any count would go below zero" -- as in, counter1 is my inventory, counter2 is an order, can I fill the order?

-- ~Ethan~

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Devin Jeanpierre

10:43 a.m.

On Sat, Nov 22, 2014 at 7:47 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:

...

Conceptually, a counter is just a dictionary that returns zero instead of raising a KeyError. So, while they can be used as multisets, they have other uses as well (for example, allowing negative counts or fractional counts).

This does little to nothing to explain most of the behavior of counters, like that of &. They are most naturally thought of as multisets.

...

Another thought, is that overloading comparison operators risks isn't always a good thing. Even for regular sets, people sometimes get surprised that the sort order isn't deterministic (because sets have a partial ordering instead of a total ordering).

Sure. OTOH it is possible to still be conservative while adding this: these risks do not exist for the set comparison operators: isdisjoint, issubset, issuperset. -- Devin

random832＠fastmail.us

12:09 a.m.

On Sat, Nov 22, 2014, at 10:47, Raymond Hettinger wrote:

...

Another thought, is that overloading comparison operators risks isn't always a good thing. Even for regular sets, people sometimes get surprised that the sort order isn't deterministic (because sets have a partial ordering instead of a total ordering).

Thank you, incidentally, for an example better than floats (in presence of nan) of a type that will break sorting algorithms.

Guido van Rossum

9:43 a.m.

On Sun, Nov 23, 2014 at 10:09 PM, <random832@fastmail.us> wrote:

...

On Sat, Nov 22, 2014, at 10:47, Raymond Hettinger wrote:

...
Another thought, is that overloading comparison operators risks isn't always a good thing. Even for regular sets, people sometimes get surprised that the sort order isn't deterministic (because sets have a partial ordering instead of a total ordering).

Thank you, incidentally, for an example better than floats (in presence of nan) of a type that will break sorting algorithms.

But does it matter? If you've got a list of sets, and you want it sorted, you have to specify how you want it sorted anyways. You can complain that < is only a partial order, but that is the mathematical root of the situation, not a problem caused by an poorly chosen language default. (Leaving < undefined and sorting by object address, now *that* would be an example of the latter. :-) Math is often surprising. This how you learn. (You may also learn more about sorting algorithms this way.) -- --Guido van Rossum (python.org/~guido)

random832＠fastmail.us

11:37 a.m.

On Mon, Nov 24, 2014, at 10:43, Guido van Rossum wrote:

...

But does it matter? If you've got a list of sets, and you want it sorted, you have to specify how you want it sorted anyways. You can complain that < is only a partial order, but that is the mathematical root of the situation, not a problem caused by an poorly chosen language default. (Leaving < undefined and sorting by object address, now *that* would be an example of the latter. :-)

You seem to be suggesting that sorting a list of partially ordered items is not a meaningful thing to ask for, rather than merely not being implemented by the list.sort method.

Antoine Pitrou

11:53 a.m.

On Mon, 24 Nov 2014 12:37:39 -0500 random832@fastmail.us wrote:

...

On Mon, Nov 24, 2014, at 10:43, Guido van Rossum wrote:

...
But does it matter? If you've got a list of sets, and you want it sorted, you have to specify how you want it sorted anyways. You can complain that < is only a partial order, but that is the mathematical root of the situation, not a problem caused by an poorly chosen language default. (Leaving < undefined and sorting by object address, now *that* would be an example of the latter. :-)

You seem to be suggesting that sorting a list of partially ordered items is not a meaningful thing to ask for, rather than merely not being implemented by the list.sort method.

Since list.sort is guaranteed to be stable, the sort order should ideally be deterministic even in the face of partial ordering. Regards Antoine.

Guido van Rossum

12:11 p.m.

On Mon, Nov 24, 2014 at 9:37 AM, <random832@fastmail.us> wrote:

...

On Mon, Nov 24, 2014, at 10:43, Guido van Rossum wrote:

...
But does it matter? If you've got a list of sets, and you want it sorted, you have to specify how you want it sorted anyways. You can complain that < is only a partial order, but that is the mathematical root of the situation, not a problem caused by an poorly chosen language default. (Leaving < undefined and sorting by object address, now *that* would be an example of the latter. :-)

You seem to be suggesting that sorting a list of partially ordered items is not a meaningful thing to ask for, rather than merely not being implemented by the list.sort method.

I assume you want a topological sort. But isn't that topic by itself a useful math lesson? -- --Guido van Rossum (python.org/~guido)

Serhiy Storchaka

12:03 p.m.

On 22.11.14 01:55, Antony Lee wrote:

...

As Counter objects effectively behave like multi-sets, it seems reasonable to overload <, <=, >, >= as for set objects to check whether a Counter is a sub/super-set of another Counter:

c < d <===> all(c[k] < d[k] for k in c)

Thoughts?

http://bugs.python.org/issue22515

3745

Age (days ago)

3748

Last active (days ago)

List overview

Download

19 comments

11 participants

participants (11)

Antoine Pitrou
Antony Lee
Devin Jeanpierre
Ethan Furman
Guido van Rossum
MRAB
Nick Coghlan
random832＠fastmail.us
Raymond Hettinger
Serhiy Storchaka
Steven D'Aprano

Counter comparison

Antony Lee

Antony Lee

Antony Lee

random832＠fastmail.us

random832＠fastmail.us

tags

participants (11)