Fwd: collections.Counter should implement fromkeys

Reposting because the original got bounced from Google Groups. ---------- Forwarded message --------- From: Tim Peters <tim.peters@gmail.com> Date: Fri, Jun 29, 2018 at 5:54 PM Subject: Re: [Python-ideas] collections.Counter should implement fromkeys To: <abedillon@gmail.com> Cc: python-ideas <python-ideas@googlegroups.com> [Abe Dillon <abedillon@gmail.com>]
More because Counter.fromkeys() could be incoherent. From the implementation (in your Lib/collections/__init__.py): @classmethod def fromkeys(cls, iterable, v=None): # There is no equivalent method for counters because setting v=1 # means that no element can have a count greater than one. raise NotImplementedError( 'Counter.fromkeys() is undefined. Use Counter(iterable) instead.') For a dict, a value appearing multiple times in the iterable doesn't matter. But a fundamental use case for Counters is to tally the _number_ of times duplicate keys appear. So, e.g., someone will be unpleasantly surprised no matter what: Counter.fromkeys("aaaaa", 2) returned. "It should set key 'a' to 2! that's what I said it should do!" "No! It should set key 'a' to 10! that's what a Counter _always_ does - sums the values associated with duplicate keys!" "You're both right - and wrong! It should raise an exception if there's a duplicate key, because there's no compelling answer to what it should do!" I expect Raymond called it NotImplementedError instead so he could release the code instead of waiting 3 years for that debate to end ;-)

[Tim Peters]
a fundamental use case for Counters is to tally the _number_ of times duplicate keys appear.
Yes, that's why the default constructor already does just that. [Tim Peters]
So, e.g., someone will be unpleasantly surprised no matter what
Sure, but in Hettinger's own words <https://www.youtube.com/watch?v=HTLu2DFOdTg&t=24m46s> "whenever you have a constructor war, everyone should get their wish". People that want a counting constructor have that, people that want the ability to initialize values don't have that. [Tim Peters]
I'm tempted to indulge in the meta argument which you're obviously striving to avoid, but I will say this: "that's what a Counter _always_ does" makes no sense. It's *almost* tantamount to saying that all constructors have to do exactly the same thing, which makes multiple constructors useless. Technically, there is no constructor for counting by X, but if enough people really wanted that, I suppose a third constructor would be in order.

On Fri, Jun 29, 2018 at 05:32:54PM -0700, Abe Dillon wrote:
*scratches head* I can initalise a Counter just fine. py> Counter({'a': 0, 'b': 0, 'ab': 2}) Counter({'ab': 2, 'a': 0, 'b': 0}) The supported API for setting initial values of a counter is to either count the supplied keys: Counter(['a', 'b', 'ab']) or supply initial counts in a dict: Counter({'a': 0, 'b': 0, 'ab': 2}) In the case where all the inital counts are zero, the obvious API is to call the dict fromkeys method: Counter(dict.fromkeys(['a', 'b', 'ab'], 0)) So what you're really asking for is a convenience method to bypass the need to create a temporary dict first: Counter.fromkeys(['a', 'b', 'ab'], 0) Presumably the initial value will default to 0 rather than None, and take any integer value. I'm sympathetic to the idea of this as a convenience, but I don't think its an obvious feature to have. Tim's point about duplicate keys is valid. Should it raise an exception, silently swallow duplicates, or count them? The dict constructors, both the standard dict() and dict.fromkeys(), silently swallow duplicates. As they should. But Counter() does not, and should not. There's a discrepency if Counter() doesn't and Counter.fromkeys() does, and it requires a value judgement to decide whether that discrepency is sufficiently unimportant. [...]
Technically, there is no constructor for counting by X, but if enough people really wanted that, I suppose a third constructor would be in order.
How about a fourth constructor? A fifth? A fiftith? How many constructors is too many before the class becomes unwieldy? Not every way you might count with a counter needs to be a constructor method. You can always just count: c = Counter() for key in keys: c[key] += X I think you make a *reasonable* case for Counter.fromkeys to silently ignore duplicates, as a convenience method for Counter(dict.fromkeys(keys, 0) but its not (in my opinion) a *compelling* argument. I think it comes down to the taste of the designer. You can always subclass it. Or even monkey-patch it. py> def fromkeys(cls, seq, value=0): ... c = cls() ... for key in seq: ... c[key] = value ... return c ... py> from collections import Counter py> Counter.fromkeys = classmethod(fromkeys) py> Counter.fromkeys(['a', 'b', 'ab', 'a', 'b', 'c']) Counter({'a': 0, 'ab': 0, 'b': 0, 'c': 0}) (Subclassing is safer :-) -- Steve

[Steven D'Aprano]
Yes, I've discussed this, but since my replies have been miss addressed, it may have gotten lost. I'll quote it below: [Abe Dillon]
[Steven D'Aprano]
So what you're really asking for is a convenience method to bypass the need to create a temporary dict first
I'm not asking for anything all that new. Just that the existing .fromkeys inherited from dict not be disabled. [Steven D'Aprano]
Presumably the initial value will default to 0 rather than None, and take any integer value.
Yes. I think that would make the most sense. 0 or 1. As long as it's documented it doesn't matter to me. [Steven D'Aprano]
It should do exactly what dict.fromkeys does (except with a numeric default): ignore duplicates [Steven D'Aprano]
That's fine. I don't think that's confusing. [Steven D'Aprano]
How about a fourth constructor? A fifth? A fiftith? How many constructors is too many before the class becomes unwieldy?
I think this is a little overboard on the slippery-slope, no? I'm asking for a constructor that already exists, but was deliberately disabled. As far as I can tell, the only people pointing out that others will complain are playing devil's advocate. I can't tell if there are any people that actually believe that Counter.fromkeys should have a multiplier effect. I wouldn't expect the campaign for the third type of constructor to get very far. Especially if Counter multiplication gets accepted. [Tim]
I think the missing bit here is that there weren't any "constructor wars" for Counter...
15 years later you're jumping up & down about Counter.fromkeys() not being
there, and that's why nobody much cares ;-)
I haven't been part of the conversation for 15 years, but most of the argument against the idea (yours especially) seem to focus on the prospect of a constructor war and imply that was the original motivation behind actively disabling the fromkeys method in Counters. I don't mean to give the impression that I'm fanatical about this. It really is a minor inconvenience. It doesn't irk me nearly as much as other minor things, like that the fact that all the functions in the heapq package begin with the redundant word 'heap'. [Tim]
Raymond may have a different judgment about that, though. I don't believe he reads python-ideas anymore
He actually did reply a few comments back! I think I'm having more fun chatting with people that I deeply respect than "jumping up and down". I'm sorry if I'm coming off as an asshole. We can kill this thread if everyone thinks I'm wasting their time. It doesn't look like anyone else shares my minor annoyance. Thanks for indulging me! On Fri, Jun 29, 2018 at 9:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:

[Abe Dillon]
I quoted the source code verbatim - its comment said fromkeys() didn't make sense for Counters. From which it's an easy inference that it makes more than one _kind_ of sense, hence "constructor wars". Not that it matters. Giving some of the history was more a matter of giving a plausible reason for why you weren't getting all that much feedback: it's quite possible that most readers of this list didn't even remember that `dict.fromkeys()` is a thing.
You have to blame Guido for that one, which is even more futile than arguing with Raymond ;-) It never much bothered me, but I do recall doing this once: from heapq import heappush as push, heappop as pop # etc
He actually did reply a few comments back!
Ya, I saw that! He's always trying to make me look bad ;-)
I think I'm having more fun chatting with people that I deeply respect than "jumping up and down". I'm sorry if I'm coming off as an asshole.
Not at all! I've enjoyed your messages. They have tended to more on the side of forceful advocacy than questioning, though, which may grate after a few more years. As to my "jumping up and down", I do a lot of leg-pulling. I'm old. It's not meant to offend, but I'm too old to care if it does :-)
Raymond's reply didn't leave any hope for adding Counter.fromkeys(), so in the absence of a killer argument that hasn't yet been made, ya, it would be prudent to move on. Unless people want to keep talking about it, knowing that Raymond won't buy it in the end. Decisions, decisions ;-)

[ Note to repliers to Abe (and others recently): replies to Google Groups posts are broken for now on this list, so be sure to replace python-ideas@googlegroups.com with python-ideas@python.org in your reply. Else the mailing list (neither Google Groups nor the python.org archive) won't get it. ] [Tim]
So, e.g., someone will be unpleasantly surprised no matter what
[Abe Dillon] that. I think the missing bit here is that there weren't any "constructor wars" for Counter. In all this time, I don't believe I've heard anyone say they wanted a Counter.fromkeys() before. For that matter, I'd bet a dollar that most Python programmers don't know that dict.fromkeys() exists, despite that it was added in Python 2.3. As I recall, the primary motivation for adding dict.fromkeys() was to make using dicts to mimic sets a little easier, by providing a constructor that threw away duplicates and didn't really care about the values (so no value was required, and nobody cared that it defaulted to the _seemingly_ insane `None` - using `None` values for sets-implemented-as-dicts was a de facto informal standard at the time). But one release later (2.4) a set type was added too, so the primary motivation for fromkeys() went away. 15 years later you're jumping up & down about Counter.fromkeys() not being there, and that's why nobody much cares ;-)
And succeeding! I can't be sucked into it :-) FWIW, fine by me if Counter.fromkeys() is added, doing exactly what you want. Raymond may have a different judgment about that, though. I don't believe he reads python-ideas anymore, so opening an enhancement request on bugs.python.org is the way to get his attention.

On Jun 29, 2018, at 5:32 PM, Abe Dillon <abedillon@gmail.com> wrote:
Sure, but in Hettinger's own words "whenever you have a constructor war, everyone should get their wish". People that want a counting constructor have that, people that want the ability to initialize values don't have that.
Sorry Abe, but you're twisting my words and pushing very hard for a proposal that doesn't make sense and isn't necessary. * Counts initialized to zero: This isn't necessary. The whole point of counters is that counts default to zero without pre-initialization. * Counts initialized to one: This is already done by the regular constructor. Use "Counter(keys)" if the keys are known to be unique and "Counter(set(keys)" to ignore duplicates. >>> Counter('abc') Counter({'a': 1, 'b': 1, 'c': 1}) >>> Counter(set('abbacac')) Counter({'a': 1, 'b': 1, 'c': 1}) * Counts initialized to some other value: That would be an unusual thing to do but would be easy with the current API. >>> Counter(dict.fromkeys('abc', 21)) Counter({'a': 21, 'b': 21, 'c': 21}) * Note, the reason that fromkeys() is disabled is that it has nonsensical or surprising interpretations: >>> Counter.fromkeys('aaabbc', 2) # What should this do that doesn't surprise at least some users? * That reason is already shown in the source code. @classmethod def fromkeys(cls, iterable, v=None): # There is no equivalent method for counters because setting v=1 # means that no element can have a count greater than one. raise NotImplementedError( 'Counter.fromkeys() is undefined. Use Counter(iterable) instead.')
Obviously, Python breaks SOLID principals successfully all over the place for pragmatic reasons. I don't think this is one of those cases.
No amount of citing generic design principles will justify adding an API that doesn't make sense. Besides, any possible use cases already have reasonable solutions using the existing API. That is likely why no one has ever requested this behavior before. Based on what I've read in this thread, I see nothing that would change the long-standing decision not to have a fromkeys() method for collections.Counter. The original reasoning still holds. Raymond

[Tim Peters]
a fundamental use case for Counters is to tally the _number_ of times duplicate keys appear.
Yes, that's why the default constructor already does just that. [Tim Peters]
So, e.g., someone will be unpleasantly surprised no matter what
Sure, but in Hettinger's own words <https://www.youtube.com/watch?v=HTLu2DFOdTg&t=24m46s> "whenever you have a constructor war, everyone should get their wish". People that want a counting constructor have that, people that want the ability to initialize values don't have that. [Tim Peters]
I'm tempted to indulge in the meta argument which you're obviously striving to avoid, but I will say this: "that's what a Counter _always_ does" makes no sense. It's *almost* tantamount to saying that all constructors have to do exactly the same thing, which makes multiple constructors useless. Technically, there is no constructor for counting by X, but if enough people really wanted that, I suppose a third constructor would be in order.

On Fri, Jun 29, 2018 at 05:32:54PM -0700, Abe Dillon wrote:
*scratches head* I can initalise a Counter just fine. py> Counter({'a': 0, 'b': 0, 'ab': 2}) Counter({'ab': 2, 'a': 0, 'b': 0}) The supported API for setting initial values of a counter is to either count the supplied keys: Counter(['a', 'b', 'ab']) or supply initial counts in a dict: Counter({'a': 0, 'b': 0, 'ab': 2}) In the case where all the inital counts are zero, the obvious API is to call the dict fromkeys method: Counter(dict.fromkeys(['a', 'b', 'ab'], 0)) So what you're really asking for is a convenience method to bypass the need to create a temporary dict first: Counter.fromkeys(['a', 'b', 'ab'], 0) Presumably the initial value will default to 0 rather than None, and take any integer value. I'm sympathetic to the idea of this as a convenience, but I don't think its an obvious feature to have. Tim's point about duplicate keys is valid. Should it raise an exception, silently swallow duplicates, or count them? The dict constructors, both the standard dict() and dict.fromkeys(), silently swallow duplicates. As they should. But Counter() does not, and should not. There's a discrepency if Counter() doesn't and Counter.fromkeys() does, and it requires a value judgement to decide whether that discrepency is sufficiently unimportant. [...]
Technically, there is no constructor for counting by X, but if enough people really wanted that, I suppose a third constructor would be in order.
How about a fourth constructor? A fifth? A fiftith? How many constructors is too many before the class becomes unwieldy? Not every way you might count with a counter needs to be a constructor method. You can always just count: c = Counter() for key in keys: c[key] += X I think you make a *reasonable* case for Counter.fromkeys to silently ignore duplicates, as a convenience method for Counter(dict.fromkeys(keys, 0) but its not (in my opinion) a *compelling* argument. I think it comes down to the taste of the designer. You can always subclass it. Or even monkey-patch it. py> def fromkeys(cls, seq, value=0): ... c = cls() ... for key in seq: ... c[key] = value ... return c ... py> from collections import Counter py> Counter.fromkeys = classmethod(fromkeys) py> Counter.fromkeys(['a', 'b', 'ab', 'a', 'b', 'c']) Counter({'a': 0, 'ab': 0, 'b': 0, 'c': 0}) (Subclassing is safer :-) -- Steve

[Steven D'Aprano]
Yes, I've discussed this, but since my replies have been miss addressed, it may have gotten lost. I'll quote it below: [Abe Dillon]
[Steven D'Aprano]
So what you're really asking for is a convenience method to bypass the need to create a temporary dict first
I'm not asking for anything all that new. Just that the existing .fromkeys inherited from dict not be disabled. [Steven D'Aprano]
Presumably the initial value will default to 0 rather than None, and take any integer value.
Yes. I think that would make the most sense. 0 or 1. As long as it's documented it doesn't matter to me. [Steven D'Aprano]
It should do exactly what dict.fromkeys does (except with a numeric default): ignore duplicates [Steven D'Aprano]
That's fine. I don't think that's confusing. [Steven D'Aprano]
How about a fourth constructor? A fifth? A fiftith? How many constructors is too many before the class becomes unwieldy?
I think this is a little overboard on the slippery-slope, no? I'm asking for a constructor that already exists, but was deliberately disabled. As far as I can tell, the only people pointing out that others will complain are playing devil's advocate. I can't tell if there are any people that actually believe that Counter.fromkeys should have a multiplier effect. I wouldn't expect the campaign for the third type of constructor to get very far. Especially if Counter multiplication gets accepted. [Tim]
I think the missing bit here is that there weren't any "constructor wars" for Counter...
15 years later you're jumping up & down about Counter.fromkeys() not being
there, and that's why nobody much cares ;-)
I haven't been part of the conversation for 15 years, but most of the argument against the idea (yours especially) seem to focus on the prospect of a constructor war and imply that was the original motivation behind actively disabling the fromkeys method in Counters. I don't mean to give the impression that I'm fanatical about this. It really is a minor inconvenience. It doesn't irk me nearly as much as other minor things, like that the fact that all the functions in the heapq package begin with the redundant word 'heap'. [Tim]
Raymond may have a different judgment about that, though. I don't believe he reads python-ideas anymore
He actually did reply a few comments back! I think I'm having more fun chatting with people that I deeply respect than "jumping up and down". I'm sorry if I'm coming off as an asshole. We can kill this thread if everyone thinks I'm wasting their time. It doesn't look like anyone else shares my minor annoyance. Thanks for indulging me! On Fri, Jun 29, 2018 at 9:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:

[Abe Dillon]
I quoted the source code verbatim - its comment said fromkeys() didn't make sense for Counters. From which it's an easy inference that it makes more than one _kind_ of sense, hence "constructor wars". Not that it matters. Giving some of the history was more a matter of giving a plausible reason for why you weren't getting all that much feedback: it's quite possible that most readers of this list didn't even remember that `dict.fromkeys()` is a thing.
You have to blame Guido for that one, which is even more futile than arguing with Raymond ;-) It never much bothered me, but I do recall doing this once: from heapq import heappush as push, heappop as pop # etc
He actually did reply a few comments back!
Ya, I saw that! He's always trying to make me look bad ;-)
I think I'm having more fun chatting with people that I deeply respect than "jumping up and down". I'm sorry if I'm coming off as an asshole.
Not at all! I've enjoyed your messages. They have tended to more on the side of forceful advocacy than questioning, though, which may grate after a few more years. As to my "jumping up and down", I do a lot of leg-pulling. I'm old. It's not meant to offend, but I'm too old to care if it does :-)
Raymond's reply didn't leave any hope for adding Counter.fromkeys(), so in the absence of a killer argument that hasn't yet been made, ya, it would be prudent to move on. Unless people want to keep talking about it, knowing that Raymond won't buy it in the end. Decisions, decisions ;-)

[ Note to repliers to Abe (and others recently): replies to Google Groups posts are broken for now on this list, so be sure to replace python-ideas@googlegroups.com with python-ideas@python.org in your reply. Else the mailing list (neither Google Groups nor the python.org archive) won't get it. ] [Tim]
So, e.g., someone will be unpleasantly surprised no matter what
[Abe Dillon] that. I think the missing bit here is that there weren't any "constructor wars" for Counter. In all this time, I don't believe I've heard anyone say they wanted a Counter.fromkeys() before. For that matter, I'd bet a dollar that most Python programmers don't know that dict.fromkeys() exists, despite that it was added in Python 2.3. As I recall, the primary motivation for adding dict.fromkeys() was to make using dicts to mimic sets a little easier, by providing a constructor that threw away duplicates and didn't really care about the values (so no value was required, and nobody cared that it defaulted to the _seemingly_ insane `None` - using `None` values for sets-implemented-as-dicts was a de facto informal standard at the time). But one release later (2.4) a set type was added too, so the primary motivation for fromkeys() went away. 15 years later you're jumping up & down about Counter.fromkeys() not being there, and that's why nobody much cares ;-)
And succeeding! I can't be sucked into it :-) FWIW, fine by me if Counter.fromkeys() is added, doing exactly what you want. Raymond may have a different judgment about that, though. I don't believe he reads python-ideas anymore, so opening an enhancement request on bugs.python.org is the way to get his attention.

On Jun 29, 2018, at 5:32 PM, Abe Dillon <abedillon@gmail.com> wrote:
Sure, but in Hettinger's own words "whenever you have a constructor war, everyone should get their wish". People that want a counting constructor have that, people that want the ability to initialize values don't have that.
Sorry Abe, but you're twisting my words and pushing very hard for a proposal that doesn't make sense and isn't necessary. * Counts initialized to zero: This isn't necessary. The whole point of counters is that counts default to zero without pre-initialization. * Counts initialized to one: This is already done by the regular constructor. Use "Counter(keys)" if the keys are known to be unique and "Counter(set(keys)" to ignore duplicates. >>> Counter('abc') Counter({'a': 1, 'b': 1, 'c': 1}) >>> Counter(set('abbacac')) Counter({'a': 1, 'b': 1, 'c': 1}) * Counts initialized to some other value: That would be an unusual thing to do but would be easy with the current API. >>> Counter(dict.fromkeys('abc', 21)) Counter({'a': 21, 'b': 21, 'c': 21}) * Note, the reason that fromkeys() is disabled is that it has nonsensical or surprising interpretations: >>> Counter.fromkeys('aaabbc', 2) # What should this do that doesn't surprise at least some users? * That reason is already shown in the source code. @classmethod def fromkeys(cls, iterable, v=None): # There is no equivalent method for counters because setting v=1 # means that no element can have a count greater than one. raise NotImplementedError( 'Counter.fromkeys() is undefined. Use Counter(iterable) instead.')
Obviously, Python breaks SOLID principals successfully all over the place for pragmatic reasons. I don't think this is one of those cases.
No amount of citing generic design principles will justify adding an API that doesn't make sense. Besides, any possible use cases already have reasonable solutions using the existing API. That is likely why no one has ever requested this behavior before. Based on what I've read in this thread, I see nothing that would change the long-standing decision not to have a fromkeys() method for collections.Counter. The original reasoning still holds. Raymond
participants (4)
-
Abe Dillon
-
Raymond Hettinger
-
Steven D'Aprano
-
Tim Peters