[Python-ideas] Support floating-point values in collections.Counter

Ned Batchelder ned at nedbatchelder.com
Wed Dec 20 16:47:18 EST 2017


On 12/20/17 5:05 AM, Paul Moore wrote:
> On 20 December 2017 at 03:09, Joel Croteau <jcroteau at gmail.com> wrote:
>> Well here is some code I wrote recently to build a histogram over a weighted
>> graph, before becoming aware that Counter existed (score is a float here):
>>
>> from collections import defaultdict
>>
>> total_score_by_depth = defaultdict(float)
>> total_items_by_depth = defaultdict(int)
>> num_nodes_by_score = defaultdict(int)
>> num_nodes_by_log_score = defaultdict(int)
>> num_edges_by_score = defaultdict(int)
>> for state in iter_graph_components():
>>      try:
>>          # There is probably some overlap here
>>          ak = state['ak']
>>          _, c = ak.score_paths(max_depth=15)
>>          for edge in state['graph'].edges:
>>              num_edges_by_score[np.ceil(20.0 * edge.score) / 20.0] += 1
>>          for node in c.nodes:
>>              total_score_by_depth[node.depth] += node.score
>>              total_items_by_depth[node.depth] += 1
>>              num_nodes_by_score[np.ceil(20.0 * node.score) / 20.0] += 1
>>              num_nodes_by_log_score[np.ceil(-np.log10(node.score))] += 1
>>          num_nodes_by_score[0.0] += len(state['graph'].nodes) - len(c.nodes)
>>          num_nodes_by_log_score[100.0] += len(state['graph'].nodes) -
>> len(c.nodes)
>>      except MemoryError:
>>          print("Skipped massive.")
>>
>> Without going too much into what this does, note that I could replace the
>> other defaultdicts with Counters, but I can't do the same thing with a
>> total_score_by_depth, at least not without violating the API.
> Hmm, OK. I can't see any huge benefit from switching to a Counter,
> though. You're not using any features of a Counter that aren't shared
> by a defaultdict, nor is there any code here that could be simplified
> or replaced by using such features...
>
>> I would
>> suggest that with a name like Counter, treating a class like a Counter
>> should be the more common use case. If it's meant to be a multiset, we
>> should call it a Multiset.
> Personally, I consider "counting" to be something we do with integers
> (whole numbers), not with floats. So for me the name Counter clearly
> implies an integer. Multiset would be a reasonable alternative name,
> but Python has a tradition of using "natural language" names over
> "computer science" names, so I'm not surprised Counter was chosen
> instead.
>
> I guess it's ultimately a matter of opinion whether a float-based
> Counter is a natural extension or not.
>
>
One thing to note is that Counter supports negative numbers, so we are 
already outside the natural numbers :)

     Python 3.6.4 (default, Dec 19 2017, 08:11:42)
     >>> from collections import Counter
     >>> c = Counter(a=4, b=2, c=0, d=-2)
     >>> d = Counter(a=1, b=2, c=3, d=4)
     >>> c.subtract(d)
     >>> c
     Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})
     >>> list(c.elements())
     ['a', 'a', 'a']

--Ned.


More information about the Python-ideas mailing list