[Python-ideas] Support floating-point values in collections.Counter

Paul Moore p.f.moore at gmail.com
Wed Dec 20 05:05:22 EST 2017


On 20 December 2017 at 03:09, Joel Croteau <jcroteau at gmail.com> wrote:
> Well here is some code I wrote recently to build a histogram over a weighted
> graph, before becoming aware that Counter existed (score is a float here):
>
> from collections import defaultdict
>
> total_score_by_depth = defaultdict(float)
> total_items_by_depth = defaultdict(int)
> num_nodes_by_score = defaultdict(int)
> num_nodes_by_log_score = defaultdict(int)
> num_edges_by_score = defaultdict(int)
> for state in iter_graph_components():
>     try:
>         # There is probably some overlap here
>         ak = state['ak']
>         _, c = ak.score_paths(max_depth=15)
>         for edge in state['graph'].edges:
>             num_edges_by_score[np.ceil(20.0 * edge.score) / 20.0] += 1
>         for node in c.nodes:
>             total_score_by_depth[node.depth] += node.score
>             total_items_by_depth[node.depth] += 1
>             num_nodes_by_score[np.ceil(20.0 * node.score) / 20.0] += 1
>             num_nodes_by_log_score[np.ceil(-np.log10(node.score))] += 1
>         num_nodes_by_score[0.0] += len(state['graph'].nodes) - len(c.nodes)
>         num_nodes_by_log_score[100.0] += len(state['graph'].nodes) -
> len(c.nodes)
>     except MemoryError:
>         print("Skipped massive.")
>
> Without going too much into what this does, note that I could replace the
> other defaultdicts with Counters, but I can't do the same thing with a
> total_score_by_depth, at least not without violating the API.

Hmm, OK. I can't see any huge benefit from switching to a Counter,
though. You're not using any features of a Counter that aren't shared
by a defaultdict, nor is there any code here that could be simplified
or replaced by using such features...

> I would
> suggest that with a name like Counter, treating a class like a Counter
> should be the more common use case. If it's meant to be a multiset, we
> should call it a Multiset.

Personally, I consider "counting" to be something we do with integers
(whole numbers), not with floats. So for me the name Counter clearly
implies an integer. Multiset would be a reasonable alternative name,
but Python has a tradition of using "natural language" names over
"computer science" names, so I'm not surprised Counter was chosen
instead.

I guess it's ultimately a matter of opinion whether a float-based
Counter is a natural extension or not.

Paul


More information about the Python-ideas mailing list