[Python-ideas] Support floating-point values in collections.Counter

Joel Croteau jcroteau at gmail.com
Tue Dec 19 22:09:07 EST 2017

Well here is some code I wrote recently to build a histogram over a
weighted graph, before becoming aware that Counter existed (score is a
float here):

from collections import defaultdict

total_score_by_depth = defaultdict(float)
total_items_by_depth = defaultdict(int)
num_nodes_by_score = defaultdict(int)
num_nodes_by_log_score = defaultdict(int)
num_edges_by_score = defaultdict(int)
for state in iter_graph_components():
        # There is probably some overlap here
        ak = state['ak']
        _, c = ak.score_paths(max_depth=15)
        for edge in state['graph'].edges:
            num_edges_by_score[np.ceil(20.0 * edge.score) / 20.0] += 1
        for node in c.nodes:
            total_score_by_depth[node.depth] += node.score
            total_items_by_depth[node.depth] += 1
            num_nodes_by_score[np.ceil(20.0 * node.score) / 20.0] += 1
            num_nodes_by_log_score[np.ceil(-np.log10(node.score))] += 1
        num_nodes_by_score[0.0] += len(state['graph'].nodes) - len(c.nodes)
        num_nodes_by_log_score[100.0] += len(state['graph'].nodes) -
    except MemoryError:
        print("Skipped massive.")

Without going too much into what this does, note that I could replace the
other defaultdicts with Counters, but I can't do the same thing with a
total_score_by_depth, at least not without violating the API. I would
suggest that with a name like Counter, treating a class like a Counter
should be the more common use case. If it's meant to be a multiset, we
should call it a Multiset. Here is an example from Stack Overflow of
someone else also wanting a float counter, and the only suggestion being to
use defaultdict:


On Tue, Dec 19, 2017 at 3:08 AM Paul Moore <p.f.moore at gmail.com> wrote:

> On 18 December 2017 at 23:51, Joel Croteau <jcroteau at gmail.com> wrote:
> > It would be useful in many scenarios for values in collections.Counter
> to be
> > allowed to be floating point.
> Do you have any evidence of this? Code examples that would be
> significantly improved by such a change?  I can't think of any myself.
> I might consider writing
>     totals - defaultdict(float)
>     for ...:
>         totals[something] = calculation(something)
> but using a counter is neither noticeably easier, nor clearer...
> One way of demonstrating such a need would be if your proposed
> behaviour were available on PyPI and getting used a lot - I'm not
> aware of any such module if it is.
> Paul
