collections.Counter surprisingly slow
stefan_ml at behnel.de
Mon Jul 29 13:46:58 CEST 2013
Steven D'Aprano, 28.07.2013 22:51:
> Calling Counter ends up calling essentially this code:
> for elem in iterable:
> self[elem] = self.get(elem, 0) + 1
> (although micro-optimized), where "iterable" is your data (lines).
> Calling the get method has higher overhead than dict[key], that will also
It comes with a C accelerator (at least in Py3.4dev), but it seems like
that stumbles a bit over its own feet. The accelerator function special
cases the (exact) dict type, but the Counter class is a subtype of dict and
thus takes the generic path, which makes it benefit a bit less than possible.
Look for _count_elements() in
Nevertheless, even the generic C code path looks fast enough in general. I
think the problem is just that the OP used Python 2.7, which doesn't have
this accelerator function.
More information about the Python-list