collections.Counter surprisingly slow
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Jul 28 16:51:11 EDT 2013
On Sun, 28 Jul 2013 15:59:04 -0400, Roy Smith wrote:
[...]
> I'm rather shocked to discover that count() is the slowest
> of all! I expected it to be the fastest. Or, certainly, no slower than
> default().
>
> The full profiler dump is at the end of this message, but the gist of it
> is:
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 0.000 0.000 0.322 0.322 ./stations.py:42(count)
> 1 0.159 0.159 0.159 0.159 ./stations.py:17(test)
> 1 0.114 0.114 0.114 0.114 ./stations.py:27(exception)
> 1 0.097 0.097 0.097 0.097 ./stations.py:36(default)
>
> Why is count() [i.e. collections.Counter] so slow?
It's within a factor of 2 of test, and 3 of exception or default (give or
take). I don't think that's surprisingly slow. In 2.7, Counter is written
in Python, while defaultdict has an accelerated C version. I expect that
has something to do with it.
Calling Counter ends up calling essentially this code:
for elem in iterable:
self[elem] = self.get(elem, 0) + 1
(although micro-optimized), where "iterable" is your data (lines).
Calling the get method has higher overhead than dict[key], that will also
contribute.
--
Steven
More information about the Python-list
mailing list