collections.Counter surprisingly slow
Roy Smith
roy at panix.com
Sun Jul 28 15:59:04 EDT 2013
I've been doing an informal "intro to Python" lunchtime series for some
co-workers (who are all experienced programmers, in other languages).
This week I was going to cover list comprehensions, exceptions, and
profiling. So, I did a little demo showing different ways to build a
dictionary counting how many times a string appears in some input:
test() implements a "look before you leap" python loop
exception() uses a try/catch construct in a similar python loop
default() uses a defaultdict
count() uses a Counter
I profiled it, to show how the profiler works. The full code is below.
The input is an 8.8 Mbyte file containing about 570,000 lines (11,000
unique strings). Python 2.7.3 on Ubuntu Precise.
As I expected, test() is slower than exception(), which is slower than
default(). I'm rather shocked to discover that count() is the slowest
of all! I expected it to be the fastest. Or, certainly, no slower
than default().
The full profiler dump is at the end of this message, but the gist of
it is:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.322 0.322 ./stations.py:42(count)
1 0.159 0.159 0.159 0.159 ./stations.py:17(test)
1 0.114 0.114 0.114 0.114 ./stations.py:27(exception)
1 0.097 0.097 0.097 0.097 ./stations.py:36(default)
Why is count() [i.e. collections.Counter] so slow?
-------------------------------------------------------------
from collections import defaultdict, Counter
def main():
lines = open('stations.txt').readlines()
d1 = test(lines)
d2 = exception(lines)
d3 = default(lines)
d4 = count(lines)
print d1 == d2
print d1 == d3
print d1 == d4
def test(lines):
d = {}
for station in lines:
if station in d:
d[station] += 1
else:
d[station] = 1
return d
def exception(lines):
d = {}
for station in lines:
try:
d[station] += 1
except KeyError:
d[station] = 1
return d
def default(lines):
d = defaultdict(int)
for station in lines:
d[station] += 1
return d
def count(lines):
d = Counter(lines)
return d
if __name__ == '__main__':
import cProfile
import pstats
cProfile.run('main()', 'stations.stats')
p = pstats.Stats('stations.stats')
p.sort_stats('cumulative').print_stats()
-------------------------------------------------------------
570335 function calls (570332 primitive calls) in 0.776 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.023 0.023 0.776 0.776 <string>:1(<module>)
1 0.006 0.006 0.753 0.753 ./stations.py:5(main)
1 0.000 0.000 0.322 0.322 ./stations.py:42(count)
1 0.000 0.000 0.322 0.322 /usr/lib/python2.7/collections.py:407(__init__)
1 0.242 0.242 0.322 0.322 /usr/lib/python2.7/collections.py:470(update)
1 0.159 0.159 0.159 0.159 ./stations.py:17(test)
1 0.114 0.114 0.114 0.114 ./stations.py:27(exception)
1 0.097 0.097 0.097 0.097 ./stations.py:36(default)
570285 0.080 0.000 0.080 0.000 {method 'get' of 'dict' objects}
1 0.055 0.055 0.055 0.055 {method 'readlines' of 'file' objects}
1 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:128(__instancecheck__)
2/1 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:148(__subclasscheck__)
3/1 0.000 0.000 0.000 0.000 {issubclass}
4 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:58(__iter__)
1 0.000 0.000 0.000 0.000 {open}
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:26(__exit__)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:20(__enter__)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:36(__init__)
3 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:68(__contains__)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:52(_commit_removals)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:81(add)
3 0.000 0.000 0.000 0.000 {getattr}
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:16(__init__)
2 0.000 0.000 0.000 0.000 {method 'remove' of 'set' objects}
2 0.000 0.000 0.000 0.000 {method '__subclasses__' of 'type' objects}
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_abcoll.py:97(__subclasshook__)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
4 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
More information about the Python-list
mailing list