collections.Counter surprisingly slow
joshua at landau.ws
Mon Jul 29 13:49:53 CEST 2013
On 29 July 2013 07:25, Serhiy Storchaka <storchaka at gmail.com> wrote:
> 28.07.13 22:59, Roy Smith написав(ла):
> The input is an 8.8 Mbyte file containing about 570,000 lines (11,000
>> unique strings).
> Repeat you tests with totally unique lines.
Counter is about ½ the speed of defaultdict in that case (as opposed to ⅓).
> The full profiler dump is at the end of this message, but the gist of
>> it is:
> Profiler affects execution time. In particular it slowdown Counter
> implementation which uses more function calls. For real world measurement
> use different approach.
Doing some re-times, it seems that his originals for defaultdict, exception
and Counter were about right. I haven't timed the other.
> Why is count() [i.e. collections.Counter] so slow?
> Feel free to contribute a patch which fixes this "wart". Note that Counter
> shouldn't be slowdowned on mostly unique data.
I find it hard to agree that counter should be optimised for the
unique-data case, as surely it's much more oft used when there's a point to
Also, couldn't Counter just extend from defaultdict?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list