collections.Counter surprisingly slow

Joshua Landau joshua at landau.ws
Mon Jul 29 13:49:53 CEST 2013


On 29 July 2013 07:25, Serhiy Storchaka <storchaka at gmail.com> wrote:

> 28.07.13 22:59, Roy Smith написав(ла):
>
>    The input is an 8.8 Mbyte file containing about 570,000 lines (11,000
>> unique strings).
>>
>
> Repeat you tests with totally unique lines.


Counter is about ½ the speed of defaultdict in that case (as opposed to ⅓).


>  The full profiler dump is at the end of this message, but the gist of
>> it is:
>>
>
> Profiler affects execution time. In particular it slowdown Counter
> implementation which uses more function calls. For real world measurement
> use different approach.


Doing some re-times, it seems that his originals for defaultdict, exception
and Counter were about right. I haven't timed the other.


>  Why is count() [i.e. collections.Counter] so slow?
>>
>
> Feel free to contribute a patch which fixes this "wart". Note that Counter
> shouldn't be slowdowned on mostly unique data.


I find it hard to agree that counter should be optimised for the
unique-data case, as surely it's much more oft used when there's a point to
counting?

Also, couldn't Counter just extend from defaultdict?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130729/5ee9d686/attachment.html>


More information about the Python-list mailing list