collections.Counter surprisingly slow
stefan_ml at behnel.de
Tue Jul 30 08:39:00 CEST 2013
Serhiy Storchaka, 29.07.2013 21:37:
> 29.07.13 20:19, Ian Kelly написав(ла):
>> On Mon, Jul 29, 2013 at 5:49 AM, Joshua Landau wrote:
>>> Also, couldn't Counter just extend from defaultdict?
>> It could, but I expect the C helper function in 3.4 will be faster
>> since it doesn't even need to call __missing__ in the first place.
> I'm surprised, but the Counter constructor with commented out import of
> this accelerator is faster (at least for some data).
Read my post. The accelerator doesn't take the fast path for dicts as
Counter is only a subtype of dict, not exactly a dict. That means that it
raises and catches a KeyError exception for each new value that it finds,
and that is apparently more costly than the overhead of calling get().
So, my expectation is that it's faster for highly repetitive data and
slower for mostly unique data.
Maybe a "fast_dict_lookup" option for the accelerator that forces the fast
path would fix this. The Counter class, just like many (most?) other
subtypes of dict, definitely doesn't need the fallback behaviour.
More information about the Python-list