[New-bugs-announce] [issue6370] Bad performance of colllections.Counter at initialisation from an iterable

Mon Jun 29 16:56:28 CEST 2009

New submission from SilentGhost <michael.mischurow+bpo at gmail.com>:

I'm comparing initialisation of Counter from an iterable with the
following function:

def unique(seq):
	"""Dict of unique values (keys) & their counts in original sequence"""

	out_dict = dict.fromkeys(set(seq), 0)
	for i in seq:
		out_dict[i] += 1
	return out_dict

iterable = list(range(43)) + list(range(43, 0, -1))

The timeit-obtained values show that it takes Counter four (4) times
longer to finish. As it's obvious from comparing my function and lines
429-430 of collections.py the only difference is preallocating the final
dictionary. When line 430 of collections is replaced with:

self[elem] = self.get(elem, 0) + 1

I was able to get about 25% time-performance increase (I assume
__missing__ is bypassed). I hope that it's possible to improve its
implementation even further.

----------
components: Library (Lib)
messages: 89846
nosy: SilentGhost
severity: normal
status: open
title: Bad performance of colllections.Counter at initialisation from an iterable
type: performance
versions: Python 3.1

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6370>
_______________________________________