[Python-Dev] counterintuitive behavior (bug?) in Counter with +=
Lars Buitinck
L.J.Buitinck at uva.nl
Mon Oct 3 12:12:47 CEST 2011
Hello,
[First off, I'm not a member of this list, so please Cc: me in a reply!]
I've found some counterintuitive behavior in collections.Counter while
hacking on the scikit-learn project [1]. I wanted to use a bunch of
Counters to do some simple term counting in a set of documents,
roughly as follows:
count_total = Counter()
for doc in documents:
count_current = Counter(analyze(doc))
count_total += count_current
count_per_doc.append(count_current)
Because we target Python 2.5+, I implemented a lightweight replacement
with just the functionality we need, including __iadd__, but then my
co-developer ran the above code on Python 2.7 and performance was
horrible. After some digging, I found out that Counter [2] does not
have __iadd__ and += copies the entire left-hand side in __add__!
I also figured out that I should use the update method instead, which
I will, but I still find that uglier than +=. I would submit a patch
to implement __iadd__, but I first want to know if that's considered
the right behavior, since it changes the semantics of +=:
>>> from collections import Counter
>>> a = Counter([1,2,3])
>>> b = a
>>> a += Counter([3,4,5])
>>> a is b
False
would become
# snip
>>> a is b
True
TIA,
Lars
[1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af
[2] http://hg.python.org/cpython/file/tip/Lib/collections/__init__.py#l399
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
More information about the Python-Dev
mailing list