A little advice please? (Convert my boss to Python)
Alex Martelli
aleax at aleax.it
Tue Apr 16 03:00:02 EDT 2002
Christophe Delord wrote:
...
>> > U = P = 0
>> > for value in dict.values():
>> > if value == 1:
>> > U += 1
>> > elif value == 2:
>> > P += 1
>>
>> Again I think it might be faster to avoid the if/else with yet
>> another dictionary:
>>
>> counts = {}
>> for count in adict.values():
>> counts[count] = 1 + counts.get(count, 0)
>>
>> U = counts.get(1, 0)
>> P = counts.get(2, 0)
>>
>> > return U*S/(U*S+P*(1-S))
>>
>
> As indices are integers (1 and 2), a simple array may be faster :
I don't see any indication that a count may not be whatever
positive integer it pleases. The OP's code singles out 1
and 2, but why shouldn't a certain tuple turn up 3497 times?
So, I think in your code below:
> counts = [None, 0, 0] # counts[0] not used
> for count in adict.values():
> counts[count] += 1
>
> U = counts[1]
> P = counts[2]
you need to guard the increment with either an if count<3 or
(probably better if large counts are unusual) a try/except
IndexError. Once you've done that you might see some
performance increase, yes. I tend to think of dicts for
any typical "sparse array" need in Python, but that's because
in most cases where I'm histogramming I can't ignore counts
greater than some threshold -- in this special case where
ignoring is what I want, a list and some care may indeed
be faster (you gotta benchmark and profile it of course).
Alex
More information about the Python-list
mailing list