A little advice please? (Convert my boss to Python)

Tue Apr 16 03:00:02 EDT 2002

Christophe Delord wrote:
        ...
>> >     U = P = 0
>> >     for value in dict.values():
>> >         if value == 1:
>> >             U += 1
>> >         elif value == 2:
>> >             P += 1
>> 
>> Again I think it might be faster to avoid the if/else with yet
>> another dictionary:
>> 
>> counts = {}
>> for count in adict.values():
>>     counts[count] = 1 + counts.get(count, 0)
>> 
>> U = counts.get(1, 0)
>> P = counts.get(2, 0)
>> 
>> >     return U*S/(U*S+P*(1-S))
>> 
> 
> As indices are integers (1 and 2), a simple array may be faster :

I don't see any indication that a count may not be whatever
positive integer it pleases.  The OP's code singles out 1
and 2, but why shouldn't a certain tuple turn up 3497 times?
So, I think in your code below:

> counts = [None, 0, 0]         # counts[0] not used
> for count in adict.values():
> counts[count] += 1
> 
> U = counts[1]
> P = counts[2]

you need to guard the increment with either an if count<3 or
(probably better if large counts are unusual) a try/except
IndexError.  Once you've done that you might see some
performance increase, yes.  I tend to think of dicts for
any typical "sparse array" need in Python, but that's because
in most cases where I'm histogramming I can't ignore counts
greater than some threshold -- in this special case where
ignoring is what I want, a list and some care may indeed
be faster (you gotta benchmark and profile it of course).

Alex