determining which value is the first to appear five times in a list?

Sat Feb 6 13:56:25 EST 2010

On 2/6/2010 1:24 PM, Chris Colbert wrote:
> I'm working on a naive K-nearest-neighbors selection criteria for an
> optical character recognition problem.
>
> After I build my training set, I test each new image against against the
> trained feature vectors and record the scores as follows:
>
> match_vals = [(match_val_1, identifier_a), (match_val_2, identifier_b)
> .... ] and so on..
>
> then I sort the list so the smallest match_val's appear first
> (indictating a strong match, so I may end up with something like this:
>
> [(match_val_291, identifier_b), (match_val_23, identifier_b),
> (match_val_22, identifer_k) .... ]
>
> Now, what I would like to do is step through this list and find the
> identifier which appears first a K number of times.
>
> Naively, I could make a dict and iterate through the list AND the dict
> at the same time and keep a tally, breaking when the criteria is met.
>
> such as:
>
> def getnn(match_vals):
>      tallies = defaultdict(lambda: 0)
>      for match_val, ident in match_vals:
>          tallies[ident] += 1
>          for ident, tally in tallies.iteritems():
>              if tally == 5:
>                  return ident
>
> I would think there is a better way to do this. Any ideas?

You only need to check that the incremented tally is 5, which is to say, 
that the about-to-be-incremented tally is 4.
	t = tallies[ident]
         if t < 4: tallies[ident] = t+1
         else: return ident