finding most common elements between thousands of multiple arrays.

Raymond Hettinger python at rcn.com
Wed Jul 8 03:11:50 EDT 2009


[Scott David Daniels]
> def most_frequent(arr, N):
>      '''Return the top N (freq, val) elements in arr'''
>      counted = frequency(arr) # get an iterator for freq-val pairs
>      heap = []
>      # First, just fill up the array with the first N distinct
>      for i in range(N):
>          try:
>              heap.append(counted.next())
>          except StopIteration:
>              break # If we run out here, no need for a heap
>      else:
>          # more to go, switch to a min-heap, and replace the least
>          # element every time we find something better
>          heapq.heapify(heap)
>          for pair in counted:
>              if pair > heap[0]:
>                  heapq.heapreplace(heap, pair)
>      return sorted(heap, reverse=True) # put most frequent first.

In Py2.4 and later, see heapq.nlargest().
In Py3.1, see collections.Counter(data).most_common(n)


Raymond



More information about the Python-list mailing list