finding most common elements between thousands of multiple arrays.
Raymond Hettinger
python at rcn.com
Wed Jul 8 03:11:50 EDT 2009
[Scott David Daniels]
> def most_frequent(arr, N):
> '''Return the top N (freq, val) elements in arr'''
> counted = frequency(arr) # get an iterator for freq-val pairs
> heap = []
> # First, just fill up the array with the first N distinct
> for i in range(N):
> try:
> heap.append(counted.next())
> except StopIteration:
> break # If we run out here, no need for a heap
> else:
> # more to go, switch to a min-heap, and replace the least
> # element every time we find something better
> heapq.heapify(heap)
> for pair in counted:
> if pair > heap[0]:
> heapq.heapreplace(heap, pair)
> return sorted(heap, reverse=True) # put most frequent first.
In Py2.4 and later, see heapq.nlargest().
In Py3.1, see collections.Counter(data).most_common(n)
Raymond
More information about the Python-list
mailing list