finding most common elements between thousands of multiple arrays.

Sat Jul 4 03:33:52 EDT 2009

Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)

so I set up a dictionary to handle the counting so when I am
iterating  I up the count on the corrosponding dictionary element. I
then iterate through the dictionary and find the 25 most common
elements.

the elements are initially held in a array within an array. so I am am
just trying to find the most common elements between all the arrays
contained in one large array.
my current code looks something like this:
d = {}
for arr in my_array:
-----for i in arr:
#elements are numpy integers and thus are not accepted as dictionary
keys
-----------d[int(i)]=d.get(int(i),0)+1

then I filter things down. but with my algorithm that only takes about
1 sec so I dont need to show it here since that isnt the problem.

But there has to be something better. I have to do this many many
times and it seems silly to iterate through 2 million things just to
get 25. The element IDs are integers and are currently being held in
numpy arrays in a larger array. this ID is what makes up the key to
the dictionary.

 It currently takes about 5 seconds to accomplish this with my current
algorithm.

So does anyone know the best solution or algorithm? I think the trick
lies in matrix intersections but I do not know.