Match items in large list

Fisherking anders.mackey at gmail.com
Thu Feb 12 05:40:15 EST 2009


Hi,

I hope you can help me with this optimizing problem!
I have a large list of dictionaries (about 250 000 of them). Two or
more of these dictionaries contains the same data (not more than five
or six of them per item) e.g. [{'a':1,'b':'2','c':3} , {'d':
4,'e':'5','f':6},{'a':1,'b':'2','c':3} , {'d':4,'e':'5','f':6},...].
(the dictionaries are really containg phone numbers, duration time and
call time.)

Which are the best way of searching through the list and extract the
items that are the same. In the first run I want to find every item
that are the same as {'a':1,'b':'2','c':3}, the second {'d':
4,'e':'5','f':6} etc. The list are not read-only and I can pop them
out of the list when I have found them!

(How can I found items that are almost the same? e.g. {'a':
1,'b':'2','c':3.1} and {'a':1,'b':'2','c':3.2} )

In the second part of this program I need to take each of these items
(similar items are treated as one) and find these in a database that
contains 2.5M posts!

The python version I am bound to is Python 2.3

thanks for your help!



More information about the Python-list mailing list