sickcodemonkey at gmail.com
Wed Jan 24 21:05:24 EST 2007
I am trying to write a python script that will compare 2 files which
contains names (millions of them).
More specifically, I have 2 files (Files1.txt and Files2.txt).
Files1.txtcontains 180 thousand names and
Files2.txt contains 34 million names.
I have a script which will analyze these two files and store them into 2
different lists (fileList1 and fileList2 respectivly). I have imported the
diflib library and after the lists are created, matching on the following
criteria " " for diflib -> (just the names that are similar between the two
This works perfectly for hundreds of names but is taking forever for
millions of them; thus not really efficient.
Does anyone have any idea on how to get this more efficient? (speaking of
Time and RAM)
Any advice would be greatly appreciated. (NOTE: I have been trying to
study multithreading, but have not really grasp the concept. So I may need
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list