<div id="mb_0">
<div>I am trying to write a python script that will compare 2 files which contains names (millions of them). </div>
<div> </div>
<div>More specifically, I have 2 files (Files1.txt and Files2.txt). Files1.txt contains 180 thousand names and Files2.txt contains 34 million names. </div>
<div> </div>
<div>I have a script which will analyze these two files and store them into 2 different lists (fileList1 and fileList2 respectivly). I have imported the diflib library and after the lists are created, matching on the following criteria " " for diflib -> (just the names that are similar between the two files).
</div>
<div> </div>
<div>This works perfectly for hundreds of names but is taking forever for millions of them; thus not really efficient.</div>
<div> </div>
<div>Does anyone have any idea on how to get this more efficient? (speaking of Time and RAM)</div>
<div> </div>
<div>Any advice would be greatly appreciated. (NOTE: I have been trying to study multithreading, but have not really grasp the concept. So I may need some examples.)</div>
<div> </div>
<div>~~~~~~~~~~~~~~</div>
<div>S.C.M.</div></div>