[Tutor] Find (list) strings in large textfile

Alan Gauld alan.gauld at yahoo.co.uk
Thu Feb 9 20:24:02 EST 2017


On 09/02/17 19:15, Sylwester Graczyk wrote:
> Hi all,
> I'm try to write a code to search strings (from one textfile with lists 
> of strings) in second large text file.
> but script doesn't search my lists in entire file (not complete output file)

The problem is that you open the data file once,
before the loop, but you never reset the cursor so when
it reaches the end after the first iteration it never
reads any more data. You need to seek(0) at the start
of each loop. However...

Your approach looks very inefficient however.
You will read 500,000 lines 2000 times. That's
a lot of file access - about 1 billion reads!

It is probably better to store your key file in memory
then loop over the large data file and check the
line against each key. Better to check 2000 data
keys in memory for one loop of the data file.
That way you only read the key file and data file
once each - 502,000 reads instead of a billion.

Also instead of splitting the line you could
just use

if line_large.startswith(key)

If the length of the comparison is critical
use the optional positional arguments:

if line_large.startswith(key,start,stop)

That should save a small amount of time compared
to splitting and indexing both lines each time.


> *[list_life.txt]*
> 1654
> 964563
> ... +2000 row's
> 
> *[large_file.txt]
> *1654    2017/02/02
> 666445    2017/02/02
> 964563    2017/02/02
> ... + 500000 rows
> 
> *[code]*
> file_list = open("list_life.txt")
> file_large = open("large_file.txt")
> save_file = open('output.txt', 'w')
> 
> for line_list in file_list:
>      splitted_line_list = line_list.split()

       file_large.seek(0)    # reset the data file cursor here

>      for line_large in file_large:
>          splitted_line_large = line_large.split()
>          if splitted_line_large[0] == splitted_line_list[0]:
>              save_file.write(line_large+"\n")
> 
> file_large.close()
> file_list.close()

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list