what happens when the file begin read is too big for all lines to be read with "readlines()"

Sun Nov 20 05:49:42 EST 2005

bonono at gmail.com wrote:

>Xiao Jianfeng wrote:
>  
>
>>  First, I must say thanks to all of you. And I'm really sorry that I
>>didn't
>>  describe my problem clearly.
>>
>>  There are many tokens in the file, every time I find a token, I have
>>to get
>>  the data on the next line and do some operation with it. It should be easy
>>  for me to find just one token using the above method, but there are
>>more than
>>  one.
>>
>>  My method was:
>>
>>  f_in = open('input_file', 'r')
>>  data_all = f_in.readlines()
>>  f_in.close()
>>
>>  for i in range(len(data_all)):
>>      line = data[i]
>>      if token in line:
>>          # do something with data[i + 1]
>>
>>  Since my method needs to read all the file into memeory, I think it
>>may be not
>>  efficient when processing very big file.
>>
>>  I really appreciate all suggestions! Thanks again.
>>
>>    
>>
>something like this :
>
>for x in fh:
>  if not has_token(x): continue
>  else: process(fh.next())
>
>you can also create an iterator by iter(fh), but I don't think that is
>necessary
>
>using the "side effect" to your advantage. I was bite before for the
>iterator's side effect but for your particular apps, it becomes an
>advantage.
>  
>
  Thanks all of you!

  I have compared the two methods,
  (1). "for x in fh:" 
  (2). read all the file into memory firstly.

  I have tested the two methods on two files, one is 80M and the second 
one is 815M.
  The first method gained a speedup of about 40% for the first file, and 
a speedup
  of about 25% for the second file.

  Sorry for my bad English, and I hope I haven't made people confused.

  Regards,

  xiaojf