what happens when the file begin read is too big for all lines to be read with "readlines()"

Steve Holden steve at holdenweb.com
Sun Nov 20 05:46:19 CET 2005


Xiao Jianfeng wrote:
> Steven D'Aprano wrote:
> 
> 
>>On Sun, 20 Nov 2005 11:05:53 +0800, Xiao Jianfeng wrote:
>>
>> 
>>
>>
>>>I have some other questions:
>>>
>>>when "fh" will be closed?
>>>   
>>>
>>
>>When all references to the file are no longer in scope:
>>
>>def handle_file(name):
>>   fp = file(name, "r")
>>   # reference to file now in scope
>>   do_stuff(fp)
>>   return fp
>>
>>
>>f = handle_file("myfile.txt)
>># reference to file is now in scope
>>f = None
>># reference to file is no longer in scope
>>
>>At this point, Python *may* close the file. CPython currently closes the
>>file as soon as all references are out of scope. JPython does not -- it
>>will close the file eventually, but you can't guarantee when.
>>
>> 
>>
>>
>>>And what shoud I do if I want to explicitly close the file immediately 
>>>after reading all data I want?
>>>   
>>>
>>
>>That is the best practice.
>>
>>f.close()
>>
>>
>> 
>>
> 
>  Let me introduce my problem I came across last night first.
> 
>  I need to read a file(which may be small or very big) and to check line 
> by line
>  to find a specific token, then the data on the next line will be what I 
> want.
>  
>  If I use readlines(), it will be a problem when the file is too big.
> 
>  If I use "for line in OPENED_FILE:" to read one line each time, how can 
> I get
>  the next line when I find the specific token?
>  And I think reading one line each time is less efficient, am I right?
> 
Not necessarily. Try this:

     f = file("filename.txt")
     for line in f:
         if token in line: # or whatever you need to identify it
             break
     else:
         sys.exit("File does not contain token")
     line = f.next()

Then line will be the one you want. Since this will use code written in 
C to do the processing you will probably be pleasantly surprised by its 
speed. Only if this isn't fast enough should you consider anything more 
complicated.

Premature optimizations can waste huge amounts of unnecessary 
programming time. Don't do it. First try measuring a solution that works!

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/




More information about the Python-list mailing list