[Tutor] open/closing files and system limits
Michael P. Reilly
arcege@shore.net
Wed, 13 Sep 2000 17:39:54 -0400 (EDT)
>
> Hello all:
>
> I've got a small python script that rapidly opens a file, reads the lines
> and closes the file. This procedure is in a for loop.
>
> for file in catalogoffilestoprocess.readlines():
> currentfile = open(file[:-1]) # Only way I could figure to strip
> newline from filename
> for line in currentfile.readlines():
> if blah
> blah
> elif blah
> blah
> currentfile.close()
>
>
> Well that works alright except for this. My list of files contains about
> 1100 files to process. It takes an extraordinary amount of time to
> run. Watching it work, I can see that it rushes through a couple hundred,
> stops for several (i.e. 1-4) minutes, then continues.
>
> I believe it has something to do with the default file limits set in my
> kernel (Linux). I was thinking that the system wasn't keeping track of
> the fact that the files were closed? Somehow hitting that system ceiling?
>
> Also running "time ./script catalogoffiles"
> returns (look at the time elapsed! also just now noticed the pagefault
> and swap info, but am not familiar with time's output or what this
> indicates).
>
> 463.40user 8.89system 8:05.39elapsed 97%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (11017major+100694minor)pagefaults 1556swaps
>
I think this is in the FAQ somewhere as a performance problem. There
are a few solutions, but my "favorite" is:
for lines in catalog.readlines(8192): # get a "block" of lines
for line in lines:
file = open(line[:-1])
...
This reads a disk block, breaks it into lines (leaving the left over
for the next read), and returns those lines; then you can iterate
through that set of lines, until the outer loop returns no lines left.
You can think of the break down as:
Block 0 line 0
line 1
line 2
line 3
line 4
line 5
Block 1 line 6
line 7
long line 8
part of line 9
Block 2 rest of line 9 (returned in third call to catalog.readlines)
line 10
line 11
line 12
line 13
Block 3 line 14
I hope this helps.
-Arcege
--
------------------------------------------------------------------------
| Michael P. Reilly, Release Manager | Email: arcege@shore.net |
| Salem, Mass. USA 01970 | |
------------------------------------------------------------------------