[Tutor] file question

Clay Shirky clay@shirky.com
Tue Aug 5 11:30:12 EDT 2003


> 
> Hi everyone,
> 
> Is there a way to pull a specific line from a file without reading the
> whole thing into memory with .readlines()?

One simplistic way to do that would be to loop through the file and ignore
every line you don't want, grab the one you want, and then break.

---

test_file = "spam_and_eggs.txt"

i = 0
print_line = 100000

f = open(test_file)

for line in f: # assumes 2.2 or higher
    if i == print_line:
        print line,
        break
    i += 1

---

spam_and_eggs.txt is an 8 meg file of ~150K lines. The for line in
f/pass/break method runs markedly faster for lines near the top of the file
(in my setup, better than 3 times faster when print_line = 10,000), while
for lines at the end of the file, the speed is about the same (though the
for line in f method still doesn't read the file into memory.)

If you have more than one line you want to get, make an array of print_line
numbers, and break after the last one.

If you are likely to want to get the same lines later, after an f.close(),
you may want to consider memoizing the lines you retrieve, saving them in a
dict with the line number as the key, then checking that before going
through the file again.

And of course, there may well be ways of manipulating the file object more
directly. The above is just a simple way to avoid the memory problem.

-clay







More information about the Tutor mailing list