what happens when the file begin read is too big for all lines to be read with "readlines()"

Steven D'Aprano steve at REMOVETHIScyber.com.au
Sun Nov 20 00:10:58 EST 2005


On Sun, 20 Nov 2005 12:28:07 +0800, Xiao Jianfeng wrote:

>  Let me introduce my problem I came across last night first.
> 
>  I need to read a file(which may be small or very big) and to check line 
> by line
>  to find a specific token, then the data on the next line will be what I 
> want.
>  
>  If I use readlines(), it will be a problem when the file is too big.
> 
>  If I use "for line in OPENED_FILE:" to read one line each time, how can 
> I get
>  the next line when I find the specific token?

Here is one solution using a flag:

done = False
for line in file("myfile", "r"):
    if done:
        break
    done = line == "token\n"  # note the newline
# we expect Python to close the file when we exit the loop
if done:
    DoSomethingWith(line) # the line *after* the one with the token
else:
    print "Token not found!"


Here is another solution, without using a flag:

def get_line(filename, token):
    """Returns the next line following a token, or None if not found.
    Leading and trailing whitespace is ignored when looking for
    the token.
    """
    fp = file(filename, "r")
    for line in fp:
        if line.strip() == token:
            break
    else:
        # runs only if we didn't break
        print "Token not found"
        result = None
    result = fp.readline()  # read the next line only 
    fp.close() 
    return result


Here is a third solution that raises an exception instead of printing an
error message:

def get_line(filename, token):
    for line in file(filename, "r"):
        if line.strip() == token:
            break
    else:
        raise ValueError("Token not found")
    return fp.readline()
    # we rely on Python to close the file when we are done



>  And I think reading one line each time is less efficient, am I right?

Less efficient than what? Spending hours or days writing more complex code
that only saves you a few seconds, or even runs slower?

I believe Python will take advantage of your file system's buffering
capabilities. Try it and see, you'll be surprised how fast it runs. If you
try it and it is too slow, then come back and we'll see what can be done
to speed it up. But don't try to speed it up before you know if it is fast
enough.


-- 
Steven.




More information about the Python-list mailing list