Python too slow for real world

Skip Montanaro skip at mojam.com
Fri Apr 23 18:41:19 EDT 1999


Arne Mueller wrote:
> However the problem of reading/writing larges files line by
> line is the source of slowing down the whole process.
> 
> def rw(input, output):
>     while 1:
>         line = input.readline()
>         if not line: break
>         output.write(line)
> 
> f = open('very_large_file','r')
> rw(f, stdout)
> 
> The file I read in contains 2053927 lines and it takes 382 sec to
> read/write it where perl does it in 15 sec.

I saw a mention of using readlines with a buffer size to get the
benefits of large reads without requiring that you read the entire file
into memory at once.  Here's a concrete example.  I use this idiom
(while loop over readlines() and a nested for loop processing each line)
all the time for processing large files that I don't need to have in
memory all at once.

The input file, /tmp/words2, was generated from /usr/dict/words:

    sed -e 's/\(.*\)/\1 \1 \1 \1 \1/' < /usr/dict/words  > /tmp/words
    cat /tmp/words /tmp/words /tmp/words /tmp/words /tmp/words >
/tmp/words2

It's not as big as your input file (10.2MB, 227k lines), but still big
enough to measure differences.  The script below prints (on the second
of two runs to make sure the file is in memory)

    68.9596179724
    7.96663999557

suggesting about a 8x speedup between your original function and my
readlines version. It's still not going to be as fast as Perl, but it's
probably close enough that some other bottleneck will probably pop up
now...

import sys, time

def rw(input, output):
    while 1:
        line = input.readline()
        if not line: break
        output.write(line)

f = open('/tmp/words2','r')
devnull = open('/dev/null','w')

t = time.time()
rw(f, devnull)
print time.time() - t

def rw2(input, output):
    lines = input.readlines(100000)
    while lines:
        output.writelines(lines)
        lines = input.readlines(100000)

f = open('/tmp/words2','r')

t = time.time()
rw2(f, devnull)
print time.time() - t



Cheers,

-- 
Skip Montanaro	| Mojam: "Uniting the World of Music"
http://www.mojam.com/
skip at mojam.com  | Musi-Cal: http://www.musi-cal.com/
518-372-5583




More information about the Python-list mailing list