python disk i/o speed
jepler at unpythonic.net
Wed Aug 7 10:40:10 EDT 2002
On Wed, Aug 07, 2002 at 07:21:28AM -0700, nnes wrote:
> I generated a file about 7MB long, with 3 numbers on each line. Then I
> wrote a programm in python, java and ANSI C, generating a second file
> based on the first one, with 4 numbers; the original 3 plus the sum of
> e.g. "2","5","1" ----> "2","5","1","8"
> I wondered about the reason of almost 10 times the difference from c
> to python since the programms should be mostly I/O bound and not CPU
> bound. Is there also a way of improving the speed for python in this
> situation? If sombody wants to make comments on the c and the java
> code it would be ok also, since I am not an expert programmer.
On any modern machine, reading a 7MB file a second time will not be "I/O
bound", because it will be in cache, and should be read at nearly the
speed of memcpy(), if not mmap().
BTW, here's my attempt at a Python program. Not having your programs, I
can't compare performance:
import sys, re
pat = re.compile('"([\d]+)","([\d]+)","([\d]+)"')
for line in sys.stdin:
match = pat.match(line)
# if not match:
a, b, c = map(int, match.group(1, 2, 3))
sys.stdout.write('"%s","%s","%s","%s"\n' % (a,b,c, a+b+c))
Remember that you can shave another ~5% off of Python runtime by using
'python -O'. Also, you could attempt to measure the startup time, which
is likely to be smaller for C, and larger for Python and Java.
More information about the Python-list