python disk i/o speed

Martin Franklin mfranklin1 at gatwick.westerngeco.slb.com
Wed Aug 7 11:52:13 EDT 2002


On Wednesday 07 Aug 2002 2:40 pm, Jeff Epler wrote:
> On Wed, Aug 07, 2002 at 07:21:28AM -0700, nnes wrote:
> > I generated a file about 7MB long, with 3 numbers on each line. Then I
> > wrote a programm in python, java and ANSI C, generating a second file
> > based on the first one, with 4 numbers; the original 3 plus the sum of
> > these.
> > e.g. "2","5","1" ----> "2","5","1","8"
>
> [...]
>
> > I wondered about the reason of almost 10 times the difference from c
> > to python since the programms should be mostly I/O bound and not CPU
> > bound. Is there also a way of improving the speed for python in this
> > situation? If sombody wants to make comments on the c and the java
> > code it would be ok also, since I am not an expert programmer.
>
> On any modern machine, reading a 7MB file a second time will not be "I/O
> bound", because it will be in cache, and should be read at nearly the
> speed of memcpy(), if not mmap().
>
> BTW, here's my attempt at a Python program.  Not having your programs, I
> can't compare performance:
>
> import sys, re
>
> pat = re.compile('"([\d]+)","([\d]+)","([\d]+)"')
> for line in sys.stdin:
>     match = pat.match(line)
> #   if not match:
> #       sys.stdout.write(line)
>     a, b, c = map(int, match.group(1, 2, 3))
>     sys.stdout.write('"%s","%s","%s","%s"\n' % (a,b,c, a+b+c))
>
> Remember that you can shave another ~5% off of Python runtime by using
> 'python -O'.  Also, you could attempt to measure the startup time, which
> is likely to be smaller for C, and larger for Python and Java.
>
> Jeff


And here is my python version:-



file=open('bigdata.dat', 'rt')
fout=open('bigdata.out', 'wt')

for line in file:
    a, b, c=map(int, line.split())
    d=a+b+c
    fout.write("%i %i %i %i\n" %(a, b, c, d))
fout.close()


And the results from my 'c' version:-

cc speedTest2.c 
speedTest2.c:19:1: warning: no newline at end of file
speedTest2.c: In function `main':
speedTest2.c:7: `FILE' undeclared (first use in this function)
speedTest2.c:7: (Each undeclared identifier is reported only once
speedTest2.c:7: for each function it appears in.)
speedTest2.c:7: `fin' undeclared (first use in this function)
speedTest2.c:8: `fout' undeclared (first use in this function)
[bpse at m-franklin scripts]$ cc speedTest2.c 
speedTest2.c:19:1: warning: no newline at end of file
speedTest2.c:2: parse error before `*'
speedTest2.c:2: warning: data definition has no type or storage class
speedTest2.c:3: parse error before `*'
speedTest2.c:3: warning: data definition has no type or storage class
speedTest2.c: In function `main':
speedTest2.c:11: warning: assignment makes pointer from integer without a cast
speedTest2.c:12: warning: assignment makes pointer from integer without a cast
[bpse at m-franklin scripts]$ cc speedTest2.c 
speedTest2.c:2: parse error before `*'
speedTest2.c:2: warning: data definition has no type or storage class
speedTest2.c:3: parse error before `*'
speedTest2.c:3: warning: data definition has no type or storage class
speedTest2.c: In function `main':
speedTest2.c:11: warning: assignment makes pointer from integer without a cast
speedTest2.c:12: warning: assignment makes pointer from integer without a cast


Yes thats right I cound not compile the 'c' version <wink>


Martin








More information about the Python-list mailing list