Improve module performance by reducing disk reads

Mon Mar 30 03:51:11 EDT 2009

Hi all,

I'm writing in Python for about 2 weeks (moved from Perl)
I've ported one of my modules which is a parser for a binary format (see
link bellow for the format specs)

http://etidweb.tamu.edu/cdrom0/image/stdf/spec.pdf

In the first version of the parser I was reading exactly the amount of data
I need to parse
For example 4 bytes per each header
The STDF files tend to be about 40+ MB size with 100K+ records, so I had at
least 1 disk read per record, sometimes 2.

Obviously it's not an efficient way to do it.
I've created a buffer which reads 4K chunks per read and then the module
parses the data.
If the buffer becomes less then 512B I read another chunk and so on.

Reducing 100K+ reads to around 15K reads should improve the performance.
For some reason it did not happen.
I've played with the chunk size, but nothing came out of it.
Is there a Python specific way to optimise reading from disk?

I'm using Python 2.5.2 with Ubuntu 8.10 32bit

Thanks in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090330/905d8e87/attachment.html>