<div dir="ltr">Hi all,<br><br>I'm writing in Python for about 2 weeks (moved from Perl)<br>I've ported one of my modules which is a parser for a binary format (see link bellow for the format specs)<br><br><a href="http://etidweb.tamu.edu/cdrom0/image/stdf/spec.pdf" class="postlink">http://etidweb.tamu.edu/cdrom0/image/stdf/spec.pdf</a><br>


<br>In the first version of the parser I was reading exactly the amount of data I need to parse<br>For example 4 bytes per each header<br>The STDF files tend to be about 40+ MB size with 100K+ records, so I had at least 1 disk read per record, sometimes 2.<br>


<br>Obviously it's not an efficient way to do it.<br>I've created a buffer which reads 4K chunks per read and then the module parses the data.<br>If the buffer becomes less then 512B I read another chunk and so on.<br>


<br>Reducing 100K+ reads to around 15K reads should improve the performance.<br>For some reason it did not happen.<br>I've played with the chunk size, but nothing came out of it.<br>Is there a Python specific way to optimise reading from disk?<br>


<br>I'm using Python 2.5.2 with Ubuntu 8.10 32bit<br><br>Thanks in advance.<br><br></div>