Improve module performance by reducing disk reads
davea at ieee.org
Mon Mar 30 15:17:50 CEST 2009
Since the email contains no code, I can only assume you're using the
bult-in open() call, and file.read(). If you didn't specify a bufsize,
a "system default" is used. I seriously doubt if 0 is the default.
Since it's already buffered, additional buffering by you might have
little effect, or even negative effect. I'd suggest explicitly
specifying a buffer size in the open() call, starting with 4096, as a
good starting place. Then I'd do benchmarks with larger and smaller
values, to see what differences it might make.
The only time additional buffering tends to be useful is if you know the
file usage pattern, and it's predictable and not sequential. Even then,
it's good to know the underlying buffer's behavior, so that your
optimizations are not at cross purposes.
I'd expect your performance problems are elsewhere.
kian tern wrote:
> Hi all,
> I'm writing in Python for about 2 weeks (moved from Perl)
> I've ported one of my modules which is a parser for a binary format (see
> link bellow for the format specs)
> In the first version of the parser I was reading exactly the amount of data
> I need to parse
> For example 4 bytes per each header
> The STDF files tend to be about 40+ MB size with 100K+ records, so I had at
> least 1 disk read per record, sometimes 2.
> Obviously it's not an efficient way to do it.
> I've created a buffer which reads 4K chunks per read and then the module
> parses the data.
> If the buffer becomes less then 512B I read another chunk and so on.
> Reducing 100K+ reads to around 15K reads should improve the performance.
> For some reason it did not happen.
> I've played with the chunk size, but nothing came out of it.
> Is there a Python specific way to optimise reading from disk?
> I'm using Python 2.5.2 with Ubuntu 8.10 32bit
> Thanks in advance.
More information about the Python-list