[FEEDBACK] Is this script efficient...is there a better way?

Oren Tirosh oren-py-l at hishome.net
Thu Sep 12 03:44:00 EDT 2002


On Wed, Sep 11, 2002 at 10:08:35PM -0400, Michael Schneider wrote:
> 
> 
> Bob X wrote:
> 
> >
> >>1) readlines() loads the entire file into a list so if you have a 30+ 
> >>mb file you just ate 30+mb of memory.  Try using xreadlines() 
> >>instead, it reads the file line by line and is much more memory 
> >>friendly.
> >
> >Very cool...I had missed that! 
> 
> 
> It is much slower though.  I am not sure about your config, but I am 
> running with a Gig of memory.
> I would much rather have the speed then the  30 mb of memory.  Again, 
> check your config.

Have you actually tested this? I used to believe that larger buffers are 
always better for performance.

Wrong.

I ran some test on Linux for the effect of buffer size on file reading
speed and the results were very interesting. I started with a buffer size
of 32 bytes, tested file I/O throughput and increased it logarithmically by 
about 2% for each step.  As expected, the time to read 1mb improved as the
buffer size increased until it hit a minimum for a buffer size of around 
4-8k (the graph is very noisy so it's hard to tell) and then rose back up 
to a value that is 10-20% worse for buffer sizes of 32-64k and remained
more-or-less constant for anything higher.

This performance curve may be a result of the CPU cache or the OS 
architecture.

The chunk size used by xreadlines is 8k which is about the optimum
value (at least for linux).

	Oren





More information about the Python-list mailing list