[FEEDBACK] Is this script efficient...is there a better way?

Thu Sep 12 03:44:00 EDT 2002

On Wed, Sep 11, 2002 at 10:08:35PM -0400, Michael Schneider wrote:
> 
> 
> Bob X wrote:
> 
> >
> >>1) readlines() loads the entire file into a list so if you have a 30+ 
> >>mb file you just ate 30+mb of memory.  Try using xreadlines() 
> >>instead, it reads the file line by line and is much more memory 
> >>friendly.
> >
> >Very cool...I had missed that! 
> 
> 
> It is much slower though.  I am not sure about your config, but I am 
> running with a Gig of memory.
> I would much rather have the speed then the  30 mb of memory.  Again, 
> check your config.

Have you actually tested this? I used to believe that larger buffers are 
always better for performance.

Wrong.

I ran some test on Linux for the effect of buffer size on file reading
speed and the results were very interesting. I started with a buffer size
of 32 bytes, tested file I/O throughput and increased it logarithmically by 
about 2% for each step.  As expected, the time to read 1mb improved as the
buffer size increased until it hit a minimum for a buffer size of around 
4-8k (the graph is very noisy so it's hard to tell) and then rose back up 
to a value that is 10-20% worse for buffer sizes of 32-64k and remained
more-or-less constant for anything higher.

This performance curve may be a result of the CPU cache or the OS 
architecture.

The chunk size used by xreadlines is 8k which is about the optimum
value (at least for linux).

	Oren