[FEEDBACK] Is this script efficient...is there a better way?
Oren Tirosh
oren-py-l at hishome.net
Thu Sep 12 03:44:00 EDT 2002
On Wed, Sep 11, 2002 at 10:08:35PM -0400, Michael Schneider wrote:
>
>
> Bob X wrote:
>
> >
> >>1) readlines() loads the entire file into a list so if you have a 30+
> >>mb file you just ate 30+mb of memory. Try using xreadlines()
> >>instead, it reads the file line by line and is much more memory
> >>friendly.
> >
> >Very cool...I had missed that!
>
>
> It is much slower though. I am not sure about your config, but I am
> running with a Gig of memory.
> I would much rather have the speed then the 30 mb of memory. Again,
> check your config.
Have you actually tested this? I used to believe that larger buffers are
always better for performance.
Wrong.
I ran some test on Linux for the effect of buffer size on file reading
speed and the results were very interesting. I started with a buffer size
of 32 bytes, tested file I/O throughput and increased it logarithmically by
about 2% for each step. As expected, the time to read 1mb improved as the
buffer size increased until it hit a minimum for a buffer size of around
4-8k (the graph is very noisy so it's hard to tell) and then rose back up
to a value that is 10-20% worse for buffer sizes of 32-64k and remained
more-or-less constant for anything higher.
This performance curve may be a result of the CPU cache or the OS
architecture.
The chunk size used by xreadlines is 8k which is about the optimum
value (at least for linux).
Oren
More information about the Python-list
mailing list