Looking under Python's hood: Will we find a high performance or clunky engine?
steve+comp.lang.python at pearwood.info
Mon Jan 23 04:11:39 EST 2012
On Sun, 22 Jan 2012 07:50:59 -0800, Rick Johnson wrote:
> What does Python do when presented with this code?
> py> [line.strip('\n') for line in f.readlines()]
> If Python reads all the file lines first and THEN iterates AGAIN to do
> the strip; we are driving a Fred flintstone mobile.
Nonsense. File-like objects offer two APIs: there is a lazy iterator
approach, using the file-like object itself as an iterator, and an eager
read-it-all-at-once approach, offered by the venerable readlines()
method. readlines *deliberately* reads the entire file, and if you as a
developer do so by accident, you have no-one to blame but yourself. Only
a poor tradesman blames his tools instead of taking responsibility for
learning how to use them himself.
You should use whichever approach is more appropriate for your situation.
You might want to consider reading from the file as quickly as possible,
in one big chunk if you can, so you can close it again and let other
applications have access to it. Or you might not care. The choice is
For small files, readlines() will probably be faster, although for small
files it won't make much practical difference. Who cares whether it takes
0.01ms or 0.02ms? For medium sized files, say, a few thousand lines, it
could go either way, depending on memory use, the size of the internal
file buffer, and implementation details. Only for files large enough that
allocating memory for all the lines at once becomes significant will lazy
iteration be a clear winner.
But if the file is that big, are you sure that a list comprehension is
the right tool in the first place?
In general, you should not care greatly which of the two you use, unless
profiling your application shows that this is the bottleneck.
But it is extremely unlikely that copying even a few thousands lines
around memory will be slower than reading them from disk in the first
place. Unless you expect to be handling truly large files, you've got
more important things to optimize before wasting time caring about this.
More information about the Python-list