[issue5445] codecs.StreamWriter.writelines problem when passed generator
Daniel Lescohier
report at bugs.python.org
Tue Mar 10 15:48:52 CET 2009
Daniel Lescohier <daniel.lescohier at cbs.com> added the comment:
In Python's file protocol, readlines and writelines is a protocol for
iterating over a file. In Python's file protocol, if one doesn't want
to iterate over the file, one calls read() with no argument in order
to read the whole file in, or one calls write() with the complete
contents you want to write.
If writelines is using join, then if one passes an iterator as the
parameter to writelines, it will not iteratively write to the file, it
will accumulate everything in memory until the iterator raises
StopIteration, and then write to the file. So, if one is tailing the
output file, one is not going to see anything in the file until the
end, instead of iteratively seeing content. So, it's breaking the
promise of the file protocol's writelines meaning iteratively write.
I think following the protocol is more important than performance. If
the application is having performance problems, it's up to the
application to buffer the data in memory and make a single write call.
However, here is an alternative implementation that is slightly more
complicated, but possibly has better performance for the passed-a-list
case. It covers three cases:
1. Passed an empty sequence; do not call self.write at all.
2. Passed a sequence with a length. That implies that all the data is
available immediately, so one can concantenate and write with one
self.write call.
3. Passed a sequence with no length. That implies that all the data
is not available immediately, so iteratively write it.
def writelines(self, sequence):
""" Writes the sequence of strings to the stream
using .write().
"""
try:
sequence_len = len(sequence)
except TypeError:
write = self.write
for value in sequence:
write(value)
return
if sequence_len:
self.write(''.join(sequence))
I'm not sure which is better. But one last point is that Python is
moving more in the direction of using iterators; e.g., in Py3K,
replacing dict's keys, values, and items with the implementation of
iterkeys, itervalues, and iteritems.
----------
message_count: 3.0 -> 4.0
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5445>
_______________________________________
More information about the Python-bugs-list
mailing list