[issue5445] codecs.StreamWriter.writelines problem when passed generator

Daniel Lescohier report at bugs.python.org
Tue Mar 10 15:48:52 CET 2009


Daniel Lescohier <daniel.lescohier at cbs.com> added the comment:

In Python's file protocol, readlines and writelines is a protocol for 
iterating over a file. In Python's file protocol, if one doesn't want 
to iterate over the file, one calls read() with no argument in order 
to read the whole file in, or one calls write() with the complete 
contents you want to write.

If writelines is using join, then if one passes an iterator as the 
parameter to writelines, it will not iteratively write to the file, it 
will accumulate everything in memory until the iterator raises 
StopIteration, and then write to the file.  So, if one is tailing the 
output file, one is not going to see anything in the file until the 
end, instead of iteratively seeing content.  So, it's breaking the 
promise of the file protocol's writelines meaning iteratively write.

I think following the protocol is more important than performance. If 
the application is having performance problems, it's up to the 
application to buffer the data in memory and make a single write call.

However, here is an alternative implementation that is slightly more 
complicated, but possibly has better performance for the passed-a-list 
case.  It covers three cases:

1. Passed an empty sequence; do not call self.write at all.
2. Passed a sequence with a length. That implies that all the data is 
available immediately, so one can concantenate and write with one 
self.write call.
3. Passed a sequence with no length.  That implies that all the data 
is not available immediately, so iteratively write it.

    def writelines(self, sequence):

        """ Writes the sequence of strings to the stream
            using .write().
        """
        try:
            sequence_len = len(sequence)
        except TypeError:
            write = self.write
            for value in sequence:
                write(value)
            return
        if sequence_len:
            self.write(''.join(sequence))

I'm not sure which is better.  But one last point is that Python is 
moving more in the direction of using iterators; e.g., in Py3K, 
replacing dict's keys, values, and items with the implementation of 
iterkeys, itervalues, and iteritems.

----------
message_count: 3.0 -> 4.0

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5445>
_______________________________________


More information about the Python-bugs-list mailing list