[issue5445] codecs.StreamWriter.writelines problem when passed generator

Tue Mar 10 18:36:03 CET 2009

Daniel Lescohier <daniel.lescohier at cbs.com> added the comment:

OK, I think I see where I went wrong in my perceptions of the file 
protocol.  I thought that readlines() returned an iterator, not a 
list, but I see in the library reference manual on File Objects that 
it returns a list.  I think I got confused because there is no 
equivalent of __iter__ for writing to streams.  For input, I'm always 
using 'for line in file_object' (in other words, 
file_object.__iter__), so I had assumed that writelines was the mirror 
image of that, because I never use the readlines method.  Then, in my 
mind, readlines became the mirror image of writelines, which I had 
assumed took an iterator, so I assumed that readlines returned an 
iterator.  I wonder if this perception problem is common or not.

So, the StreamWriter interface matches the file protocol; readlines() 
and writelines() deal with lists.  There shouldn't be any change to 
it, because it follows the protocol.

Then, the example I wrote would be instead:

rows = (line[:-1].split('\t') for line in in_file)
projected = (keep_fields(row, 0, 3, 7) for row in rows)
filtered = (row for row in projected if row[2]=='1')
formatted = (u'\t'.join(row)+'\n' for row in filtered)
write = out_file.write
for line in formatted:
    write(line)

I think it's correct that the file object write C code only does 
1000-line chunks for sequences that have a defined length: if it has a 
defined length, then that implies that the data exists now, and can be 
concatenated and written now.  Something without a defined length may 
be a generator with items arriving later.

----------
message_count: 6.0 -> 7.0

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5445>
_______________________________________