[issue21910] File protocol should document if writelines must handle generators sensibly
New submission from Jan Kanis: The resolution of issue 5445 should be documented somewhere properly, so people can depend on it or not. IOBase.writelines handles generator arguments without problems, i.e. without first draining the entire generator and then writing the result in one go. That would require large amounts of memory if the generator is large, and fail entirely if the generator is infinite. codecs.StreamWriter.writelines uses self.write(''.join(argument)) as implementation, which fails on very large or infinite arguments. According to issue 5445 it is not part of the file protocol that .writelines must handle (large/infinite) generators, only list-like iterables. However as far as I know this is not documented anywhere, and sometimes people assume that writelines is meant for this case. E.g. jinja (https://github.com/mitsuhiko/jinja2/blob/master/jinja2/environment.py#L1153, the dump method is explicitly documented to stream). The guarantees that .writelines makes or does not make in this regard should be documented somewhere, so that either .writeline implementations that don't handle large generators can be pointed out as bugs, or code that makes assumptions on .writeline handling large generators can be. I personally think .writelines should handle large generators, since in the python 3 world a lot of apis were iterator-ified and it is wat a lot of people would probably expect. But having a clear and documented decision on this is more important. (note: I've copied most of the nosy list from #5445) ---------- assignee: docs@python components: Documentation, IO messages: 222165 nosy: JanKanis, benjamin.peterson, dlesco, docs@python, hynek, lemburg, pitrou, stutzbach, terry.reedy priority: normal severity: normal status: open title: File protocol should document if writelines must handle generators sensibly versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21910> _______________________________________
Josh Rosenberg added the comment: +1. I've been assuming writelines handled arbitrary generators without an issue; guess I've gotten lucky and only used the ones that do. I've fed stuff populated by enormous (though not infinite) generators created from stuff like itertools.product and the like into it on the assumption that it would safely write it without generating len(seq) ** repeat values in memory. I'd definitely appreciate a documented guarantee of this. I don't need it to explicitly guarantee that each item is written before the next item is pulled off the iterator or anything; if it wants to buffer a reasonable amount of data in memory before triggering a real I/O that's fine (generators returning mutable objects and mutating them when the next object comes along are evil anyway, and forcing one-by-one output can prevent some useful optimizations). But anything that uses argument unpacking, collection as a list, ''.join (or at the C level, PySequence_Fast and the like), forcing the whole generator to exhaust before writing byte one, is a bad idea. ---------- nosy: +josh.rosenberg _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21910> _______________________________________
Terry J. Reedy added the comment: Security fix only versions do not get doc fixes. ---------- versions: -Python 3.1, Python 3.2, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21910> _______________________________________
Dan Haffey added the comment: +1, I just lost an hour-plus compute job to this. It sure violates POLA. I've been passing large generators to file.writelines since about as long as generators have existed, so I never would have guessed that a class named "StreamWriter" of all things wouldn't, you know, stream its writelines argument. ---------- nosy: +dhaffey _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21910> _______________________________________
Changes by Martin Panter <vadmium+py@gmail.com>: ---------- stage: -> needs patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue21910> _______________________________________
participants (5)
-
Dan Haffey
-
Jan Kanis
-
Josh Rosenberg
-
Martin Panter
-
Terry J. Reedy