On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Jul 17, 2014 at 05:04:00PM -0700, Andrew Barnert wrote:
It turns out to be even simpler than I expected.
I reused the "newline" parameter of open and TextIOWrapper.__init__, adding a param of the same name to the constructors for BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and FileIO.
For text files, just remove the check for newline being one of the standard values and it all works. For binary files, remove the check for truthy, make open pass each Buffered* constructor newline=(newline if binary else None), make each Buffered* class store it, and change two lines in RawIOBase.readline to use it. And that's it.
All the words are in English, but I have no idea what you're actually saying... :-)
You seem to be talking about the implementation of the change, but what is the interface? Having made all these changes, how does it effect Python code? You have a use-case of splitting on something other than the standard newlines, so how does one do that? E.g. suppose I have a file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line character. How would I iterate over lines in this file?
The way I understand it is this: for line in open("spam.txt", newline="\u0085"): process(line) If that's the case, I would be strongly in favour of this. Nice and clean, and should break nothing; there'll be special cases for newline=None and newline='', and the only change is that, instead of a small number of permitted values ('\n', '\r', '\r\n'), any string (or maybe any one-character string plus '\r\n'?) would be permitted. Effectively, it's not "iterate over this file, divided by \0 instead of newlines", but it's "this file uses the unusual encoding of newline=\0, now iterate over lines in the file". Seems a smart way to do it IMO. ChrisA