Re: [Python-ideas] Iterating non-newline-separated files should be easier

17 Jul 2014

      On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve@pearwood.info> wrote:
...
On Thu, Jul 17, 2014 at 05:04:00PM -0700, Andrew Barnert wrote:
...
It turns out to be even simpler than I expected.
I reused the "newline" parameter of open and TextIOWrapper.__init__,
adding a param of the same name to the constructors for
BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and
FileIO.
For text files, just remove the check for newline being one of the
standard values and it all works. For binary files, remove the check
for truthy, make open pass each Buffered* constructor newline=(newline
if binary else None), make each Buffered* class store it, and change
two lines in RawIOBase.readline to use it. And that's it.
All the words are in English, but I have no idea what you're actually
saying... :-)
You seem to be talking about the implementation of the change, but what
is the interface? Having made all these changes, how does it effect
Python code? You have a use-case of splitting on something other than
the standard newlines, so how does one do that? E.g. suppose I have a
file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line
character. How would I iterate over lines in this file?
The way I understand it is this:

for line in open("spam.txt", newline="\u0085"):
    process(line)

If that's the case, I would be strongly in favour of this. Nice and
clean, and should break nothing; there'll be special cases for
newline=None and newline='', and the only change is that, instead of a
small number of permitted values ('\n', '\r', '\r\n'), any string (or
maybe any one-character string plus '\r\n'?) would be permitted.

Effectively, it's not "iterate over this file, divided by \0 instead
of newlines", but it's "this file uses the unusual encoding of
newline=\0, now iterate over lines in the file". Seems a smart way to
do it IMO.

ChrisA

Re: [Python-ideas] Iterating non-newline-separated files should be easier

Chris Angelico