[Python-ideas] Iterating non-newline-separated files should be easier
Steven D'Aprano
steve at pearwood.info
Sat Jul 19 11:01:59 CEST 2014
On Sat, Jul 19, 2014 at 04:18:35AM -0400, Nick Coghlan wrote:
> On 19 July 2014 03:32, Chris Angelico <rosuav at gmail.com> wrote:
> > On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> I still favour my proposal there to add a separate "readrecords()"
> >> method, rather than reusing the line based iteration methods - lines
> >> and arbitrary records *aren't* the same thing
> >
> > But they might well be the same thing. Look at all the Unix commands
> > that usually separate output with \n, but can be told to separate with
> > \0 instead. If you're reading from something like that, it should be
> > just as easy to split on \n as on \0.
>
> Python isn't Unix, and Python has never supported \0 as a "line
> ending". Changing the meaning of existing constructs is fraught with
> complexity, and should only be done when there is absolutely no
> alternative. In this case, there's an alternative: a new method,
> specifically for reading arbitrary records.
I don't have an opinion one way or the other, but I don't quite see why
you're worried about allowing the newline parameter to be set to some
arbitrary separator. The best I can come up with is a scenario something
like this:
I open a file with some record-separator
fp = open(filename, newline="\0")
then pass it to a function:
spam(fp)
which assumes that each chunk ends with a linefeed:
assert next(fp).endswith('\n')
But in a case like that, the function is already buggy. I can see at
least two problems with such an assumption:
- what if universal newlines has been turned off and you're reading
a file created under (e.g.) classic Mac OS or RISC OS?
- what if the file contains a single line which does not end with an
end of line character at all?
open('/tmp/junk', 'wb').write("hello world!")
next(open('/tmp/junk', 'r'))
Have I missed something?
Although I'm don't mind whether files grow a readrecords() method, or
re-use the readlines() method, I'm not convinced that API decisions
should be driven solely by the needs of programs which are already
buggy.
--
Steven
More information about the Python-ideas
mailing list