Python MSI not installing, log file showing name of a Viatnemese communist revolutionary
steve+comp.lang.python at pearwood.info
Sun Mar 23 02:07:32 CET 2014
On Sun, 23 Mar 2014 02:09:20 +1100, Chris Angelico wrote:
> On Sun, Mar 23, 2014 at 1:50 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> Line endings are terminators: they end the line. Whether you consider
>> the terminator part of the line or not is a matter of opinion (is the
>> cover of a book part of the book?) but consider this:
>> If you say that the end of lines are *not* part of the line, then
>> that implies that some parts of the file are not inside any line at
>> all. And that would be just weird.
> Not so weird IMO. A file is not a concatenation of lines; it is a stream
> of bytes.
But a *text file* is a concatenation of lines. The "text file" model is
important enough that nearly all programming languages offer a line-based
interface to files, and some (Python at least, possibly others) make it
the default interface so that iterating over the file gives you lines
rather than bytes -- even in "binary" mode.
> Now, if you ask Python to read you 512 bytes from a binary
> file, and then ask for another 512 bytes, and so on until you reach the
> end, then it would indeed be VERY weird if there were parts of the file
> that weren't in the returned (byte) strings. But if you ask for a line,
> and then another line, and another line, then it's quite reasonable to
> interpret U+000A as "line separation" rather than "line termination",
> and not return it. (Both interpretations make sense. I just wish the
> most obvious form of iteration gave the cleaner/tidier version, or at
> very least that there be some really obvious way to ask for
There is: call strip('\n') on the line after reading it. Perl and Ruby
spell it chomp(). Other languages may spell it differently. I don't know
of any language that automatically strips newlines, probably because you
can easily strip the newline from the line, but if the language did it
for you, you cannot reliably reverse it.
> Imagine the output of GNU find as a series of
> records. You can ask for those to be separated by newlines (the default,
> or -print), or by NULs (with the -print0 command). In either case, the
> records do not *contain* that value, they're separated by it; the
> records consist of file names.
I have no problem with that: when interpreting text as a record with
delimiters, e.g. from a CSV file, you normally exclude the delimiter.
Sometimes the line terminator does double-duty as a record delimiter as
Reading from a file is considered a low-level operation. Reading
individual bytes in binary mode is the lowest level; reading lines in
text mode is the next level, built on top of the lower binary mode. You
build higher protocols on top of one or the other of that mode, e.g.
"read a zip file" would be built on top of binary mode, "read a csv file"
would be built on top of text mode.
As a low-level protocol, you ought to be able to copy a file without
changing it by reading it in then writing it out:
for blob in infile:
ought to work whether you are in text mode or binary mode, so long as the
infile and outfile are opened in the same mode. If Python were to strip
newlines, that would no longer be the case.
(Even high-level protocols should avoid unnecessary modifications to
files. One of the more annoying, if not crippling, limitations to the
configparser module is that reading an INI file in, then writing it out
again destroys the high-level structure of the file: comments and blank
lines are stripped, and records may be re-ordered.)
More information about the Python-list