[Tutor] Blank line added after reading a line from a file

Jeff Shannon jeff@ccvcorp.com
Thu, 06 Dec 2001 09:57:34 -0800

> On Thu, 6 Dec 2001 09:52:14 -0000,
> "Kelly, Phelim" <KellyPhe@logica.com> wrote:
> Jeff, Andrei,
>             I got that problem fixed using one of the lines you gave me. The
> line of text was read in with an extra blank line, it wasn't just the print
> command that added it on. I got rid of the blank line using line 4 below:

Well, really, the line was not read in with an *extra* blank line.  The line was read in, including the newline character (and trust me, that newline character *does* exist in the file on disk).  It's just that, when you read a line of the file into a string, you expect that the end of the string is the end of the line, so the (somewhat redundant) existance of a character there identifying this as the end of a
line, can be a bit confusing.  If you were to look directly at the file on disk, it would look something like this (using the convention of '\n' to indicate newline characters):

This is a line of text.\nThis is a second line.\nThis textfile contains\na number of different lines,\nin fact.\n
------------------end of textfile.txt----------

Now, when you normally view the file, the newlines are translated, i.e.,

$ cat textfile.txt
This is a line of text.
This is a second line.
This textfile contains
a number of different lines,
in fact.

But when you read the file into python using readline() or readlines(), it does *not* translate the newline characters, it just breaks the file into chunks at their locations, so that you get:

>>> fp = open('textfile.txt')
>>> text = fp.readlines()
>>> fp.close()
>>> for line in text:
...  repr(line)
"'This is a line of text.\\012'"
"'This is a second line.\\012'"
"'This textfile contains\\012'"
"'a number of different lines,\\012'"
"'in fact.\\012'"

(The extra quotes are added by repr(). )  This shows that the newline characters are still there, just as they were in the original file, it's just that the file has now been chunked up.  We can verify that it's the same as the original file, too:

>>> fp = open('textfile.txt')
>>> text = fp.read()
>>> fp.close()
>>> repr(text)
"'This is a line of text.\\012This is a second line.\\012This textfile contains\\012a number of different lines,\\012in fact.\\012'"

You can see that the first example has all of the same characters as the second, it's just been partitioned into several pieces.  The problem comes when you try to do anything with those individual lines.  Most functions (or statements such as print) expect that what they're given, is a complete line, and will act appropriately.  They get thrown, however, by the existence of that trailing newline character that
readlines() left in for the sake of completeness.

And as a further note, I'm pretty sure that readline[s]() leaves the newline in place, in order to maintain symmetry with the writelines() function, which writes a list of strings to a file, but does not add newlines on its own (it's up to you to add newlines anywhere you want them).

Of course, all of this is just background, since you've found a way to solve your problem, but it's good to know the background so you know *why* your problem is fixed (and why it was a problem to begin with)...  :)

Jeff Shannon
Credit International