Idiomatic portable way to strip line endings?

John Roth johnroth at ameritech.net
Sun Dec 16 16:04:01 EST 2001


"Jason Orendorff" <jason at jorendorff.com> wrote in message
news:mailman.1008532640.22171.python-list at python.org...
> > I've been trying to figure out the canonical way to strip
> > the line endings from a text file.
>
> There isn't one.  Almost always, either rstrip() is sufficient,
> *or* you're doing text slinging, in which case you can leave the
> newline characters on there, do the regex stuff, and file.write()
> the lines out the same way they came in.
>
> That said, the function you want is:
>
>   def chomp(line):
>       if line.endswith('\r\n'):
>           return line[:-2]
>       elif line.endswith('\r') or line.endswith('\n'):
>           return line[:-1]
>       else:
>           return line
>
> If you're getting these strings from a text file, you could:
>
>   for unstripped_line in file:
>       for line in unstripped_line.splitlines():
>           ...process line...
>
> Why is this necessary?  Unfortunately readline() doesn't
> interpret a bare '\r' as a line ending on Windows or Unix.
> So if the file contains bare '\r's, then the above code
> will read the entire file into the unstripped_line
> variable, then break it into lines with splitlines().

Unfortunately, this won't work in all cases, either. Let's
go back to basics. If your file has the line ending convention
defined for the operation system you're running on, then
everything works nicely. If it doesn't, then you're in a
great deal of difficulty, because there are cases where
readline() and readlines() will not parse the file into
lines for you, or if they do, you get extra characters
at the end. You need to read it in without having the
system do the parse, and do it yourself.

These are two different cases, although it does come
up in practice if you're importing files from the internet.
Some browsers will fix the line endings, and some won't.
I had a lot of files with Unix line endings that I had to
convert to Windows line endings because Notepad
will not handle Unix line endings. At all.

John Roth







More information about the Python-list mailing list