Newby: how to transform text into lines of text
sjmachin at lexicon.net
Mon Jan 26 01:44:33 CET 2009
On 26/01/2009 10:34 AM, Tim Chase wrote:
> I believe that using the formulaic "for line in file(FILENAME)"
> iteration guarantees that each "line" will have at most only one '\n'
> and it will be at the end (again, a malformed text-file with no terminal
> '\n' may cause it to be absent from the last line)
It seems that you are right -- not that I can find such a guarantee
written anywhere. I had armchair-philosophised that writing
"foo\n\r\nbar\r\n" to a file in binary mode and reading it on Windows in
text mode would be strict and report the first line as "foo\n\n"; I was
>> So, we are left with the unfortunately awkward
>> if line.endswith('\n'):
>> line = line[:-1]
> You're welcome to it, but I'll stick with my more DWIM solution of "get
> rid of anything that resembles an attempt at a CR/LF".
Thanks, but I don't want it. My point was that you didn't TTOPEWYM (tell
the OP exactly what you meant).
My approach to DWIM with data is, given
norm_space = lambda s: u' '.join(s.split())
to break up the line into fields first (just in case the field delimiter
== '\t') then apply norm_space to each field. This gets rid of your '\r'
at end (or start!) of line, and multiple whitespace characters are
replaced by a single space. Whitespace includes NBSP (U+00A0) as an
added bonus for being righteous and using Unicode :-)
> Thank goodness I haven't found any of my data-sources using "\n\r"
> instead, which would require me to left-strip '\r' characters as well.
> Sigh. My kingdom for competency. :-/
Indeed. I actually got data in that format once from a *x programmer who
was so kind as to do it that way just for me because he knew that I use
Windows and he thought that's what Windows text files looked like. No
More information about the Python-list