[Tutor] why is unicode converted file double spaced?

Marc Tompkins marc.tompkins at gmail.com
Tue Apr 7 19:11:58 CEST 2009

On Tue, Apr 7, 2009 at 9:52 AM, Pirritano, Matthew <MPirritano at ochca.com>wrote:

> So Kent's syntax worked to convert my Unicode file to plain text. But
> now my data is double space. How can I fix this.  Here is the code I'm
> using.

Sounds like you're being stung by the difference in newline handling between
operating systems - to recap, MS-DOS and Windows terminate a line with a
carriage return and linefeed (aka CRLF or '\r\n'); *nixes use just LF
('\n'); Mac OS up to version 9 uses just CR ('\r').  You will have noticed
this, on Windows, if you ever open a text file in Notepad that was created
on a different OS - instead of breaking into separate lines, everything
appears on one long line with funky characters where the breaks should be.
If you use a more sophisticated text editor such as Notepad++ or Textpad,
everything looks normal.  Python has automatic newline conversion;
generally, you can read a text file from any OS and write to it correctly
regardless of the OS that you happen to be running yourself.

However, the automatic newline handling (from my perfunctory Googling)
appears to break down when you're also converting between Unicode and ASCII;
or it could be because you're essentially doing a read() from one file and a
writelines() to the other; or something else entirely.  Anyway, try this -

import codecs
> inp = codecs.open('g:\\data\\amm\\text files\\test20090320.txt', 'r',
> 'utf-16')
> outp = open('g:\\data\\amm\\text files\\new_text_file.txt', 'w')
for outLine in inp:
>     outp.write(outLine.strip())
> outp.close()

strip() will remove any leading or trailing whitespace - which should
include any leftover CR or LF characters.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090407/09c38396/attachment-0001.htm>

More information about the Tutor mailing list