[Tutor] why is unicode converted file double spaced?

Kent Johnson kent37 at tds.net
Tue Apr 7 19:10:06 CEST 2009


On Tue, Apr 7, 2009 at 12:52 PM, Pirritano, Matthew
<MPirritano at ochca.com> wrote:
> So Kent's syntax worked to convert my Unicode file to plain text. But
> now my data is double space. How can I fix this.  Here is the code I'm
> using.
>
> import codecs
>
> inp = codecs.open('g:\\data\\amm\\text files\\test20090320.txt', 'r',
> 'utf-16')
> outp = open('g:\\data\\amm\\text files\\new_text_file.txt', 'w')
> outp.writelines(inp)
> inp.close()
> outp.close()

I guess there is something funny going on with conversion of newlines.
It would help to know what line endings are in the original data, and
what are in the new data. One thing to try is to open the output file
as binary - 'wb' instead of 'w'. The input file is opened as binary by
the codecs module.

If that doesn't work, you could try tostrip line endings from the
original, then add back in to the new file:

inp = codecs.open('g:\\data\\amm\\text files\\test20090320.txt', 'r',
'utf-16')
outp = open('g:\\data\\amm\\text files\\new_text_file.txt', 'w')
for line in inp:
  line = line.rstrip()
  outp.write(line)
  outp.write('\n')
inp.close()
outp.close()

Note that this will strip all trailing white space from the input, I
don't know if that is an issue...

Kent


More information about the Tutor mailing list