[Tutor] unicode to plain text conversion
Pirritano, Matthew
MPirritano at ochca.com
Tue Apr 7 17:56:16 CEST 2009
Thanks all!
Kent, this syntax worked. I was able to figure it out the encoding just
with trial and error. It is utf16. Now the only thing is that the
conversion is double-spacing the lines of data. I'm thinking this must
be something that I need to fix in my syntax. I will continue to try and
figure it out, but any pointing out of the obvious or other ideas would
be much appreciated. Again, newbie here.
Thanks
Matt
Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648
-----Original Message-----
From: tutor-bounces+mpirritano=ochca.com at python.org
[mailto:tutor-bounces+mpirritano=ochca.com at python.org] On Behalf Of Kent
Johnson
Sent: Monday, April 06, 2009 5:51 PM
To: Pirritano, Matthew
Cc: Python Tutor
Subject: Re: [Tutor] unicode to plain text conversion
On Mon, Apr 6, 2009 at 6:48 PM, Pirritano, Matthew
<MPirritano at ochca.com> wrote:
> Hello python people,
>
> I am a total newbie. I have a very large file > 4GB that I need to
> convert from Unicode to plain text. I used to just use dos when the
file
> was < 4GB but it no longer seems to work. Can anyone point me to some
> python code that might perform this function?
What is the encoding of the Unicode file?
Assuming that the file has lines that will each fit in memory, you can
use the codecs module to decode the unicode. Something like this:
import codecs
inp = codecs.open('Unicode_file.txt', 'r', 'utf-16le')
outp = open('new_text_file.txt')
outp.writelines(inp)
inp.close()
outp.close()
The above code assumes UTF-16LE encoding, change it to the correct one
if that is not right. A list of supported encodings is here:
http://docs.python.org/library/codecs.html#id3
Kent
_______________________________________________
Tutor maillist - Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list