[Tutor] Decoding from strange symbols
Peter Otten
__peter__ at web.de
Wed Jan 19 14:53:43 CET 2011
Oleg Oltar wrote:
> I am trying to decode a string I took from file:
>
> file = open ("./Downloads/lamp-post.csv", 'r')
> data = file.readlines()
> data[0]
>
>
'\xff\xfeK\x00e\x00y\x00w\x00o\x00r\x00d\x00\t\x00C\x00o\x00m\x00p\x00e\x00t\x00i\x00t\x00i\x00o\x00n\x00\t\x00G\x00l\x00o\x00b\x00a\x00l\x00
> How do I convert this to something human readable?
If you stare at it long enough you'll see the usual ascii characters
interspersed with zero-bytes shown by Python as "\x00".
This is an UTF-16 file. Open it with
import codecs
filename = "./Downloads/lamp-post.csv"
with codecs.open(filename, "r", encoding="utf-16") as file:
for line in file:
print line
Note that 'line' will now contain a unicode string instead of a byte string.
If you want to write that to a file you have to encode it manually
line = u"äöü"
with open("tmp.txt", "w") as f:
f.write(line.encode("utf-8"))
or use codecs.open() again:
with codecs.open("tmp.txt", "w", encoding="utf-8") as f:
f.write(line)
More information about the Tutor
mailing list