[Tutor] Decoding from strange symbols

Peter Otten __peter__ at web.de
Wed Jan 19 14:53:43 CET 2011


Oleg Oltar wrote:

> I am trying to decode a string I took from file:
> 
> file = open ("./Downloads/lamp-post.csv", 'r')
> data = file.readlines()
> data[0]
> 
> 
'\xff\xfeK\x00e\x00y\x00w\x00o\x00r\x00d\x00\t\x00C\x00o\x00m\x00p\x00e\x00t\x00i\x00t\x00i\x00o\x00n\x00\t\x00G\x00l\x00o\x00b\x00a\x00l\x00

> How do I convert this to something human readable?

If you stare at it long enough you'll see the usual ascii characters 
interspersed with zero-bytes shown by Python as "\x00".

This is an UTF-16 file. Open it with

import codecs
filename = "./Downloads/lamp-post.csv"
with codecs.open(filename, "r", encoding="utf-16") as file:
   for line in file:
       print line

Note that 'line' will now contain a unicode string instead of a byte string. 
If you want to write that to a file you have to encode it manually

line = u"äöü"
with open("tmp.txt", "w") as f:
    f.write(line.encode("utf-8"))

or use codecs.open() again:

with codecs.open("tmp.txt", "w", encoding="utf-8") as f:
    f.write(line)



More information about the Tutor mailing list