changing string encoding to different charset?
philip at semanchuk.com
Sun Dec 14 16:07:36 CET 2008
On Dec 14, 2008, at 9:21 AM, Daniel Woodhouse wrote:
> Is it possible to re-encode a string to a different character set in
> python? To be more specific, I want to change a text file encoded in
> windows-1251 to UTF-8.
> I've tried using string.encode, but get the error:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position
> ordinal not in range(128)
Without seeing your code, I can't be sure, but I suspect that first
you need to decode the file to Unicode.
# Untested --
s = file("in.txt").read()
s = s.decode("win-1251") # Might be "cp1251" instead
s = s.encode("utf-8")
More information about the Python-list