Convert unicode escape sequences to unicode in a file

Alex Willmer alex at moreati.org.uk
Tue Jan 11 17:36:26 EST 2011


On Jan 11, 8:53 pm, Jeremy <jlcon... at gmail.com> wrote:
> I have a file that has unicode escape sequences, i.e.,
>
> J\u00e9r\u00f4me
>
> and I want to replace all of them in a file and write the results to a new file.  The simple script I've created is copied below.  However, I am getting the following error:
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 947: ordinal not in range(128)
>
> It appears that the data isn't being converted when writing to the file.  Can someone please help?

Are you _sure_ that your file contains the characters '\', 'u', '0',
'0', 'e' and '9'? I expect that actually your file contains a byte
with value 0xe9 and you have inspected the file using Python, which
has printed the byte using a Unicode escape sequence. Open the file
using a text editor or hex editor and look at the value at offset 947
to be sure.

If so, you need to replace 'unicode-escape' with the actual encoding
of the file.

> if __name__ == "__main__":
>     f = codecs.open(filename, 'r', 'unicode-escape')
>     lines = f.readlines()
>     line = ''.join(lines)
>     f.close()
>
>     utFound = re.sub('STRINGDECODE\((.+?)\)', r'\1', line)
>     print(utFound[:1000])
>
>     o = open('newDice.sql', 'w')
>     o.write(utFound.decode('utf-8'))
>     o.close()




More information about the Python-list mailing list