Interpreting string containing \u000a

Peter Otten __peter__ at
Wed Jun 18 14:21:18 CEST 2008

Francis Girard wrote:

> I have an ISO-8859-1 file containing things like
> "Hello\u000d\u000aWorld", i.e. the character '\', followed by the
> character 'u' and then '0', etc.
> What is the easiest way to automatically translate these codes into
> unicode characters ?

If the file really contains the escape sequences use "unicode-escape" as the

>>> "Hello\\u000d\\u000aWorld".decode("unicode-escape")

If it contains the raw bytes use "iso-8859-1":

>>> "Hello\x0d\x0aWorld".decode("iso-8859-1")

Open the file with, encoding=encoding_as_determined_above)

instead of the builtin open().


More information about the Python-list mailing list