convert string with raw binary data to unicode

Neil Hodgson nhodgson at bigpond.net.au
Thu Feb 12 15:38:24 EST 2004


Achim Domma:

> data = codecs.open('path_to_file','rb','???').read()
>
> I tried to use UCS2 for the ???, but this encoding does not exist. A
posting
> found via google supposes to use UTF-16 but this is not the same and
raises
> an error.

   It is better to show the error message when sending queries to a news
group. You may want to look at the 'errors' argument which can be one of:

'strict' Raise ValueError (or a subclass); this is the default.
'ignore' Ignore the character and continue with the next.
'replace' Replace with a suitable replacement character
'xmlcharrefreplace' Replace with the appropriate XML character reference
'backslashreplace' Replace with backslashed escape sequences.

   Take a look at the results after using, say, 'backslashreplace' and you
may find that much of your file is not UTF-16 or that it is byte swapped or
that there are just a few bad characters in a header or similar.

   Neil





More information about the Python-list mailing list