decoding a byte array that is unicode escaped?
Peter Otten
__peter__ at web.de
Fri Nov 6 03:59:24 EST 2009
sam wrote:
> I have a byte stream read over the internet:
>
> responseByteStream = urllib.request.urlopen( httpRequest );
> responseByteArray = responseByteStream.read();
>
> The characters are encoded with unicode escape sequences, for example
> a copyright symbol appears in the stream as the bytes:
>
> 5C 75 30 30 61 39
>
> which translates to:
> \u00a9
>
> which is unicode for the copyright symbol.
>
> I am simply trying to display this copyright symbol on a webpage, so
> how do I encode the byte array to utf-8 given that it is 'escape
> encoded' in the above way? I tried:
>
> responseByteArray.decode('utf-8')
> and responseByteArray.decode('unicode_escape')
> and str(responseByteArray).
>
> I am using Python 3.1.
Convert the bytes to unicode first:
>>> u = b"\\u00a9".decode("unicode-escape")
>>> u
'©'
Then convert the string to bytes:
>>> u.encode("utf-8")
b'\xc2\xa9'
More information about the Python-list
mailing list