decoding a byte array that is unicode escaped?

Fri Nov 6 03:59:24 EST 2009

sam wrote:

> I have a byte stream read over the internet:
> 
> responseByteStream = urllib.request.urlopen( httpRequest );
> responseByteArray = responseByteStream.read();
> 
> The characters are encoded with unicode escape sequences, for example
> a copyright symbol appears in the stream as the bytes:
> 
> 5C 75 30 30 61 39
> 
> which translates to:
> \u00a9
> 
> which is unicode for the copyright symbol.
> 
> I am simply trying to display this copyright symbol on a webpage, so
> how do I encode the byte array to utf-8 given that it is 'escape
> encoded' in the above way?  I tried:
> 
> responseByteArray.decode('utf-8')
> and responseByteArray.decode('unicode_escape')
> and str(responseByteArray).
> 
> I am using Python 3.1.

Convert the bytes to unicode first:

>>> u = b"\\u00a9".decode("unicode-escape")
>>> u
'©'

Then convert the string to bytes:

>>> u.encode("utf-8")
b'\xc2\xa9'