[Tutor] Encoding question

Wed Sep 9 12:59:41 CEST 2009

On Wed, Sep 9, 2009 at 5:06 AM, Oleg Oltar <oltarasenko at gmail.com> wrote:
> Hi!
>
> One of my tests returned following text ()
>
> The test:
> from django.test.client import Client
>  c = Client()
> resp = c.get("/")
> resp.content
>
> In [25]: resp.content
> Out[25]: '\r\n\r\n\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
> Strict//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\r\n\r\n<html
> xmlns="http://www.w3.org/1999/xhtml">\r\n  <head>\r\n    <meta
> http-equiv="content-type" content="text/html; charset=utf-8" />\r\n
> \r\n    \n<title>Japanese innovation |
> \xd0\xaf\xd0\xbf\xd0\xbe\xd0\xbd\xd0\xb8\xd1\x8f
> \xd0\xb8\xd0\xbd\xd0\xbd\xd0\xbe\xd0\xb2\xd0\xb0\xd1\x86\xd0\xb8\xd0\xb8</title>\n\r\n
<snip>
> Is there a way I can convert it to normal readable text? (I need for example
> to find a string of text in this response to check if my test case Pass or
> failed)

resp.content.decode('string_escape') will convert it to encoded bytes.
Then another decode() with the correct encoding will get you Unicode.
I'm not sure what the correct encoding is for the second decode(),
most likely one of 'utf-8', 'utf_16_le' or 'utf_16_be'.

Kent