[Tutor] trying to convert pycurl/html to ascii

Cameron Simpson cs at zip.com.au
Mon Mar 30 04:06:44 CEST 2015


On 29Mar2015 21:49, bruce <badouglas at gmail.com> wrote:
>Doing a quick/basic pycurl test on a site and trying to convert the
>returned page to pure ascii.

And if the page cannot be representing in ASCII?

>The page has the encoding line
><meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
>The test uses pycurl, and the StringIO to fetch the page into a str.

Which StringIO? StringIO.StringIO or io.StringIO? In Python 2 the format is 
effectively bytes (python 2 str) and the latter is unicode (as it is in python 
3).

>pycurl stuff
>foo=gg.getBuffer()
>-at this point, foo has the page in a str buffer.
>What's happening, is that the test is getting the following kind of error/
>UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 20:
>invalid start byte

Please show us more of the code, preferrably a complete example as small as 
possible to reproduce the exception. We have no idea what "gg" is or how it was 
obtained.

>The test is using python 2.6 on redhat.
>I've tried different decode functions based on different
>sites/articles/stackoverflow but can't quite seem to resolve the issue.

Flailing about on stackoverflow sounds a bit random.

Have you consulted the PycURL documentation, especially this page:

  http://pycurl.sourceforge.net/doc/unicode.html

which looks like it ought to discuss your problem.

Cheers,
Cameron Simpson <cs at zip.com.au>


More information about the Tutor mailing list