[Tutor] trying to convert pycurl/html to ascii
Cameron Simpson
cs at zip.com.au
Mon Mar 30 04:06:44 CEST 2015
On 29Mar2015 21:49, bruce <badouglas at gmail.com> wrote:
>Doing a quick/basic pycurl test on a site and trying to convert the
>returned page to pure ascii.
And if the page cannot be representing in ASCII?
>The page has the encoding line
><meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
>The test uses pycurl, and the StringIO to fetch the page into a str.
Which StringIO? StringIO.StringIO or io.StringIO? In Python 2 the format is
effectively bytes (python 2 str) and the latter is unicode (as it is in python
3).
>pycurl stuff
>foo=gg.getBuffer()
>-at this point, foo has the page in a str buffer.
>What's happening, is that the test is getting the following kind of error/
>UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 20:
>invalid start byte
Please show us more of the code, preferrably a complete example as small as
possible to reproduce the exception. We have no idea what "gg" is or how it was
obtained.
>The test is using python 2.6 on redhat.
>I've tried different decode functions based on different
>sites/articles/stackoverflow but can't quite seem to resolve the issue.
Flailing about on stackoverflow sounds a bit random.
Have you consulted the PycURL documentation, especially this page:
http://pycurl.sourceforge.net/doc/unicode.html
which looks like it ought to discuss your problem.
Cheers,
Cameron Simpson <cs at zip.com.au>
More information about the Tutor
mailing list