requests.Session() how do you set 'replace' on the encoding?
dieter
dieter at handshake.de
Tue Jul 7 01:38:19 EDT 2015
Veek M <vek.m1234 at gmail.com> writes:
> dieter wrote:
>
>> Veek M <vek.m1234 at gmail.com> writes:
>>> UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in
>>> position 8: illegal multibyte sequence
>>
>> You give us very little context.
>
> It's a longish chunk of code: basically, i'm trying to download using the
> 'requests.Session' module and that should give me Unicode once it's told
> what encoding is being used 'gbk'.
>
> def get_page(s, url):
> print(url)
> r = s.get(url, headers = {
> 'User-Agent' : user_agent,
> 'Keep-Alive' : '3600',
> 'Connection' : 'keep-alive',
> })
> s.encoding='gbk'
It looks strange that you can set "s.encoding" after you have
called "s.get" - but, as you apparently get an error related to
the "gbk" encoding, it seems to work.
> text = r.text
> return text
>
> # Open output file
> fh=codecs.open('/tmp/out', 'wb')
> fh.write(header)
>
> # Download
> s = requests.Session()
> ------------
>
> If 'text' is NOT proper unicode because the server introduced some junk,
> then when i do anchor.getparent() on my 'text' it'll traceback..
> ergo the question, how do i set a replacement char within 'requests'
I see the following options for you:
* you look at the code (of "requests.Session"),
determine where the "s.encoding" is taken care of and
look around whether there it also support a replacement strategy.
Then, you use this knowledge to set up your replacement.
* you avoid the "unicode" translating functionality of
"requests.Session". If it does not immediately supports this,
you can trick it using the "iso-8859-1" encoding (this maps
bytes to the first 256 unicode codepoints in a one-to-one way)
and then do the unicode handling in your own code -- with
facilities you already know of (including replacement)
* you contact the website administrator and ask him why
the delivered pages do not contain valid "gbk" content.
More information about the Python-list
mailing list