requests.Session() how do you set 'replace' on the encoding?
dieter at handshake.de
Fri Jul 3 07:59:24 CEST 2015
Veek M <vek.m1234 at gmail.com> writes:
> I'm getting a Unicode error:
> Traceback (most recent call last):
> File "fooxxx.py", line 56, in <module>
> parent = anchor.getparent()
> UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in position
> 8: illegal multibyte sequence
You give us very little context.
Using "getparent" seems to indicate that you are doing something with
hierarchies, likely some XML processing. In this case,
the XML document likely specified "gbk" as document encoding
(otherwise, you would get the default "utf-8") -- and it got it wrong
(which should not happen).
In general: when you need control over encoding handling because
deep in a framework an econding causes problems (as apparently in
your case), you can usually first take the plain text,
fix any encoding problems and only then pass the fixed text to
> I'm doing:
> s = requests.Session()
> to suck data in, so.. how do i 'replace' chars that fit gbk
It does not seem that the problem occurs inside the "requests" module.
Thus, you have a chance to "intercept" the downloaded text
and fix encoding problems.
More information about the Python-list