On Thu, Jan 7, 2016 at 8:20 PM, Emil Stenström wrote:
So the rationale for this idea is:
* http.client doesn't work the way beginners expect for very basic usecases (posting unicode strings) * Libraries in other languages behave like beginners expect, which magnifies the problem. * Changing the default latin-1 encoding probably isn't possible, because it would break the spec... * But catching the exception and try encoding in utf-8 instead wouldn't break the spec and solves the problem.
----
Here's a couple of issues where people expect things to work differently:
https://github.com/kennethreitz/requests/issues/1926 https://github.com/kennethreitz/requests/issues/2838 https://github.com/kennethreitz/requests/issues/1822
----
Does this make sense?
It makes sense, but I disagree with the suggestion. Having "Latin-1 or UTF-8" as the effective default encoding is not a good idea, IMO; sometimes I've *de*coded text using such heuristics (the other order, of course; attempt UTF-8 decode, and if that fail, decode as Latin-1 or possibly CP-1252) as a means of coping with broken systems, but I would much prefer the default to simply be one or the other. As the 'requests' module is not part of Python's standard library, it would be free to change its own default, regardless of the behaviour of http.client; whether that's a good idea or not is for the requests community to decide (unless there's something specifically binding it to http.client). But whether you're asking for a change in http.client or in requests, I would disagree with the "either-or" approach; change to a UTF-8 default, perhaps, but not to the hybrid. ChrisA