[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client

Thu Jan 7 04:49:55 EST 2016

On Thu, Jan 7, 2016 at 8:20 PM, Emil Stenström <em at kth.se> wrote:
>
> So the rationale for this idea is:
>
> * http.client doesn't work the way beginners expect for very basic usecases (posting unicode strings)
> * Libraries in other languages behave like beginners expect, which magnifies the problem.
> * Changing the default latin-1 encoding probably isn't possible, because it would break the spec...
> * But catching the exception and try encoding in utf-8 instead wouldn't break the spec and solves the problem.
>
> ----
>
> Here's a couple of issues where people expect things to work differently:
>
> https://github.com/kennethreitz/requests/issues/1926
> https://github.com/kennethreitz/requests/issues/2838
> https://github.com/kennethreitz/requests/issues/1822
>
> ----
>
> Does this make sense?

It makes sense, but I disagree with the suggestion. Having "Latin-1 or
UTF-8" as the effective default encoding is not a good idea, IMO;
sometimes I've *de*coded text using such heuristics (the other order,
of course; attempt UTF-8 decode, and if that fail, decode as Latin-1
or possibly CP-1252) as a means of coping with broken systems, but I
would much prefer the default to simply be one or the other.

As the 'requests' module is not part of Python's standard library, it
would be free to change its own default, regardless of the behaviour
of http.client; whether that's a good idea or not is for the requests
community to decide (unless there's something specifically binding it
to http.client). But whether you're asking for a change in http.client
or in requests, I would disagree with the "either-or" approach; change
to a UTF-8 default, perhaps, but not to the hybrid.

ChrisA