On Thu, Jan 7, 2016 at 10:50 AM, Emil Stenström <em@kth.se> wrote:
Den 2016-01-07 kl. 17:46, skrev Cory Benfield:
On 7 Jan 2016, at 16:32, Guido van Rossum <guido@python.org>
wrote:

Personally I'm perplexed that Requests, which claims to be "HTTP
for Humans" doesn't take care of this but just lets http/client.py
blow up. (However, IIUC both 2838 and 1822 are about the
body.encode() call in Python 3's http/client.py at _send_request().
1926 seems to originate in Requests itself; it's also Python 2.7.)

The main reason is historical: this was missed in the original
(substantial) rewrite in requests 2.0, and as a result we can’t
change it without a backward compat break, just the same as Python.
We’ll probably fix it in 3.0.

So as things stand:

* The general consensus seems to be that the raised error should be changed to something like: TypeError("Unicode string supplied without an explicit encoding")

* Python would like to change http.client to reject unicode input with an exception, but won't because of backwards compatibility

* Requests would like to do the same but won't because of backwards compatibility

I think it will be very hard to find code that breaks because of a type change in the exception when sending invalid data. On the other hand, it's VERY easy to find people that are affected by the confusing error currently in use everywhere.

When a backward compatible change makes life easier for 99.9% of users, and 0.1% of users need to debug a TypeError with a very clear error message (which was probably a bug in their code to begin with), I'm starting to question having a policy that strict.

What policy are you referring to? I don't think anyone objects against making the error message clearer. The objection is against rejecting unicode strings that in the past would have been successfully encoded using Latin-1.

I'm not sure whether it's a good idea to change the exception type from TypeError to UnicodeError -- the exception is really related to Unicode so keeping UnicodeError but changing the message sounds like the right thing to do. And this can be done independently in both Requests and the stdlib.

--
--Guido van Rossum (python.org/~guido)