Thanks especially to Cory for digging into the source and the RFCs here!

Personally I'm perplexed that Requests, which claims to be "HTTP for Humans" doesn't take care of this but just lets http/client.py blow up. (However, IIUC both 2838 and 1822 are about the body.encode() call in Python 3's http/client.py at _send_request(). 1926 seems to originate in Requests itself; it's also Python 2.7.)

Anyways, if we were to follow the Python 3 philosophy regarding Unicode to the letter we would have to reject the str type altogether here, and insist on bytes. The error message could tell the caller what to do, e.g. "use data.encode('utf-8') if you want the data to be encoded in UTF-8". (Then of course the server might not like it.)

An alternative could be to look at the content-type header (if one is given) and use the charset from there or the default from the RFC for the content/type.

But all these are rather painfully backwards incompatible, which is a big concern here.

Maybe the best solution (most backward compatible *and* most likely to stem the flood of bug reports) is to just catch the UnicodeError and replace its message with something more Human-friendly, explaining that the data must be encoded before sending it. Then the user can figure out what encoding to use (though yes, most likely UTF-8 is it, so the message could suggest trying that first).



--
--Guido van Rossum (python.org/~guido)