[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client
Emil Stenstrรถm
em at kth.se
Thu Jan 7 04:20:35 EST 2016
Hi,
I hope python-ideas is the right place to post this, I'm very new to
this and appreciate a pointer in the right direction if this is not it.
The requests project is getting multiple bug reports about a problem in
the stdlib http.client, so I thought I'd raise an issue about it here.
The bug reports concern people posting http requests with unicode
strings when they should be using utf-8 encoded strings.
Since RFC 2616 says latin-1 is the default encoding http.client tries
that and fails with a UnicodeEncodeError.
My idea is NOT to change from latin-1 to something else, that would
break compliance with the spec, but instead catch that exception, and
try encoding with utf-8 instead. That would avoid breaking backward
compatibility, unless someone specifically relied on that exception,
which I think is very unlikely.
This is also how other languages http libraries seem to deal with this,
sending in unicode just works:
In cURL (works fine):
curl http://example.com -d "Celebrate ๐"
In Ruby with http.rb (works fine):
require 'http'
r = HTTP.post("http://example.com", :body => "Celebrate ๐)
In Node with request (works fine):
var request = require('request');
request.post({url: 'http://example.com', body: "Celebrate ๐"}, function
(error, response, body) {
console.log(body)
})
But Python 3 with requests crashes instead:
import requests
r = requests.post("http://localhost:8000/tag", data="Celebrate ๐")
...with the following stacktrace:
...
File "../lib/python3.4/http/client.py", line 1127, in _send_request
body = body.encode('iso-8859-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position
14-15: ordinal not in range(256)
----
So the rationale for this idea is:
* http.client doesn't work the way beginners expect for very basic
usecases (posting unicode strings)
* Libraries in other languages behave like beginners expect, which
magnifies the problem.
* Changing the default latin-1 encoding probably isn't possible, because
it would break the spec...
* But catching the exception and try encoding in utf-8 instead wouldn't
break the spec and solves the problem.
----
Here's a couple of issues where people expect things to work differently:
https://github.com/kennethreitz/requests/issues/1926
https://github.com/kennethreitz/requests/issues/2838
https://github.com/kennethreitz/requests/issues/1822
----
Does this make sense?
/Emil
More information about the Python-ideas
mailing list