
On 7 Jan 2017, at 02:18, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
On Sat, 7 Jan 2017 at 03:23 Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
Maybe we should support unicode for the body as well. We can set the charset in the mime-type and everything so that it will be properly intelligible by the server, which doesn't happen if the user manually encodes like this.
Oh, forgot to comment on this point; in the specific case of JSON, it isn't necessary to specify UTF-8 in Content-Type[1], but for HTML or XML it's a pretty good idea. However, I'm not sure if it's possible to modify Content-Type in a generic fashion to make this sort of thing work; for example, "Content-Type: application/octet-stream; charset=UTF-8" is nonsense. I'll defer to some HTTP experts here ;)
This is really not simple, for the reason that many MIME types do not define a charset extension. In the case of JSON, it’s not just not necessary to specify UTF-8 in Content-Type, but the standard explicitly does not define charset for the JSON content type[0]:
Note: No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients.
Strictly a completely compliant implementation would not emit charset details for content types that have no charset registration. Such a thing is pretty tricky to do. Knowing that, it’s probably best to YOLO your way though, or forbid unicode in bodies. Cory [0]: https://tools.ietf.org/html/rfc7159#section-11