[Web-SIG] bytes / unicode
henry at precheur.org
Wed Jun 23 23:35:38 CEST 2010
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote:
> I don't think you can't claim, though, that Python 3 makes things
> significantly harder for these frameworks. The proof is that many of
> them already give the user unicode strings in Python 2.x. They must
> have somehow got the decoding right.
Well... Frameworks usually 'simplify' the problem by partly ignoring it.
By default they assume the data in the request in UTF-8. You can specify
an alternative encoding in most of them. Django , Werkzeug , and
WebOb  do that.
The problem with this approach is that you still have to deal with weird
requests where one thing is unicode, and another is latin-1. Sometime
you can even have 2 different encodings in a single header like Cookies.
There's no solution to this problem, it has to be solved on a case by
There was a big discussion a while ago on web-sig. I think the consensus
was that WSGI for Python 3 should assume that the data is encoded in
latin-1 since it's the default encoding according to the RFC.
More information about the Web-SIG