[Web-SIG] bytes, strings, and Unicode in Jython, IronPython,
and CPython 3.0
paul.boddie at ementor.no
Wed Sep 15 12:33:31 CEST 2004
Phillip J. Eby wrote:
> I've reviewed last month's Python-Dev discussion about the future
> 'bytes()' type, and the eventual transition away from Python's current
> 8-bit strings.
> Mainly, the impression I get is that significant change in this
> really can't happen until Python 3.0, because too many things have to
> change at once for it to work.
I think there was (and perhaps still is) a runtime option to force
treat all strings as Unicode objects.
> So, here's what I propose to do about the open issue in PEP 333.
> and gateways that run under Python implementations where all strings
> Unicode (e.g. Jython) *may*:
> * accept Unicode statuses and headers, so long as they properly
> them for transmission (latin-1 + RFC 2047)
I think I encode all Unicode objects used in this area as US-ASCII in
> * accept Unicode for response body segments, so long as each segment
> be encoded as latin-1 (i.e. only uses chars 0-255)
It should be possible to be more intelligent about response bodies, but
can argue that it isn't up to something like WSGI to go through the
necessary gymnastics to make sure that Unicode objects presented to the
response stream become encoded appropriately.
> * produce Unicode input headers and body strings by decoding from
> latin-1, as long as the produced values are considered type 'str' for
> Python implementation.
I think I've left incoming headers as plain strings, but I suppose a
translation could be performed in WebStack.
More information about the Web-SIG