[Web-SIG] WSGI, Python 3 and Unicode
graham.dumpleton at gmail.com
Mon Dec 10 04:56:55 CET 2007
On 09/12/2007, Guido van Rossum <guido at python.org> wrote:
> On Dec 8, 2007 12:37 AM, Graham Dumpleton <graham.dumpleton at gmail.com> wrote:
> > On 08/12/2007, Phillip J. Eby <pje at telecommunity.com> wrote:
> > > * When running under Python 3, servers MUST provide a text stream for
> > > wsgi.errors
> > In Python 3, what happens if user code attempts to output to a text
> > stream a byte string? Ie., what would be displayed?
> Nothing. You get a TypeError.
Hmmm, this in itself could be quite a pain for existing code where
people have added debug code to print out details from request headers
(if now to be passed as bytes), or part of the request content.
What is the suggested way of best dumping out bytes for debugging
purposes so one does not have to worry about encoding issues, just use
> > Also, if wsgi.errors is a text stream, presume that if a WSGI adapter
> > has to internally map this to a C char* like API for logging that it
> > would need to apply standard Python encoding to yield usable char*
> > string for output.
> The encoding can/must be specified per text stream.
But what should the encoding associated with the wsgi.errors stream be?
If code which outputs text to wsgi.errors can use any valid Unicode
character, if one sets it to US-ASCII encoding, then chance that
logging output will fail because of characters not being valid in that
character set. If one instead uses UTF-8, then potentially have issues
where that byte string coming out other end of text stream is passed
to C API functions. Issues might arise here where C API not expecting
variable width character encoding.
I'll freely admit I am not across all this Unicode encode/decode stuff
as I don't generally have to deal with foreign languages, but seems to
be a few missing details in this area which need to be filled out for
a modified WSGI specification.
More information about the Web-SIG