[Web-SIG] WSGI 2

Bill Janssen janssen at parc.com
Tue Aug 4 19:30:01 CEST 2009


P.J. Eby <pje at telecommunity.com> wrote:

> At 02:28 PM 8/4/2009 +1000, Graham Dumpleton wrote:
> >2009/8/4 P.J. Eby <pje at telecommunity.com>:
> > > I'm not clear on your logic here.  If I request foo/bar/baz (where baz
> > > actually has an accent over the 'a') in latin-1 encoding, and 
> > foo/bar is the
> > > script, then the (accented) baz is legitimate for pass-through to the
> > > application, no?
> >
> >Technically, but what I am pointing out is that Apache pretty well
> >says that foo/bar needs to be UTF-8.
> 
> Which doesn't change the fact that you haven't yet proposed what a 
> WSGI server should *do* with such non-UTF8 bytes in PATH_INFO and 
> QUERY_STRING.  Apache can and does pass through such bytes, so the 
> spec needs to say what we do with them.

Particularly QUERY_STRING.  The original thinking around urlencoded was
that it was always Latin-1.  You were supposed to use
"multipart/form-data" for non-Latin-1 encodings.  Long thread on
www-talk circa 1994 about this.

I think bytes are the safest way to go here.  It would be nice if we
could automagically detect the correct encoding, but there's no
foolproof way of doing that.

Bill


More information about the Web-SIG mailing list