[Web-SIG] String Types in WSGI [Graham's WSGI for py3]

Ian Bicking ianb at colorstudy.com
Fri Sep 18 19:02:14 CEST 2009


On Fri, Sep 18, 2009 at 2:56 AM, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
> As others have pointed out, the likes of rack and jack, not sure about
> the new Perl variant, don't seem to have an issue with using unicode.

I looked up Jack and Rack: http://jackjs.org/jsgi-spec.html and
http://rack.rubyforge.org/doc/files/SPEC.html

They don't have an issue with unicode because they don't mention it
and don't specify anything at all.  Basically they punt on the issue.

In the specific case, most things in Javascript have to be unicode.
The response body iterator must have items that respond to
toByteString, which includes String and Binary.  I'm assuming Strings
always use UTF8 in Javascript, as JSON acts that way.  jsgi.input is
only specified as an "input stream", which is very unspecified.
Especially since jsgi.errors is an "output stream", though presumably
one should be binary and the other text.

Ruby's unicode is kind of funny (as I understand it), in a way that
might help them.  Strings are stored as binary with an attached
encoding.  So there's no "unicode", only binary strings with
encodings; so you can change the encoding, or transcoding happens
implicitly when you combine strings from different encodings.  So
basically there's no mention of unicode because they've dodged that
whole bullet.  But it also seems to be unspecified what encoding might
be attached to strings, if any at all.

Another example, neither spec even indicates if SCRIPT_NAME/PATH_INFO
are url-decoded (or that they aren't decoded).  So, in summary: I
don't see anything we can learn from these specs, and there's no
reason we should feel like we've somehow been leapfrogged, instead
these other specifications are underspecified.  I also think on
Web-SIG we are approaching this with more robust and general
applications in mind than for Jack and Rack -- for instance, I would
like WSGI to be a reasonable basis for an HTTP proxy, where you can't
enforce UTF8-everywhere.  If all we wanted for WSGI was to be a layer
for serving monolithic applications then these issues wouldn't be so
important.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker


More information about the Web-SIG mailing list