[Web-SIG] WSGI2: write callable?

Sun Sep 28 21:32:19 CEST 2014

 On Sat, Sep 27, 2014 at 5:38 PM, Robert Collins
<robertc at robertcollins.net> wrote:
> I think we're uncovering important assumptions / facts here.

Indeed!

> For clarity: I'm not interested in a nice API for HTTP/2. I want
> HTTP/2 and its full featureset to be *possible*, *efficient* and
> *clear* in a protocol that can replace WSGI - and do so with a fair
> chance of adoption.

Cool.  Then my suggestion would be: don't use WSGI as a basis for
designing that protocol.  Start with something that's a natural fit
for the HTTP/2 model, which -- from what I can tell so far -- is
nothing like WSGI's simple request/response model.

>  Ditto websockets. Neither is possible within WSGI
> today: the base protocol is insufficient, and every implementation of
> either HTTP/2 or Websockets for app writers only works by depending on
> extensions that don't meet the basic design principles - for instance
> exposing the actual server socket as an extension, which mod_wsgi
> cannot do.

Right.  I do think it might be worthwhile creating a spec for how to
create safe "middleware-bypassing" and "rich object" server extensions
within WSGI, to allow limited use of HTTP/2 features.

>  * This almost certainly applies to WSGI as well: WSGI2 -> WSGI1 ->
> WSGI2 will have to downgrade to WSGI1. Some things may be tunnelable
> [and we can try to do that], but the full set of features almost
> certainly cannot.

That depends on what you mean by "WSGI2".  I think an HTTP/2 gateway
API is a different animal than "WSGI2" per se.  I think there may be
room for a request/response WSGI2, distinct from a Python HTTP/2 API,
and (mostly) interoperable with WSGI 1.  That doesn't mean that the
HTTP/2 API might not win over the market and supplant WSGI1/2, I'm
just not convinced that it should be positioned as WSGI's successor.
(At least, not until I've seen it... ;-) )

> From this I drew the proposal to do interop by providing an API [not
> protocol] that provides WSGI1 on the top and 2 on the bottom, and
> another that does the reverse: allowing folk to upgrade individual
> middleware piecemeal, and get the full benefits whenever they have a
> fully upgraded stack. E.g. leave upgrading debug middleware to the
> end. Perhaps this is misguided and implementors will reject such
> assistance?

My suggestion would be to make a good HTTP/2 API without any WSGI
legacy, and then develop a set of middleware-safe server extensions to
provide HTTP 2 features on WSGI 1.  Here's an idea about how you can
safely do that, for trailers, push, and even websockets:

1. Define a server extension that accepts metadata or callbacks, and
returns a string (or array of strings if the extension applies to the
body)
2. To activate the effect, the app puts the string in a header (e.g.
"Content-Type: application/x-wsgi-rich-body; id=sfdfs876654") and
returns it in the body as well (e.g. ['sfdfs876654'])
3. If the header or body string reaches the origin server, apply the
metadata or invoke the callback(s)
4. If it doesn't, use the response the middleware provided instead
5. Discard all registered metadata or callbacks upon completion of the request

This model can be used for:

* Websockets - register a callback that receives the websocket, which
will be run in place of the middleware response
* Trailers - register a callback to generate the body and trailers
* Associated content - register metadata to push the content, listed
as header strings

Heck -- you can create a *generalized* escape path to allow a HTTP/2
app API instead of doing one-off protocols like this.  Imagine this
decorator:

    def http2_under_wsgi(http2_app):
        def wrapped(environ, start_response):
            try:
                upgrade = environ['http2.upgrade']
            except KeyError:
                raise RuntimeError("HTTP/2 API not available")
            return upgrade(http2_app, start_response)

    @http2_under_wsgi
    def my_http2_app(...):
         yield some.thing(...)  ???
         # whatever the super cool HTTP2 API does

Now, you just use @http2_under_wsgi as a wrapper to convert an HTTP/2
app to a WSGI 1 app.  The server environment just invokes
start_response with a special status, headers, return body, which
contain tag strings registered to the given `http2_app`.  In order to
actually handle the request, it looks up the tag it gets (since
middleware could be running multiple subrequests) to find the
http2_app it's going to run.  It then runs that app under the HTTP/2
API.

This model lets you run most of your app under plain WSGI, with
escapes as necessary, and even allows WSGI middleware for routing,
authentication, and other pre-processing; you just can't use
response-altering middleware.  In addition, you can write WSGI 1
middleware that still intercepts HTTP/2 API by replacing the
`http2.upgrade` key and wrapping the apps being passed up to the
server extension.

I hope this helps to explain why I don't think you should try to use
WSGI 1 as a basis for HTTP/2.  You can and should bypass it
altogether, especially since it should be able to be done in a way
that lets ANY existing WSGI 1 app framework "escape" to full HTTP/2,
where available.  And then, HTTP/2 needn't be burdened by any of the
many compromises and legacy crufts of WSGI and its CGI heritage.
You'll still need an implementation of WSGI *in terms of* the HTTP/2
API, plus the "escape" hook, but I don't see any reason why HTTP/2
*needs* to be even remotely WSGI-like.

>> (FWIW, I never proposed making headers a dict.  That's a bad idea, IMO.)
>
> Could you enlarge on that? There have been lots of [often security
> related] bugs in implementations of HTTP/1.x which were due to
> protocol handlers *not* treating the headers as dicts. Things like
> appending a header that cannot be repeated where in an N-tier deployed
> system the first layer consults the last header and the second layer
> consults the first. HTTP's header model could be modelled as
> {header: [value, ...]} or even more strictly as {header:
> value_or_list_value}. I'm going to guess and say 'a list is necessary,
> a dict isn't, and someone can write middleware to sanitise response
> headers' ?

Actually, the reason is that one of the WSGI design principles is that
it tries to stay as close as possible to the wire protocol it was
based on.  HTTP/1 headers are a series of lines, so WSGI headers are a
series of lines.  If some browser crashes when you put the headers in
the "wrong" order (from its perspective), then WSGI should not create
any obstacles to sending them in the "right" order (i.e., the one that
doesn't make e.g. IE crash).

(I'm not saying that such an issue actually exists/existed, just that
a list was chosen based on the principle that WSGI should give the app
as much control over the output stream as possible.  The stream
blocking and timing requirements exist for the same reason.)

> Has wsgi_lite been picked up by server and middleware authors? Do we
> have any feedback on how well its working?

Nope, I never got around to promoting it, apart from a blog post or
two introducing the idea a few years ago.  ;-)

> So, there are multiple examples of websockets today, which share much
> in common with HTTP/2. All of them require server support, and tunnel
> through WSGI in ways that are liable to break (e.g. a middleware that
> remotes objects will almost certainly fail to handle the raw socket).

So we should definitely fix that, by defining a safe "rich server API
upgrade" escape for WSGI.  Hm....  maybe your new API should be the
"Rich Server Gateway Interface", or RSGI -- pronounced "risky".  ;-)

Anyway, "upgrade escapes" are a generic concept, and we can define
that independently of *what* API you upgrade to, so that might be a
good idea to work on soon, as it could be used for websockets and the
like today, as a standardized WSGI extension.

>> Does an HTTP/2 server or API for Python even *exist* yet?
>
> Yes. http://nghttp2.org/documentation/package_README.html#python-bindings
>
> The model is of a handler class, and four events - headers, data,
> request fully received, stream closed. It supports push, but in a way
> that prevents implementing a notification server such as
> https://tools.ietf.org/html/draft-thomson-webpush-http2-00 specifies.

This looks like a fairly reasonable approach to an API.  Given that
we'll still have WSGI for simple cases, I don't see an issue with RSGI
having an event-driven model with various APIs going in both
directions.  But I'll probably bow out of most discussions about
defining RSGI unless I see something that relates to "lessons learned"
in WSGI.  I worry a little that a RSGI design is still premature,
given only ONE Python API, but if we have rich escapes in WSGI, then
there will be room for servers to develop experimental HTTP/2 APIs
that can then form a basis for RSGI later.

Yeah, that really looks like the way forward: define a safe way to
escape WSGI from inside of it, so that server developers aren't forced
to dumb down HTTP/2 to WSGI, in order to provide rich HTTP/2 APIs.
What do you think?