[Web-SIG] WSGI2: write callable?

Mon Sep 29 04:09:39 CEST 2014

On 29 September 2014 08:32, PJ Eby <pje at telecommunity.com> wrote:
>  On Sat, Sep 27, 2014 at 5:38 PM, Robert Collins
> <robertc at robertcollins.net> wrote:
>> I think we're uncovering important assumptions / facts here.
>
> Indeed!
>
>
>> For clarity: I'm not interested in a nice API for HTTP/2. I want
>> HTTP/2 and its full featureset to be *possible*, *efficient* and
>> *clear* in a protocol that can replace WSGI - and do so with a fair
>> chance of adoption.
>
> Cool.  Then my suggestion would be: don't use WSGI as a basis for
> designing that protocol.  Start with something that's a natural fit
> for the HTTP/2 model, which -- from what I can tell so far -- is
> nothing like WSGI's simple request/response model.

Thats a fair point. I have not been constrained by WSGI today in
thinking about this - but since this effort is about updating the
standard folk write to, for web server -> gateway -> app plumbing in
Python, WSGI is, for better or worse, the touchstone folk have.

WSGI's simple request/response model has been unable to fully handle
the modern web since HTTP/1.1 came out (AFAIK none of the gateways
have managed to make chunked uploads work right, and trailers are not
supported). Thats not a bad thing about WSGI, and putting the union of
requirements into a spec can make it unwieldy (see RFC2616 for a
classic example :)) - but while we have a lot of frameworks that are
a) composed of WSGI adapters and b) WSGI on the top, or WSGI on the
bottom, they're not following WSGI all that precisely, because WSGI is
too restrictive. And the web has moved on with Websockets in 2011 and
HTTP/2 any day now.

I think its clear from the broad interest we've got that folk are
interested in a new spec.

Whats in a name?

We could call it something else. RSGI as you humourously suggest, or
we can call it WSGI.

I think WSGI is the right name, because I don't think we want to aim
for a situation where folk writing new servers write both WSGI and
$NEWTHING support. They should be able to pick one, write to it well,
and have their users choose to downgrade the environment if they have
legacy things that are not yet upgraded.

So - I'm going to keep drafting this as WSGI2, unless there is
consensus here that the name should be different.

> Right.  I do think it might be worthwhile creating a spec for how to
> create safe "middleware-bypassing" and "rich object" server extensions
> within WSGI, to allow limited use of HTTP/2 features.

That might be an interesting thing; I have no real interest in writing
it at this point: my intent is to provide a new thing, which may be
very similar, or may be strikingly different - thats what this SIG
will come up with - which can contain WSGI1 middleware safely via some
adapter. I don't have interest in writing a 'do HTTP/2 features from
within WSGI1' effort, because I think its a lot of work for
little-if-any-gain: we have to have servers that can speak the new
wire protocols before we can use the new features, and that means the
top of our stack will be $NEWTHING anyway. There are some exceptions,
such as the mod_spdy hack to tunnel awareness of to-push resources,
but its not clear that that will do the right thing in all
circumstances with oblivious middleware, no matter how its spelt in
code. [Because, whatever failure mode we choose by default, some
middleware will want the other one - at which point its not oblivious,
and may as well just be upgraded].

>>  * This almost certainly applies to WSGI as well: WSGI2 -> WSGI1 ->
>> WSGI2 will have to downgrade to WSGI1. Some things may be tunnelable
>> [and we can try to do that], but the full set of features almost
>> certainly cannot.
>
> That depends on what you mean by "WSGI2".  I think an HTTP/2 gateway
> API is a different animal than "WSGI2" per se.  I think there may be
> room for a request/response WSGI2, distinct from a Python HTTP/2 API,
> and (mostly) interoperable with WSGI 1.  That doesn't mean that the
> HTTP/2 API might not win over the market and supplant WSGI1/2, I'm
> just not convinced that it should be positioned as WSGI's successor.
> (At least, not until I've seen it... ;-) )

Thats fair enough, but in the absence of a better name - and see
above: having the need for server and middleware authors to only need
to care about one protocol is a key design point - I think calling it
WSGI2 is better than calling it something new. If its going to make
the discussion hard, I'm ok calling it e.g. NNGI (no name gateway
interface) until we're done.

>
>> From this I drew the proposal to do interop by providing an API [not
>> protocol] that provides WSGI1 on the top and 2 on the bottom, and
>> another that does the reverse: allowing folk to upgrade individual
>> middleware piecemeal, and get the full benefits whenever they have a
>> fully upgraded stack. E.g. leave upgrading debug middleware to the
>> end. Perhaps this is misguided and implementors will reject such
>> assistance?
>
> My suggestion would be to make a good HTTP/2 API without any WSGI
> legacy, and then develop a set of middleware-safe server extensions to
> provide HTTP 2 features on WSGI 1.  Here's an idea about how you can
> safely do that, for trailers, push, and even websockets:

Your adapter sketch there is a useful escape hatch approach. It may
have some use. However the downside is that its going to break on a
lot of middleware.

The approach I'm thinking of is more:

def wsgi2_under_wsgi(app):
    def converter(environ, start_response):
        if is_really_wsgi2(environ):
             # fast path to detect things that were wrapped unnecessarily
             return app(environ)
        # <... stuff oh my gosh stuff to convert the protocols,
downgrading all features>
    return converter

def wsgi_under_wsgi2(app):
    # export a WSGI2 server as a WSGI1 server
    def start_response(status, headers):
        try:
         # and so on
    def converted_environ(environ):
        ....
        # include a marker in here to let wsgi2_under_wsgi fast-path it
    def wsgi2_to_wsgi(environ):
        return app(converted_environ(environ), start_response)

Really, I think we're agreeing on 95% here, but I'm biasing for having
a majority of WSGI2 eventually, whereas you seem to be biasing for
having a majority of WSGI indefinitely. The reason I want to bias for
the long term, is that it will be with us for a long while. We need to
make incremental deployment easy - and that may well mean tunnelling
some things. The role of the spec here though is to define the
protocol by which folk can write tunnellers *after* we get the thing
working. Perhaps thats exactly what you mean: decouple an
HTTP/2+Websockets+HTTP/1.x protocol from tunnelling new features
through WSGI for legacy deployments. If thats what you mean, then I
agree - and thats what I'm working on :)

>>> (FWIW, I never proposed making headers a dict.  That's a bad idea, IMO.)
>>
>> Could you enlarge on that? There have been lots of [often security
>> related] bugs in implementations of HTTP/1.x which were due to
>> protocol handlers *not* treating the headers as dicts. Things like
>> appending a header that cannot be repeated where in an N-tier deployed
>> system the first layer consults the last header and the second layer
>> consults the first. HTTP's header model could be modelled as
>> {header: [value, ...]} or even more strictly as {header:
>> value_or_list_value}. I'm going to guess and say 'a list is necessary,
>> a dict isn't, and someone can write middleware to sanitise response
>> headers' ?
>
> Actually, the reason is that one of the WSGI design principles is that
> it tries to stay as close as possible to the wire protocol it was
> based on.  HTTP/1 headers are a series of lines, so WSGI headers are a
> series of lines.  If some browser crashes when you put the headers in
> the "wrong" order (from its perspective), then WSGI should not create
> any obstacles to sending them in the "right" order (i.e., the one that
> doesn't make e.g. IE crash).
>
> (I'm not saying that such an issue actually exists/existed, just that
> a list was chosen based on the principle that WSGI should give the app
> as much control over the output stream as possible.  The stream
> blocking and timing requirements exist for the same reason.)

Ok, so implementor experience in the wild has taught us that this is a
bad idea ;). http://tools.ietf.org/html/rfc7230#section-3.2.2 - the
protocol defines wire order of headers as undefined, except that the
relative order of a) list headers and b) set-cookie needs to be
preserved.
{headername: [value, ...]} is a superset that would model this but be
a lot harder to get wrong for middleware.

The IE crashing scenario is not one I'm worried about because
intermediaries like Squid and Apache have been normalising and
altering headers for a /very/ long time. The websockets RFC explicitly
permits arbitrary orders (after we had a long discussion about HTTP
semantics during the spec process :)).

>> So, there are multiple examples of websockets today, which share much
>> in common with HTTP/2. All of them require server support, and tunnel
>> through WSGI in ways that are liable to break (e.g. a middleware that
>> remotes objects will almost certainly fail to handle the raw socket).
>
> So we should definitely fix that, by defining a safe "rich server API
> upgrade" escape for WSGI.  Hm....  maybe your new API should be the
> "Rich Server Gateway Interface", or RSGI -- pronounced "risky".  ;-)

So that would/might address the breakiness but it wouldn't standardise
the upgraded protocol(s) - the network effect is where the value is:
folk can work around bugs on a case by case basis.

>>> Does an HTTP/2 server or API for Python even *exist* yet?
>>
>> Yes. http://nghttp2.org/documentation/package_README.html#python-bindings
>>
>> The model is of a handler class, and four events - headers, data,
>> request fully received, stream closed. It supports push, but in a way
>> that prevents implementing a notification server such as
>> https://tools.ietf.org/html/draft-thomson-webpush-http2-00 specifies.
>
> This looks like a fairly reasonable approach to an API.  Given that
> we'll still have WSGI for simple cases, I don't see an issue with RSGI
> having an event-driven model with various APIs going in both
> directions.  But I'll probably bow out of most discussions about
> defining RSGI unless I see something that relates to "lessons learned"
> in WSGI.  I worry a little that a RSGI design is still premature,
> given only ONE Python API, but if we have rich escapes in WSGI, then
> there will be room for servers to develop experimental HTTP/2 APIs
> that can then form a basis for RSGI later.
>
> Yeah, that really looks like the way forward: define a safe way to
> escape WSGI from inside of it, so that server developers aren't forced
> to dumb down HTTP/2 to WSGI, in order to provide rich HTTP/2 APIs.
> What do you think?

I worry that that leaves us with a lingua franca which we're expecting
everyone to escape from. That doesn't seem like a great place to aim
it. It would be equivalent to HTTP/2 requiring HTTP/1 on all
connections and then working well after that.

What HTTP/2 has done instead is to define both a no-overhead
direct-to-HTTP/2 handshake, *and* an upgrade handshake. Doesn't matter
which you use - but the direct one (TLS/ALPN) is less round trips vs
TLS + HTTP/1 + upgrade and more secure vs HTTP/1. [Encryption isn't a
hard requirement of HTTP/2.... but a number of big browser vendors
have said they won't implement the non-encryption codepath. So it is
an effective requirement outside of the plumbing of webapps.

There's been enough ideas put forward in this thread that I need to
sit down and do some experiments. I want to try out context managers
as a replacement for the close idiom, I want to try pure generator
based APIs a little.

I'd very much appreciate specific examples of middleware that you
believe are representative of the sorts of issues folk will encounter,
so that I can compare and contrast the implications of different
design decisions on them.

-Rob
-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud