[Web-SIG] WSGI2: write callable?

Sat Sep 27 23:38:11 CEST 2014

I think we're uncovering important assumptions / facts here.

For clarity: I'm not interested in a nice API for HTTP/2. I want
HTTP/2 and its full featureset to be *possible*, *efficient* and
*clear* in a protocol that can replace WSGI - and do so with a fair
chance of adoption. Ditto websockets. Neither is possible within WSGI
today: the base protocol is insufficient, and every implementation of
either HTTP/2 or Websockets for app writers only works by depending on
extensions that don't meet the basic design principles - for instance
exposing the actual server socket as an extension, which mod_wsgi
cannot do.

So, basic axioms I've been working from:
 * HTTP/2 cannot be tunnelled through HTTP/1: it can be downgraded,
but not tunnelled. An HTTP/2->HTTP1.1->HTTP/2 chain is not capable of
the same results as a straight HTTP/2 connection (or chain).
 * This almost certainly applies to WSGI as well: WSGI2 -> WSGI1 ->
WSGI2 will have to downgrade to WSGI1. Some things may be tunnelable
[and we can try to do that], but the full set of features almost
certainly cannot.

>From this I drew the proposal to do interop by providing an API [not
protocol] that provides WSGI1 on the top and 2 on the bottom, and
another that does the reverse: allowing folk to upgrade individual
middleware piecemeal, and get the full benefits whenever they have a
fully upgraded stack. E.g. leave upgrading debug middleware to the
end. Perhaps this is misguided and implementors will reject such
assistance?

On 28 September 2014 07:55, PJ Eby <pje at telecommunity.com> wrote:
> On Sat, Sep 27, 2014 at 12:20 AM, Robert Collins
> <robertc at robertcollins.net> wrote:
>> We should capture these design principles somewhere FAQ-like, since
>> many of the folk participating in this rework weren't part of the
>> original design.
>
> A lot of it is in the PEP itself, albeit in ways that seem a lot more
> obscure now, 10 years later, than they did at the time of writing.
> It's also spread out among different parts, including the FAQ at the
> end.

:) - I am familiar with PEP, so yeah, does feel a bit obscure :).
Thank you for chiming in to reinforce them.

> Any feature which is added solely to entice an end-consumer of WSGI
> (vs. a framework or library implementer) is 100% wasted.

I understand that argument, but...
...
> If WSGI 2 adds features that users want, and library/framework
> developers can reasonably add those features to *their* APIs, then
> there is a chance that they will do so.  But if they have to throw out
> their whole existing paradigm to do that, or users have to abandon
> their framework to adopt the WSGI 2 paradigm, then nothing was really
> gained by the effort.

libraries and frameworks exist for the same users. WSGI's ability to
say 'and this is up to library/framework developers' is contingent on
the protocol being *sufficient* for folk to do that. I suspect a bunch
of our discussions are going to end up being around whether specific
changes are necessary or things libraries can do.

> Basically, going after end users puts you in a "boil the ocean"
> position.  That is, a situation where you must more or less convince
> everybody to change at the same time in order for the standard to
> reach critical mass.

I had hoped not, due to proposing that we provide an API [not
protocol] for adapting between the protocols. That would exist solely
to make implementors have an easier time bringing support in
incrementally. So - I think you're misinterpreting my thrust as being
'after end users' - I'm not: I'm squarely focused on the
implementation problems of server and middleware authors.

> However, if you are *not* trying to boil the ocean by attracting end
> users, then anything that you do to benefit them (at the expense of
> framework, middleware, or server authors) is pure waste, since the
> incremental strategy (that WSGI was based on in the first place)
> doesn't depend on end-users using the raw WSGI protocol.  As the PEP
> itself explains:
>...
> If you replace "WSGI" with "WSGI 2" in the above, the rationale
> remains unchanged.

Sure.

>>> The above API is cute and clean for the app writer, but for a
>>> middleware writer it's a barrel of misery.  *Every* piece of
>>> middleware that even wants to *read* anything from the response (let
>>> alone modify it), now needs to check types of yielded values,
>>> accumulate headers, and maybe buffer content.  And there are many ways
>>> to write that middleware that will be wrong, but *appear* right
>>> because the author didn't think of all the ways that an app could
>>> violate the middleware author's assumptions.
>>
>> Hang on, why would they buffer content? Buffering response content is
>> currently verboten, and I haven't seen any proposal to change that. I
>> don't understand how phrasing the API as I suggested would lead to
>> buffering being permitted or required.
>
> By "content" I was actually talking about the headers or other
> metadata.  Sorry for the confusion.

No worries. Right now buffering of headers is required - the whole
'until the iterator returns a non-empty bytestring' bit - sure,  I'd
like to get rid of that.

I still don't see a case where the generator based protocol would
force buffering of headers [outside of the context of middleware that
actually wants to buffer headers].

>> If its a method on the response body, the returning a list or
>> generator no longer works, unless you start poking random attributes
>> onto things. It would also be inconsistent - why would trailers be a
>> method on the response, but headers be a dict in the return value?
>
> (FWIW, I never proposed making headers a dict.  That's a bad idea, IMO.)

Could you enlarge on that? There have been lots of [often security
related] bugs in implementations of HTTP/1.x which were due to
protocol handlers *not* treating the headers as dicts. Things like
appending a header that cannot be repeated where in an N-tier deployed
system the first layer consults the last header and the second layer
consults the first. HTTP's header model could be modelled as
{header: [value, ...]} or even more strictly as {header:
value_or_list_value}. I'm going to guess and say 'a list is necessary,
a dict isn't, and someone can write middleware to sanitise response
headers' ?

> As for returning a list or generator, I don't see why you can't do e.g.
>
>     return status, headers, trailing_signature(body, ...)
>
> Where trailing_signature is a function that returns an iterator with
> appropriate annotation, wrapping the original iterable.  That works
> whether body is a list or a generator or some other custom iterable.
>
> ("Poking random attributes onto things" isn't a requirement, IOW.)

yield from in recent pythons could make that fairly efficient, ok.
Still leaves the inconsistency between an immediate value for headers
and a late bound value for trailers but perhaps thats ok.

..
> Sure -- the existence of bytes is an obvious win, as is the dropping
> of start_response.  But if you want WSGI 2 to be *interoperable* with
> WSGI 1, or more precisely, if we want to support *tunneling* WSGI 2
> through a WSGI 1 stack, then the design has to be at least somewhat
> constrained by WSGI 1.

Ok, so I don't think we *can* do that, and in fact I think we
shouldn't. I think we *can* do the following:
 - make WSGI2 degrade to WSGI1 via an adapter
 - tunnel WSGI1 through WSGI2

I may be wrong, and if we're clever enough - great. OTOH some of the
changes we're discussing - like getting rid of start_response and
making bidirectional channels possible - are pretty fundamentally
different to WSGI1, and I'd be worried about a protocol that requires
middleware authors to write to *both* WSGI1 and WSGI2 at the same
time. I think thats an unnecessary burden and one that will hinder
adoption.

> So, I don't see a problem with creating a response object per se.  I
> was just thinking that with middleware, you really want to be able to
> mix and match what features are being returned with the response, so
> unless you use `__getattr__` proxying, or it's required that response
> objects allow arbitrary attributes to be added, then the paradigm "bag
> of related features in a dictionary" better fits the requirement than
> "return an object".

Ok.

>>> So, let's trim the sharp edges for the poor middleware and server
>>> developers, rather than polishing the bits that app writers aren't
>>> going to be using, anyway.  (Since most of them are going to be using
>>> Django, Pyramid, Flask, or whatever the latest hotness is, anyway.)
>>
>> Do you have a hitlist of such sharp edges you'd like to see catered
>> for in this new spec?
>
> The ones described in the wsgi_lite docs:
>
> 1. People forgetting that the environ is volatile
> 2. People forgetting to close()
> 3. The horror that is the stateful nature of the current protocol (all
> the rules on what can be called when)
>
> In wsgi_lite I addressed #1 by providing the binding protocol to map
> desired request data to keyword arguments.  #2, by the "closing"
> extension, and #3 by switching to a functional paradigm rather than an
> imperative one.  (Thus eliminating any rules on what can be called
> when, because the response is a return value, not an invocation of
> something.)

Has wsgi_lite been picked up by server and middleware authors? Do we
have any feedback on how well its working?

> All in all, it kind of sounds to me like what you *really* want is to
> make a user-level API for HTTP/2 applications.  And maybe it would be
> a good idea to do that *first*, without reference to tweaking WSGI.

So, my personal driver is that I have multiple use cases, most but not
all of which are end user use cases, that depend on HTTP/2 // will
benefit from HTTP/2. A user level API is certainly a thing that will
need to exist, but all the servers around so far are just degrading
HTTP/2 to WSGI - the lingua franca. One perhaps unintended consequence
of WSGI is that its become that lingua franca, and many things are
internally structured around middleware stacks :). So the first thing
that needs to be done is a WSGI like thing and internal code
shuffling. You're right though that more implementor experience would
be good - I'm hoping do be doing that on the basis of drafts and
discussion.
...
> And finally, we could look at that protocol and say, "okay, can we
> encapsulate this protocol in such a way that it can be safely tunneled
> through WSGI 1?"

If it can :).

> Each of these stages has benefit.  If you only get through the first,
> at least it's possible to do HTTP/2 in Python!  If you get through the
> second, well, maybe it's not WSGI, but at least it's a protocol (SSGI?
>  H2GI?).  And so on.

> I guess what I'm saying is, based on what you seem to be trying to do,
> I think trying to update WSGI is *way* premature.  Even WSGI wasn't
> proposed in a vacuum: it was based on looking at the APIs provided by
> existing Python-supporting web servers and required by existing Python
> web frameworks.  So, in the absence of even *one* HTTP/2 framework API
> to drive the requirements, it's probably premature to propose paradigm
> shifts in WSGI itself.

So, there are multiple examples of websockets today, which share much
in common with HTTP/2. All of them require server support, and tunnel
through WSGI in ways that are liable to break (e.g. a middleware that
remotes objects will almost certainly fail to handle the raw socket).

> Does an HTTP/2 server or API for Python even *exist* yet?

Yes. http://nghttp2.org/documentation/package_README.html#python-bindings

The model is of a handler class, and four events - headers, data,
request fully received, stream closed. It supports push, but in a way
that prevents implementing a notification server such as
https://tools.ietf.org/html/draft-thomson-webpush-http2-00 specifies.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud