[Web-SIG] WSGI2: write callable?

PJ Eby pje at telecommunity.com
Sat Sep 27 20:55:23 CEST 2014


On Sat, Sep 27, 2014 at 12:20 AM, Robert Collins
<robertc at robertcollins.net> wrote:
> We should capture these design principles somewhere FAQ-like, since
> many of the folk participating in this rework weren't part of the
> original design.

A lot of it is in the PEP itself, albeit in ways that seem a lot more
obscure now, 10 years later, than they did at the time of writing.
It's also spread out among different parts, including the FAQ at the
end.


> Right now, anything providing the server profile has to cope with
> exceptions and translate those to 500 errors, so we have the variation
> of 'status and headers may not be provided'. Most middleware can be
> oblivious and delegate this to the server via bubble-up. I suspect the
> same would work for a default of 200 - 99% of middleware would ignore
> it and it would just work. However, I'm not super attached - it was
> just an idea.

In the limit case,

>
>>> So a classic example for Trailers is digitally signing streamed
>>> content. Using the same strawman API as above:
>>>
>>> def app(environ):
>>>    yield {':status': '200}
>>>    md5sum = md5.new()
>>>    for bytes in block_reader(open('foo', 'rb'), 65536):
>>>        md5sum.update(bytes)
>>>        yield bytes
>>>    digest = md5sum.hexdigest()
>>>    signature = sign_bytes(digest.encode('utf8'))
>>>    yield {'Content-MD5Sum': digest, 'X-Signature': signature}
>>>
>>> Note that this doesn't need to buffer or use a closure.



>> Please bear in mind that another core WSGI design principle is that we
>> don't make apps easier to write by making servers and middleware
>> harder to write.  That kills adoption and growth, because the audience
>> that *needs* to adopt WSGI (or any successor standard) is the audience
>> of people who write servers and middleware.  If a feature is sinfully
>> ugly for the app writer, but a thing of beauty for a middleware
>> author, we *want* that feature.
>
> I get that to a degree - I think there is a balance to be struck.

Actually, no, there *isn't*.  That's the whole point: there is NO
balance to be struck.  WSGI was never intended to be an API for
writing web applications.

In fact, not only was it not intended for that, it was *explicitly* an
*anti-goal*, and I don't think that any of the conditions that made it
an anti-goal have changed.

Any feature which is added solely to entice an end-consumer of WSGI
(vs. a framework or library implementer) is 100% wasted.

Why?  Because only the most trivial apps can be written with raw WSGI.
Real apps need a thousand tiny *other* features (routing, sessions,
authentication, authorization, registration, etc. etc.)...

Which means that even if you make an awesome WSGI API, nobody's going
to use it.  They're going to need libraries or frameworks, no matter
what.  WSGI itself cannot *possibly* compete with these libraries and
frameworks.

If WSGI 2 adds features that users want, and library/framework
developers can reasonably add those features to *their* APIs, then
there is a chance that they will do so.  But if they have to throw out
their whole existing paradigm to do that, or users have to abandon
their framework to adopt the WSGI 2 paradigm, then nothing was really
gained by the effort.

Basically, going after end users puts you in a "boil the ocean"
position.  That is, a situation where you must more or less convince
everybody to change at the same time in order for the standard to
reach critical mass.

However, if you are *not* trying to boil the ocean by attracting end
users, then anything that you do to benefit them (at the expense of
framework, middleware, or server authors) is pure waste, since the
incremental strategy (that WSGI was based on in the first place)
doesn't depend on end-users using the raw WSGI protocol.  As the PEP
itself explains:

"""But the mere existence of a WSGI spec does nothing to address the
existing state of servers and frameworks for Python web applications.
Server and framework authors and maintainers must actually implement
WSGI for there to be any effect.

However, since no existing servers or frameworks support WSGI, there
is little immediate reward for an author who implements WSGI support.
Thus, WSGI must be easy to implement, so that an author's initial
investment in the interface can be reasonably low.

Thus, simplicity of implementation on both the server and framework
sides of the interface is absolutely critical to the utility of the
WSGI interface, and is therefore the principal criterion for any
design decisions.

***Note, however, that simplicity of implementation for a framework
author is not the same thing as ease of use for a web application
author.*** WSGI presents an absolutely "no frills" interface to the
framework author, because bells and whistles like response objects and
cookie handling would just get in the way of existing frameworks'
handling of these issues. Again, the goal of WSGI is to facilitate
easy interconnection of existing servers and applications or
frameworks, not to create a new web framework."""

If you replace "WSGI" with "WSGI 2" in the above, the rationale
remains unchanged.




>> It's not a fair tradeoff, because only server authors and middleware
>> authors *have to* deal with WSGI directly.  App authors can use
>> libraries to pretty it up, so we don't need to pretty it for them in
>> advance -- especially since we don't know what their *personal* idea
>> of pretty is going to be.  ;-)
>
> Server authors and middleware authors can use libraries too: we can
> write functions to provide common handling for a bunch of stuff: thats
> not to say we should make things bad at the API level - we shouldn't -
> but it doesn't make sense to me to say that folk writing middleware
> cannot use libraries.

If the protocol is such that alternate paths have to be followed (the
"if" conditions I alluded to), then the only way a library can remove
this complexity is to implement a canonical form.  But if it is
*possible* to have a canonical form that doesn't require the alternate
paths, then that means we should make that canonical form the spec in
the first place.  There is no point to creating alternate
possibilities just so we can make a library to take them back out.
;-)

As explained in the PEP, we want the protocol *itself* to provide
simplicity for implementers who are adding support to existing tools.
If libraries are required to implement the protocol, then the people
implementing *those* libraries are the people we want to make things
simple for.  ;-)

Sure, it'd be awesome to provide good middleware facilities in a
library, but we should design the underlying protocol so that it's not
insanely difficult to make those libraries.  ;-)


>> The above API is cute and clean for the app writer, but for a
>> middleware writer it's a barrel of misery.  *Every* piece of
>> middleware that even wants to *read* anything from the response (let
>> alone modify it), now needs to check types of yielded values,
>> accumulate headers, and maybe buffer content.  And there are many ways
>> to write that middleware that will be wrong, but *appear* right
>> because the author didn't think of all the ways that an app could
>> violate the middleware author's assumptions.
>
> Hang on, why would they buffer content? Buffering response content is
> currently verboten, and I haven't seen any proposal to change that. I
> don't understand how phrasing the API as I suggested would lead to
> buffering being permitted or required.

By "content" I was actually talking about the headers or other
metadata.  Sorry for the confusion.


> If its a method on the response body, the returning a list or
> generator no longer works, unless you start poking random attributes
> onto things. It would also be inconsistent - why would trailers be a
> method on the response, but headers be a dict in the return value?

(FWIW, I never proposed making headers a dict.  That's a bad idea, IMO.)

As for returning a list or generator, I don't see why you can't do e.g.

    return status, headers, trailing_signature(body, ...)

Where trailing_signature is a function that returns an iterator with
appropriate annotation, wrapping the original iterable.  That works
whether body is a list or a generator or some other custom iterable.

("Poking random attributes onto things" isn't a requirement, IOW.)



>> Please try to think instead of how you could implement those things in
>> a "make it nice" API for app authors.  WSGI wasn't made ugly on a
>> whim; it's the direct result of some very important design principles.
>> While the need for start_response() is gone, many of the other reasons
>> for its ugliness remain.
>>
>> (In any case, you can still implement a generator-based API for
>> writing WSGI apps, without needing to make WSGI *itself* be
>> implemented that way.)
>
> I don't think WSGI is ugly, but I do think that things have changed
> substantially in the python world since it came to be, and we owe it
> to ourselves to investigate whether we can do better now.

Sure -- the existence of bytes is an obvious win, as is the dropping
of start_response.  But if you want WSGI 2 to be *interoperable* with
WSGI 1, or more precisely, if we want to support *tunneling* WSGI 2
through a WSGI 1 stack, then the design has to be at least somewhat
constrained by WSGI 1.


> Is there some documentation about the other reasons that it needs to
> be ugly - last thing I want to do is waste folks time suggesting
> things that won't work.

There is really only one reason, that manifests itself in a variety of
constraints.  That reason is that the success or failure of the
standard rests in the hands of those who implement tools (servers,
middleware, libraries, and frameworks), not the hands of those who
implement apps.  Those are the people whose support is critical, so
every decision turns in their favor wherever possible.  Even the
existence of the start_response()/write() kludge is there because at
the time, many existing frameworks offered streaming via some sort of
imperative, push-based "writing" API, rather than an iteration-based
pulling one.


> The original WSGI spec avoiding defining objects on the basis of being
> extremely minimal, to ease adoption - and its been a wild success. How
> much complexity are we starting to drive though, as we keep avoiding
> having an object - tuple return types, iterators with extra
> attributes. Would a defined ABC be a burden to implementors these
> days? I presume that it was the C servers like mod_python that we
> would have harmed previously?

Yes.  Or nowadays, mod_wsgi.  As to whether it would be a burden, I
couldn't say.  (Also, bear in mind that other C-based servers and
gateways integrate with WSGI, e.g. nginx IIRC.)

In any case, the burden for *consuming* a response object should be
less than the burden of *creating* a request object.  Defining custom
types in C is more work than just accessing attributes of a returned
object.

So, I don't see a problem with creating a response object per se.  I
was just thinking that with middleware, you really want to be able to
mix and match what features are being returned with the response, so
unless you use `__getattr__` proxying, or it's required that response
objects allow arbitrary attributes to be added, then the paradigm "bag
of related features in a dictionary" better fits the requirement than
"return an object".


>> To put it another way, the common case for WSGI always was -- and
>> mostly still is -- to return an entire HTTP response in one go,
>> without any streaming or buffering or anything of that sort.  And
>> simple things should be simple, with complex things still being
>> possible.
>
> I would be interesting to get stats on that. The WSGI spec goes to
> great pains to require that streaming work and buffering be verboten
> (presumably excusable for middleware like JPEG->PNG transformers that
> simply cannot avoid buffering) - but even then they are required to
> yield a b'' AIUI.

The design rule here is STASCTAP: simple things are simple, complex
things are possible.  Admittedly, the "empty yield" rule is a burden
on middleware, but a necessary one to make streaming *possible*.  The
trade was in favor of framework authors supporting streaming, at the
expense of middleware authors.

(If I could do it over again, I think I'd prioritize things the other
way.  That is, weigh the interests of middleware authors more highly
than those of framework authors, in the event of a trade-off between
the two.  Middleware combines the requirements of both sides of  the
interface, whereas servers and frameworks each have only a one-sided
view of things.  Prioritizing middleware over either side should
produce a better protocol on balance, than trying to directly trade
one end against the other.)


> But your points about simple and complex are interesting. Middleware
> authors need to cater to everything - so making the simple simple
> doesn't make it simple for middleware authors - they don't get to opt
> out. Its only by making everything as simple - uncomplected[1] - as
> possible that we keep things easy for server and middleware authors.

Right.


> I still disagree that middleware and server authors cannot get a nicer
> API through libraries. The different between middleware or server and
> apps is that apps can choose not to care about things they don't care
> about, whereas middleware and servers have to care - but appropriate
> helper functions can still help them.

But if we know what helper functions we would want to write, then we
can just make the protocol be the result of calling the helpers,
instead of making the protocol require the helpers.  Then, the *app*
would call the helpers, not the middleware, which puts all the
wrapping at the edge of the system instead of ubiquitous unwrapping
and rewrapping.


>> So, let's trim the sharp edges for the poor middleware and server
>> developers, rather than polishing the bits that app writers aren't
>> going to be using, anyway.  (Since most of them are going to be using
>> Django, Pyramid, Flask, or whatever the latest hotness is, anyway.)
>
> Do you have a hitlist of such sharp edges you'd like to see catered
> for in this new spec?

The ones described in the wsgi_lite docs:

1. People forgetting that the environ is volatile
2. People forgetting to close()
3. The horror that is the stateful nature of the current protocol (all
the rules on what can be called when)

In wsgi_lite I addressed #1 by providing the binding protocol to map
desired request data to keyword arguments.  #2, by the "closing"
extension, and #3 by switching to a functional paradigm rather than an
imperative one.  (Thus eliminating any rules on what can be called
when, because the response is a return value, not an invocation of
something.)


> That seems reasonable, presumably because any code we write will not
> be backported to older standard libraries. I think it would be a
> mistake to not think about the default experience as well though: we
> should specify the protocol, and offer a good default API on top of
> it.

I think maybe you're confused about *whose* default experience is to
be catered to.  ;-)  In my estimation, the framework developers who
want to expose their apps as WSGI 2 will be adding metadata to their
library, not importing a decorator from the stdlib to do it.

All in all, it kind of sounds to me like what you *really* want is to
make a user-level API for HTTP/2 applications.  And maybe it would be
a good idea to do that *first*, without reference to tweaking WSGI.

That is, maybe go out and write a nice API with whatever bells and
whistles you want to provide to apps, and just implement it for one or
two specific front-end servers.

*Then*, we would be able to look at a concrete API implementation and
say, "okay, how can we make a simple protocol that allows this end
user HTTP/2 API to exist, while being minimal for middleware and
servers to support?"

And finally, we could look at that protocol and say, "okay, can we
encapsulate this protocol in such a way that it can be safely tunneled
through WSGI 1?"

Each of these stages has benefit.  If you only get through the first,
at least it's possible to do HTTP/2 in Python!  If you get through the
second, well, maybe it's not WSGI, but at least it's a protocol (SSGI?
 H2GI?).  And so on.

I guess what I'm saying is, based on what you seem to be trying to do,
I think trying to update WSGI is *way* premature.  Even WSGI wasn't
proposed in a vacuum: it was based on looking at the APIs provided by
existing Python-supporting web servers and required by existing Python
web frameworks.  So, in the absence of even *one* HTTP/2 framework API
to drive the requirements, it's probably premature to propose paradigm
shifts in WSGI itself.

Does an HTTP/2 server or API for Python even *exist* yet?


More information about the Web-SIG mailing list