[Web-SIG] WSGI2: write callable?

Sat Sep 27 01:41:59 CEST 2014

On 27 September 2014 10:31, PJ Eby <pje at telecommunity.com> wrote:
> On Fri, Sep 26, 2014 at 5:02 PM, Robert Collins
> <robertc at robertcollins.net> wrote:
>> But perhaps it would be nicer to say:
>> iterator of headers_dict_or_body_bytes
>> With the first item yielded having to be headers (or error thrown),and
>> the last item yielded may be a dict to emit trailers.
>>
>> So:
>> def app(environ):
>>     yield {':status': '200'}
>>     yield b'hello world'
>>     yield {'Foo': 'Bar'}
>>
>> is an entirely valid, if trivial, app.
>>
>> What do you think?
>
> I think this would make it harder to write middleware, actually, and
> for the same reason that I dislike folding status into the headers.
> It's a case of "flat is better than nested", I think, in both cases.
> That is, if the status is always required, it's easier to validate its
> presence in a 3-tuple than nested inside another data structure.

I'm intrigued here - validation of the status code is tied into into
the details of the headers. For instance, 301/302 need a Location
header to be valid. So I don't understand how its any easier with
status split out. I'd be delighted to whip up a few constrasting
middleware samples to let us compare and contrast.

Note too that folk can still return bad status codes with a different layout
  (status, headers, body, trailers)
    return None, {}, [], {}

One thing we could do with the status code in the headers dict is to
default to 200 - the vastly common case (in the same way that throwing
an error generates a 500). Then status wouldn't be required at all for
trivial uses. That would make things easier, no?

> As
> far as trailers go, I'm not sure what those are used for or how they'd
> be used in practice, but my initial thought is that they should be
> attached to the response body, analagous to how FileWrapper works.

So a classic example for Trailers is digitally signing streamed
content. Using the same strawman API as above:

def app(environ):
   yield {':status': '200}
   md5sum = md5.new()
   for bytes in block_reader(open('foo', 'rb'), 65536):
       md5sum.update(bytes)
       yield bytes
   digest = md5sum.hexdigest()
   signature = sign_bytes(digest.encode('utf8'))
   yield {'Content-MD5Sum': digest, 'X-Signature': signature}

Note that this doesn't need to buffer or use a closure.

Writing that with a callback for trailers (which is the only
alternative - its either a callback or a generator - because until the
body is fully handled the content of the trailers cannot be
determined):

def app(environ):
   md5sum = md5.new()
   def body():
       for bytes in block_reader(open('foo', 'rb'), 65536):
           md5sum.update(bytes)
           yield bytes
   def trailers():
       digest = md5sum.hexdigest()
       signature = sign_bytes(digest.encode('utf8'))
       yield {'Content-MD5Sum': digest, 'X-Signature': signature}
   return '200', {}, body, trailers

> The other alternative is to use a dict as the response object
> (analagous to environ as the request object), with named keys for
> status, headers, trailers, body, etc.  It would then be extensible to
> handle things like the "Associated content" concept.

That might work, though it will force more closures. One of the things
I like about the generator style is the clarity in code that we can
achieve.

> In this way, middleware that is simply passing things through
> unchanged can do so, while middleware that is creating a new response
> can discard the old object.

That seems to apply either way, right?

Here's a body-size logging middleware:

def logger(app):
    def middleware(environ):
        wrapped = app(environ)
        yield next(wrapped)
        body_bytes = 0
        for maybe_body in wrapped:
            if type(maybe_body) is bytes:
                body_bytes += len(maybe_body)
            yield maybe_body
        logging.info("Saw %d bytes for %s" % (body_bytes, environ['PATH_INFO']))
    return middleware

..
>> We're bumping the WSGI version, will that serve as a sufficient flag?
>
> I mean, flagged on the app end.  For example, wsgi_lite marks apps
> that support wsgi_lite with a  true-valued `__wsgi_lite__` attribute.
> In this way, a container invoking the app knows it can be called with
> just an environ (and no start_response).

Ok, So we'd use the absence of such a mark to trigger the WSGI1
adapter automagically? I'm curious if that will work well enough we
are given wsgi_lite or other extensions to wsgi. Perhaps we should
refuse to guess and just supply the adapters and instructions?

> So, I'm saying that an app callable would opt in to this new WSGI
> version, so that servers and middleware don't need to grow new APIs
> for registering apps -- they can auto-detect.  Also, having
> auto-detection means you can write a decorator (e.g. in wsgiref), to
> wrap and convert WSGI 1 apps to WSGI 2, without needing to know if
> you're passing something already wrapped.  It means that a WSGI 2
> server or middleware can just wrap whatever apps it sees, and get back
> a WSGI 2 app, whether the thing it got was WSGI 1 or WSGI 2.

Thats certainly a desirable property. If we've changed things too much
to infer by the basic structure then we'll need some metadata for it.
Works for me - I'd like to have a decorator for that:

def logger(app):
    @wsgi2
    def middleware(environ):
        ...
    return middleware

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud