[Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

Graham Dumpleton graham.dumpleton at gmail.com
Tue Jan 5 06:57:57 EST 2016


> On 5 Jan 2016, at 10:26 PM, Cory Benfield <cory at lukasa.co.uk> wrote:
> 
> Forwarding this message from the django-developers list.
> 
> Hi Cory,
> 
> I’m not subscribed to web-sig but I read the discussion there. Feel free to forward my answer to the group if you think it’s useful.
> 
> I have roughly the same convictions as Graham Dumpleton. If you want to support HTTP/2 and WebSockets, don’t start with design decisions anchored in CGI. Figure out what a simple and flexible API for these new protocols would be, specify it, implement it, and make sure it degrades gracefully to HTTP/1. You may be able to channel most of the communication through a single generator, but it’s unclear to me that this will be the most convenient design.
> 
> If you want to improve WSGI, here’s a list of mistakes or shortcomings in PEP 3333 that you can take a stab at. There’s a general theme: for a specification that looks at the future, I believe that making modern PaaS-based deployments secure by default matters more than not implementing anything beyond what’s available in legacy CGI-based deployments.
> 
> 1. WSGI is prone to header injection vulnerabilities issues by design due to the conversion of HTTP headers to CGI-style environment variables: if the server doesn’t specifically prevent it, X-Foo and X_Foo both become HTTP_X_Foo. I don’t believe it’s a good choice to destructively encode headers, expect applications to undo the damage somehow, and introduce security vulnerabilities in the process. If mimicking CGI is still considered a must-have — 1% of current Python web programmers may have heard about it, most of them from PEP 3333 — then that burden should be pushed onto the server, not the application.

FWIW, Apache 2.4 will discard headers which would use underscore, as well as many other characters. Basically it probably only accepts alphanumeric and ‘-‘ in original name.

In mod_wsgi, it does the same thing, even for Apache 2.2 where it wasn’t done.

So with mod_wsgi at least you are safe. Or at least if not still using some ancient mod_wsgi version. (Death to LTS Linux versions and out of date packages) :-)

The nginx server if used as a front end and where it is populating CGI like variables for passing to a builtin module such as uWSGI will also I believe discard headers which don’t match that requirement as well.

I can’t remember if gunicorn was updated to do something similar, or whether when uWSGI isn’t used behind nginx via its uwsgi protocol, but instead listens publicly via HTTP whether it does it either. 

> 2. More generally, I fail to see how mixing HTTP headers, server-related inputs, and environment variables in a dict adds values. It prevents iterating on each collection separately. It only makes sense if not offering more features than CGI is a design goal; in that case, this discussion doesn’t serve a purpose anyway. It would be nicer and possibly more secure if the application received separately:
> 
> a. Configuration information, which servers could read from environment variables by default for backwards compatibility, but could also get through more secure channels and restrict to what the application needs in order to better isolate it from the entire OS.

I have always had a bit of a beef with the way that the use of environment variables for configuration was promoted by the 12 factor manifesto. It grew out of how a specific hosting service did things and ignored that various web servers used configuration files instead or did things in other ways. Of course the hosting service made it difficult to impossible to use some of those traditional web servers, so they were safe in their narrow view of things.

Anyway, if environment variables were used where appropriate and with an intermediate mapping layer within Python web applications that would have been fine. The problem was that you started to see direct lookup of environment variables deep in code bases. So people wedded themselves to use of environment variables.

The more sensible thing to do would have been to use an intermediate Python module/package providing an abstraction layer for getting configuration. Code would then use that. The configuration layer could then look up environment variables or use other means to get configuration, such as from more traditional configuration files, or pulling it done from configuration servers.

As far as I know there is no good Python package out there which serves as such a intermediary configuration system which could be plugged into any application and which doesn’t carry a huge amount of baggage. Would love to hear about one if it exists.

> b. Server APIs mandated by the spec, per request.
> c. HTTP headers, per request.
> 
> 3. Stop pretending that HTTP is a unicode protocol, or at least stop ignoring reality when doing so. WSGI enforces ISO-8859-1-decoded str objects in the environ, which is just wrong. It’s all the more a surprising choice since this change was driven by Python 3, that UTF-8 is the correct choice, and that Python 3 defaults to UTF-8. Django has to re-encode and re-decode before doing anything with HTTP headers: https://github.com/django/django/blob/d5b90c8e120687863c1d41cf92a4cdb11413ad7f/django/core/handlers/wsgi.py#L231-L253 <https://github.com/django/django/blob/d5b90c8e120687863c1d41cf92a4cdb11413ad7f/django/core/handlers/wsgi.py#L231-L253>

WSGI uses ISO-8859-1 with raw bytes because there is no guarantee about what the encoding might be for certain inbound headers due to clients ignoring the ASCII requirement and using what they please. A header value may well be UTF-8 these days, but technically it could also be some obscure Japanese encoding.

Ultimately there was no way the WSGI server could know for sure what the encoding of some header values could be. The only place that knew this for sure was the web application itself.

This is why it ended up as ISO-8859-1 encoded raw bytes.

There would be quite a lot of discussion about this topic back in the Web-SIG archives somewhere.

> 4. Normalize the way to tell the application about the original protocol, IP address and port. When dev and ops responsibilities are separate, this is clearly an ops responsibility, but due to the lack of standardization devs end up dealing with this problem in custom middleware, when they do it at all. Everyone keeps getting it wrong, which introduces security vulnerabilities. Also it always breaks silently on infrastructure changes.

FWIW, I came to the conclusion that it is the responsibility of the WSGI server to translate any proxy forwarding headers. As you say, if you leave it up to WSGI applications or middleware it becomes a mess and is usually wrong.

If you dig into the mod_wsgi code I have all these capabilities now where you can tell mod_wsgi what are the trusted proxy headers from your proxy and also what are trusted proxy IP addresses or address ranges. With that mod_wsgi will then update various stuff that goes into the WSGI environ and deletes all the headers which weren’t marked as trusted so that WSGI middleware or applications then don’t incorrectly use them.

I talked a bit about this in a blog post:

    http://blog.dscpl.com.au/2015/06/proxying-to-python-web-application.html <http://blog.dscpl.com.au/2015/06/proxying-to-python-web-application.html>

Unfortunately I haven’t updated it to support the new ‘Forwarded’ header yet, which I should do.

Common to a common understanding of what should be done in this space and have all WSGI servers provide the same functionality would be nice.

> 5. Improve request / response length handling and connection closure. Armin and Graham have talked about in the past and know the topic better than I do. There’s also a rejected PEP by Armin which made sense to me.
> 
> As you can see from these comments, I don’t quite share the design choices that led to WSGI as it currently stands. I think it will be easier to build a new standard than evolve the current one.
> 
> I hope this helps!
> 
> Aymeric
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: https://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20160105/069246d1/attachment-0001.html>


More information about the Web-SIG mailing list