From chrism at plope.com Thu Sep 16 01:03:20 2010 From: chrism at plope.com (Chris McDonough) Date: Wed, 15 Sep 2010 19:03:20 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) Message-ID: <1284591800.14651.36.camel@thinko> A PEP was submitted and accepted today for a WSGI successor protocol named Web3: http://python.org/dev/peps/pep-0444/ I'd encourage other folks to suggest improvements to that spec or to submit a competing spec, so we can get WSGI-on-Python3 settled soon. - C From prologic at shortcircuit.net.au Thu Sep 16 01:40:56 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Thu, 16 Sep 2010 09:40:56 +1000 Subject: [Web-SIG] wsgiref for web apps Message-ID: Hi, Just curious, but does anyone actually use wsgiref (in the python stdlib) for real web applications ? Or, do most use some other third-party web framework ? Perhaps wsgiref is used for simple / quick things ? cheers James -- -- James Mills -- -- "Problems are solved by method" From mdipierro at cs.depaul.edu Thu Sep 16 01:40:23 2010 From: mdipierro at cs.depaul.edu (Massimo Di Pierro) Date: Wed, 15 Sep 2010 18:40:23 -0500 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: I fully support it! Massimo On Sep 15, 2010, at 6:03 PM, Chris McDonough wrote: > A PEP was submitted and accepted today for a WSGI successor protocol > named Web3: > > http://python.org/dev/peps/pep-0444/ > > I'd encourage other folks to suggest improvements to that spec or to > submit a competing spec, so we can get WSGI-on-Python3 settled soon. > > - C > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/mdipierro%40cti.depaul.edu From prologic at shortcircuit.net.au Thu Sep 16 01:51:52 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Thu, 16 Sep 2010 09:51:52 +1000 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: On Thu, Sep 16, 2010 at 9:40 AM, Massimo Di Pierro wrote: > I fully support it! I don't entirely. I don't quite agree with the key changes from wsgi to web3. I think it's unnecessary. cheers james -- -- James Mills -- -- "Problems are solved by method" From pje at telecommunity.com Thu Sep 16 02:05:43 2010 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Sep 2010 20:05:43 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: <20100916000542.30AD73A403D@sparrow.telecommunity.com> At 07:03 PM 9/15/2010 -0400, Chris McDonough wrote: >A PEP was submitted and accepted today for a WSGI successor protocol >named Web3: > >http://python.org/dev/peps/pep-0444/ > >I'd encourage other folks to suggest improvements to that spec or to >submit a competing spec, so we can get WSGI-on-Python3 settled soon. The first thing I notice is that web3.async appears to force all existing middleware to delete it from the environment if it wishes to remain compatible, unless it adapts to support receiving callables itself. On further reading I see you have something about middleware disabling itself if it doesn't support async execution, but this doesn't make any sense to me: if it can't support async execution, why wouldn't it just delete web3.async from the environ, forcing its wrapped app to be synchronous instead? I'm also not a fan of the bytes environ, or the new path_info/script_name variables; note that the spec's sample CGI implementation does not itself provide the new variables, and that middleware must be explicitly written to handle the case where there is duplication. My main fear with this spec is that people will assume they can just make a few superficial changes to run WSGI code on it, when in fact it is deeply incompatible where middleware is concerned. In fact, AFAICT, it seems like it will be *harder* to write correct web3 middleware than it is to write correct WSGI middleware now. This seems like a step backward, since the whole idea behind dropping start_response() was to make correct middleware *easier* to write. Any time a spec makes something optional or allows More Than One Way To Do It, it immediately doubles the mimimum code required to implement that portion of the spec in compliant middleware. This spec has two optionalities: web3.async, and the optional path_info/script_name, so the return handling of every piece of middleware is doubled (or else "environ['web3.async'] = False" must be added at the top), and any code that modifies paths must similarly ditch the special variables or do double work to update them. From chrism at plope.com Thu Sep 16 02:15:41 2010 From: chrism at plope.com (Chris McDonough) Date: Wed, 15 Sep 2010 20:15:41 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100916000542.30AD73A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> Message-ID: <1284596141.14651.57.camel@thinko> On Wed, 2010-09-15 at 20:05 -0400, P.J. Eby wrote: > At 07:03 PM 9/15/2010 -0400, Chris McDonough wrote: > >A PEP was submitted and accepted today for a WSGI successor protocol > >named Web3: > > > >http://python.org/dev/peps/pep-0444/ > > > >I'd encourage other folks to suggest improvements to that spec or to > >submit a competing spec, so we can get WSGI-on-Python3 settled soon. > > The first thing I notice is that web3.async appears to force all > existing middleware to delete it from the environment if it wishes to > remain compatible, unless it adapts to support receiving callables itself. We can ditch everything concerning web3.async as far as I'm concerned. Ian has told me that this feature won't be liked by the async people anyway, as it doesnt have a trigger mechanism. > On further reading I see you have something about middleware > disabling itself if it doesn't support async execution, but this > doesn't make any sense to me: if it can't support async execution, > why wouldn't it just delete web3.async from the environ, forcing its > wrapped app to be synchronous instead? > > I'm also not a fan of the bytes environ, or the new > path_info/script_name variables; note that the spec's sample CGI > implementation does not itself provide the new variables, and that > middleware must be explicitly written to handle the case where there > is duplication. I'm not concerned about which environment variables have it, but I would definitely like to be able to get at the "original" (non-%2F-decoded) path info somewhere. I'd be fine if PATH_INFO was just that, and get rid of web3.path_info. web3.script_name is probably just a mistake entirely. > My main fear with this spec is that people will assume they can just > make a few superficial changes to run WSGI code on it, when in fact > it is deeply incompatible where middleware is concerned. In fact, > AFAICT, it seems like it will be *harder* to write correct web3 > middleware than it is to write correct WSGI middleware now. I'm very willing to drop web3.async entirely. It seems reasonable to do so. I should have done so before I mailed the spec, as I knew it would be unpopular. > This seems like a step backward, since the whole idea behind dropping > start_response() was to make correct middleware *easier* to write. > > Any time a spec makes something optional or allows More Than One Way > To Do It, it immediately doubles the mimimum code required to > implement that portion of the spec in compliant middleware. This > spec has two optionalities: web3.async, and the optional > path_info/script_name, so the return handling of every piece of > middleware is doubled (or else "environ['web3.async'] = False" must > be added at the top), and any code that modifies paths must similarly > ditch the special variables or do double work to update them. No worries, let's get rid of both, with the caveat that it's pretty essential (to me anyway) to be able to get at the non-%2F-encoded path somewhere. The most sensible thing to me would be to put it in PATH_INFO. As far as bytes vs. strings, whatever, we have to pick one. Bytes makes more sense to me. I'll leave it to the native-string and/or unicode people to create their own spec. - C From prologic at shortcircuit.net.au Thu Sep 16 02:34:49 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Thu, 16 Sep 2010 10:34:49 +1000 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284596141.14651.57.camel@thinko> References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> <1284596141.14651.57.camel@thinko> Message-ID: On Thu, Sep 16, 2010 at 10:15 AM, Chris McDonough wrote: > We can ditch everything concerning web3.async as far as I'm concerned. > Ian has told me that this feature won't be liked by the async people > anyway, as it doesnt have a trigger mechanism. You and Ian are right about that. I don't see the point of introducing an "async" property/variable into the environment data. cheers James -- -- James Mills -- -- "Problems are solved by method" From prologic at shortcircuit.net.au Thu Sep 16 02:38:20 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Thu, 16 Sep 2010 10:38:20 +1000 Subject: [Web-SIG] [Python-Dev] Add PEP 444, Python Web3 Interface. In-Reply-To: <4C915EA4.5040300@animats.com> References: <4C915EA4.5040300@animats.com> Message-ID: On Thu, Sep 16, 2010 at 10:02 AM, John Nagle wrote: > On 9/15/2010 4:44 PM, python-dev-request at python.org wrote: >> >> ``SERVER_PORT`` must be a bytes instance (not an integer). > > ? What's that supposed to mean? ?What goes in the "bytes > instance"? ?A character string in some format? ?A long binary > number? ?If the latter, with which byte ordering? ?What > problem does this\ solve? (Posting to web-sig): I can see value in this (some-what). There are certain situations (UNIX Sockets) where SERVER_PORT is irrelevant and doesn't make sense. In my experience setting this to 0 or None is probably okay (when it used to be an int). Can't comment on byte ordering, or format, etc... Perhaps SERVER_PORT should be left as it was in the original PEP 333 specs as an int (or None?) cheers James -- -- James Mills -- -- "Problems are solved by method" From armin.ronacher at active-4.com Thu Sep 16 02:43:03 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 02:43:03 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100916000542.30AD73A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> Message-ID: <4C916817.5090003@active-4.com> Hi, On 2010-09-16 2:05 AM, P.J. Eby wrote: > The first thing I notice is that web3.async appears to force all > existing middleware to delete it from the environment if it wishes to > remain compatible, unless it adapts to support receiving callables itself. In terms of backwards compatibility, we have a huge change here anyways, so existing middlewares are not that much of an issue I support. I know however that web3.async will be a controversial topic. The reason it's in there is that there is theoretical support for async frameworks on top of WSGI to have some kind of basic interoperability. Someone brought up the argument that it relies on polling, but that is only partially true because you control the incoming web3 environment. That environment might contain some callbacks that the application can use that would internally send signals to the server so that it knows when to call the response callable. The callback was modeled after the hack that nginx (if I remember correctly) is doing wrt yielding empty strings until responses are ready. I would like bring some people from asynchronous servers onto the discussion for that particular issue before we decide on the future. Tornado is currently the most popular Python project on github, so there is genuine interest in async servers and I am pretty sure enough people use it in practice. This however also means that Tornado has its own environment which looks very much like the situation we were in before WSGI was around. > On further reading I see you have something about middleware disabling > itself if it doesn't support async execution, but this doesn't make any > sense to me: if it can't support async execution, why wouldn't it just > delete web3.async from the environ, forcing its wrapped app to be > synchronous instead? Instead of deleting it would set it to False though. Why would it want to disable itself? For instance because the middleware is actually depending on an asynchronous specification developed on top of web3 that is not supported by synchronous servers which is the main intention of that async flag. To be used as the basis for an actual proper async specification written by people that actually use async servers unlike me and Chris :) > My main fear with this spec is that people will assume they can just > make a few superficial changes to run WSGI code on it, when in fact it > is deeply incompatible where middleware is concerned. In fact, AFAICT, > it seems like it will be *harder* to write correct web3 middleware than > it is to write correct WSGI middleware now. For just rewriting the environment it's about as complicated, and for making middlewares harder that modify the response I think this is a good thing. Things middleware should do currently and do not: - honour content-encoding - correct set/unset content-length - update/remove etags - not be surprised by HEAD responses - patching through exc_info - not swallowing the write callable I am sure there are more, I remember that Graham had some bad experiences with them in particular. > This seems like a step backward, since the whole idea behind dropping > start_response() was to make correct middleware *easier* to write. Do we really need middlewares that rewrite the response? Even without web3.async and limited to bytes only, there are so many things that can go wrong and will go wrong. I would instead suggest a common library that people could use to develop middlewares on top of web3 that sorts these things out for you. Regards, Armin From chrism at plope.com Thu Sep 16 03:05:16 2010 From: chrism at plope.com (Chris McDonough) Date: Wed, 15 Sep 2010 21:05:16 -0400 Subject: [Web-SIG] [Python-Dev] Add PEP 444, Python Web3 Interface. In-Reply-To: <4C915EA4.5040300@animats.com> References: <4C915EA4.5040300@animats.com> Message-ID: <1284599116.14651.88.camel@thinko> It's, e.g. b'8080' .. instead of the integer value 8080. Apparently the type of this value was not spelled out sufficiently in the WSGI spec and string values and integer values were used interchangeably, making it harder to join them with the other values in the environ (a common thing to want to do). Bytes instances are attractive, as the rest of the values are also bytes, so they can be joined together easily. (I also redirected this to web-sig at the request of PJE). - C On Wed, 2010-09-15 at 17:02 -0700, John Nagle wrote: > On 9/15/2010 4:44 PM, python-dev-request at python.org wrote: > > ``SERVER_PORT`` must be a bytes instance (not an integer). > > What's that supposed to mean? What goes in the "bytes > instance"? A character string in some format? A long binary > number? If the latter, with which byte ordering? What > problem does this\ solve? > > John Nagle > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40plope.com > From prologic at shortcircuit.net.au Thu Sep 16 03:31:31 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Thu, 16 Sep 2010 11:31:31 +1000 Subject: [Web-SIG] [Python-Dev] Add PEP 444, Python Web3 Interface. In-Reply-To: <1284599116.14651.88.camel@thinko> References: <4C915EA4.5040300@animats.com> <1284599116.14651.88.camel@thinko> Message-ID: On Thu, Sep 16, 2010 at 11:05 AM, Chris McDonough wrote: > It's, e.g. > > b'8080' > > .. instead of the integer value 8080. > > Apparently the type of this value was not spelled out sufficiently in > the WSGI spec and string values and integer values were used > interchangeably, making it harder to join them with the other values in > the environ (a common thing to want to do). ?Bytes instances are > attractive, as the rest of the values are also bytes, so they can be > joined together easily. If this is to be "standard" - that is the SERVER_PORT be specified as bytes representing the numerical port (tcp) then I support this. In the case of unix sockets it could be a null byte string, eg: b"" cheers james -- -- James Mills -- -- "Problems are solved by method" From prologic at shortcircuit.net.au Thu Sep 16 03:33:32 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Thu, 16 Sep 2010 11:33:32 +1000 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C916817.5090003@active-4.com> References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> <4C916817.5090003@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 10:43 AM, Armin Ronacher wrote: > I would like bring some people from asynchronous servers onto the discussion > for that particular issue before we decide on the future. Tornado is > currently the most popular Python project on github, so there is genuine > interest in async servers and I am pretty sure enough people use it in > practice. ?This however also means that Tornado has its own environment > which looks very much like the situation we were in before WSGI was around. As a developer of an asynchronous framework myself, I'm actually not really sure what to think of the whole web3.async "thing" yet... My feeling(s) are that other web frameworks are just doing to do their own thing anyway... cheers james -- -- James Mills -- -- "Problems are solved by method" From armin.ronacher at active-4.com Thu Sep 16 03:42:24 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 03:42:24 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> <4C916817.5090003@active-4.com> Message-ID: <4C917600.2030702@active-4.com> Hi, On 2010-09-16 3:33 AM, James Mills wrote: > As a developer of an asynchronous framework myself, I'm actually not > really sure what to think of > the whole web3.async "thing" yet... My feeling(s) are that other web > frameworks are just doing to > do their own thing anyway... Any chances of finding some common ground? Regards, Armin From prologic at shortcircuit.net.au Thu Sep 16 04:07:32 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Thu, 16 Sep 2010 12:07:32 +1000 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C917600.2030702@active-4.com> References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> <4C916817.5090003@active-4.com> <4C917600.2030702@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 11:42 AM, Armin Ronacher wrote: > Any chances of finding some common ground? Well take Twisted for example. It's not specifically an asynchronous web server is it ? Whereas Tornado was specifically designed to be so. I don't see how making WSGI Middlware "async aware" (if that's a good way of looking at it) has any benefit IHMO. cheers James -- -- James Mills -- -- "Problems are solved by method" From roberto at unbit.it Thu Sep 16 05:29:49 2010 From: roberto at unbit.it (Roberto De Ioris) Date: Thu, 16 Sep 2010 05:29:49 +0200 (CEST) Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: > A PEP was submitted and accepted today for a WSGI successor protocol > named Web3: > > http://python.org/dev/peps/pep-0444/ > > I'd encourage other folks to suggest improvements to that spec or to > submit a competing spec, so we can get WSGI-on-Python3 settled soon. > > - C > > I generally like it. About the *.file_wrapper removal, i suggest a PSGI-like approach where 'body' can contains a File Object. def file_app(environ): fd = open('/tmp/pippo.txt', 'r') status = b'200 OK' headers = [(b'Content-type', b'text/plain')] body = fd return body, status, headers or def file_app(environ): fd = open('/tmp/pippo.txt', 'r') status = b'200 OK' headers = [(b'Content-type', b'text/plain')] body = [b'Header', fd, b'Footer'] return body, status, headers (and what about returning multiple File objects ?) By the way, congratulations for the big step forward -- Roberto De Ioris http://unbit.it From gary.poster at gmail.com Thu Sep 16 08:37:01 2010 From: gary.poster at gmail.com (Gary Poster) Date: Thu, 16 Sep 2010 08:37:01 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> <1284596141.14651.57.camel@thinko> Message-ID: <8702E6A0-692A-4E74-9E2A-14E4D096D089@gmail.com> On Sep 16, 2010, at 2:34 AM, James Mills wrote: > On Thu, Sep 16, 2010 at 10:15 AM, Chris McDonough wrote: Thank you for the work, Chris. >> We can ditch everything concerning web3.async as far as I'm concerned. >> Ian has told me that this feature won't be liked by the async people >> anyway, as it doesnt have a trigger mechanism. > > You and Ian are right about that. I don't see the point of introducing > an "async" property/variable into the environment data. I've been hoping for something like web3.async. When I saw it in the spec, I didn't see it as a way to support asynchronous applications generally. I suspect that fully async applications are just not really ultimately interested in a wsgi/web3 world--the threaded model is too different. I'd love to be wrong. (To be clear, happily some async frameworks *are* interested in being wsgi servers.) In any case, I saw it as a way for web3 threaded applications to support long polls, from JS or some other client. Threaded applications might authenticate and do X work, and then pass off some work significantly more appropriate for an async server back to the web3 server. That work might be proxying a file found elsewhere on an internal network; or waiting for a response from an asynchronous job in this process (Twisted) or some other one (RabbitMQ); or other similar tasks. Meanwhile, the threaded application could go off and handle more requests, having done what was needed of it. Periodically polling the callable wasn't what I was thinking of--I had the Twisted world in mind, so I was thinking more of a Deferred type model--but polling would be good enough for my needs. I'd like to see it, or something like it. If not, I suspect I'll be trying to hack something like this in somehow, because it addresses concerns we've had at Launchpad in recent planning sessions. I'd *much* prefer to have a supported, clean approach. Gary From masklinn at masklinn.net Thu Sep 16 08:37:49 2010 From: masklinn at masklinn.net (Masklinn) Date: Thu, 16 Sep 2010 12:07:49 +0530 Subject: [Web-SIG] PEP 444 (aka Web3) Message-ID: > I generally like it. > > About the *.file_wrapper removal, i suggest > a PSGI-like approach where 'body' can contains a File Object. > > def file_app(environ): > fd = open('/tmp/pippo.txt', 'r') > status = b'200 OK' > headers = [(b'Content-type', b'text/plain')] > body = fd > return body, status, headers > As far as I understand it, `body` is an iterable so there should not be any problem with sending a file through directly in this manner. Better, the web3 spec specifically mandates that if the `body` iterable has a `close` method it must be called on request completion (second-to-last paragraph in the specification details section [0]). So a File Object as a body is already completely handled by web3. On the other hand, `body` has to yield bytes, so `fd = open('/tmp/pippo.txt', 'rb')` I think. > def file_app(environ): > fd = open('/tmp/pippo.txt', 'r') > status = b'200 OK' > headers = [(b'Content-type', b'text/plain')] > body = [b'Header', fd, b'Footer'] > return body, status, headers > > > (and what about returning multiple File objects ?) > Well you could just use `itertools.chain([b'Header'], fd, [b'Footer'])` and `itertools.chain(*files)` respectively though there is the issue that, with non-refcounting GCs (Jython, IronPython, pypy), these may stay unclosed for quite some time. A good idea would probably be some kind of `closingchain` replacement to `itertools.chain` which would be able to `close()` its sub-iterables if they're closable (or maybe a `contextchain` which calls `__enter__` and `__exit__` on its sub-iterables if those are available). [0] http://python.org/dev/peps/pep-0444/#specification-details From roberto at unbit.it Thu Sep 16 08:57:29 2010 From: roberto at unbit.it (Roberto De Ioris) Date: Thu, 16 Sep 2010 08:57:29 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: Message-ID: Il giorno 16/set/2010, alle ore 08.37, Masklinn ha scritto: >> I generally like it. >> >> About the *.file_wrapper removal, i suggest >> a PSGI-like approach where 'body' can contains a File Object. >> >> def file_app(environ): >> fd = open('/tmp/pippo.txt', 'r') >> status = b'200 OK' >> headers = [(b'Content-type', b'text/plain')] >> body = fd >> return body, status, headers >> > As far as I understand it, `body` is an iterable so there should not be any problem with sending a file through directly in this manner. Better, the web3 spec specifically mandates that if the `body` iterable has a `close` method it must be called on request completion (second-to-last paragraph in the specification details section [0]). So a File Object as a body is already completely handled by web3. > > On the other hand, `body` has to yield bytes, so `fd = open('/tmp/pippo.txt', 'rb')` I think. > In this case i do not see a need for wsgi.file_wrapper replacement. The Web3 gateway/hosting system can manage File-Like Object the way it wants (and transparently for the application) -- Roberto De Ioris http://unbit.it JID: roberto at jabber.unbit.it From dirkjan at ochtman.nl Thu Sep 16 13:23:11 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Thu, 16 Sep 2010 13:23:11 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: On Thu, Sep 16, 2010 at 01:03, Chris McDonough wrote: > A PEP was submitted and accepted today for a WSGI successor protocol > named Web3: > > http://python.org/dev/peps/pep-0444/ > > I'd encourage other folks to suggest improvements to that spec or to > submit a competing spec, so we can get WSGI-on-Python3 settled soon. I find the order of the application return arguments really annoying, could it just be status, headers, body? Mirrors the actual structure of the request, which is easier to remember IMO. Also, I would really like it if the header value returned by applications must be checked for an .items() method so we can return (o)dicts in addition to tuples. I also keep thinking that some things (for example status) should just be allowed to be text, but restricted to ascii. Cheers, Dirkjan From armin.ronacher at active-4.com Thu Sep 16 13:32:37 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 13:32:37 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <4C920055.1030608@active-4.com> Hi, On 9/16/10 1:23 PM, Dirkjan Ochtman wrote: > I find the order of the application return arguments really annoying, > could it just be status, headers, body? Mirrors the actual structure > of the request, which is easier to remember IMO. The motivation is that you can pass that to constructors of response objects already in place. response_tuple = response.get_response_tuple() response = Response(*response_tuple) The order "body", "status code", "headers" is what Werkzeug and WebOb are currently using. Django has (content, mimetype, status) as constructor but if they detect a list/dict on the third parameter they could assume that mimetype referes to the status thus they have a proper upgrade path. > Also, I would really like it if the header value returned by > applications must be checked for an .items() method so we can return > (o)dicts in addition to tuples. That would be a nice to have, but makes the middleware logic harder because each middleware would have to check for the type. > I also keep thinking that some things (for example status) should just > be allowed to be text, but restricted to ascii. Works for 2.x, but on 3.x that would mean each middleware would have to check the type before each operation and convert to bytes if necessary which means a lot of overhead for each middleware in the stack. Regards, Armin From ziade.tarek at gmail.com Thu Sep 16 13:44:15 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 16 Sep 2010 13:44:15 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: On Thu, Sep 16, 2010 at 1:03 AM, Chris McDonough wrote: > A PEP was submitted and accepted today for a WSGI successor protocol > named Web3: > > http://python.org/dev/peps/pep-0444/ > > I'd encourage other folks to suggest improvements to that spec or to > submit a competing spec, so we can get WSGI-on-Python3 settled soon. I have a request for the middleware stack. There should be one obvious way to get back to the original application, through the stack Right now, I have to write crazy things like this depending on the stack: original_app = self.app.app.application.app Because some middleware use "app", some "application" etc.. I propose to write in the PEP that a middleware should provide an "app" attribute to get the wrapped application or middleware. It seems to be the most common name used out there. Thanks Tarek -- Tarek Ziad? | http://ziade.org From dirkjan at ochtman.nl Thu Sep 16 13:49:49 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Thu, 16 Sep 2010 13:49:49 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C920055.1030608@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C920055.1030608@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 13:32, Armin Ronacher wrote: > The motivation is that you can pass that to constructors of response objects > already in place. > > response_tuple = response.get_response_tuple() > response = Response(*response_tuple) > > The order "body", "status code", "headers" is what Werkzeug and WebOb are > currently using. ?Django has (content, mimetype, status) as constructor but > if they detect a list/dict on the third parameter they could assume that > mimetype referes to the status thus they have a proper upgrade path. Okay, I can see why the order makes sense from a default arguments point of view, but I'm still not sure why it helps if the Response() signature looks like the application return signature. > That would be a nice to have, but makes the middleware logic harder because > each middleware would have to check for the type. > > Works for 2.x, but on 3.x that would mean each middleware would have to > check the type before each operation and convert to bytes if necessary which > means a lot of overhead for each middleware in the stack. Okay, I guess it makes sense. I just thoroughly dislike that we're making applications harder in a bunch of places to make the life of middleware easier. Surely we write more applications than middleware? Can we somehow invert the model to have the gateway act as a controller for middleware, so that we can canonicalize application returns before passing them to the middleware? Or provide a function in wsgiref that allows me to write an application like this: import wsgiref def app(environ): return wsgiref.canonicalize(200, {'Content-Type': 'text/plain'}, ['foo']) Maybe it should be an exceedingly light-weight response class (which could be inherited by the frameworks) instead. Cheers, Dirkjan From armin.ronacher at active-4.com Thu Sep 16 13:57:48 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 13:57:48 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <4C92063C.6090303@active-4.com> Hi, On 9/16/10 1:44 PM, Tarek Ziad? wrote: > I propose to write in the PEP that a middleware should provide an > "app" attribute to get the wrapped application or middleware. > It seems to be the most common name used out there. What about middlewares that encapsulate more than one application? Regards, Armin From ziade.tarek at gmail.com Thu Sep 16 14:38:51 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 16 Sep 2010 14:38:51 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C92063C.6090303@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 1:57 PM, Armin Ronacher wrote: > Hi, > > On 9/16/10 1:44 PM, Tarek Ziad? wrote: >> >> I propose to write in the PEP that a middleware should provide an >> "app" attribute to get the wrapped application or middleware. >> It seems to be the most common name used out there. > > What about middlewares that encapsulate more than one application? True... I don't know what's the best option here.. I guess we need to provide all children so one may visit the whole graph. Do you have a list of middleware that does this ? Regards Tarek -- Tarek Ziad? | http://ziade.org From armin.ronacher at active-4.com Thu Sep 16 14:40:29 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 14:40:29 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> Message-ID: <4C92103D.5040600@active-4.com> Hi, On 9/16/10 2:38 PM, Tarek Ziad? wrote: > True... I don't know what's the best option here.. I guess we need to > provide all children so one may visit the whole graph. Another gripe I have with WSGI is that if you attempt to combine applications together with a dispatcher middleware, the inner application does not know the URL of the outer one. It's SCRIPT_NAME points to itself and there is no ORIGINAL_SCRIPT_NAME. > Do you have a list of middleware that does this ? I know that Paste has a cascade middleware and I think it also has one that maps applications to specific prefixes. Regards, Armin From ziade.tarek at gmail.com Thu Sep 16 14:48:58 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 16 Sep 2010 14:48:58 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C92103D.5040600@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> <4C92103D.5040600@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 2:40 PM, Armin Ronacher wrote: > Hi, > > On 9/16/10 2:38 PM, Tarek Ziad? wrote: >> >> True... I don't know what's the best option here.. I guess we need to >> provide all children so one may visit the whole graph. > > Another gripe I have with WSGI is that if you attempt to combine > applications together with a dispatcher middleware, the inner application > does not know the URL of the outer one. ?It's SCRIPT_NAME points to itself > and there is no ORIGINAL_SCRIPT_NAME. > >> Do you have a list of middleware that does this ? > > I know that Paste has a cascade middleware and I think it also has one that > maps applications to specific prefixes. Ah yes, the composite thing IIRC - I didn't know this was a middleware. Should those be middlewares ? ISTM that they should in the front of the stack instead, and that a stack of middleware should be dedicated to a single application -- for the griefs you mentioned and probably other problems. I mean, one call does not visit several application, and this is some kind of dynamic rewriting of the stack.. Another possibility would be to define a "get_application(environ=None)" method so the middleware is able to return the right app at the right moment > > > Regards, > Armin > -- Tarek Ziad? | http://ziade.org From masklinn at masklinn.net Thu Sep 16 14:57:16 2010 From: masklinn at masklinn.net (Masklinn) Date: Thu, 16 Sep 2010 18:27:16 +0530 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> Message-ID: <1B2A23B3-B584-4061-92E5-E8C8FA2ADAF3@masklinn.net> On 2010-09-16, at 18:08 , Tarek Ziad? wrote: > On Thu, Sep 16, 2010 at 1:57 PM, Armin Ronacher > wrote: >> Hi, >> >> On 9/16/10 1:44 PM, Tarek Ziad? wrote: >>> I propose to write in the PEP that a middleware should provide an >>> "app" attribute to get the wrapped application or middleware. >>> It seems to be the most common name used out there. >> >> What about middlewares that encapsulate more than one application? > > True... I don't know what's the best option here.. I guess we need to > provide all children so one may visit the whole graph. That would require a hypothetical self.app to always be a list, or at least an iterable, right? From ziade.tarek at gmail.com Thu Sep 16 15:11:30 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 16 Sep 2010 15:11:30 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1B2A23B3-B584-4061-92E5-E8C8FA2ADAF3@masklinn.net> References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> <1B2A23B3-B584-4061-92E5-E8C8FA2ADAF3@masklinn.net> Message-ID: On Thu, Sep 16, 2010 at 2:57 PM, Masklinn wrote: > On 2010-09-16, at 18:08 , Tarek Ziad? wrote: >> On Thu, Sep 16, 2010 at 1:57 PM, Armin Ronacher >> wrote: >>> Hi, >>> >>> On 9/16/10 1:44 PM, Tarek Ziad? wrote: >>>> I propose to write in the PEP that a middleware should provide an >>>> "app" attribute to get the wrapped application or middleware. >>>> It seems to be the most common name used out there. >>> >>> What about middlewares that encapsulate more than one application? >> >> True... I don't know what's the best option here.. I guess we need to >> provide all children so one may visit the whole graph. > That would require a hypothetical self.app to always be a list, or at least an iterable, right? I would prefer a get_application(environ=None) iterator that would reach the final application depending on the environment, and return only one app or middleware per level, but I am not sure... -- Tarek Ziad? | http://ziade.org From and-py at doxdesk.com Thu Sep 16 15:25:44 2010 From: and-py at doxdesk.com (And Clover) Date: Thu, 16 Sep 2010 15:25:44 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100916000542.30AD73A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <20100916000542.30AD73A403D@sparrow.telecommunity.com> Message-ID: <4C921AD8.3000008@doxdesk.com> On 09/16/2010 02:05 AM, P.J. Eby wrote: > note that the spec's sample CGI > implementation does not itself provide the new variables It can't: "This is the original URL-encoded value derived from the request URI. If the server cannot provide this value, it must omit it from the environ". A CGI gateway doesn't have access to the original URL-encoded value. > middleware must be explicitly written to handle the case where there is > duplication. The alternative to duplication would be to allow a gateway to try to 'reconstruct' `path_info` from CGI `PATH_INFO`. If this is done there really needs to be a flag somewhere to say that it has been done, ie. that `/` and non-ASCII characters in the path are unreliable. Otherwise we're just going to end up in the same sorry situation we have today where all sorts of different encodings and corruptions lurk inside PATH_INFO and apps simply cannot rely on it. chrism at plope.com wrote: > The most sensible thing to me would be to put it in PATH_INFO. Please don't have a field with encoded semantics that re-uses the name of a field that has always had decoded semantics. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From jek at discorporate.us Thu Sep 16 16:41:14 2010 From: jek at discorporate.us (jason kirtland) Date: Thu, 16 Sep 2010 07:41:14 -0700 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> <4C92103D.5040600@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 5:48 AM, Tarek Ziad? wrote: > On Thu, Sep 16, 2010 at 2:40 PM, Armin Ronacher > wrote: >> Hi, >> >> On 9/16/10 2:38 PM, Tarek Ziad? wrote: >>> >>> True... I don't know what's the best option here.. I guess we need to >>> provide all children so one may visit the whole graph. >> >> Another gripe I have with WSGI is that if you attempt to combine >> applications together with a dispatcher middleware, the inner application >> does not know the URL of the outer one. ?It's SCRIPT_NAME points to itself >> and there is no ORIGINAL_SCRIPT_NAME. >> >>> Do you have a list of middleware that does this ? >> >> I know that Paste has a cascade middleware and I think it also has one that >> maps applications to specific prefixes. > > Ah yes, the composite thing IIRC - I didn't know this was a middleware. > > Should those be middlewares ? ISTM that they should in the front of > the stack instead, and that a stack of middleware should be dedicated > to a single application -- for the griefs you mentioned and probably > other problems. > > I mean, one call does not visit several application, and this is some > kind of dynamic rewriting of the stack.. > > Another possibility would be to define a > "get_application(environ=None)" method so the middleware is able to > return the right app at the right moment The 'pegboard' middleware composes a result out of an arbitrary graph of WSGI apps, with one request visiting many applications. The graph can be built at runtime in application code, so it would be very difficult to report all of the '.app's applicable for a given environ until after the request. Also, it is quite reasonable in practice to have middleware both in front of such a composer and also in the stacks of the apps it composes. A concern with "should have .app" is that a single closure middleware breaks the chain. For example: def unproxy(app): def middleware(environ): environ['HTTP_HOST'] = environ['HTTP_X_FORWARDED_FOR_HOST'] return app(environ) return middleware For the use case of "original_app = self.app.app.application.app", I've had great success with a pattern I first saw in Zine: applying the middleware internally to the application instance, not wrapping the instance. It seems fairly robust against closures and middleware that can't or won't play along with .app. Unlike .app, this isn't generically traversable, but in cases where I need this kind of cross-talk between middleware/apps I haven't had any problems getting the right instances into scope at runtime. class MyApp: def apply_middleware(self, factory, *args, **kw): self.dispatch_wsgi = factory(self.dispatch_wsgi, *args, **kw) def dispatch_wsgi(self, environ): return [b'hi'], b'200 OK', [(b'Content-type', b'text/plain')] def __call__(self, environ): return self.dispatch_wsgi(environ) app = MyApp() app.apply_middleware(unproxy) app.apply_middleware(StaticContent, 'static/') From chris.dent at gmail.com Thu Sep 16 16:50:20 2010 From: chris.dent at gmail.com (chris.dent at gmail.com) Date: Thu, 16 Sep 2010 15:50:20 +0100 (BST) Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> <4C92103D.5040600@active-4.com> Message-ID: On Thu, 16 Sep 2010, jason kirtland wrote: > The 'pegboard' middleware composes a result out of an arbitrary graph > of WSGI apps, with one request visiting many applications. The graph > can be built at runtime in application code, so it would be very > difficult to report all of the '.app's applicable for a given environ > until after the request. Also, it is quite reasonable in practice to > have middleware both in front of such a composer and also in the > stacks of the apps it composes. The general rule we can extract from this is that we don't want the spec to limit what is possible for the sake of making fairly arbitrary things that only some people (think they?) need and can be satisfied using the more fundamental units already present in the design. I can see that applying here, thus we don't want to enforce some kind of "app" method or attribute as that could be costly for assembling flexible groups of apps (in the same app). On the other end of that same principle, I'm not sure I can see much justification in (paraphrase) "let's make the return signature be the same as the signature of some constructors at use out there in the wild". One of the best things about WSGI, that I hope does not get lost in Web3 (thanks for moving things forward, by the way), is that in its most basic use it is almost entirely about (simple) data structure and (simple) data flow and not about methods, objects, magical attributes and other flim flammery. In other words it is good that the units are basic and fundamental. -- Chris Dent http://burningchrome.com/ [...] From rsyring at inteli-com.com Thu Sep 16 17:04:18 2010 From: rsyring at inteli-com.com (Randy Syring) Date: Thu, 16 Sep 2010 11:04:18 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C92063C.6090303@active-4.com> <4C92103D.5040600@active-4.com> Message-ID: <4C9231F2.8090504@inteli-com.com> Thanks to Chris M. and Armin for moving forward with a PEP! Armin Ronacher wrote: > Hi, > > On 9/16/10 1:23 PM, Dirkjan Ochtman wrote: >> I find the order of the application return arguments really annoying, >> could it just be status, headers, body? Mirrors the actual structure >> of the request, which is easier to remember IMO. > The motivation is that you can pass that to constructors of response > objects already in place. > > response_tuple = response.get_response_tuple() > response = Response(*response_tuple) chris.dent at gmail.com wrote: > On the other end of that same principle, I'm not sure I can see > much justification in (paraphrase) "let's make the return signature be > the same as the signature of some constructors at use out there > in the wild". FWIW, I am with Dirkjan and Chris on this...the most logical ordering for a response tuple is: status, headers, body Trying to conform the spec to existing frameworks doesn't seem like the best approach in this case. -------------------------------------- Randy Syring Intelicom 502-644-4776 "Whether, then, you eat or drink or whatever you do, do all to the glory of God." 1 Cor 10:31 chris.dent at gmail.com wrote: > On Thu, 16 Sep 2010, jason kirtland wrote: > >> The 'pegboard' middleware composes a result out of an arbitrary graph >> of WSGI apps, with one request visiting many applications. The graph >> can be built at runtime in application code, so it would be very >> difficult to report all of the '.app's applicable for a given environ >> until after the request. Also, it is quite reasonable in practice to >> have middleware both in front of such a composer and also in the >> stacks of the apps it composes. > > The general rule we can extract from this is that we don't want the spec > to limit what is possible for the sake of making fairly arbitrary things > that only some people (think they?) need and can be satisfied using > the more fundamental units already present in the design. > > I can see that applying here, thus we don't want to enforce some kind of > "app" method or attribute as that could be costly for assembling > flexible groups of apps (in the same app). > > On the other end of that same principle, I'm not sure I can see > much justification in (paraphrase) "let's make the return signature be > the same as the signature of some constructors at use out there > in the wild". > > One of the best things about WSGI, that I hope does not get lost in > Web3 (thanks for moving things forward, by the way), is that in its > most basic use it is almost entirely about (simple) data structure > and (simple) data flow and not about methods, objects, magical > attributes and other flim flammery. > > In other words it is good that the units are basic and fundamental. > From fumanchu at aminus.org Thu Sep 16 18:19:35 2010 From: fumanchu at aminus.org (Robert Brewer) Date: Thu, 16 Sep 2010 09:19:35 -0700 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: Chris McDonough wrote: > A PEP was submitted and accepted today for a WSGI successor protocol > named Web3: > > http://python.org/dev/peps/pep-0444/ > > I'd encourage other folks to suggest improvements to that spec or to > submit a competing spec, so we can get WSGI-on-Python3 settled soon. Thanks Chris, a few comments: 1. Hooray for all-byte output. 2. Hardly anybody implements RFC 2047, and http-bis is phasing it out. In addition, since folded and/or 2047-encoded lines are equivalent to their non-folded-nor-encoded variants, applications have no business emitting folded or encoded versions of these; that decision should be left up to the origin server. So keep the text about control characters, carriage returns and linefeeds, please. 3. +1 on (status, headers, body) in that order. Your own example code composed them in that order, and then re-arranged them for output! One of the benefits of a new spec is the opportunity to coerce rewrites in existing codebases that undo their poor design choices and make them more readable. By the way, the "Specification Details" and "Values Returned" sections have this in the (s, h, b) order in your draft. 4. The web3 spec says, "In case a content length header is absent the stream must not return anything on read. It must never request more data than specified from the client." but later it says, "Web3 servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.". I would be sad if web3 did not support streaming uploads via Transfer-Encoding. One way to implement that would be to make the origin server handle read() transparently by returning '' on EOF, regardless of whether a Content-Length or a Transfer-Encoding header was provided. 5. Conversely, streaming output is nice to have and should be explicitly supported in the web3 spec. One way would be to require servers to respect a 'Transfer-Encoding: chunked' header emitted by the application. However, the WSGI and web3 specs specifically deny this approach by saying, "Applications and middleware are forbidden from using HTTP/1.1 "hop-by-hop" features or headers". A workaround would be for the application to signal Transfer-Encoding by omitting any Content-Length header in its response headers (this is what CherryPy currently does). 6. I'd personally like to see it be OK for apps and middleware to emit "Connection: close" too, or have some other way of communicating that desire to the server. 7. "it is presumed that Web3 middleware will be created which can be used "in front" of existing WSGI 1.0 applications, allowing those existing WSGI 1.0 applications to run under a Web3 stack. This middleware will require, when under Python 3, an equivalence to be drawn between Python 3 str types and the bytes values represented by the HTTP request and all the attendant encoding- guessing (or configuration) it implies." Just some field experience: that's not hard. CherryPy 3.2 does this now between various WSGI proposals. Robert Brewer fumanchu at aminus.org From armin.ronacher at active-4.com Thu Sep 16 18:41:00 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 18:41:00 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <4C92489C.2040704@active-4.com> Hi, On 9/16/10 6:19 PM, Robert Brewer wrote: > 1. Hooray for all-byte output. Hooray for agreeing :) > 3. +1 on (status, headers, body) in that order. Your own example code > composed them in that order, and then re-arranged them for output! > One of the benefits of a new spec is the opportunity to coerce > rewrites in existing codebases that undo their poor design choices > and make them more readable. By the way, the "Specification Details" > and "Values Returned" sections have this in the (s, h, b) order in > your draft. I suppose it makes sense to word the spec in that order then, seems like the majority wants it that way round. > 4. The web3 spec says, "In case a content length header is absent the > stream must not return anything on read. It must never request more > data than specified from the client." but later it says, "Web3 > servers must handle any supported inbound "hop-by-hop" headers on > their own, such as by decoding any inbound Transfer-Encoding, > including chunked encoding if applicable.". I would be sad if web3 > did not support streaming uploads via Transfer-Encoding. One way to > implement that would be to make the origin server handle read() > transparently by returning '' on EOF, regardless of whether a > Content-Length or a Transfer-Encoding header was provided. I was toying with the idea to have a websocket extension for web3 which would have solved my usecase for requests without a content-length header. The problem with the content length of incoming data is quite complex and that seemed to be the solution that was easiest for everybody involved. > 5. Conversely, streaming output is nice to have and should be > explicitly > supported in the web3 spec. One way would be to require servers > to respect a 'Transfer-Encoding: chunked' header emitted by the > application. However, the WSGI and web3 specs specifically deny > this approach by saying, "Applications and middleware are forbidden > from using HTTP/1.1 "hop-by-hop" features or headers". A workaround > would be for the application to signal Transfer-Encoding by omitting > any Content-Length header in its response headers (this is what > CherryPy currently does). I am fine improving that, but it would require a very good reference implementation with enough comments so that people have an idea of how it's supposed to behave. wsgiref is nice in WSGI already, but it has its faults to which we should try to keep in mind for web3. (Like that it sets multithreaded flag despite being single threaded or that it always appends a Date header breaking some applications). > 6. I'd personally like to see it be OK for apps and middleware to > emit "Connection: close" too, or have some other way of > communicating that desire to the server. I would like to see this feature as well, but you will have to fight for this feature with Phillip and Graham I suppose. > 7. "it is presumed that Web3 middleware will be created which can > be used "in front" of existing WSGI 1.0 applications, allowing > those existing WSGI 1.0 applications to run under a Web3 stack. > This middleware will require, when under Python 3, an equivalence > to be drawn between Python 3 str types and the bytes values > represented by the HTTP request and all the attendant encoding- > guessing (or configuration) it implies." Just some field experience: > that's not hard. CherryPy 3.2 does this now between various WSGI > proposals. I suppose we will see some adapters that have some configuration parameters to adapt to different usage patterns. Regards, Armin From ianb at colorstudy.com Thu Sep 16 19:01:04 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 16 Sep 2010 12:01:04 -0500 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: Well, reiterating some things I've said before: * This is clearly just WSGI slightly reworked, why the new name? * Why byte values in the environ? No one has offered any real reason they are better than native strings. I keep asking people to offer a reason, *and no one ever does*. It's just hyperbole and distraction. Frankly I'm feeling annoyed. So far my experience makes me believe using native strings will make it easier to port and support libraries across 2 and 3. * It makes sense to me that the error stream should accept both bytes and unicode, and should do a best effort to handle either. Getting encoding errors or type errors when logging an error is very distracting. * Instead of focusing on Response(*response_tuple), I'd rather just rely on something like Response.from_wsgi(response_tuple). Body first feels very unnatural. * Regarding long response headers, I think we should ignore the HTTP spec. You can put 4k in a Set-Cookie header, such headers aren't easily or safely folded... I think the line length constraint in the HTTP spec isn't a constraint we need to pay attention to. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Sep 16 19:35:09 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Sep 2010 10:35:09 -0700 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: On Thu, Sep 16, 2010 at 10:01 AM, Ian Bicking wrote: > Well, reiterating some things I've said before: > > * This is clearly just WSGI slightly reworked, why the new name? > * Why byte values in the environ?? No one has offered any real reason they > are better than native strings.? I keep asking people to offer a reason, > *and no one ever does*.? It's just hyperbole and distraction.? Frankly I'm > feeling annoyed.? So far my experience makes me believe using native strings > will make it easier to port and support libraries across 2 and 3. Hm. IIUC the proposal is to implicitly assume Latin1 when decoding the bytes to Unicode. I worry that this will just perpetuate mojibake and other atrocities committed in Python 2. > * It makes sense to me that the error stream should accept both bytes and > unicode, and should do a best effort to handle either.? Getting encoding > errors or type errors when logging an error is very distracting. This agree on. In logs, mojibake is better than an exception. For me, hex escapes are probably better than mojibake, but not everyone agrees. > * Instead of focusing on Response(*response_tuple), I'd rather just rely on > something like Response.from_wsgi(response_tuple).? Body first feels very > unnatural. > * Regarding long response headers, I think we should ignore the HTTP spec. > You can put 4k in a Set-Cookie header, such headers aren't easily or safely > folded... I think the line length constraint in the HTTP spec isn't a > constraint we need to pay attention to. No comments on the rest except to note that at this point it looks unlikely that we can make everyone happy (or even get an agreement to adopt what would be the long-term technically optimal solution -- AFAICT there is no agreement on what that solution would be, if one weren't to take porting Python 2 code into account). IOW something/sokebody has gotta give. -- --Guido van Rossum (python.org/~guido) From tseaver at palladion.com Thu Sep 16 19:45:39 2010 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 16 Sep 2010 13:45:39 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C920055.1030608@active-4.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dirkjan Ochtman wrote: > Okay, I guess it makes sense. I just thoroughly dislike that we're > making applications harder in a bunch of places to make the life of > middleware easier. Surely we write more applications than middleware? Most application writers won't speak "raw" WSGI anyway: they are going to use some framework (e.g., Django, Pylons, BFG) or library (e.g. WebOb, Workzeug) which mediates those inconveniences. Make the authors of the frameworks / libraries deal with them, plus the "I don't need no stinking framework" speed freaks, seems a pretty reasonable tradeoff. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkySV7sACgkQ+gerLs4ltQ43DQCg3MUaUyLHuxwPyM1Z/AvMp2av ixAAoMNOic31GemNeHhc64tlnx/K/7s+ =6Bfk -----END PGP SIGNATURE----- From ianb at colorstudy.com Thu Sep 16 20:00:52 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 16 Sep 2010 13:00:52 -0500 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: On Thu, Sep 16, 2010 at 12:35 PM, Guido van Rossum wrote: > On Thu, Sep 16, 2010 at 10:01 AM, Ian Bicking wrote: > > Well, reiterating some things I've said before: > > > > * This is clearly just WSGI slightly reworked, why the new name? > > * Why byte values in the environ? No one has offered any real reason > they > > are better than native strings. I keep asking people to offer a reason, > > *and no one ever does*. It's just hyperbole and distraction. Frankly > I'm > > feeling annoyed. So far my experience makes me believe using native > strings > > will make it easier to port and support libraries across 2 and 3. > > Hm. IIUC the proposal is to implicitly assume Latin1 when decoding the > bytes to Unicode. I worry that this will just perpetuate mojibake and > other atrocities committed in Python 2. > I was reading http://python.org/dev/peps/pep-0444/ -- is there another revision under discussion? This seems to explicitly say all environ values will be bytes. There have been other str-oriented proposals, including mod_wsgi's implementation. There is consensus that request and response bodies should be bytes. So really we're talking about whether headers and status are bytes or native strings. Most HTTP headers can only contain sensible characters in ASCII, and while anyone can submit anything in a header I'm not aware of it being a problem that, e.g., someone submits a Cache-Control header with non-ASCII values. There are a small number of headers that can reasonably contain Latin1 characters. Latin1 is specified in HTTP, and in a few instances RFC2047 encoding is allowed, though I don't believe anyone proposes that servers should try to handle RFC2047 (I believe CherryPy does/did do this, but I believe Robert Brewer who is in charge of that project supports removing that). There are headers that can reasonably contain RFC2047, but this can be decoded at the application level. The Cookie header does frequently contain incorrect encodings, but to handle this you have to decode the header as bytes or latin1 (all the meaningful characters are the same in both cases) and then decode/transcode values after parsing. Latin1 imposes only a small speedbump for a header that already has a bunch of speedbumps. The other case when Latin1 is not appropriate is the URL-decoded path, WSGI 1's SCRIPT_NAME and PATH_INFO. This proposal removes those. The URL-encoded values are ASCII-safe, or at least could be safely normalized to be safe in the server level. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ty at sarna.org Thu Sep 16 19:56:46 2010 From: ty at sarna.org (Ty Sarna) Date: Thu, 16 Sep 2010 13:56:46 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <2EF43580-CE88-4D04-A796-70D3ED2EAACB@sarna.org> On Sep 16, 2010, at 1:01 PM, Ian Bicking wrote: > Well, reiterating some things I've said before: > > * This is clearly just WSGI slightly reworked, why the new name? Agreed. Among many other reasons, it seems poor from a Python 3 marketing perspective to introduce a name change that implies something totally different from WSGI that will require major rewrites to port to. It's also a poor choice as a rebranding even if one were desirable, I think. It's terribly generic, and suggests it's somehow a successor to "Web 2.0". Nor is it very search engine friendly, and there may be trademark issues (http://www.networkedplanet.com/Products/Web3/) Also, ordering the response tuple for the very minor convenience of a couple of frameworks while simultaneously requiring them to make adjustments for the web3.* names seems strange to me. Count me for retaining the WSGI naming, and for (status, headers, body), for what little it's worth. > * It makes sense to me that the error stream should accept both bytes and unicode, and should do a best effort to handle either. Getting encoding errors or type errors when logging an error is very distracting. I think I agree with this too. From pje at telecommunity.com Thu Sep 16 20:04:38 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 16 Sep 2010 14:04:38 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <20100916180438.E510B3A403D@sparrow.telecommunity.com> At 10:35 AM 9/16/2010 -0700, Guido van Rossum wrote: >No comments on the rest except to note that at this point it looks >unlikely that we can make everyone happy (or even get an agreement to >adopt what would be the long-term technically optimal solution -- >AFAICT there is no agreement on what that solution would be, if one >weren't to take porting Python 2 code into account). IOW >something/sokebody has gotta give. Indeed. This entire discussion has pushed me strongly in favor of doing a super-minimalist update to PEP 333 with the following points: * Clarifying the encoding of environ values (locale+surrogateescape vs. latin1, TBD) * Making the streams and all output values byte strings ('str' on 2.x, 'bytes' on 3.x), leaving everything else "native" strings ('str' on both 2.x and 3.x) * Any other minor errata/clarifications that the folks with the requisite experience (e.g. Robert, Ian, Graham -- not an exclusive list, but at least they all have both heavy WSGI implementations under their belts and 3.x experience) think are absolutely necessary to resolve open questions for Python 3.2 WSGI implementations. Something like that has a halfway decent chance of being able to settle and get implemented in the short timeline, and it also doesn't put Graham (mod_wsgi) in the position of coming back from vacation to a huge new spec to unravel. ;-) (To be clear, what I'm suggesting is almost exactly what mod_wsgi does; it's just stricter on outputs than what mod_wsgi accepts, and there may be some minor issues regarding the environ encoding: mod_wsgi is probably using the latin1 approach rather than locale+surrogateescape, and I think we need to talk that one out a bit.) Anyway, web3 is nice, but it doesn't look like it'll really fit the bill for porting applications. i.e., it's like a bike shed full of red herrings for what Python-Dev needs right now. ;-) From armin.ronacher at active-4.com Thu Sep 16 20:20:48 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 20:20:48 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <2EF43580-CE88-4D04-A796-70D3ED2EAACB@sarna.org> References: <1284591800.14651.36.camel@thinko> <2EF43580-CE88-4D04-A796-70D3ED2EAACB@sarna.org> Message-ID: <4C926000.7090707@active-4.com> Hi, On 9/16/10 7:56 PM, Ty Sarna wrote: > Agreed. Among many other reasons, it seems poor from a Python 3 > marketing perspective to introduce a name change that implies > something totally different from WSGI that will require major > rewrites to port to. It's also a poor choice as a rebranding even if > one were desirable, I think. It's terribly generic, and suggests it's > somehow a successor to "Web 2.0". Nor is it very search engine > friendly, and there may be trademark issues > (http://www.networkedplanet.com/Products/Web3/) The name is not set in stone. I am very happy to accept WSGI 2 as a name for that, but we did not want to totally bypass the discussions on web-sig here and announce something that clearly says it will be WSGI 2 when only a small set of the people here participated directly in the writing of that PEP. >> * It makes sense to me that the error stream should accept both >> bytes and unicode, and should do a best effort to handle either. >> Getting encoding errors or type errors when logging an error is >> very distracting. > > I think I agree with this too. There are no such stream objects on Python 3 unless I am missing something. Furthermore there are no libraries on Python 3 that would emit string information as text, so I don't see the reason for considering bytes and unicode for that stream. Regards, Armin From chrism at plope.com Thu Sep 16 20:46:32 2010 From: chrism at plope.com (Chris McDonough) Date: Thu, 16 Sep 2010 14:46:32 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <1284662792.14651.108.camel@thinko> On Thu, 2010-09-16 at 12:01 -0500, Ian Bicking wrote: > Well, reiterating some things I've said before: > > * This is clearly just WSGI slightly reworked, why the new name? The PEP says "Web3 is clearly a WSGI derivative; it only uses a different name than "WSGI" in order to indicate that it is not in any way backwards compatible." I don't really care what the name is. My experience in various communities suggests that naming the new totally-bw-incompat thing the same as the old thing weakens both the new thing and the old thing, but.. whatever. I just don't care much. > * Why byte values in the environ? No one has offered any real reason > they are better than native strings. I keep asking people to offer a > reason, *and no one ever does*. It's just hyperbole and distraction. > Frankly I'm feeling annoyed. So far my experience makes me believe > using native strings will make it easier to port and support libraries > across 2 and 3. I'm sorry you're annoyed. I chose bytes here mainly out of ignorance and fear. This is an extremely low level protocol, and I just literally don't know how we can sanely convert environ values to Unicode without some loss of control or potential for incorrect decoding without having server encoding configuration. You say it's easy and straightforward, and that's fine. I just haven't internalized enough specification to know. I'd very much encourage folks who want to use native strings to create another PEP: it's just a lot easier to argue about one "thing" than it is to argue endlessly in snippets on blogs and epic maillist threads. I could care less if this *particular* PEP is selected, to be honest. Let's just get it over within a process where there's at least some chance of resolution. > * It makes sense to me that the error stream should accept both bytes > and unicode, and should do a best effort to handle either. Getting > encoding errors or type errors when logging an error is very > distracting. Sounds good. > * Instead of focusing on Response(*response_tuple), I'd rather just > rely on something like Response.from_wsgi(response_tuple). Body first > feels very unnatural. Others have said same, also good. > * Regarding long response headers, I think we should ignore the HTTP > spec. You can put 4k in a Set-Cookie header, such headers aren't > easily or safely folded... I think the line length constraint in the > HTTP spec isn't a constraint we need to pay attention to. OK. - C From chrism at plope.com Thu Sep 16 20:54:54 2010 From: chrism at plope.com (Chris McDonough) Date: Thu, 16 Sep 2010 14:54:54 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100916180438.E510B3A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <20100916180438.E510B3A403D@sparrow.telecommunity.com> Message-ID: <1284663294.14651.109.camel@thinko> On Thu, 2010-09-16 at 14:04 -0400, P.J. Eby wrote: > At 10:35 AM 9/16/2010 -0700, Guido van Rossum wrote: > >No comments on the rest except to note that at this point it looks > >unlikely that we can make everyone happy (or even get an agreement to > >adopt what would be the long-term technically optimal solution -- > >AFAICT there is no agreement on what that solution would be, if one > >weren't to take porting Python 2 code into account). IOW > >something/sokebody has gotta give. > > Indeed. This entire discussion has pushed me strongly in favor of > doing a super-minimalist update to PEP 333 with the following points: Right on, write it all down! ;-) - C From mdipierro at cs.depaul.edu Thu Sep 16 20:55:23 2010 From: mdipierro at cs.depaul.edu (Massimo Di Pierro) Date: Thu, 16 Sep 2010 13:55:23 -0500 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284662792.14651.108.camel@thinko> References: <1284591800.14651.36.camel@thinko> <1284662792.14651.108.camel@thinko> Message-ID: <7CA14644-008B-417A-B5BD-2FC2C221E61A@cs.depaul.edu> > My experience in various > communities suggests that naming the new totally-bw-incompat thing the > same as the old thing weakens both the new thing and the old thing, I share the same experience. From ty at sarna.org Thu Sep 16 21:16:43 2010 From: ty at sarna.org (Ty Sarna) Date: Thu, 16 Sep 2010 15:16:43 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <7CA14644-008B-417A-B5BD-2FC2C221E61A@cs.depaul.edu> References: <1284591800.14651.36.camel@thinko> <1284662792.14651.108.camel@thinko> <7CA14644-008B-417A-B5BD-2FC2C221E61A@cs.depaul.edu> Message-ID: On Sep 16, 2010, at 2:55 PM, Massimo Di Pierro wrote: >> My experience in various >> communities suggests that naming the new totally-bw-incompat thing the >> same as the old thing weakens both the new thing and the old thing, > > I share the same experience. Interesting. Do you feel that Python 3.x should have been named something other than Python? I think that would rather have weakened both 3.x and 2.x by suggesting a fork, placing the two in competition, when the goal was to have one supersede the other, as is also the case here. From ianb at colorstudy.com Thu Sep 16 21:17:13 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 16 Sep 2010 14:17:13 -0500 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100916180438.E510B3A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <20100916180438.E510B3A403D@sparrow.telecommunity.com> Message-ID: On Thu, Sep 16, 2010 at 1:04 PM, P.J. Eby wrote: > * Clarifying the encoding of environ values (locale+surrogateescape vs. > latin1, TBD) > locale+surrageescape would be insanity! CGI will just require some configuration with respect to the environment. Anyway, I suspect CGI only really works because: (a) people using CGI are sticking to ASCII, (b) they've fixed stuff up in their apps, (c) they just produce garbage and no one cares. * Making the streams and all output values byte strings ('str' on 2.x, > 'bytes' on 3.x), leaving everything else "native" strings ('str' on both 2.x > and 3.x) > > * Any other minor errata/clarifications that the folks with the requisite > experience (e.g. Robert, Ian, Graham -- not an exclusive list, but at least > they all have both heavy WSGI implementations under their belts and 3.x > experience) think are absolutely necessary to resolve open questions for > Python 3.2 WSGI implementations. > There are some simple errata, most of which I believe web3 covers (in addition to other things it covers). I think everyone is on board with: status, headers, app_iter = app(environ) Web3 proposed a different order, but it seems clear from the thread that people prefer the more natural order, and web3 authors don't particularly object. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdipierro at cs.depaul.edu Thu Sep 16 21:32:07 2010 From: mdipierro at cs.depaul.edu (Massimo Di Pierro) Date: Thu, 16 Sep 2010 14:32:07 -0500 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <1284662792.14651.108.camel@thinko> <7CA14644-008B-417A-B5BD-2FC2C221E61A@cs.depaul.edu> Message-ID: Not sure this discussion belongs here but since you asked: I think it should have takes three/four more bold steps: 1) address the GIL issue completely by removing reference counting 2) add more support for lightweight threads (like stackless, erlang and go) 3) perhaps allow some mechanism for tainting data and do restricted execution 4) change name to avoid confusion ... and yet stress that it was almost 100% compatible with existing python code. I think a lot more people would have jumped on it from outside the existing community. The future is in multi core processors and lightweight threads. Of course I am not a developer and I do realize these things may be hard to accomplish. I also trust Guido's judgement more than my own in this respect so consider mine a wish more than a realistic suggestion. Massimo On Sep 16, 2010, at 2:16 PM, Ty Sarna wrote: > On Sep 16, 2010, at 2:55 PM, Massimo Di Pierro wrote: > >>> My experience in various >>> communities suggests that naming the new totally-bw-incompat thing >>> the >>> same as the old thing weakens both the new thing and the old thing, >> >> I share the same experience. > > Interesting. Do you feel that Python 3.x should have been named > something other than Python? > > I think that would rather have weakened both 3.x and 2.x by > suggesting a fork, placing the two in competition, when the goal was > to have one supersede the other, as is also the case here. > From pje at telecommunity.com Thu Sep 16 21:39:36 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 16 Sep 2010 15:39:36 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <20100916180438.E510B3A403D@sparrow.telecommunity.com> Message-ID: <20100916193936.A900D3A403D@sparrow.telecommunity.com> At 02:17 PM 9/16/2010 -0500, Ian Bicking wrote: >On Thu, Sep 16, 2010 at 1:04 PM, P.J. Eby ><pje at telecommunity.com> wrote: >* Clarifying the encoding of environ values (locale+surrogateescape >vs. latin1, TBD) > > >locale+surrageescape would be insanity!? CGI will just require some >configuration with respect to the environment.? Anyway, I suspect >CGI only really works because: (a) people using CGI are sticking to >ASCII, (b) they've fixed stuff up in their apps, (c) they just >produce garbage and no one cares. Ok. >There are some simple errata, most of which I believe web3 covers >(in addition to other things it covers). > >I think everyone is on board with: > >? status, headers, app_iter = app(environ) > >Web3 proposed a different order, but it seems clear from the thread >that people prefer the more natural order, and web3 authors don't >particularly object. My comments were about releasing a WSGI 1.0 update for Python 3, not making changes to web3. The current free-for-all (and the 3.2 stdlib need) have convinced me to stop arguing for throwing out WSGI 1 on Python 3. Or, to put it another way: splitting the spec into two 100% incompatible versions is a bad idea for Python 3 adoption. With a WSGI 1 addendum, we should be able to make it possible to put the same apps and middleware on 2 and 3 with just a decorator wrapping them. (i.e., people should be able to write libraries that run on both 2 and 3, which is probably critical to adoption). I just wish I'd come to these conclusions much sooner... like a year or two ago. :-( From rsyring at inteli-com.com Thu Sep 16 21:40:36 2010 From: rsyring at inteli-com.com (Randy Syring) Date: Thu, 16 Sep 2010 15:40:36 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <1284662792.14651.108.camel@thinko> <7CA14644-008B-417A-B5BD-2FC2C221E61A@cs.depaul.edu> Message-ID: <4C9272B4.3040604@inteli-com.com> Ty Sarna wrote: > On Sep 16, 2010, at 2:55 PM, Massimo Di Pierro wrote: > > >>> My experience in various >>> communities suggests that naming the new totally-bw-incompat thing the >>> same as the old thing weakens both the new thing and the old thing, >>> >> I share the same experience. >> > > Interesting. Do you feel that Python 3.x should have been named something other than Python? > > I think that would rather have weakened both 3.x and 2.x by suggesting a fork, placing the two in competition, when the goal was to have one supersede the other, as is also the case here. FWIW, I agree on this point. WSGI2 seems better than WEB3. IMO, its OK to put a disclaimer at the top of the spec that states they are different specs and entirely backwards incompatible. If there is consensus to more away from WSGI, then I think a name other than WEB3 is in order. Its just too generic. -------------------------------------- Randy Syring Intelicom 502-644-4776 "Whether, then, you eat or drink or whatever you do, do all to the glory of God." 1 Cor 10:31 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Sep 16 21:58:22 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 16 Sep 2010 15:58:22 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284663294.14651.109.camel@thinko> References: <1284591800.14651.36.camel@thinko> <20100916180438.E510B3A403D@sparrow.telecommunity.com> <1284663294.14651.109.camel@thinko> Message-ID: <20100916195822.3C70C3A403D@sparrow.telecommunity.com> At 02:54 PM 9/16/2010 -0400, Chris McDonough wrote: >On Thu, 2010-09-16 at 14:04 -0400, P.J. Eby wrote: > > At 10:35 AM 9/16/2010 -0700, Guido van Rossum wrote: > > >No comments on the rest except to note that at this point it looks > > >unlikely that we can make everyone happy (or even get an agreement to > > >adopt what would be the long-term technically optimal solution -- > > >AFAICT there is no agreement on what that solution would be, if one > > >weren't to take porting Python 2 code into account). IOW > > >something/sokebody has gotta give. > > > > Indeed. This entire discussion has pushed me strongly in favor of > > doing a super-minimalist update to PEP 333 with the following points: > >Right on, write it all down! ;-) I thought I just did. ;-) Okay, I will carve out some cycles. (Btw, it appears that somebody has recently hacked on the code in PEP 333 and inadvertently broken the specification, so I'll be fixing that first.) From guido at python.org Thu Sep 16 22:21:06 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Sep 2010 13:21:06 -0700 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <1284662792.14651.108.camel@thinko> <7CA14644-008B-417A-B5BD-2FC2C221E61A@cs.depaul.edu>

Message-ID: Um, talk about a whopper of a topic change. None of that is on the table. Maybe for Python 4. And certainly not in web-sig. On Thu, Sep 16, 2010 at 12:32 PM, Massimo Di Pierro wrote: > Not sure this discussion belongs here but since you asked: > > I think it should have takes three/four more bold steps: > 1) address the GIL issue completely by removing reference counting > 2) add more support for lightweight threads (like stackless, erlang and go) > 3) perhaps allow some mechanism for tainting data and do restricted > execution > 4) change name to avoid confusion > ... and yet stress that it was almost 100% compatible with existing python > code. > > I think a lot more people would have jumped on it from outside the existing > community. > The future is in multi core processors and lightweight threads. > > Of course I am not a developer and I do realize these things may be hard to > accomplish. > I also trust Guido's judgement more than my own in this respect so consider > mine a wish more than a realistic suggestion. > > Massimo > > > On Sep 16, 2010, at 2:16 PM, Ty Sarna wrote: > >> On Sep 16, 2010, at 2:55 PM, Massimo Di Pierro wrote: >> >>>> My experience in various >>>> communities suggests that naming the new totally-bw-incompat thing the >>>> same as the old thing weakens both the new thing and the old thing, >>> >>> I share the same experience. >> >> Interesting. Do you feel that Python 3.x should have been named something >> other than Python? >> >> I think that would rather have weakened both 3.x and 2.x by suggesting a >> fork, placing the two in competition, when the goal was to have one >> supersede the other, as is also the case here. >> > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From mdipierro at cs.depaul.edu Thu Sep 16 22:22:59 2010 From: mdipierro at cs.depaul.edu (Massimo Di Pierro) Date: Thu, 16 Sep 2010 15:22:59 -0500 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <1284662792.14651.108.camel@thinko> <7CA14644-008B-417A-B5BD-2FC2C221E61A@cs.depaul.edu>

Message-ID: <32B20757-7405-403C-8A9A-B2E78C70508D@cs.depaul.edu> sorry. Apologies On Sep 16, 2010, at 3:21 PM, Guido van Rossum wrote: > Um, talk about a whopper of a topic change. None of that is on the > table. Maybe for Python 4. And certainly not in web-sig. > > On Thu, Sep 16, 2010 at 12:32 PM, Massimo Di Pierro > wrote: >> Not sure this discussion belongs here but since you asked: >> >> I think it should have takes three/four more bold steps: >> 1) address the GIL issue completely by removing reference counting >> 2) add more support for lightweight threads (like stackless, erlang >> and go) >> 3) perhaps allow some mechanism for tainting data and do restricted >> execution >> 4) change name to avoid confusion >> ... and yet stress that it was almost 100% compatible with existing >> python >> code. >> >> I think a lot more people would have jumped on it from outside the >> existing >> community. >> The future is in multi core processors and lightweight threads. >> >> Of course I am not a developer and I do realize these things may be >> hard to >> accomplish. >> I also trust Guido's judgement more than my own in this respect so >> consider >> mine a wish more than a realistic suggestion. >> >> Massimo >> >> >> On Sep 16, 2010, at 2:16 PM, Ty Sarna wrote: >> >>> On Sep 16, 2010, at 2:55 PM, Massimo Di Pierro wrote: >>> >>>>> My experience in various >>>>> communities suggests that naming the new totally-bw-incompat >>>>> thing the >>>>> same as the old thing weakens both the new thing and the old >>>>> thing, >>>> >>>> I share the same experience. >>> >>> Interesting. Do you feel that Python 3.x should have been named >>> something >>> other than Python? >>> >>> I think that would rather have weakened both 3.x and 2.x by >>> suggesting a >>> fork, placing the two in competition, when the goal was to have one >>> supersede the other, as is also the case here. >>> >> >> _______________________________________________ >> Web-SIG mailing list >> Web-SIG at python.org >> Web SIG: http://www.python.org/sigs/web-sig >> Unsubscribe: >> http://mail.python.org/mailman/options/web-sig/guido%40python.org >> > > > > -- > --Guido van Rossum (python.org/~guido) From armin.ronacher at active-4.com Thu Sep 16 22:58:39 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 16 Sep 2010 22:58:39 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: <4C9284FF.60309@active-4.com> Hi, Here some things comments summarized and how things will change: - The order of the response tuple. The majority of this list wants it to be changed to the standard (status, headers, body) format, and we agree. The original motivation was passing it to the constructor of a common response object, but there is no reason this shouldn't be changed. Will update the PEP and implementation appropriately. - The async part. It was added in the hope that someone would step up and come up with something better as replacement. I asked in the #twisted IRC channel but they did not see any value in supporting a common specification that was shared with the synchronous world and it looks like it will be harder to find someone that does care about this particular issue. The motivation was that facebook's tornado framework is currently attracting a lot of users and creating an environment besides the WSGI one which means that it might be quite hard to share some code between those two worlds. I also remember hearing a lot of backlash when start_response was considered for deleting last time from the nginx mod_wsgi maintainer. If I can't find someone that is willing to provide some input on that I will remove that section. - Bytes values in the environment: HTTP transmits bytes, that's a fact we can't change. When we go with native strings we will go with unicode on 3.x This has the following implications: - getting the right path info requires a decode + an encode unless you are assuming latin1. - same as above for the script name and cookie header When going with unicode strings on 3.x for environ values, we would have to do the same for outgoing values which makes middlewares a lot harder to write: - header keys and values might then be bytes and unicode strings. Because of this all middlewares would have to convert to either str objects or bytes which might mean a lot of extra encoding and decoding depending on how the middleware is implemented. - We can't change the fact that a large percentage of Python developers is living in an ASCII-only world which would never have to deal with encodings that way and might be encouraged to just assume ASCII as encoding. For implementations not based on the standard library the bytes-only approach seems to be easier in any way as far as I can see. The only real issue appears to be urllib for the moment, and until that is resolved one could easily do an encode/decode around the calls to that particular library. - web3.errors I think Ian raised concern that it's specified to support unicode only. I don't think we should change that to accepting either bytes or unicode is a good idea on Python 3 where there is no stream in the language or standard library that accepts both at the same time. An implementation for 2.x could support both, but I don't know if there is a usecase for that. In general though I have to say that very few people use wsgi.errors currently, so I don't think this is a real issue anyways. - the web3 name If there is any value in this PEP and we find something to decide on, there is no reason this couldn't be WSGI 2. But until it's just something a small part of the web-sig community worked on directly a separate name is a good thing I think, because it does not reserve the name "WSGI 2" for something that might actually become WSGI 2 in case this PEP gets rejected. Regards, Armin From prologic at shortcircuit.net.au Thu Sep 16 23:07:18 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Fri, 17 Sep 2010 07:07:18 +1000 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C9284FF.60309@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> Message-ID: On Fri, Sep 17, 2010 at 6:58 AM, Armin Ronacher wrote: > - The async part. ?It was added in the hope that someone would step up > ?and come up with something better as replacement. ?I asked in the > ?#twisted IRC channel but they did not see any value in supporting > ?a common specification that was shared with the synchronous world > ?and it looks like it will be harder to find someone that does care > ?about this particular issue. > > ?The motivation was that facebook's tornado framework is currently > ?attracting a lot of users and creating an environment besides the > ?WSGI one which means that it might be quite hard to share some code > ?between those two worlds. > > ?I also remember hearing a lot of backlash when start_response was > ?considered for deleting last time from the nginx mod_wsgi > ?maintainer. > > ?If I can't find someone that is willing to provide some input on that > ?I will remove that section. I'm with the Twisted community on this one in that I see no real "value". async operations and awareness should be (IHMO) really left up to the server/framework, not the application(s) or middleware. > - the web3 name > > ?If there is any value in this PEP and we find something to decide on, > ?there is no reason this couldn't be WSGI 2. ?But until it's just > ?something a small part of the web-sig community worked on directly > ?a separate name is a good thing I think, because it does not reserve > ?the name "WSGI 2" for something that might actually become WSGI 2 > ?in case this PEP gets rejected. I personally still don't see any real benefit to changing the key names from "wsgi" to "web3" (or whatever). I would prefer it remain the same. If you're going to use Python3, you know you're using Python3 (you don't need "web3" key names to know that). (subjective) cheers James -- -- James Mills -- -- "Problems are solved by method" From and-py at doxdesk.com Fri Sep 17 00:48:18 2010 From: and-py at doxdesk.com (And Clover) Date: Fri, 17 Sep 2010 00:48:18 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <4C929EB2.6090502@doxdesk.com> On 09/16/2010 06:19 PM, Robert Brewer wrote: > 2. Hardly anybody implements RFC 2047, and http-bis is phasing it out. s/Hardly anybody/No-one/. Even if you wanted to, it's impossible to implement in any consistent way. The mention of RFC2047 is nothing more than an error. RFC2047 is not on-topic as the top-level HTTP request/response entity is not defined in RFC822-family terms (HTTP uses its own grammar which is subtly incompatible). In header that might be able to fit an RFC2047 encoded-word, no browser or server actually supports it, and the one place where RFC2616 actually references RFC2047 is in a quoted-string context, which RFC2047 explicitly states is not a valid place to use it! This is why httpbis wants rid of it, and why Web3 shouldn't mention RFC2047 at all. There is no reliable mechanism today to get non-ASCII characters into an HTTP header, browsers treat non-ASCII header values differently and incompatibly, and all Web3 can hope to do is pass through the bytes unchanged without regard to what encoding they might represent. > since folded and/or 2047-encoded lines are equivalent > to their non-folded-nor-encoded variants, applications have no > business emitting folded or encoded versions of these Indeed. I'll go further: there is no place for header folding in HTTP, period - neither from the application nor the server/gateway. This is another feature httpbis deprecates. Folding is an RFC822-family trait that doesn't work on the web, due to poor server/UA compatibility and the existence of long, inherently non-foldable headers (eg. try passing a Authorization header containing a Kerberos ticket in 80 columns). -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From ianb at colorstudy.com Fri Sep 17 03:43:25 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 16 Sep 2010 21:43:25 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C9284FF.60309@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 4:58 PM, Armin Ronacher wrote: > - Bytes values in the environment: > > HTTP transmits bytes, that's a fact we can't change. When we go > with native strings we will go with unicode on 3.x This has the > following implications: > > - getting the right path info requires a decode + an encode > unless you are assuming latin1. > Not if you are working with the URL-encoded paths. > - same as above for the script name and cookie header > Cookie is weird. If that one header could be bytes, that'd be great... but special-casing Cookie/Set-Cookie is too hard/weird. Plus handling Cookie/Set-Cookie as Latin1 is just one more line of code (well, two, one for each header). When going with unicode strings on 3.x for environ values, we would > have to do the same for outgoing values which makes middlewares a lot > harder to write: > All response headers handle encoded URLs (e.g., Location), so SCRIPT_NAME/PATH_INFO issues don't come into play. Set-Cookie could be an issue, though only really when someone wants to replicate an external system's weird cookies -- except for legacy issues it's best for application developers to stick to ASCII cookies (URL-encoding cookie values is a popular way of doing this). I don't know of any other header (or the status) that would reasonably cause a problem. And I'm not glossing over corner cases -- I'm generally very aware and concerned with legacy issues, and interacting with legacy systems. There just aren't any here except for the resolvable issues I've listed. - web3.errors > > I think Ian raised concern that it's specified to support unicode > only. I don't think we should change that to accepting either bytes > or unicode is a good idea on Python 3 where there is no stream in > the language or standard library that accepts both at the same time. > An implementation for 2.x could support both, but I don't know if > there is a usecase for that. In general though I have to say that > very few people use wsgi.errors currently, so I don't think this is > a real issue anyways. > It's more of an issue under Python 2, it could probably be ignored with Python 3. Under Python 2 when you have some error condition it's really frustrating to encounter some unicode error with the logging of that error (often covering up the original error). -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From armin.ronacher at active-4.com Fri Sep 17 03:59:44 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Fri, 17 Sep 2010 03:59:44 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> Message-ID: <4C92CB90.9050505@active-4.com> Hi, On 9/17/10 3:43 AM, Ian Bicking wrote: > Not if you are working with the URL-encoded paths. SCRIPT_NAME / PATH_INFO will always stay unencoded and the current spec requires the web3.script_name thing to only be provided if the server can safely provide that. So at least for the fallback, we are dealing with (properly latin1 decoded) non-URL encoded things. Can be changed of course. > Cookie is weird. If that one header could be bytes, that'd be great... > but special-casing Cookie/Set-Cookie is too hard/weird. Special casing one header is indeed weird. > I don't know of any other header (or the status) that would reasonably > cause a problem. And I'm not glossing over corner cases -- I'm > generally very aware and concerned with legacy issues, and interacting > with legacy systems. There just aren't any here except for the > resolvable issues I've listed. Technically speaking it would affect etags too, but I doubt anyone is using non-ASCII quoted strings there. A very funny header is btw the Warning header which actually can have any encoding: "The warn-text SHOULD be in a natural language and character set that is most likely to be intelligible to the human user receiving the response. This decision MAY be based on any available knowledge, such as the location of the cache or user, the Accept-Language field in a request, the Content-Language field in a response, etc. The default language is English and the default character set is ISO-8859-1. If a character set other than ISO-8859-1 is used, it MUST be encoded in the warn-text using the method described in RFC 2047 [14]." Doubt anyone is using that header though. > It's more of an issue under Python 2, it could probably be ignored with > Python 3. Under Python 2 when you have some error condition it's really > frustrating to encounter some unicode error with the logging of that > error (often covering up the original error). I guess there it would be fine to have stderr like stream that accepts unicode and bytes. Regards, Armin From ianb at colorstudy.com Fri Sep 17 04:21:34 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 16 Sep 2010 22:21:34 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C92CB90.9050505@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 9:59 PM, Armin Ronacher wrote: > On 9/17/10 3:43 AM, Ian Bicking wrote: > >> Not if you are working with the URL-encoded paths. >> > > SCRIPT_NAME / PATH_INFO will always stay unencoded and the current spec > requires the web3.script_name thing to only be provided if the server can > safely provide that. So at least for the fallback, we are dealing with > (properly latin1 decoded) non-URL encoded things. Can be changed of course. Yes, if we get rid of SCRIPT_NAME/PATH_INFO then the problem goes away. For servers without access to the unencoded value, reencoding those values doesn't actually lose any information over what we have now, and avoids any encoding issues. Servers with REQUEST_URI can at least attempt to reconstruct the encoded values. > > Cookie is weird. If that one header could be bytes, that'd be great... >> but special-casing Cookie/Set-Cookie is too hard/weird. >> > Special casing one header is indeed weird. Cookie is also the one header that can't be safely folded. It's just a messed up header, and requires hacky workarounds. > > I don't know of any other header (or the status) that would reasonably >> cause a problem. And I'm not glossing over corner cases -- I'm >> generally very aware and concerned with legacy issues, and interacting >> with legacy systems. There just aren't any here except for the >> resolvable issues I've listed. >> > Technically speaking it would affect etags too, but I doubt anyone is using > non-ASCII quoted strings there. A very funny header is btw the Warning > header which actually can have any encoding: > > "The warn-text SHOULD be in a natural language and character set that is > most likely to be intelligible to the human user receiving the response. > This decision MAY be based on any available knowledge, such as the location > of the cache or user, the Accept-Language field in a request, the > Content-Language field in a response, etc. The default language is English > and the default character set is ISO-8859-1. > > If a character set other than ISO-8859-1 is used, it MUST be encoded in the > warn-text using the method described in RFC 2047 [14]." > > Doubt anyone is using that header though. > The Title header (in Atompub) also suggests 2047, but that's essentially an ASCII conversion like URL quoting. It looks something like =?iso-8859-1?q?p=F6stal?= -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From armin.ronacher at active-4.com Fri Sep 17 04:37:45 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Fri, 17 Sep 2010 04:37:45 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> Message-ID: <4C92D479.6070508@active-4.com> Hi, On 9/17/10 4:21 AM, Ian Bicking wrote: > The Title header (in Atompub) also suggests 2047, but that's essentially > an ASCII conversion like URL quoting. It looks something like > =?iso-8859-1?q?p=F6stal?= Yep. That was mere a fun fact I wanted to share. Was not aware of HTTP specifying a non latin1 header anywhere. I suppose the authors of the HTTP specification were aware of encoding issues, just that the people that made the Cookie specification didn't have non-ASCII payloads in mind. Not too surprising, after all it's called Cookie and not arbitrary data-store :) Regards, Armin From dirkjan at ochtman.nl Fri Sep 17 09:35:18 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 17 Sep 2010 09:35:18 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100916193936.A900D3A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <20100916180438.E510B3A403D@sparrow.telecommunity.com> <20100916193936.A900D3A403D@sparrow.telecommunity.com> Message-ID: On Thu, Sep 16, 2010 at 21:39, P.J. Eby wrote: > Or, to put it another way: splitting the spec into two 100% incompatible > versions is a bad idea for Python 3 adoption. ?With a WSGI 1 addendum, we > should be able to make it possible to put the same apps and middleware on 2 > and 3 with just a decorator wrapping them. ?(i.e., people should be able to > write libraries that run on both 2 and 3, which is probably critical to > adoption). > > I just wish I'd come to these conclusions much sooner... ?like a year or two > ago. ?:-( Meh, I'd much rather have Web3/WSGI 2 (and I prefer the WSGI name, too) for Python 3 than the small update you're proposing. IMO there are some good improvements in Chris & Armin's spec over the original WSGI, and I would be sad to have to go back to an incremental update that does just enough to make PEP 333 work on Python 3. (Also I think there might actually be value in having some incompatibility to make the distinction clearer.) Cheers, Dirkjan From g.brandl at gmx.net Fri Sep 17 10:29:31 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 17 Sep 2010 10:29:31 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: Am 16.09.2010 20:00, schrieb Ian Bicking: > On Thu, Sep 16, 2010 at 12:35 PM, Guido van Rossum > > wrote: > > On Thu, Sep 16, 2010 at 10:01 AM, Ian Bicking > > wrote: > > Well, reiterating some things I've said before: > > > > * This is clearly just WSGI slightly reworked, why the new name? > > * Why byte values in the environ? No one has offered any real reason they > > are better than native strings. I keep asking people to offer a reason, > > *and no one ever does*. It's just hyperbole and distraction. Frankly I'm > > feeling annoyed. So far my experience makes me believe using native strings > > will make it easier to port and support libraries across 2 and 3. > > Hm. IIUC the proposal is to implicitly assume Latin1 when decoding the > bytes to Unicode. I worry that this will just perpetuate mojibake and > other atrocities committed in Python 2. > > > I was reading http://python.org/dev/peps/pep-0444/ -- is there another revision > under discussion? This seems to explicitly say all environ values will be > bytes. There have been other str-oriented proposals, including mod_wsgi's > implementation. IIUC Guido was referring to your proposal. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Fri Sep 17 10:36:20 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 17 Sep 2010 10:36:20 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> Message-ID: Am 16.09.2010 23:07, schrieb James Mills: >> - the web3 name >> >> If there is any value in this PEP and we find something to decide on, >> there is no reason this couldn't be WSGI 2. But until it's just >> something a small part of the web-sig community worked on directly >> a separate name is a good thing I think, because it does not reserve >> the name "WSGI 2" for something that might actually become WSGI 2 >> in case this PEP gets rejected. > > I personally still don't see any real benefit to changing the key names > from "wsgi" to "web3" (or whatever). I would prefer it remain the > same. If you're going to use Python3, you know you're using Python3 > (you don't need "web3" key names to know that). (subjective) That statement shows another weakness of the "web3" name: this spec is not in the least exclusive to Python 3. (Which would be a bit useless, having two incompatible WSGI/web specs on two incompatible Python versions.) The goal would be to first migrate to WSGI2/web3, and *then* have an easy transition going to Python 3. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From bchesneau at gmail.com Fri Sep 17 10:59:39 2010 From: bchesneau at gmail.com (Benoit Chesneau) Date: Fri, 17 Sep 2010 10:59:39 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> Message-ID: On Fri, Sep 17, 2010 at 10:36 AM, Georg Brandl wrote: > Am 16.09.2010 23:07, schrieb James Mills: >>> - the web3 name >>> >>> ?If there is any value in this PEP and we find something to decide on, >>> ?there is no reason this couldn't be WSGI 2. ?But until it's just >>> ?something a small part of the web-sig community worked on directly >>> ?a separate name is a good thing I think, because it does not reserve >>> ?the name "WSGI 2" for something that might actually become WSGI 2 >>> ?in case this PEP gets rejected. >> >> I personally still don't see any real benefit to changing the key names >> from "wsgi" to "web3" (or whatever). I would prefer it remain the >> same. If you're going to use Python3, you know you're using Python3 >> (you don't need "web3" key names to know that). (subjective) > > That statement shows another weakness of the "web3" name: this spec is not > in the least exclusive to Python 3. ?(Which would be a bit useless, having > two incompatible WSGI/web specs on two incompatible Python versions.) > > The goal would be to first migrate to WSGI2/web3, and *then* have an easy > transition going to Python 3. > > Georg > also WSGI acronym is defining better the purpose by itself than "web3" which mean nothing. - benoit From bchesneau at gmail.com Fri Sep 17 11:07:01 2010 From: bchesneau at gmail.com (Benoit Chesneau) Date: Fri, 17 Sep 2010 11:07:01 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C92489C.2040704@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C92489C.2040704@active-4.com> Message-ID: On Thu, Sep 16, 2010 at 6:41 PM, Armin Ronacher wrote: >> ?4. The web3 spec says, "In case a content length header is absent the >> ? ? stream must not return anything on read. It must never request more >> ? ? data than specified from the client." but later it says, "Web3 >> ? ? servers must handle any supported inbound "hop-by-hop" headers on >> ? ? their own, such as by decoding any inbound Transfer-Encoding, >> ? ? including chunked encoding if applicable.". I would be sad if web3 >> ? ? did not support streaming uploads via Transfer-Encoding. One way to >> ? ? implement that would be to make the origin server handle read() >> ? ? transparently by returning '' on EOF, regardless of whether a >> ? ? Content-Length or a Transfer-Encoding header was provided. > > I was toying with the idea to have a websocket extension for web3 which > would have solved my usecase for requests without a content-length header. > ?The problem with the content length of incoming data is quite complex and > that seemed to be the solution that was easiest for everybody involved. > uh ? Since with Transfer-Encoding: chunked we know when the stream end, I would be in favor of returning an EOF too at the end. Also most of servers know when a stream end even if there is no content-length. Maybe we could have a capability setting in environ that say if the server support streaming or not. And in all cases returning EOF at the end? - beno?t From and-py at doxdesk.com Fri Sep 17 11:40:28 2010 From: and-py at doxdesk.com (And Clover) Date: Fri, 17 Sep 2010 11:40:28 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> Message-ID: <4C93378C.3020707@doxdesk.com> On 09/17/2010 04:21 AM, Ian Bicking wrote: > Yes, if we get rid of SCRIPT_NAME/PATH_INFO then the problem goes away. For > servers without access to the unencoded value, reencoding those values > doesn't actually lose any information over what we have now, and avoids any > encoding issues. It doesn't lose any information, but it also makes script_name/path_info inherently unreliable. My fear is that if gateways are allowed to create a reconstructed script_name/path_info without clearly signalling they have done so, those values will continue to be unreliable at all times and server authors won't feel the need to get it right since it's broken everywhere anyway: the unhappy status quo. This is why I am continuing to plead for a 'script_name/path_info are authoritative' flag in environ that applications can use to detect situations where it is safe to go ahead and rely on them. I want to say "Unicode paths are supported if your server/gateway does", not "Unicode paths might sometimes work, depending on how you configure your server and application". It is not just CGI that is affected here! IIS does not provide the original undecoded path at all, even through ISAPI. At the moment I am using a 'fixPathInfo' method in my form-reading layer to try to compensate as much as possible for the problems of CGI: - on Python 2 on Windows, re-read the environment variables using ctypes if available, to avoid the mangling caused by reading os.environ using mbcs. (This didn't used to work, as old versions of IIS deliberately mbcs-filtered values before putting them in the environment, but it does now.) - on Python 3 on POSIX, re-read the environment variables using environb if available. Otherwise try to reverse the faulty decoding of environ using surrogateescapes, where available. - on Windows, encode the Unicode environment to bytes using ISO-8859-1 if the server is Apache, or UTF-8 is the server is IIS. (IIS tries to decode path bytes using UTF-8, falling back to mbcs where the input is not valid UTF-8. Unfortunately there is no way to tell this has happened.) - when server is Microsoft-IIS, remove the erroneously repeated SCRIPT_NAME components from the front of PATH_INFO. (This is a long-standing bug that can be configured away using the allowPathInfo/AllowPathInfoForScriptMappings configs, but no- one does as it breaks ASP.) However, the form layer is not really the right place to be doing these hacks. It would be better done in the stdlib CGI handler. > Servers with REQUEST_URI can at least attempt to > reconstruct the encoded values. This is slightly unsafe. It's something an application might want to do (or at least provide as an option), but a gateway probably couldn't get away with it for the general case because REQUEST_URI doesn't reflect the redirections done by a RewriteRule or an ErrorDocument. > Cookie is also the one header that can't be safely folded. There are others, eg. Authorization. Anyway: folding doesn't happen in the HTTP world. It can be forgotten about. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From armin.ronacher at active-4.com Fri Sep 17 14:03:50 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Fri, 17 Sep 2010 14:03:50 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C93378C.3020707@doxdesk.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> <4C93378C.3020707@doxdesk.com> Message-ID: <4C935926.1020904@active-4.com> Hi, On 9/17/10 11:40 AM, And Clover wrote: > This is why I am continuing to plead for a 'script_name/path_info are > authoritative' flag in environ that applications can use to detect > situations where it is safe to go ahead and rely on them. I want to say > "Unicode paths are supported if your server/gateway does", not "Unicode > paths might sometimes work, depending on how you configure your server > and application". In case there is no raw value with the current spec, you can see SCRIPT_NAME and PATH_INFO as unreliable. In case we change the spec as Ian mentioned above, I am all for a "wsgi.guessed_encoding" = True flag or something like that. > It is not just CGI that is affected here! IIS does not provide the > original undecoded path at all, even through ISAPI. Unless I am mistaken, the same is true for CGI scripts running on Apache2 on Windows. > - on Python 2 on Windows, re-read the environment variables using > ctypes if available, to avoid the mangling caused by reading > os.environ using mbcs. (This didn't used to work, as old versions > of IIS deliberately mbcs-filtered values before putting them in the > environment, but it does now.) I did some tests a while ago and was pretty sure that Apache2 on Windows did the same. Might be wrong though. > However, the form layer is not really the right place to be doing these > hacks. It would be better done in the stdlib CGI handler. The correct place for these hacks would be the appropriate WSGI/Web3 handler of the webserver. Certainly not a particular WSGI/Web3 implementation or even the CGI module of the standard library. Regards, Armin From and-py at doxdesk.com Fri Sep 17 15:43:21 2010 From: and-py at doxdesk.com (And Clover) Date: Fri, 17 Sep 2010 15:43:21 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C935926.1020904@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> <4C93378C.3020707@doxdesk.com> <4C935926.1020904@active-4.com> Message-ID: <4C937079.3070904@doxdesk.com> On 09/17/2010 02:03 PM, Armin Ronacher wrote: > In case we change the spec as Ian mentioned above, I am all for > a "wsgi.guessed_encoding" = True flag or something like that. Yes, I'd like to see that. I believe going with *only* a raw-or-reconstructed path_info, rather than having both path_info and PATH_INFO, is probably best, for the middleware-dupication reasons PJE mentioned. A more in-depth possibility might be: wsgi.path_accuracy = 0: script_name/path_info have been crudely reconstructed from SCRIPT_NAME/PATH_INFO from an unknown source. Beware! If there is to be backwards compatibility with WSGI1, this would be seen as the 'default value' given a missing path_accuracy. 1: script_name/path_info have been reconstructed, but it is known that path_info is accurate, other than %2F and non-ASCII issues. That is, it's known that the path doesn't come from IIS's broken PATH_INFO, or the IIS error has been detected and compensated for. 2: script_name/path_info have been reconstructed using known-good encodings for the env. The only way in which they may differ from the original request path is that a slash might originally have been a %2F. (This is good enough for the vast majority of applications.) 3: script_name/path_info come directly from the request path without any intervening mangling. > Unless I am mistaken, the same is true for CGI scripts running on > Apache2 on Windows. Yes, it's true of *all* CGI scripts, but also for non-CGI scripts on IIS. > I did some tests a while ago and was pretty sure that Apache2 on Windows > did the same. Apache-on-Windows puts the bytes of the decoded path into the environment variables as one code unit per byte: that is, as if encoded by ISO-8859-1. You still have to read the environ using ctypes because mbcs is never ISO-8859-1, but at least the original bytes are recoverable, which isn't the case with IIS. > The correct place for these hacks would be the appropriate WSGI/Web3 > handler of the webserver. The IIS PATH_INFO-prefix hack would be appropriate to put in an IIS-specific handler; indeed, I believe isapi_wsgi does just that. But the other hacks are specific to CGI. For CGI, there is no 'handler of the webserver', there is only the standard CGI-to-WSGI adapter, so this is the only component it is reasonable to burden with the hacks. Frameworks and libraries further up the stack cannot reliably do the fixups, because they don't know whether the WSGI environ they have been given comes from os.environ or somewhere else, or whether middleware has played with it. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From pje at telecommunity.com Fri Sep 17 17:42:42 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 17 Sep 2010 11:42:42 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C937079.3070904@doxdesk.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> <4C93378C.3020707@doxdesk.com> <4C935926.1020904@active-4.com> <4C937079.3070904@doxdesk.com> Message-ID: <20100917154244.C43C63A403D@sparrow.telecommunity.com> At 03:43 PM 9/17/2010 +0200, And Clover wrote: >On 09/17/2010 02:03 PM, Armin Ronacher wrote: > >>In case we change the spec as Ian mentioned above, I am all for >>a "wsgi.guessed_encoding" = True flag or something like that. > >Yes, I'd like to see that. I believe going with *only* a >raw-or-reconstructed path_info, rather than having both path_info >and PATH_INFO, is probably best, for the middleware-dupication >reasons PJE mentioned. > >A more in-depth possibility might be: > >wsgi.path_accuracy = > > 0: script_name/path_info have been crudely reconstructed from > SCRIPT_NAME/PATH_INFO from an unknown source. Beware! > If there is to be backwards compatibility with WSGI1, this > would be seen as the 'default value' given a missing path_accuracy. > > 1: script_name/path_info have been reconstructed, but it is known > that path_info is accurate, other than %2F and non-ASCII issues. > That is, it's known that the path doesn't come from IIS's broken > PATH_INFO, or the IIS error has been detected and compensated for. > > 2: script_name/path_info have been reconstructed using known-good > encodings for the env. The only way in which they may differ from > the original request path is that a slash might originally have > been a %2F. (This is good enough for the vast majority of > applications.) > > 3: script_name/path_info come directly from the request path > without any intervening mangling. So, do you have an example of what some real-world code is going to *do* with this information? i.e., what's the use case for knowing the precise degree of messed-uppedness of the path? ;-) From and-py at doxdesk.com Fri Sep 17 18:06:38 2010 From: and-py at doxdesk.com (And Clover) Date: Fri, 17 Sep 2010 18:06:38 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100917154244.C43C63A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> <4C93378C.3020707@doxdesk.com> <4C935926.1020904@active-4.com> <4C937079.3070904@doxdesk.com> <20100917154244.C43C63A403D@sparrow.telecommunity.com> Message-ID: <4C93920E.2050909@doxdesk.com> On 09/17/2010 05:42 PM, P.J. Eby wrote: > do you have an example of what some real-world code is going to *do* > with this information? At level 0, the application can't rely on PATH_INFO at all. It can't do routing without deployment help from rewrites. It may choose to generate only links in query-string form instead of routed paths. At level 1, the application can use routing, as long as any strings inserted into the generated links are slugged down to simple ASCII (without %2F or control codes). At level 2, the application can output full Unicode paths, knowing it will be able to retrieve unmolested non-ASCII path segments for matching. This is needed for routing Wikipedia-style Unicode URLs (eg. in IRI form, http://en.wikipedia.org/wiki/??). At level 3, the application can put any byte sequence in a path part and retrieve it without it having been changed. Probably not a good idea for an application to require this, but it's here for completeness. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From armin.ronacher at active-4.com Fri Sep 17 18:25:42 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Fri, 17 Sep 2010 18:25:42 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <20100917154244.C43C63A403D@sparrow.telecommunity.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> <4C93378C.3020707@doxdesk.com> <4C935926.1020904@active-4.com> <4C937079.3070904@doxdesk.com> <20100917154244.C43C63A403D@sparrow.telecommunity.com> Message-ID: <4C939686.6080809@active-4.com> Hi, On 9/17/10 5:42 PM, P.J. Eby wrote: > So, do you have an example of what some real-world code is going to *do* > with this information? i.e., what's the use case for knowing the precise > degree of messed-uppedness of the path? ;-) Actually, I can see a couple of use cases. I have a blog that by default only produces ASCII-safe slugs for the URLs which means that if you are a chinese person you will only get the ID based fallback there. If I could safely detect if the setup supports unicode identifiers in URLs in a way that works, I could give a good default and warn the user if they change the setting. Regards, Armin From ionel.mc at gmail.com Fri Sep 17 18:47:10 2010 From: ionel.mc at gmail.com (Ionel Maries Cristian) Date: Fri, 17 Sep 2010 19:47:10 +0300 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: I don't like this proposal at all. Besides having to go through the bytes craziness the design is pretty backwards for middleware and asynchronous applications. Even the proxy_and_timing_support example in the PEP is broken for async or streaming apps - it won't return the proper time (since it doesn't consume the body iterable) and it will fail most of the times since you can't just add a tuple to a iterable. The missing requirement that middleware must yield at least an empty string if they need more more information from the application iterable also breaks async gateways that expect oob information from the app (for example cogen can't be ported to this spec). The removed requirement "middleware components *must not* block iteration waiting for multiple values from an application iterable. If the middleware needs to accumulate more data from the application before it can produce any output, it *must* yield an empty string." also breaks async gateways/apps. I feel this spec puts too much burden on applications - having to process all those byte strings and even having to add Content-Length even for naive buffered-body apps. --ionel On Thu, Sep 16, 2010 at 02:03, Chris McDonough wrote: > A PEP was submitted and accepted today for a WSGI successor protocol > named Web3: > > http://python.org/dev/peps/pep-0444/ > > I'd encourage other folks to suggest improvements to that spec or to > submit a competing spec, so we can get WSGI-on-Python3 settled soon. > > - C > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrism at plope.com Fri Sep 17 19:01:04 2010 From: chrism at plope.com (Chris McDonough) Date: Fri, 17 Sep 2010 13:01:04 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <1284742864.3022.49.camel@thinko> On Fri, 2010-09-17 at 19:47 +0300, Ionel Maries Cristian wrote: > I don't like this proposal at all. Besides having to go through the > bytes craziness the design is pretty backwards for middleware and > asynchronous applications. We've acknowledged in other messages to this thread that the web3.async red herring is speculative, and Armin has indicated that if he does not find a champion willing to create a reference implementation for it today that it will be taken out. This doesn't help async people, but it also doesn't harm them (no difference from WSGI really). Personally, I hope nobody steps up and we just rip it out. ;-) I'm not sure why you characterize using bytes as "bytes craziness". We have been using strings as byte sequences in WSGI for over five years. Python itself draws an equivalence between the Python 3 bytes type and Python 2 "str" ("bytes" is aliased to "str" under Python 2). I'm not really sure why we shouldn't take advantage of that equivalence, and why people are so enamored of treating envvar values, headers, and such as text other than the brokenness of the Python 3 stdlib urllib stuff. IMO, WSGI/Web3 isn't really a programming platform (or at least if it is, it is destined to be a pretty crappy one), it's just a connection protocol, so any "its more typing" or "its ugly" argument seems pretty thin to me. I'd personally rather have it be more general and less easy to use than potentially broken in some corner case circumstance. - C From ianb at colorstudy.com Fri Sep 17 19:02:52 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 17 Sep 2010 13:02:52 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C937079.3070904@doxdesk.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> <4C93378C.3020707@doxdesk.com> <4C935926.1020904@active-4.com> <4C937079.3070904@doxdesk.com> Message-ID: On Fri, Sep 17, 2010 at 9:43 AM, And Clover wrote: > On 09/17/2010 02:03 PM, Armin Ronacher wrote: > > In case we change the spec as Ian mentioned above, I am all for >> a "wsgi.guessed_encoding" = True flag or something like that. >> > > Yes, I'd like to see that. I believe going with *only* a > raw-or-reconstructed path_info, rather than having both path_info and > PATH_INFO, is probably best, for the middleware-dupication reasons PJE > mentioned. > > A more in-depth possibility might be: > > wsgi.path_accuracy = > > 0: script_name/path_info have been crudely reconstructed from > SCRIPT_NAME/PATH_INFO from an unknown source. Beware! > If there is to be backwards compatibility with WSGI1, this > would be seen as the 'default value' given a missing path_accuracy. > > 1: script_name/path_info have been reconstructed, but it is known > that path_info is accurate, other than %2F and non-ASCII issues. > That is, it's known that the path doesn't come from IIS's broken > PATH_INFO, or the IIS error has been detected and compensated for. > > 2: script_name/path_info have been reconstructed using known-good > encodings for the env. The only way in which they may differ from > the original request path is that a slash might originally have > been a %2F. (This is good enough for the vast majority of > applications.) > > 3: script_name/path_info come directly from the request path > without any intervening mangling. path_accuracy is certainly a better name than encoding; nothing here actually relates to encoding (except insofar as attempts to encode or reencode values corrupts the path). Personally I wouldn't want to split it up this much, I'd rather a simple flag to indicate something was guessed, vs. an accurate request. The only real value I see in it is to help people debug problems. Maybe. I'm not sure it's that realistic to imagine this will be noticed by people deploying software and encountering problems. A helpful application could use it to warn the deployer of potential problems. It seems that it would be possible to create a WSGI application and client library that together can detect and help resolve these issues. E.g., the application always returns the values of script_name, path_info, and query_string, and the client fires off a bunch of different requests to see how it gets interpreted. It could suggest corrections until everything passes. I would really like to see concerns over bad gateways not be used to keep valuable information out of the spec. We want people to use well-configured gateways that accurately represent requests. There are limits, e.g., in environments where information is lost. The only really problematic example is losing the distinction between %2f and /, and I think it's reasonable to suggest that applications should avoid making that distinction in the path if they want to be easily deployed in different environments. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Fri Sep 17 19:24:53 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 17 Sep 2010 13:24:53 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <4C92CB90.9050505@active-4.com> <4C93378C.3020707@doxdesk.com> <4C935926.1020904@active-4.com> <4C937079.3070904@doxdesk.com> Message-ID: On Fri, Sep 17, 2010 at 1:02 PM, Ian Bicking wrote: > I would really like to see concerns over bad gateways not be used to keep > valuable information out of the spec. We want people to use well-configured > gateways that accurately represent requests. There are limits, e.g., in > environments where information is lost. The only really problematic example > is losing the distinction between %2f and /, and I think it's reasonable to > suggest that applications should avoid making that distinction in the path > if they want to be easily deployed in different environments. > Just to expand -- the reason %2f is special is because / has special meaning in URL paths, or at least is treated as such. ? has special meaning too, but that's already handled by splitting off QUERY_STRING. Technically ; is supposed to mean something, but no one ever cared, so it doesn't really. In theory you could make any character special, and in doing so want an escape mechanism to determine the difference between, e.g., "," and %2c... but no one does that, so no problem. All the other potential problems are problems of gateway corruption. E.g., where the bytes were decoded with Latin1 and then encoded with sys.getfilesystemencoding(), or some other mismatched combination. I don't believe we should expose gateway corruption to the spec. I *do* believe that we can build tools inside WSGI to help debug and fix those problems, and I don't think any of these changes makes those tools particularly harder to implement. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.dent at gmail.com Fri Sep 17 19:37:59 2010 From: chris.dent at gmail.com (chris.dent at gmail.com) Date: Fri, 17 Sep 2010 18:37:59 +0100 (BST) Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: On Fri, 17 Sep 2010, Ionel Maries Cristian wrote: > I feel this spec puts too much burden on applications - having to process > all those byte strings and even having to add Content-Length even for naive > buffered-body apps. The Content-Length requirement is a big killer for me. I'm usually generating content in apps, rather deep in a stack of middleware-like pieces that may or may not be looking at or modifying that content. I don't want to a) have to unwind my generators at each level b) reset the content-length here there and everywhere. It could be I'm doing it completely wrong, but it works rather nicely. -- Chris Dent http://burningchrome.com/ [...] From ianb at colorstudy.com Fri Sep 17 19:43:08 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 17 Sep 2010 13:43:08 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: On Fri, Sep 17, 2010 at 1:37 PM, wrote: > On Fri, 17 Sep 2010, Ionel Maries Cristian wrote: > > I feel this spec puts too much burden on applications - having to process >> all those byte strings and even having to add Content-Length even for >> naive >> buffered-body apps. >> > > The Content-Length requirement is a big killer for me. I'm usually > generating content in apps, rather deep in a stack of middleware-like > pieces that may or may not be looking at or modifying that content. > I don't want to a) have to unwind my generators at each level b) > reset the content-length here there and everywhere. > > It could be I'm doing it completely wrong, but it works rather > nicely. > I'm unclear what exactly you guys are reacting to. This? - The server must not inject an additional Content-Length header by guessing the length from the response iterable. This must be set by the application itself in all situations. I'm also not sure what motivated this particular change, but I don't have any opinion one way or the other. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From armin.ronacher at active-4.com Fri Sep 17 20:06:13 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Fri, 17 Sep 2010 20:06:13 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko>

Message-ID: <4C93AE15.2080106@active-4.com> Hi, On 9/17/10 7:43 PM, Ian Bicking wrote: > I'm also not sure what motivated this particular change, but I don't > have any opinion one way or the other. Motivation is that WSGI wants servers to do something like this: if len(iterable) == 1 and content_length_header_missing: headers.append(('Content-Length', str(len(iterable[0]))) However not everybody was doing that and some applications were setting a content length header or not. If a content length header was not set some middlewares that changed content worked properly even though they did not check the header. The idea is that with web3 every tool in the chain is supposed to look for that header and update it appropriately. Even the piglatin middleware from the PEP 333 did not check the content length if I remember correctly. Regards, Armin From ianb at colorstudy.com Fri Sep 17 20:14:29 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 17 Sep 2010 14:14:29 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C93AE15.2080106@active-4.com> References: <1284591800.14651.36.camel@thinko>

<4C93AE15.2080106@active-4.com> Message-ID: On Fri, Sep 17, 2010 at 2:06 PM, Armin Ronacher wrote: > Hi, > > > On 9/17/10 7:43 PM, Ian Bicking wrote: > >> I'm also not sure what motivated this particular change, but I don't >> have any opinion one way or the other. >> > Motivation is that WSGI wants servers to do something like this: > > if len(iterable) == 1 and content_length_header_missing: > headers.append(('Content-Length', str(len(iterable[0]))) > > However not everybody was doing that and some applications were setting a > content length header or not. If a content length header was not set some > middlewares that changed content worked properly even though they did not > check the header. The idea is that with web3 every tool in the chain is > supposed to look for that header and update it appropriately. > > Even the piglatin middleware from the PEP 333 did not check the content > length if I remember correctly. > OK, so maybe it should just be clarified: * Middleware and servers should not modify or add Content-Length, Date, or other headers unless they have reason to do so, and they must ensure that the response is valid (e.g., there should never be two Content-Length headers). It still seems reasonable that *if* there is no Content-Length, and the server can guess easily enough (mostly it is returned an actual list/tuple that we know can be introspected fast and without side effects), then it's perfectly reasonable to set it -- but certainly the server doesn't "own" that header (or any other, except maybe some connection-related headers?). -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc at gsites.de Sat Sep 18 11:03:25 2010 From: marc at gsites.de (Marcel Hellkamp) Date: Sat, 18 Sep 2010 11:03:25 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <4C9284FF.60309@active-4.com> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> Message-ID: <1284800605.1919.36.camel@nava> Am Donnerstag, den 16.09.2010, 22:58 +0200 schrieb Armin Ronacher: > - The async part. > If I can't find someone that is willing to provide some input on that > I will remove that section. I see a problem here: The response tuple must be returned synchronously according to web3. Once returned, the values are final. If an application needs to wait for some background task to finish in order to decide about headers or the status code, it is now forced to block completely. A common use case for this is a web service that itself queries other web services (e.g. an ajax proxy to work around "same origin policy"). With WSGI it was possible to yield empty strings as long as the application is waiting for data and call start_response once the headers are final. Not perfect, but at least non-blocking. Web3 removes this possibility. The headers must be returned before the body iterable yielded its first element, empty or not. Removing any support for this type of asynchronism would render web3 useless for all but completely synchronous and trivial applications. Even frameworks would have no way to work around this anymore. I do understand that the start_response callable is inconvenient for middleware to implement, but it totally made sense. -- Mit freundlichen Gr??en Marcel Hellkamp From ianb at colorstudy.com Sat Sep 18 11:34:00 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 18 Sep 2010 05:34:00 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284800605.1919.36.camel@nava> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <1284800605.1919.36.camel@nava> Message-ID: On Sat, Sep 18, 2010 at 5:03 AM, Marcel Hellkamp wrote: > With WSGI it was possible to yield empty strings as long as the > application is waiting for data and call start_response once the headers > are final. Not perfect, but at least non-blocking. Web3 removes this > possibility. The headers must be returned before the body iterable > yielded its first element, empty or not. > > Removing any support for this type of asynchronism would render web3 > useless for all but completely synchronous and trivial applications. > Even frameworks would have no way to work around this anymore. > I'm aware of what a lot of people have done with WSGI, but I'm not aware of anyone doing an async proxy of any sort, or implementing anything in a way where this empty string policy served any function. It's not implausible that it *could* be used, but years of practice have shown it is not used. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionel.mc at gmail.com Sat Sep 18 13:08:57 2010 From: ionel.mc at gmail.com (Ionel Maries Cristian) Date: Sat, 18 Sep 2010 14:08:57 +0300 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <1284800605.1919.36.camel@nava> Message-ID: There's a framework called cogen and it relies on this policy. -- ionel On Sat, Sep 18, 2010 at 12:34, Ian Bicking wrote: > On Sat, Sep 18, 2010 at 5:03 AM, Marcel Hellkamp wrote: > >> With WSGI it was possible to yield empty strings as long as the >> application is waiting for data and call start_response once the headers >> are final. Not perfect, but at least non-blocking. Web3 removes this >> possibility. The headers must be returned before the body iterable >> yielded its first element, empty or not. >> >> Removing any support for this type of asynchronism would render web3 >> useless for all but completely synchronous and trivial applications. >> Even frameworks would have no way to work around this anymore. >> > > I'm aware of what a lot of people have done with WSGI, but I'm not aware of > anyone doing an async proxy of any sort, or implementing anything in a way > where this empty string policy served any function. It's not implausible > that it *could* be used, but years of practice have shown it is not used. > > -- > Ian Bicking | http://blog.ianbicking.org > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fumanchu at aminus.org Sat Sep 18 18:01:30 2010 From: fumanchu at aminus.org (Robert Brewer) Date: Sat, 18 Sep 2010 09:01:30 -0700 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284800605.1919.36.camel@nava> References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <1284800605.1919.36.camel@nava> Message-ID: Marcel Hellkamp wrote: > Am Donnerstag, den 16.09.2010, 22:58 +0200 schrieb Armin Ronacher: > > - The async part. > > If I can't find someone that is willing to provide some input on that > > I will remove that section. > > I see a problem here: The response tuple must be returned synchronously > according to web3. Once returned, the values are final. If an > application needs to wait for some background task to finish in order > to decide about headers or the status code, it is now forced to block > completely. > > A common use case for this is a web service that itself queries other > web services (e.g. an ajax proxy to work around "same origin policy"). > > With WSGI it was possible to yield empty strings as long as the > application is waiting for data and call start_response once the > headers are final. Not perfect, but at least non-blocking. Web3 > removes this possibility. The headers must be returned before the > body iterable yielded its first element, empty or not. > > Removing any support for this type of asynchronism would render web3 > useless for all but completely synchronous and trivial applications. > Even frameworks would have no way to work around this anymore. > > I do understand that the start_response callable is inconvenient for > middleware to implement, but it totally made sense. I don't follow. What is the benefit of yielding empty strings instead of just waiting for the status and headers to be available? Do you then run off and do other things with that server thread? I've run a few businesses now on WSGI without doing what you describe, so I don't see why blocking makes an application 'trivial'. Robert Brewer fumanchu at aminus.org From pje at telecommunity.com Sat Sep 18 18:30:36 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 18 Sep 2010 12:30:36 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <1284800605.1919.36.camel@nava> Message-ID: <20100918163040.68ED53A403D@sparrow.telecommunity.com> At 09:01 AM 9/18/2010 -0700, Robert Brewer wrote: >Marcel Hellkamp wrote: > > > > Removing any support for this type of asynchronism would render web3 > > useless for all but completely synchronous and trivial applications. > > Even frameworks would have no way to work around this anymore. > >I've run a few businesses now on WSGI without doing what you >describe, so I don't see why blocking makes an application 'trivial'. I believe he means: all_but(synchronous_apps + trivial_apps), not all_but(apps(synchronous & trivial)). ;-) (That being said, for WSGI 2 I still want to get rid of start_response. IMO, async WSGI needs to be a different protocol.) From chrism at plope.com Sun Sep 19 17:42:36 2010 From: chrism at plope.com (Chris McDonough) Date: Sun, 19 Sep 2010 11:42:36 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <1284910956.3022.139.camel@thinko> On Thu, 2010-09-16 at 05:29 +0200, Roberto De Ioris wrote: > About the *.file_wrapper removal, i suggest > a PSGI-like approach where 'body' can contains a File Object. > > def file_app(environ): > fd = open('/tmp/pippo.txt', 'r') > status = b'200 OK' > headers = [(b'Content-type', b'text/plain')] > body = fd > return body, status, headers I don't see why this couldn't work as long as middleware didn't convert the body into something not-file-like. But it is really an implementation detail of the origin server (it might specialize when the body is a file), and doesn't really need to be in the spec. > or > > def file_app(environ): > fd = open('/tmp/pippo.txt', 'r') > status = b'200 OK' > headers = [(b'Content-type', b'text/plain')] > body = [b'Header', fd, b'Footer'] > return body, status, headers This won't work, as the body is required to return an iterable which returns bytes, and cannot be an iterable which returns either bytes or other iterables (it must be a "flat" sequence). - C From chrism at plope.com Sun Sep 19 17:32:36 2010 From: chrism at plope.com (Chris McDonough) Date: Sun, 19 Sep 2010 11:32:36 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> Message-ID: <1284910356.3022.135.camel@thinko> On Thu, 2010-09-16 at 13:44 +0200, Tarek Ziad? wrote: > On Thu, Sep 16, 2010 at 1:03 AM, Chris McDonough wrote: > > A PEP was submitted and accepted today for a WSGI successor protocol > > named Web3: > > > > http://python.org/dev/peps/pep-0444/ > > > > I'd encourage other folks to suggest improvements to that spec or to > > submit a competing spec, so we can get WSGI-on-Python3 settled soon. > > I have a request for the middleware stack. There should be one obvious > way to get back to the original application, through the stack > > Right now, I have to write crazy things like this depending on the stack: > > original_app = self.app.app.application.app > > Because some middleware use "app", some "application" etc.. > > I propose to write in the PEP that a middleware should provide an > "app" attribute to get the wrapped application or middleware. > It seems to be the most common name used out there. We can't really mandate this because middleware is not required to be an instance. It can be a function. - C From ianb at colorstudy.com Mon Sep 20 00:41:51 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 19 Sep 2010 18:41:51 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284910356.3022.135.camel@thinko> References: <1284591800.14651.36.camel@thinko> <1284910356.3022.135.camel@thinko> Message-ID: On Sun, Sep 19, 2010 at 11:32 AM, Chris McDonough wrote: > > I propose to write in the PEP that a middleware should provide an > > "app" attribute to get the wrapped application or middleware. > > It seems to be the most common name used out there. > > We can't really mandate this because middleware is not required to be an > instance. It can be a function. > We could suggest it, and suggest the attribute name. Composites, lazy loading middleware, or a bunch of other situations can break it... but it's nice for introspection tools to at least be able to attempt to run down the chain. Middleware is almost always a closure if it's a function, I believe, so you could still do: def caps(app): def replacement_app(environ): status, headers, body = app(environ) body = [''.join(body).upper()] return status, headers, body replacement_app.app = app return replacement_app -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrism at plope.com Mon Sep 20 03:44:24 2010 From: chrism at plope.com (Chris McDonough) Date: Sun, 19 Sep 2010 21:44:24 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <4C9284FF.60309@active-4.com> <1284800605.1919.36.camel@nava> Message-ID: <1284947064.3022.157.camel@thinko> On Sat, 2010-09-18 at 14:08 +0300, Ionel Maries Cristian wrote: > There's a framework called cogen and it relies on this policy. I've been told by a number of people (both async and sync people) that WSGI is a poor protocol on top of which to develop async applications, and they usually go on to say that async applications and servers really should communicate over separate (perhaps-WSGI-like) protocol. I don't really know much about developing async web applications, but frankly I'm loath to keep features in this thing that are only tolerated (spat upon lightly! ;-)) by async folks, but which are also common tripping points for people who never write async applications. This is an apologetic way of saying "please find more champions for this feature". - C > > -- ionel > > On Sat, Sep 18, 2010 at 12:34, Ian Bicking > wrote: > On Sat, Sep 18, 2010 at 5:03 AM, Marcel Hellkamp > wrote: > > With WSGI it was possible to yield empty strings as > long as the > application is waiting for data and call > start_response once the headers > are final. Not perfect, but at least non-blocking. > Web3 removes this > possibility. The headers must be returned before the > body iterable > yielded its first element, empty or not. > > Removing any support for this type of asynchronism > would render web3 > useless for all but completely synchronous and trivial > applications. > Even frameworks would have no way to work around this > anymore. > > I'm aware of what a lot of people have done with WSGI, but I'm > not aware of anyone doing an async proxy of any sort, or > implementing anything in a way where this empty string policy > served any function. It's not implausible that it *could* be > used, but years of practice have shown it is not used. > > > > -- > Ian Bicking | http://blog.ianbicking.org > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/ionel.mc% > 40gmail.com > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com From chrism at plope.com Mon Sep 20 03:52:06 2010 From: chrism at plope.com (Chris McDonough) Date: Sun, 19 Sep 2010 21:52:06 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko>

<4C93AE15.2080106@active-4.com> Message-ID: <1284947526.3022.163.camel@thinko> On Fri, 2010-09-17 at 14:14 -0400, Ian Bicking wrote: > OK, so maybe it should just be clarified: > > * Middleware and servers should not modify or add Content-Length, > Date, or other headers unless they have reason to do so, and they must > ensure that the response is valid (e.g., there should never be two > Content-Length headers). I tried adding such a statement to a local copy of the specification, but I wasn't able to really "nail" it. If someone here can come up with some unambiguous wording (defining "unless they have reason to do so" and "other headers" above would be a good start), I'd just put it in. > It still seems reasonable that *if* there is no Content-Length, and > the server can guess easily enough (mostly it is returned an actual > list/tuple that we know can be introspected fast and without side > effects), then it's perfectly reasonable to set it -- but certainly > the server doesn't "own" that header (or any other, except maybe some > connection-related headers?). I'm -0 on the server trying to guess the Content-Length header. It just doesn't seem like much of a burden to place on an application and it's easier to specify that an application must do this than it is to specify how a server should behave in the face of a missing Content-Length. I also believe Graham has argued against making the server guess, I presume this causes him some pain somehow (probably underspecification in WSGI). - C From chrism at plope.com Mon Sep 20 04:58:32 2010 From: chrism at plope.com (Chris McDonough) Date: Sun, 19 Sep 2010 22:58:32 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284947526.3022.163.camel@thinko> References: <1284591800.14651.36.camel@thinko>

<4C93AE15.2080106@active-4.com> <1284947526.3022.163.camel@thinko> Message-ID: <1284951512.3022.204.camel@thinko> On Sun, 2010-09-19 at 21:52 -0400, Chris McDonough wrote: > I'm -0 on the server trying to guess the Content-Length header. It just > doesn't seem like much of a burden to place on an application and it's > easier to specify that an application must do this than it is to specify > how a server should behave in the face of a missing Content-Length. I > also believe Graham has argued against making the server guess, I > presume this causes him some pain somehow (probably underspecification > in WSGI). Graham's issues with requiring the server to set Content-Length are detailed here: http://blog.dscpl.com.au/2009/10/wsgi-issues-with-http-head-requests.html From chris.dent at gmail.com Mon Sep 20 11:35:12 2010 From: chris.dent at gmail.com (chris.dent at gmail.com) Date: Mon, 20 Sep 2010 10:35:12 +0100 (BST) Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <1284910356.3022.135.camel@thinko> Message-ID: On Sun, 19 Sep 2010, Ian Bicking wrote: > On Sun, Sep 19, 2010 at 11:32 AM, Chris McDonough wrote: > >>> I propose to write in the PEP that a middleware should provide an >>> "app" attribute to get the wrapped application or middleware. >>> It seems to be the most common name used out there. >> >> We can't really mandate this because middleware is not required to be an >> instance. It can be a function. >> > > We could suggest it, and suggest the attribute name. Composites, lazy > loading middleware, or a bunch of other situations can break it... but it's > nice for introspection tools to at least be able to attempt to run down the > chain. Middleware is almost always a closure if it's a function, I believe, > so you could still do: If the goal here is to write a spec, then I would prefer that spec say what must be done and what must not be done, not what may be done, could be done or is suggested as perhaps a best practice. Those sorts of things belong in communication that is out of band of the spec. -- Chris Dent http://burningchrome.com/ [...] From guido at python.org Mon Sep 20 17:09:44 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Sep 2010 08:09:44 -0700 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko> <1284910356.3022.135.camel@thinko> Message-ID: On Mon, Sep 20, 2010 at 2:35 AM, wrote: > If the goal here is to write a spec, then I would prefer that spec > say what must be done and what must not be done, not what may be > done, could be done or is suggested as perhaps a best practice. > Those sorts of things belong in communication that is out of band of > the spec. Actually, many specs (esp. Internet RFCs and language specs, the ones I am most familiar with besides PEPs) carefully define and use verbs of different strength, typically must, should, may, should not, must not. This is needed since almost all specs give the implementers of the spec some leeway in how to conform to the spec (otherwise it wouldn't be a spec but a program :-). Doubly so when there are two sides to a protocol (e.g. client/server, consumer/producer). -- --Guido van Rossum (python.org/~guido) From fumanchu at aminus.org Mon Sep 20 17:19:32 2010 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 20 Sep 2010 08:19:32 -0700 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284951512.3022.204.camel@thinko> References: <1284591800.14651.36.camel@thinko>

<4C93AE15.2080106@active-4.com><1284947526.3022.163.camel@thinko> <1284951512.3022.204.camel@thinko> Message-ID: > On Sun, 2010-09-19 at 21:52 -0400, Chris McDonough wrote: > > > I'm -0 on the server trying to guess the Content-Length header. It > just > > doesn't seem like much of a burden to place on an application and > it's > > easier to specify that an application must do this than it is to > specify > > how a server should behave in the face of a missing Content-Length. > I > > also believe Graham has argued against making the server guess, I > > presume this causes him some pain somehow (probably > underspecification > > in WSGI). > > Graham's issues with requiring the server to set Content-Length are > detailed here: > > http://blog.dscpl.com.au/2009/10/wsgi-issues-with-http-head- > requests.html Chris, Thanks for that link. I had completely forgotten about that issue. I'd really appreciate it if your web3 spec made some definitive decision on whether applications and middleware are responsible for correctly differentiating HEAD from GET, or whether servers should transform HEAD to GET before invoking the first application callable. I'd personally prefer the former. Robert Brewer fumanchu at aminus.org From matt.goodall at gmail.com Mon Sep 20 18:31:23 2010 From: matt.goodall at gmail.com (Matt Goodall) Date: Mon, 20 Sep 2010 17:31:23 +0100 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko>

<4C93AE15.2080106@active-4.com> <1284947526.3022.163.camel@thinko> <1284951512.3022.204.camel@thinko> Message-ID: On 20 September 2010 16:19, Robert Brewer wrote: > > On Sun, 2010-09-19 at 21:52 -0400, Chris McDonough wrote: > > > > > I'm -0 on the server trying to guess the Content-Length header. It > > just > > > doesn't seem like much of a burden to place on an application and > > it's > > > easier to specify that an application must do this than it is to > > specify > > > how a server should behave in the face of a missing Content-Length. > > I > > > also believe Graham has argued against making the server guess, I > > > presume this causes him some pain somehow (probably > > underspecification > > > in WSGI). > > > > Graham's issues with requiring the server to set Content-Length are > > detailed here: > > > > http://blog.dscpl.com.au/2009/10/wsgi-issues-with-http-head- > > requests.html > > Chris, > > Thanks for that link. I had completely forgotten about that issue. I'd > really appreciate it if your web3 spec made some definitive decision on > whether applications and middleware are responsible for correctly > differentiating HEAD from GET, or whether servers should transform HEAD > to GET before invoking the first application callable. I'd personally > prefer the former. Servers should definitely not transform a HEAD to a GET. Transforming HEAD to GET and then discarding the body is often not a bad default but an application may well want to handle the HEAD explicitly. For instance, an application's HEAD handler may only need to check an ETag in a database before returning a "304 Not Modified" response (with the correct Content-Length and no body, of course). Similarly, it's almost certainly a bad idea for a WSGI server or middleware to change the Content-Length header in the application's HTTP response because there may be no body to look at. - Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Mon Sep 20 18:36:42 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 20 Sep 2010 12:36:42 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko>

<4C93AE15.2080106@active-4.com> <1284947526.3022.163.camel@thinko> <1284951512.3022.204.camel@thinko> Message-ID: On Mon, Sep 20, 2010 at 12:31 PM, Matt Goodall wrote: > Servers should definitely not transform a HEAD to a GET. > > Transforming HEAD to GET and then discarding the body is often not a bad > default but an application may well want to handle the HEAD explicitly. For > instance, an application's HEAD handler may only need to check an ETag in a > database before returning a "304 Not Modified" response (with the correct > Content-Length and no body, of course). > > Similarly, it's almost certainly a bad idea for a WSGI server or middleware > to change the Content-Length header in the application's HTTP response > because there may be no body to look at. > If a piece of output-transforming middleware is being picky, it could change HEAD to GET on the incoming request, then throw away the response body on its own. This is not a great strategy, but at least it seems like it will create a generally correct result. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From armin.ronacher at active-4.com Mon Sep 20 18:49:04 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Mon, 20 Sep 2010 18:49:04 +0200 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: References: <1284591800.14651.36.camel@thinko>

<4C93AE15.2080106@active-4.com> <1284947526.3022.163.camel@thinko> <1284951512.3022.204.camel@thinko> Message-ID: <4C979080.7010706@active-4.com> Hi, On 9/20/10 6:31 PM, Matt Goodall wrote: > Servers should definitely not transform a HEAD to a GET. There are some good reasons why it currently has to. I haven't read the link in question but I had a discussion with Graham a few days ago on Skype and he outlined the issue in detail. I will write a summary to the list in a few days, just too busy to do that right now :( Regards, Armin From pje at telecommunity.com Tue Sep 21 18:09:44 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 21 Sep 2010 12:09:44 -0400 Subject: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 Message-ID: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> While the Web-SIG is trying to hash out PEP 444, I thought it would be a good idea to have a backup plan that would allow the Python 3 stdlib to move forward, without needing a major new spec to settle out implementation questions. After all, even if PEP 333 is ultimately replaced by PEP 444, it's probably a good idea to have *some* sort of WSGI 1-ish thing available on Python 3, with bytes/unicode and other matters settled. In the past, I was waiting for some consensuses (consensi?) on Web-SIG about different approaches to Python 3, looking for some sort of definite, "yes, we all like this" response. However, I can see now that this just means it's my fault we don't have a spec yet. :-( So, unless any last-minute showstopper rebuttals show up this week, I've decided to go ahead officially bless nearly all of what Graham Dumpleton (who's not only the mod_wsgi author, but has put huge amounts of work into shepherding WSGI-on-Python3 proposals, WSGI amendments, etc.) has proposed, with a few minor exceptions. In other words: almost none of the following is my own original work; it's like 90% Graham's. Any praise for this belongs to him; the only thing that belongs to me is the blame for not doing this sooner! (Sorry Graham. You asked me to do this ages ago, and you were right.) Anyway, I'm posting this for comment to both Python-Dev and the Web-SIG. If you are commenting on the technical details of the amendments, please reply to the Web-SIG only. If you are commenting on the development agenda for wsgiref or other Python 3 library issues, please reply to Python-Dev only. That way, neither list will see off-topic discussions. Thanks! The Plan ======== I plan to update the proposal below per comments and feedback during this week, then update PEP 333 itself over the weekend or early next week, followed by a code review of Python 3's wsgiref, and implementation of needed changes (such as recoding os.environ to latin1-captured bytes in the CGI handler). To complete the changes, it is possible that I may need assistance from one or more developers who have more Python 3 experience. If after reading the proposed changes to the spec, you would like to volunteer to help with updating wsgiref to match, please let me know! The Proposal ============ Overview -------- 1. The primary purpose of this update is to provide a uniform porting pattern for moving Python 2 WSGI code to Python 3, meaning a pattern of changes that can be mechanically applied to as little code as practical, while still keeping the WSGI spec easy to programmatically validate (e.g. via ``wsgiref.validate``). The Python 3 specific changes are to use: * ``bytes`` for I/O streams in both directions * ``str`` for environ keys and values * ``bytes`` for arguments to start_response() and write() * text stream for wsgi.errors In other words, "strings in, bytes out" for headers, bytes for bodies. In general, only changes that don't break Python 2 WSGI implementations are allowed. The changes should also not break mod_wsgi on Python 3, but may make some Python 3 wsgi applications non-compliant, despite continuing to function on mod_wsgi. This is because mod_wsgi allows applications to output string headers and bodies, but I am ruling that option out because it forces every piece of middleware to have to be tested with arbitrary combinations of strings and bytes in order to test compliance. If you want your application to output strings rather than bytes, you can always use a decorator to do that. (And a sample one could be provided in wsgiref.) 2. The secondary purpose of the update is to address some long-standing open issues documented here: http://www.wsgi.org/wsgi/Amendments_1.0 As with the Python 3 changes, only changes that don't retroactively invalidate existing implementations are allowed. 3. There is no tertiary purpose. ;-) (By which I mean, all other kinds of changes are out-of-scope for this update.) 4. The section below labeled "A Note On String Types" is proposed for verbatim addition to the "Specification Overview" section in the PEP; the other sections below describe changes to be made inline at the appropriate part of the spec, and changes that were proposed but are rejected for inclusion in this amendment. A Note On String Types ---------------------- In general, HTTP deals with bytes, which means that this specification is mostly about handling bytes. However, the content of those bytes often has some kind of textual interpretation, and in Python, strings are the most convenient way to handle text. But in many Python versions and implementations, strings are Unicode, rather than bytes. This requires a careful balance between a usable API and correct translations between bytes and text in the context of HTTP... especially to support porting code between Python implementations with different ``str`` types. WSGI therefore defines two kinds of "string": * "Native" strings (which are always implemented using the type named ``str``) * "Bytestrings" (which are implemented using the ``bytes`` type in Python 3, and ``str`` elsewhere) So, even though HTTP is in some sense "really just bytes", there are many API conveniences to be had by using whatever Python's default ``str`` type is. Do not be confused however: even if Python's ``str`` is actually Unicode under the hood, the *content* of a native string is still restricted to bytes! See the section on `Unicode Issues`_ later in this document. In short: where you see the word "string" in this document, it refers to a "native" string, i.e., an object of type ``str``, whether it is internally implemented as bytes or unicode. Where you see references to "bytestring", this should be read as "an object of type ``bytes`` under Python 3, or type ``str`` under Python 2". Clarifications (To be made in-line) ----------------------------------- The following amendments are clarifications to parts of the existing spec that proved over the years to be ambiguous or insufficiently specified, as well as some attempts to correct practical errors. (Note: many of these issues cannot be completely fixed in WSGI 1 without breaking existing implementations, and so the text below has notations such as "(MUST in WSGI 2)" to indicate where any replacement spec for WSGI 1 should strengthen them.) * If an application returns a body iterator, a server (or middleware) MAY stop iterating over it and discard the remainder of the output, as long as it calls any close() method provided by the iterator. Applications returning a generator or other custom iterator SHOULD NOT assume that the entire iterator will be consumed. (This change makes it explicit that caching middleware or HEAD-processing servers can throw away the response body.) * start_response() SHOULD (MUST in WSGI 2) check for errors in the status or headers at the time it's called, so that an error can be raised as close to the problem as possible * If start_response() raises an error when called normally (i.e. without exc_info), it SHOULD be an error to call it a second time without passing exc_info * The SERVER_PORT variable is of type str, just like any other CGI environ variable. (According to the WSGI wiki, "some implementations" expect it to be an integer, even though there is nothing in the WSGI spec that allows a CGI variable to be anything but a str.) * A server SHOULD (MUST in WSGI 2) support the size hint argument to readline() on its wsgi.input stream. * A server SHOULD (MUST in WSGI 2) return an empty bytestring from read() on wsgi.input to indicate an end-of-file condition. (In WSGI 2, language should be clarified to allow the input stream length and CONTENT_LENGTH to be out of sync, for reasons explained in Graham's blog post.) * A server SHOULD (MUST in WSGI 2) allow read() to be called without an argument, and return the entire remaining contents of the stream * If an application provides a Content-Length header, the server SHOULD NOT (MUST NOT in WSGI 2) send more data to the client than was specified in that header, whether via write(), yielded body bytestrings, or via a wsgi.file_wrapper. (This rule applies to middleware as well.) * wsgi.errors is a text stream accepting "native strings" Rejected Amendments ------------------- * Manlio Perillo's suggestion to allow header specification to be delayed until the response iterator is producing non-empty output. This would've been a possible win for async WSGI, but could require substantial changes to existing servers. From chrism at plope.com Tue Sep 21 18:40:10 2010 From: chrism at plope.com (Chris McDonough) Date: Tue, 21 Sep 2010 12:40:10 -0400 Subject: [Web-SIG] PEP 444 (aka Web3) In-Reply-To: <1284591800.14651.36.camel@thinko> References: <1284591800.14651.36.camel@thinko> Message-ID: <1285087210.2130.90.camel@thinko> I have some pending changes to the PEP 444 spec (the working copy is at http://github.com/mcdonc/web3/blob/master/pep-0444.rst but please don't consider that canonical in any sense, it will change before an official republication of the proposal). The modifications fold in most of what we've talked about on the list, or at least acknowledge the issues; a change log is contained near the top. However, I'm currently trying work work through what to do about offering up quoted PATH_INFO and SCRIPT_NAME values (unquoted in the sense that, at least on platforms that support it, these would be the original values before being run through urllib.unquote). The current published proposal on Python.org indicates that these would go into "web3.path_info" and "web3.script_name" but nobody seems to much like that because it would make things like "path_info_pop" hard (the code would need to keep two data structures in sync, and would need to be pretty magical in the face of %2F markers). The pending, unpublished proposal turns SCRIPT_NAME and PATH_INFO into *quoted* values, and adds a ``web3.path_requoted`` flag for debugging purposes, which will be True if the SCRIPT_NAME and/or PATH_INFO needed to be recomposed and requoted (eg. on CGI platforms). But private conversations lead me to believe that not many folks will like this either, because it comandeers CGI names that are well-understood to be unquoted. The only sensible way to break the deadlock seems to be to not use any "CGI names" in the specification at all, so as not to break people's expectations. I know that when I change it to not use any CGI names, it will be received poorly, but I can't think of a better idea. - C On Wed, 2010-09-15 at 19:03 -0400, Chris McDonough wrote: > A PEP was submitted and accepted today for a WSGI successor protocol > named Web3: > > http://python.org/dev/peps/pep-0444/ > > I'd encourage other folks to suggest improvements to that spec or to > submit a competing spec, so we can get WSGI-on-Python3 settled soon. > > - C > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com > From chrism at plope.com Tue Sep 21 18:47:00 2010 From: chrism at plope.com (Chris McDonough) Date: Tue, 21 Sep 2010 12:47:00 -0400 Subject: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> Message-ID: <1285087620.2130.104.camel@thinko> On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote: > While the Web-SIG is trying to hash out PEP 444, I thought it would > be a good idea to have a backup plan that would allow the Python 3 > stdlib to move forward, without needing a major new spec to settle > out implementation questions. If a WSGI-1-compatible protocol seems more sensible to folks, I'm personally happy to defer discussion on PEP 444 or any other backwards-incompatible proposal. - C From ianb at colorstudy.com Tue Sep 21 18:55:15 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 21 Sep 2010 12:55:15 -0400 Subject: [Web-SIG] [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: <1285087620.2130.104.camel@thinko> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <1285087620.2130.104.camel@thinko> Message-ID: On Tue, Sep 21, 2010 at 12:47 PM, Chris McDonough wrote: > On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote: > > While the Web-SIG is trying to hash out PEP 444, I thought it would > > be a good idea to have a backup plan that would allow the Python 3 > > stdlib to move forward, without needing a major new spec to settle > > out implementation questions. > > If a WSGI-1-compatible protocol seems more sensible to folks, I'm > personally happy to defer discussion on PEP 444 or any other > backwards-incompatible proposal. > I think both make sense, making WSGI 1 sensible for Python 3 (as well as other small errata like the size hint) doesn't detract from PEP 444 at all, IMHO. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Tue Sep 21 18:57:44 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 21 Sep 2010 12:57:44 -0400 Subject: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> Message-ID: On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby wrote: > The Python 3 specific changes are to use: > > * ``bytes`` for I/O streams in both directions > * ``str`` for environ keys and values > * ``bytes`` for arguments to start_response() and write() > This is the only thing that seems odd to me -- it seems like the response should be symmetric with the request, and the request in this case uses str for headers (status being header-like), and bytes for the body. Otherwise this seems good to me, the only other major errata I can think of are all listed in the links you included. * text stream for wsgi.errors > > In other words, "strings in, bytes out" for headers, bytes for bodies. > > In general, only changes that don't break Python 2 WSGI implementations are > allowed. The changes should also not break mod_wsgi on Python 3, but may > make some Python 3 wsgi applications non-compliant, despite continuing to > function on mod_wsgi. > > This is because mod_wsgi allows applications to output string headers and > bodies, but I am ruling that option out because it forces every piece of > middleware to have to be tested with arbitrary combinations of strings and > bytes in order to test compliance. If you want your application to output > strings rather than bytes, you can always use a decorator to do that. (And > a sample one could be provided in wsgiref.) > I agree allowing both is not ideal. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdhardy at gmail.com Tue Sep 21 19:06:36 2010 From: jdhardy at gmail.com (Jeff Hardy) Date: Tue, 21 Sep 2010 11:06:36 -0600 Subject: [Web-SIG] [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: <1285087620.2130.104.camel@thinko> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <1285087620.2130.104.camel@thinko> Message-ID: On Tue, Sep 21, 2010 at 10:47 AM, Chris McDonough wrote: > On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote: > If a WSGI-1-compatible protocol seems more sensible to folks, I'm > personally happy to defer discussion on PEP 444 or any other > backwards-incompatible proposal. I think both make sense. PEP 444 can continue to be worked out (and it should be!); the changes here are pretty much uncontroversial. It also helps clarify how WSGI should work on IronPython, which has the same str/unicode issues as Python 3 - that fact it's basically how I've implemented it for IronPython is nice as well. - Jeff From jdhardy at gmail.com Tue Sep 21 19:07:17 2010 From: jdhardy at gmail.com (Jeff Hardy) Date: Tue, 21 Sep 2010 11:07:17 -0600 Subject: [Web-SIG] [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> Message-ID: On Tue, Sep 21, 2010 at 10:57 AM, Ian Bicking wrote: > On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby wrote: >> >> The Python 3 specific changes are to use: >> >> * ``bytes`` for I/O streams in both directions >> * ``str`` for environ keys and values >> * ``bytes`` for arguments to start_response() and write() > > This is the only thing that seems odd to me -- it seems like the response > should be symmetric with the request, and the request in this case uses str > for headers (status being header-like), and bytes for the body. FWIW I agree with Ian about the symmetry breaking being odd. For IronPython, most .NET webservers expect the status and headers as strings, which in .NET are unicode, but that would just be an implementation convenience for me. - Jeff From pje at telecommunity.com Tue Sep 21 19:10:10 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 21 Sep 2010 13:10:10 -0400 Subject: [Web-SIG] [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <1285087620.2130.104.camel@thinko> Message-ID: <20100921171005.0064F3A4079@sparrow.telecommunity.com> At 12:55 PM 9/21/2010 -0400, Ian Bicking wrote: >On Tue, Sep 21, 2010 at 12:47 PM, Chris McDonough ><chrism at plope.com> wrote: >On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote: > > While the Web-SIG is trying to hash out PEP 444, I thought it would > > be a good idea to have a backup plan that would allow the Python 3 > > stdlib to move forward, without needing a major new spec to settle > > out implementation questions. > >If a WSGI-1-compatible protocol seems more sensible to folks, I'm >personally happy to defer discussion on PEP 444 or any other >backwards-incompatible proposal. > > >I think both make sense, making WSGI 1 sensible for Python 3 (as >well as other small errata like the size hint) doesn't detract from >PEP 444 at all, IMHO. Yep. I agree. I do, however, want to get these amendments settled and make sure they get carried over to whatever spec is the successor to PEP 333. I've had a lot of trouble following exactly what was changed in 444, and I'm a tad worried that several new ambiguities may be being introduced. So, solidifying 333 a bit might be helpful if it gives a good baseline against which to diff 444 (or whatever). From pje at telecommunity.com Tue Sep 21 19:11:50 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 21 Sep 2010 13:11:50 -0400 Subject: [Web-SIG] [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: <20100921185254.10becbcc@pitrou.net> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100921185254.10becbcc@pitrou.net> Message-ID: <20100921171143.1A8663A4079@sparrow.telecommunity.com> At 06:52 PM 9/21/2010 +0200, Antoine Pitrou wrote: >On Tue, 21 Sep 2010 12:09:44 -0400 >"P.J. Eby" wrote: > > While the Web-SIG is trying to hash out PEP 444, I thought it would > > be a good idea to have a backup plan that would allow the Python 3 > > stdlib to move forward, without needing a major new spec to settle > > out implementation questions. > >If this allows the Web situation in Python 3 to be improved faster >and with less hassle then all the better. >There's something strange in your proposal: it mentions WSGI 2 at >several places while there's no guarantee about what WSGI 2 will be (is >there?). Sorry - "WSGI 2" should be read as shorthand for, "whatever new spec succeeds PEP 333", whether that's PEP 444 or something else. It just means that any new spec that doesn't have to be backward-compatible can (and should) more thoroughly address the issue in question. From pje at telecommunity.com Tue Sep 21 19:17:55 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 21 Sep 2010 13:17:55 -0400 Subject: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> Message-ID: <20100921171748.79E053A4079@sparrow.telecommunity.com> [trimming reply headers to just web-sig] At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote: >On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby ><pje at telecommunity.com> wrote: >The Python 3 specific changes are to use: > >* ``bytes`` for I/O streams in both directions >* ``str`` for environ keys and values >* ``bytes`` for arguments to start_response() and write() > > >This is the only thing that seems odd to me -- it seems like the >response should be symmetric with the request, and the request in >this case uses str for headers (status being header-like), and bytes >for the body. Are you suggesting a "``str`` for headers, ``bytes`` for bodies" approach instead? I suppose that could work; I was going for "str in, bytes out". My assumption, though, was that headers are relatively easy to address at a choke point from a framework's output. But I guess that iterator output is equally chokable. I'm open to discussion on this point, so long as every value produced or consumed by a WSGI application is of a specified single type(). >Otherwise this seems good to me, the only other major errata I can >think of are all listed in the links you included. Um, if by "links" you mean, "included textually in the proposal", then sure. If it's not in the proposal, it's not going in the PEP, even if it's on the WSGI Amendments page or Graham's blog. From ianb at colorstudy.com Tue Sep 21 23:31:40 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 21 Sep 2010 17:31:40 -0400 Subject: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 In-Reply-To: <20100921171748.79E053A4079@sparrow.telecommunity.com> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100921171748.79E053A4079@sparrow.telecommunity.com> Message-ID: On Tue, Sep 21, 2010 at 1:17 PM, P.J. Eby wrote: > [trimming reply headers to just web-sig] > > At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote: > > On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby <> >pje at telecommunity.com> wrote: >> The Python 3 specific changes are to use: >> >> * ``bytes`` for I/O streams in both directions >> * ``str`` for environ keys and values >> * ``bytes`` for arguments to start_response() and write() >> >> >> This is the only thing that seems odd to me -- it seems like the response >> should be symmetric with the request, and the request in this case uses str >> for headers (status being header-like), and bytes for the body. >> > > Are you suggesting a "``str`` for headers, ``bytes`` for bodies" approach > instead? > Yes. I suppose that could work; I was going for "str in, bytes out". My > assumption, though, was that headers are relatively easy to address at a > choke point from a framework's output. But I guess that iterator output is > equally chokable. > The request body would still be bytes in either model (at least, I assumed that). I'm open to discussion on this point, so long as every value produced or > consumed by a WSGI application is of a specified single type(). > > > > Otherwise this seems good to me, the only other major errata I can think >> of are all listed in the links you included. >> > > Um, if by "links" you mean, "included textually in the proposal", then > sure. If it's not in the proposal, it's not going in the PEP, even if it's > on the WSGI Amendments page or Graham's blog. > Well, at a minimum there is the size hint on wsgi.input. Things like CONTENT_LENGTH are probably more involved than is necessary for this revision. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc at gsites.de Wed Sep 22 14:46:57 2010 From: marc at gsites.de (Marcel Hellkamp) Date: Wed, 22 Sep 2010 14:46:57 +0200 Subject: [Web-SIG] Most WSGI servers close connections to early. Message-ID: <1285159617.4962.93.camel@nava> I just discovered a problem that affects most WSGI server implementations and most current web-browsers (tested with wsgiref, paste, firefox, chrome, wget and curl): If the server closes the connection while the client is still uploading data via POST or PUT, the browser displays an error message ('Connection closed') and does not display the response sent by the server. The error occurs if an application chooses to not process a form submissions before returning to the WSGI server. This is quite rare in real world scenarios, but hard to debug because the server logs the request as successfully sent to the client. To reproduce the problem, run the following script, visit http://localhost:8080/ and upload a big file:: from wsgiref.simple_server import make_server def application(environ, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) return [""" """] server = make_server('localhost', 8080, application) server.serve_forever() I would like to add a warning to the WSGI/web3 specification to address this issue: "An application should read all available data from `environ['wsgi.input']` on POST or PUT requests, even if it does not process that data. Otherwise, the client might fail to complete the request and not display the response." -- Mit freundlichen Gr??en Marcel Hellkamp From fumanchu at aminus.org Wed Sep 22 17:34:10 2010 From: fumanchu at aminus.org (Robert Brewer) Date: Wed, 22 Sep 2010 08:34:10 -0700 Subject: [Web-SIG] Most WSGI servers close connections to early. In-Reply-To: <1285159617.4962.93.camel@nava> References: <1285159617.4962.93.camel@nava> Message-ID: Marcel Hellkamp wrote: > I just discovered a problem that affects most WSGI server > implementations and most current web-browsers (tested with wsgiref, > paste, firefox, chrome, wget and curl): > > If the server closes the connection while the client is still uploading > data via POST or PUT, the browser displays an error message > ('Connection > closed') and does not display the response sent by the server. > > The error occurs if an application chooses to not process a form > submissions before returning to the WSGI server. This is quite rare in > real world scenarios, but hard to debug because the server logs the > request as successfully sent to the client. > > To reproduce the problem, run the following script, visit > http://localhost:8080/ and upload a big file:: > > > > from wsgiref.simple_server import make_server > > def application(environ, start_response): > start_response('200 OK', [('Content-Type', 'text/html')]) > return [""" > > """] > > server = make_server('localhost', 8080, application) > server.serve_forever() > > > > > I would like to add a warning to the WSGI/web3 specification to address > this issue: > > "An application should read all available data from > `environ['wsgi.input']` on POST or PUT requests, even if it does not > process that data. Otherwise, the client might fail to complete the > request and not display the response." Indeed. CherryPy has protected against this for some time. But it shouldn't be the burden of *applications* to do this; the WSGI "origin" server can do so quite easily. However, the caveat requires a caveat: servers must still be able to protect themselves from malicious clients. In practice, that means allowing servers to close the connection without reading the entire request body if a certain number of bytes is exceeded. Robert Brewer fumanchu at aminus.org From bchesneau at gmail.com Wed Sep 22 18:53:59 2010 From: bchesneau at gmail.com (Benoit Chesneau) Date: Wed, 22 Sep 2010 18:53:59 +0200 Subject: [Web-SIG] Most WSGI servers close connections to early. In-Reply-To: <1285159617.4962.93.camel@nava> References: <1285159617.4962.93.camel@nava> Message-ID: On Wed, Sep 22, 2010 at 2:46 PM, Marcel Hellkamp wrote: > I just discovered a problem that affects most WSGI server > implementations and most current web-browsers (tested with wsgiref, > paste, firefox, chrome, wget and curl): > > If the server closes the connection while the client is still uploading > data via POST or PUT, the browser displays an error message ('Connection > closed') and does not display the response sent by the server. > > The error occurs if an application chooses to not process a form > submissions before returning to the WSGI server. This is quite rare in > real world scenarios, but hard to debug because the server logs the > request as successfully sent to the client. > > To reproduce the problem, run the following script, visit > http://localhost:8080/ and upload a big file:: > > > > from wsgiref.simple_server import make_server > > def application(environ, start_response): > ? ?start_response('200 OK', [('Content-Type', 'text/html')]) > ? ?return [""" > ? ? > ? ?"""] > > server = make_server('localhost', 8080, application) > server.serve_forever() > > > > > I would like to add a warning to the WSGI/web3 specification to address > this issue: > > "An application should read all available data from > `environ['wsgi.input']` on POST or PUT requests, even if it does not > process that data. Otherwise, the client might fail to complete the > request and not display the response." > > -- > Mit freundlichen Gr??en > Marcel Hellkamp > Your application and client should be aware of Expect: 100-Continue header : http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html - beno?t (resent, because web-sig doesn't set well the default reply-to) From bchesneau at gmail.com Wed Sep 22 18:56:39 2010 From: bchesneau at gmail.com (Benoit Chesneau) Date: Wed, 22 Sep 2010 18:56:39 +0200 Subject: [Web-SIG] Most WSGI servers close connections to early. In-Reply-To: References: <1285159617.4962.93.camel@nava> Message-ID: On Wed, Sep 22, 2010 at 5:34 PM, Robert Brewer wrote: > However, the caveat requires a caveat: servers must still be able to protect themselves from malicious clients. In practice, that means allowing servers to close the connection without reading the entire request body if a certain number of bytes is exceeded. > I don't see how it could be the responsability of the server. Can you develop a little ? The server shouldn't interfere in the HTTP request imo. - benp?t From pje at telecommunity.com Wed Sep 22 19:00:13 2010 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 22 Sep 2010 13:00:13 -0400 Subject: [Web-SIG] Most WSGI servers close connections to early. In-Reply-To: References: <1285159617.4962.93.camel@nava> Message-ID: <20100922170029.96AE43A4079@sparrow.telecommunity.com> At 08:34 AM 9/22/2010 -0700, Robert Brewer wrote: >Marcel Hellkamp wrote: > > I would like to add a warning to the WSGI/web3 specification to address > > this issue: > > > > "An application should read all available data from > > `environ['wsgi.input']` on POST or PUT requests, even if it does not > > process that data. Otherwise, the client might fail to complete the > > request and not display the response." > >Indeed. CherryPy has protected against this for some time. But it >shouldn't be the burden of *applications* to do this; the WSGI >"origin" server can do so quite easily. > >However, the caveat requires a caveat: servers must still be able to >protect themselves from malicious clients. In practice, that means >allowing servers to close the connection without reading the entire >request body if a certain number of bytes is exceeded. We can certainly add warnings, although these are both more of a "best practices" advisory rather than a part of the spec per se. From fumanchu at aminus.org Wed Sep 22 21:25:05 2010 From: fumanchu at aminus.org (Robert Brewer) Date: Wed, 22 Sep 2010 12:25:05 -0700 Subject: [Web-SIG] Most WSGI servers close connections to early. In-Reply-To: References: <1285159617.4962.93.camel@nava> Message-ID: Benoit Chesneau wrote: > On Wed, Sep 22, 2010 at 5:34 PM, Robert Brewer > wrote: > > However, the caveat requires a caveat: servers must still be able to > protect themselves from malicious clients. In practice, that means > allowing servers to close the connection without reading the entire > request body if a certain number of bytes is exceeded. > > I don't see how it could be the responsability of the server. Can you > develop a little ? The server shouldn't interfere in the HTTP request > imo. Well since the "origin server" is the only component in the architecture that's *actually* having an HTTP conversation with the client, calling it "interference" seems a bit skewed. ;) RFC 2616 8.2.3 says: "If an origin server receives a request that does not include an Expect request-header field with the "100-continue" expectation, the request includes a request body, and the server responds with a final status code before reading the entire request body from the transport connection, then the server SHOULD NOT close the transport connection until it has read the entire request, or until the client closes the connection. Otherwise, the client might not reliably receive the response message. However, this requirement is not be construed as preventing a server from defending itself against denial-of-service attacks, or from badly broken client implementations." The way CherryPy implements this is to wrap the socket file before handing it to wsgi.input. That wrapper understands Content-Length (and another understands Transfer-Encoding), and won't allow any component that calls wsgi.input.read(n) to read past the Content-Length limit. [This also allows components to call read() without a size argument yet not timeout on the socket, as specified in recent proposals.] The server can be configured to have a maximum number of bytes it will allow to be read--if Content-Length exceeds that number, the server immediately responds with 413 Request Entity Too Large. It doesn't read the rest of the request entity, because it's too big and could cause a DoS. If clients can't read the response because they're still blocked sending a request that's too big, there's not really any way to get around that if the client didn't send an Expect request header. If the Content-Length is not too large, and the application returns (normally or exceptionally), and the wrapper has not recorded that the bytes read equals the Content-Length, then the server will consume the remaining bytes and throw them away before sending the response headers. I just noticed it doesn't do that if it's going to close the conn. Not sure why. Maybe it should. Robert Brewer fumanchu at aminus.org From pje at telecommunity.com Thu Sep 23 18:06:47 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 23 Sep 2010 12:06:47 -0400 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> Message-ID: <20100923160645.2B5E33A4079@sparrow.telecommunity.com> At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote: >On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby ><pje at telecommunity.com> wrote: >The Python 3 specific changes are to use: > >* ``bytes`` for I/O streams in both directions >* ``str`` for environ keys and values >* ``bytes`` for arguments to start_response() and write() > > >This is the only thing that seems odd to me -- it seems like the >response should be symmetric with the request, and the request in >this case uses str for headers (status being header-like), and bytes >for the body. So, I've given some thought to your suggestion, and, while it's true that most of the output headers are far less prone to ending up with unintended unicode content, there are at least two output headers that can include some sort of application content (and can therefore have random failures): Location and Set-Cookie. If these headers accidentally contain non-Latin1 characters, the error isn't detectable until the header reaches the origin server doing the transmission encoding, and it'll likely be a dynamic (and therefore hard-to-debug) error. However, if the output is always bytes (and this can be relatively-statically verified), then any error can't occur except *inside* the application, where the app's developer can find it more easily. So I guess the question boils down to: would we rather make sure that coding errors happen *inside* applications, or would we rather make porting WSGI apps trivial (or nearly so)? But I think that it's possible here to have one's cake and eat it too: if we require bytes for all outputs, but provide a pair of decorators in wsgiref.util like the following: def encode_body(codec='utf8'): """Allow a WSGI app to output its response body as strings w/specified encoding""" def decorate(app): def encode(response): try: for data in response: yield data.encode(codec) finally: if hasattr(response, 'close'): response.close() def decorated_app(environ, start_response): def start(status, response_headers, exc_info=None): _write = start_response(status, response_headers, exc_info) def write(data): return _write(data.encode(codec)) return write return encode(app(environ, start)) return decorated_app return decorate def encode_headers(codec='latin1'): """Allow a WSGI app to output its headers as strings, w/specified encoding""" def decorate(app): def decorated_app(environ, start_response): def start(status, response_headers, exc_info=None): status = status.encode(codec) response_headers = [ (k.encode(codec), v.encode(codec)) for k,v in response_headers ] return start_response(status, response_headers, exc_info) return app(environ, start) return decorated_app return decorate So, this seems like a win-win to me: relatively-static verification, errors stay in the app (or at least in the decorator), and the API is clean-and-easy. Indeed, it seems likely that at least some apps that don't read wsgi.input themselves could be ported *just* by adding the appropriate decorator(s). And, if your app is using unicode on 2.x, you can even use the same decorators there, for the benefit of 2to3. (Assuming I release an updated standalone wsgiref version with the decorators, of course.) So, unless somebody has some additional arguments on this one, I think I'm going to stick with bytes output. From ianb at colorstudy.com Thu Sep 23 18:17:32 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 23 Sep 2010 11:17:32 -0500 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: <20100923160645.2B5E33A4079@sparrow.telecommunity.com> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> Message-ID: On Thu, Sep 23, 2010 at 11:06 AM, P.J. Eby wrote: > At 12:57 PM 9/21/2010 -0400, Ian Bicking wrote: > >> On Tue, Sep 21, 2010 at 12:09 PM, P.J. Eby <> >pje at telecommunity.com> wrote: >> The Python 3 specific changes are to use: >> >> * ``bytes`` for I/O streams in both directions >> * ``str`` for environ keys and values >> * ``bytes`` for arguments to start_response() and write() >> >> >> This is the only thing that seems odd to me -- it seems like the response >> should be symmetric with the request, and the request in this case uses str >> for headers (status being header-like), and bytes for the body. >> > > So, I've given some thought to your suggestion, and, while it's true that > most of the output headers are far less prone to ending up with unintended > unicode content, there are at least two output headers that can include some > sort of application content (and can therefore have random failures): > Location and Set-Cookie. > > If these headers accidentally contain non-Latin1 characters, the error > isn't detectable until the header reaches the origin server doing the > transmission encoding, and it'll likely be a dynamic (and therefore > hard-to-debug) error. > I don't see any reason why Location shouldn't be ASCII. Any header could have any character put in it, of course, there's just no valid case where Location shouldn't be a URL, and URLs are ASCII. Cookie can contain weirdness, yes. I would expect any library that abstracts cookies to handle this (it's certainly doable)... otherwise, this seems like one among many ways a person can do the wrong thing. This can also be detected with the validator, which doesn't avoid runtime errors, but bytes allow runtime errors too -- they will just happen somewhere else (e.g., when a value is converted to bytes in an application or library). If servers print the invalid value on error (instead of just some generic error) I don't think it would be that hard to track down problems. This requires some explicit effort on the part of the server (most servers handle app_iter==None ungracefully, which is a similar problem). However, if the output is always bytes (and this can be > relatively-statically verified), then any error can't occur except *inside* > the application, where the app's developer can find it more easily. > > So I guess the question boils down to: would we rather make sure that > coding errors happen *inside* applications, or would we rather make porting > WSGI apps trivial (or nearly so)? > > But I think that it's possible here to have one's cake and eat it too: if > we require bytes for all outputs, but provide a pair of decorators in > wsgiref.util like the following: > > def encode_body(codec='utf8'): > """Allow a WSGI app to output its response body as strings > w/specified encoding""" > def decorate(app): > def encode(response): > try: > for data in response: > yield data.encode(codec) > finally: > if hasattr(response, 'close'): > response.close() > def decorated_app(environ, start_response): > def start(status, response_headers, exc_info=None): > _write = start_response(status, response_headers, > exc_info) > def write(data): > return _write(data.encode(codec)) > return write > return encode(app(environ, start)) > return decorated_app > return decorate > > def encode_headers(codec='latin1'): > """Allow a WSGI app to output its headers as strings, w/specified > encoding""" > def decorate(app): > def decorated_app(environ, start_response): > def start(status, response_headers, exc_info=None): > status = status.encode(codec) > response_headers = [ > (k.encode(codec), v.encode(codec)) for k,v in > response_headers > ] > return start_response(status, response_headers, > exc_info) > return app(environ, start) > return decorated_app > return decorate > > So, this seems like a win-win to me: relatively-static verification, errors > stay in the app (or at least in the decorator), and the API is > clean-and-easy. Indeed, it seems likely that at least some apps that don't > read wsgi.input themselves could be ported *just* by adding the appropriate > decorator(s). And, if your app is using unicode on 2.x, you can even use > the same decorators there, for the benefit of 2to3. (Assuming I release an > updated standalone wsgiref version with the decorators, of course.) > This doesn't seem that different than the validator, except that the decorator uses a different interface internally and externally (the internal interface using text, the external one bytes). -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Sep 23 18:32:32 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 23 Sep 2010 12:32:32 -0400 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications Message-ID: <20100923163229.915DC3A4079@sparrow.telecommunity.com> Just a reminder: I'm planning to actually update PEP 333 over the weekend and start working on wsgiref updates, so if you have any last-minute comments on the proposal, now's the time to post them, however unpolished they may be! From jdhardy at gmail.com Thu Sep 23 19:11:16 2010 From: jdhardy at gmail.com (Jeff Hardy) Date: Thu, 23 Sep 2010 11:11:16 -0600 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: <20100923160645.2B5E33A4079@sparrow.telecommunity.com> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> Message-ID: On Thu, Sep 23, 2010 at 10:06 AM, P.J. Eby wrote: > So, unless somebody has some additional arguments on this one, I think I'm > going to stick with bytes output. I don't have a strong opinion on whether it should be bytes or strings -- I'll leave that discussion for people who know more about the details than I do. I do think input and output should be symmetric, though. If response headers are going to be bytes, then the request headers should be as well, or vice versa. The same arguments apply to both, after all. - Jeff From pje at telecommunity.com Thu Sep 23 19:52:20 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 23 Sep 2010 13:52:20 -0400 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> Message-ID: <20100923175218.AD71B3A4079@sparrow.telecommunity.com> At 11:11 AM 9/23/2010 -0600, Jeff Hardy wrote: >On Thu, Sep 23, 2010 at 10:06 AM, P.J. Eby wrote: > > So, unless somebody has some additional arguments on this one, I think I'm > > going to stick with bytes output. > >I don't have a strong opinion on whether it should be bytes or strings >-- I'll leave that discussion for people who know more about the >details than I do. > >I do think input and output should be symmetric, though. If response >headers are going to be bytes, then the request headers should be as >well, or vice versa. The same arguments apply to both, after all. Actually, they don't. There are more apps than servers, so more code to get right, by more people. Servers also don't generally *create* any of the bytes or text involved, they're just ferrying it from one place to the next. So the API conditions are not symmetrical. From jdhardy at gmail.com Thu Sep 23 20:33:26 2010 From: jdhardy at gmail.com (Jeff Hardy) Date: Thu, 23 Sep 2010 12:33:26 -0600 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: <20100923175218.AD71B3A4079@sparrow.telecommunity.com> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> <20100923175218.AD71B3A4079@sparrow.telecommunity.com> Message-ID: On Thu, Sep 23, 2010 at 11:52 AM, P.J. Eby wrote: >> I do think input and output should be symmetric, though. If response >> headers are going to be bytes, then the request headers should be as >> well, or vice versa. The same arguments apply to both, after all. > > Actually, they don't. ?There are more apps than servers, so more code to get > right, by more people. ?Servers also don't generally *create* any of the > bytes or text involved, they're just ferrying it from one place to the next. > ?So the API conditions are not symmetrical. How so? If I'm writing an application, I would need to deal with strings in environ but remember to send bytes to start_response. Conversions can happen on the application side either way. I just don't see how having strings in->bytes out is more error-prone than bytes-in->bytes-out or strings in->strings out, from an application or a server perspective. Also, IronPython/.NET falls outside of "generally". Every .NET server I've seen deals with headers exclusively as strings (like Python 3, .NET strings are Unicode), so NWSGI would be encoding the response headers to strings, but passing the request headers through unchanged. - Jeff From ianb at colorstudy.com Thu Sep 23 20:46:27 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 23 Sep 2010 13:46:27 -0500 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> Message-ID: On Thu, Sep 23, 2010 at 11:17 AM, Ian Bicking wrote: > If these headers accidentally contain non-Latin1 characters, the error >> isn't detectable until the header reaches the origin server doing the >> transmission encoding, and it'll likely be a dynamic (and therefore >> hard-to-debug) error. >> > > I don't see any reason why Location shouldn't be ASCII. Any header could > have any character put in it, of course, there's just no valid case where > Location shouldn't be a URL, and URLs are ASCII. Cookie can contain > weirdness, yes. I would expect any library that abstracts cookies to handle > this (it's certainly doable)... otherwise, this seems like one among many > ways a person can do the wrong thing. > Minor correction, Set-Cookie, not Cookie. Good practice is to stick to ASCII even there (all other techniques have a high risk of mojibake), so we're really considering legacy integration. Note that a similar problem is using [('Content-length', len(body))] -- which also results in a sometimes confusing error message well away from the application itself. Generally without validation any data errors occur away from the application. A type error is not any different than an encoding error. Using bytes removes a possible encoding error, but IMHO has a greater chance of type errors (as bytes are not as natural as text in most cases). Validation can check all aspects, including encoding (simply by doing a test encoding). Consider this hello world: def app(environ, start_response): body = b'Hello World' start_response(b'200 OK', [(b'Content-Type', str(len(body)).encode('ascii'))]) return [body] str(len(body)).encode('ascii')?!? Yuck. Also no 2to3 fixup can help there. bytes(len(body)) does something weird. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Thu Sep 23 20:51:46 2010 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 23 Sep 2010 14:51:46 -0400 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications In-Reply-To: <20100923163229.915DC3A4079@sparrow.telecommunity.com> References: <20100923163229.915DC3A4079@sparrow.telecommunity.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 P.J. Eby wrote: > Just a reminder: I'm planning to actually update PEP 333 over the > weekend and start working on wsgiref updates, so if you have any > last-minute comments on the proposal, now's the time to post them, > however unpolished they may be! I'm fine with the substance of the changes you proposed, but puzzled about the process: in what case does it work to updated an already-approved-and-implemented PEP would be updated, instead of replacing it with a newer PEP (e.g., PEPs 241 -> 314 -> 345). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkybob0ACgkQ+gerLs4ltQ64eACfedK0bHE9/zTpwx5acmXlJi+0 sKAAoL8Q3V2tPnmC4A9BBwb088odHSqf =f0ph -----END PGP SIGNATURE----- From pje at telecommunity.com Thu Sep 23 21:44:23 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 23 Sep 2010 15:44:23 -0400 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications In-Reply-To: References: <20100923163229.915DC3A4079@sparrow.telecommunity.com> Message-ID: <20100923194425.2E3C33A4079@sparrow.telecommunity.com> At 02:51 PM 9/23/2010 -0400, Tres Seaver wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >P.J. Eby wrote: > > > Just a reminder: I'm planning to actually update PEP 333 over the > > weekend and start working on wsgiref updates, so if you have any > > last-minute comments on the proposal, now's the time to post them, > > however unpolished they may be! > >I'm fine with the substance of the changes you proposed, but puzzled >about the process: in what case does it work to updated an >already-approved-and-implemented PEP would be updated, instead of >replacing it with a newer PEP (e.g., PEPs 241 -> 314 -> 345). In the case where one is clarifying ambiguities/questions in the original spec. ;-) (None of the changes invalidate existing implementations, but simply provide additional guidance/best practice suggestions. Even the Python 3 changes won't invalidate at least mod_wsgi's Python 3 implementation.) From pje at telecommunity.com Thu Sep 23 22:23:02 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 23 Sep 2010 16:23:02 -0400 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> Message-ID: <20100923202259.6BE883A4079@sparrow.telecommunity.com> At 11:17 AM 9/23/2010 -0500, Ian Bicking wrote: >I don't see any reason why Location shouldn't be ASCII.? Any header >could have any character put in it, of course, there's just no valid >case where Location shouldn't be a URL, and URLs are ASCII.? Cookie >can contain weirdness, yes.? I would expect any library that >abstracts cookies to handle this (it's certainly doable)... >otherwise, this seems like one among many ways a person can do the wrong thing. > >This can also be detected with the validator, which doesn't avoid >runtime errors, but bytes allow runtime errors too -- they will just >happen somewhere else (e.g., when a value is converted to bytes in >an application or library). Right: somewhere much closer to the *actual* error, where the developer can know the problem is, "I have garbage data or have not selected an appropriate codec", rather than "this WSGI stuff is giving me errors some place". >If servers print the invalid value on error (instead of just some >generic error) I don't think it would be that hard to track down >problems.? This requires some explicit effort on the part of the >server (most servers handle app_iter==None ungracefully, which is a >similar problem). The difference is that if a server rejects non-bytes, you'll know *right away* that your app isn't compliant, instead of having to wait until some non-latin1 data shows up. AFAICT, there are only two advantages to using text for output headers: 1. Text is easier to work with, and 2. It's symmetric with using text for input headers. Both of which can still be had, by using the @encode_headers decorator. I'm a little bit on the fence on this one, because 1) it does seem a little pointless (if harmless) to shuffle headers around in bytes form, and 2) Location and Set-Cookie are very likely the only headers where any kind of damage could ever happen. But, since it *can* happen, and because it is also really easy to fix the API issue with a decorator, I'm still leaning in favor of "output is bytes" over "headers are text, bodies are bytes", unless somebody can come up with either some actually-bad consequence of using bytes, or some extra-good consequence of using text (that isn't addressed by just using the decorator). (Note, by the way, that WSGI design has always leaned in the direction of "any convenience that can be handled by a library should be", if it keeps the spec simpler and more verifiable. So, this seems like a good use of that principle.) From ianb at colorstudy.com Thu Sep 23 22:48:51 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 23 Sep 2010 15:48:51 -0500 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: <20100923202259.6BE883A4079@sparrow.telecommunity.com> References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> <20100923202259.6BE883A4079@sparrow.telecommunity.com> Message-ID: On Thu, Sep 23, 2010 at 3:23 PM, P.J. Eby wrote: > At 11:17 AM 9/23/2010 -0500, Ian Bicking wrote: > >> I don't see any reason why Location shouldn't be ASCII.? Any header could >> have any character put in it, of course, there's just no valid case where >> Location shouldn't be a URL, and URLs are ASCII.? Cookie can contain >> weirdness, yes.? I would expect any library that abstracts cookies to >> handle this (it's certainly doable)... otherwise, this seems like one among >> many ways a person can do the wrong thing. >> >> >> This can also be detected with the validator, which doesn't avoid runtime >> errors, but bytes allow runtime errors too -- they will just happen >> somewhere else (e.g., when a value is converted to bytes in an application >> or library). >> > > Right: somewhere much closer to the *actual* error, where the developer can > know the problem is, "I have garbage data or have not selected an > appropriate codec", rather than "this WSGI stuff is giving me errors some > place". > > > If servers print the invalid value on error (instead of just some generic >> error) I don't think it would be that hard to track down problems.? This >> requires some explicit effort on the part of the server (most servers handle >> app_iter==None ungracefully, which is a similar problem). >> > > The difference is that if a server rejects non-bytes, you'll know *right > away* that your app isn't compliant, instead of having to wait until some > non-latin1 data shows up. > No, you've only pushed the encoding elsewhere, and the error elsewhere. Somewhere someone is probably doing text_value.encode('ascii') (or latin1 or whatever), and if they haven't tested with non-ascii or non-latin1 input then they might encounter an error. It will be in their code, not in the WSGI server, but the error will be present in all the same situations. I don't think it will be much harder to fix if it occurs in the WSGI server, so long as the error message is at least a little bit helpful. > AFAICT, there are only two advantages to using text for output headers: > > 1. Text is easier to work with, and > 2. It's symmetric with using text for input headers. > > Both of which can still be had, by using the @encode_headers decorator. > Sure, anything can be fixed in a library. But @encode_headers is just another library. And it also can't magically appear with 2to3, instead it requires yet more patches and weird workarounds. Also, what you are proposing hasn't been considered for PEP 444, though other combinations of bytes and text have (all symmetric). So it doesn't seem to have any clean way to translate into the next version of the specification. > I'm a little bit on the fence on this one, because 1) it does seem a little > pointless (if harmless) to shuffle headers around in bytes form, and 2) > Location and Set-Cookie are very likely the only headers where any kind of > damage could ever happen. > Set-Cookie only, Location is clean. The entirety of hand-wringing over bytes is all just about freakin' cookies. Or the theory of cookies, I don't know that anyone has yet encountered any concrete and vexing problems. But, since it *can* happen, and because it is also really easy to fix the > API issue with a decorator, I'm still leaning in favor of "output is bytes" > over "headers are text, bodies are bytes", unless somebody can come up with > either some actually-bad consequence of using bytes, or some extra-good > consequence of using text (that isn't addressed by just using the > decorator). > > (Note, by the way, that WSGI design has always leaned in the direction of > "any convenience that can be handled by a library should be", if it keeps > the spec simpler and more verifiable. So, this seems like a good use of > that principle.) > It only fixes the one case of non-Latin1 characters, there are still many other values you can put into a header (a newline or control character for instance), and innumerable header-specific issues. It seems to be adding complexity for one of the least problematic cases. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From renesd at gmail.com Fri Sep 24 13:22:44 2010 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Fri, 24 Sep 2010 13:22:44 +0200 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications In-Reply-To: <20100923163229.915DC3A4079@sparrow.telecommunity.com> References: <20100923163229.915DC3A4079@sparrow.telecommunity.com> Message-ID: Hi, Have all the changes been tested with real world implementations? cheers, On Thu, Sep 23, 2010 at 6:32 PM, P.J. Eby wrote: > Just a reminder: I'm planning to actually update PEP 333 over the weekend > and start working on wsgiref updates, so if you have any last-minute > comments on the proposal, now's the time to post them, however unpolished > they may be! > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/renesd%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From manlio_perillo at libero.it Fri Sep 24 13:45:37 2010 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 24 Sep 2010 13:45:37 +0200 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications In-Reply-To: <20100923163229.915DC3A4079@sparrow.telecommunity.com> References: <20100923163229.915DC3A4079@sparrow.telecommunity.com> Message-ID: <4C9C8F61.3010200@libero.it> Il 23/09/2010 18:32, P.J. Eby ha scritto: > Just a reminder: I'm planning to actually update PEP 333 over the > weekend and start working on wsgiref updates, so if you have any > last-minute comments on the proposal, now's the time to post them, > however unpolished they may be! > Where can I find a draft of the update? Thanks Manlio From pje at telecommunity.com Fri Sep 24 17:06:02 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 24 Sep 2010 11:06:02 -0400 Subject: [Web-SIG] Output header encodings? (was Re: Backup plan: WSGI 1 Addenda and wsgiref update for Py3) In-Reply-To: References: <20100921160937.4C2D23A4079@sparrow.telecommunity.com> <20100923160645.2B5E33A4079@sparrow.telecommunity.com> <20100923202259.6BE883A4079@sparrow.telecommunity.com> Message-ID: <20100924150601.057343A4079@sparrow.telecommunity.com> At 03:48 PM 9/23/2010 -0500, Ian Bicking wrote: >It only fixes the one case of non-Latin1 characters, there are still >many other values you can put into a header (a newline or control >character for instance), and innumerable header-specific >issues.? It seems to be adding complexity for one of the least >problematic cases. Ok, you found one that convinces me. ;-) "Headers are text, bodies are bytes" shall be the rule. I'll rewrite the "note about string types" and change the way I'm updating the spec accordingly. From pje at telecommunity.com Fri Sep 24 17:07:51 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 24 Sep 2010 11:07:51 -0400 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications In-Reply-To: <4C9C8F61.3010200@libero.it> References: <20100923163229.915DC3A4079@sparrow.telecommunity.com> <4C9C8F61.3010200@libero.it> Message-ID: <20100924150750.777A63A4079@sparrow.telecommunity.com> At 01:45 PM 9/24/2010 +0200, Manlio Perillo wrote: >Il 23/09/2010 18:32, P.J. Eby ha scritto: > > Just a reminder: I'm planning to actually update PEP 333 over the > > weekend and start working on wsgiref updates, so if you have any > > last-minute comments on the proposal, now's the time to post them, > > however unpolished they may be! > > > >Where can I find a draft of the update? See http://mail.python.org/pipermail/web-sig/2010-September/004655.html for the notes; I have not updated the PEP yet, but am about to. One change since that post: Ian has convinced me to make headers text and bodies bytes, where before I proposed to only have input headers be text, and output headers be bytes. From jdhardy at gmail.com Fri Sep 24 17:52:15 2010 From: jdhardy at gmail.com (Jeff Hardy) Date: Fri, 24 Sep 2010 09:52:15 -0600 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications In-Reply-To: <20100923163229.915DC3A4079@sparrow.telecommunity.com> References: <20100923163229.915DC3A4079@sparrow.telecommunity.com> Message-ID: On Thu, Sep 23, 2010 at 10:32 AM, P.J. Eby wrote: > Just a reminder: I'm planning to actually update PEP 333 over the weekend > and start working on wsgiref updates, so if you have any last-minute > comments on the proposal, now's the time to post them, however unpolished > they may be! Will you bump the version number to 1.1, or will it stay at 1.0? Does anyone actually check the version number? - Jeff From pje at telecommunity.com Fri Sep 24 20:32:47 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 24 Sep 2010 14:32:47 -0400 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications In-Reply-To: References: <20100923163229.915DC3A4079@sparrow.telecommunity.com> Message-ID: <20100924183251.9CF7E3A4079@sparrow.telecommunity.com> At 09:52 AM 9/24/2010 -0600, Jeff Hardy wrote: >On Thu, Sep 23, 2010 at 10:32 AM, P.J. Eby wrote: > > Just a reminder: I'm planning to actually update PEP 333 over the weekend > > and start working on wsgiref updates, so if you have any last-minute > > comments on the proposal, now's the time to post them, however unpolished > > they may be! > >Will you bump the version number to 1.1, or will it stay at 1.0? Does >anyone actually check the version number? Since these are just clarifications to the existing spec, and no previously-compliant implementations are invalidated by the changes, there will be no changes to the version number. >- Jeff From pje at telecommunity.com Fri Sep 24 20:33:25 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 24 Sep 2010 14:33:25 -0400 Subject: [Web-SIG] Last call for WSGI 1.0 errata/clarifications Message-ID: <20100924183324.55C413A4079@sparrow.telecommunity.com> At 01:22 PM 9/24/2010 +0200, Ren? Dudfield wrote: >Hi, > >Have all the changes been tested with real world implementations? mod_wsgi under Python 3 is compliant with the changes, and I believe it has all the general addenda/clarifications implemented under Python 2 as well (and for some years now, in fact). From pje at telecommunity.com Sat Sep 25 21:56:59 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 25 Sep 2010 15:56:59 -0400 Subject: [Web-SIG] WSGI is now Python 3-friendly Message-ID: <20100925195711.60FB13A4079@sparrow.telecommunity.com> I have only done the Python 3-specific changes at this point; the diff is here if anybody wants to review, nitpick or otherwise comment: http://svn.python.org/view/peps/trunk/pep-0333.txt?r1=85014&r2=85013&pathrev=85014 For that matter, if anybody wants to take a crack at updating Python 3's wsgiref based on the above, feel free. ;-) I'll be happy to answer any questions I can that come up in the process. (Please note: I went with Ian Bicking's "headers are strings, bodies are bytes" proposal, rather than my original "bodies and outputs are bytes" one, as there were not only some good arguments in its favor, but because it also resulted in fewer changes to the PEP, especially in the code samples.) I will continue to work on adding the other addenda/errata mentioned here: http://mail.python.org/pipermail/web-sig/2010-September/004655.html But because these are "shoulds" rather than musts, and apply to both Python 2 and 3, they are not as high priority for immediate implementation in wsgiref and do not necessarily need to hold up the 3.2 release. (Nonetheless, if anybody is willing to implement them in the Python 3 version, I will happily review the changes for backport into the Python 2 standalone version of wsgiref, and issue an updated release to include them.) Thanks! From guido at python.org Sat Sep 25 23:07:05 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Sep 2010 14:07:05 -0700 Subject: [Web-SIG] [Python-Dev] WSGI is now Python 3-friendly In-Reply-To: <20100925195711.60FB13A4079@sparrow.telecommunity.com> References: <20100925195711.60FB13A4079@sparrow.telecommunity.com> Message-ID: This is a very laudable initiative and I approve of the changes -- but I really think it ought to be a separate PEP rather than pretending it is just a set of textual corrections on the existing PEP 333. --Guido On Sat, Sep 25, 2010 at 12:56 PM, P.J. Eby