From brett at python.org Sat Mar 1 05:06:27 2008 From: brett at python.org (Brett Cannon) Date: Fri, 29 Feb 2008 20:06:27 -0800 Subject: [Web-SIG] [stdlib-sig] Choosing one of two options for url* in the stdlib reorg In-Reply-To: <47C89987.4000401@egenix.com> References: <47C7DE17.4050606@egenix.com> <47C86270.9070605@egenix.com> <47C89987.4000401@egenix.com> Message-ID: On Fri, Feb 29, 2008 at 3:47 PM, M.-A. Lemburg wrote: > Brett Cannon wrote: > > On Fri, Feb 29, 2008 at 11:52 AM, M.-A. Lemburg wrote: > >> On 2008-02-29 20:20, Brett Cannon wrote: > >> >> So, I'd be +1 on the second approach, provided that those > >> >> two classes make the transition into url.request as > >> >> well. Otherwise, I'm +1 on the first approach and -1 > >> >> on the second. > >> >> > >> > > >> > Just to make sure I got this straight, as long as the two classes > >> > without the urllib._urlopener support from urlopen() are moved forward > >> > you are happy with this? > >> > >> Yes. > >> > >> > >> > What about making urllib an external library people can download and > >> > install using distutils? > >> > >> Well, removing urllib from the std lib doesn't mean that the module > >> is gone, but that's true for most modules in the std lib, right ? > >> > >> The key argument for doing the std lib reorg - as I understand it - > >> was to be able to have a 2to3.py take care of changing the imports > >> in a script to make it work on 3.x. > >> > > > > Yes, as well as to clean out the cruft in the stdlib. > > Depends on what you call cruft :-) > > Just calling a module urllib2 doesn't make it better or less > crufty. urllib and urllib2 use a different approach to more or > less the same thing. > > The complexity of using both is roughly the same. Both have > their ups and downs. Both come with a urlopen() that does the > trick most of the time. Neither implements the full stack of what > you'd need for a web crawler. > > It's apples and oranges more than anything else. Some like > apples, others oranges. > But see, I don't want to manage both an apple tree and orange tree. At that point people will want cranberries, pineapples. And god forbid we get into supporting various nuts! =) Seriously, I just don't want to support two different approaches to the same problem. > > >> If you're now suggesting to move modules out of the way with no > >> option to automatically port them to 3.x, then you're going far > >> beyond that original intent. > >> > > > > No, the modules are already ported to 3.0. This has been done all > > along in order for the test suite in Python to continue to work. > > Making them external just means that if you want to use it in Python > > 3.0 you need to download the source and run the included setup.py to > > install the module. Then 2to3 doesn't have to change anything as the > > original urllib imports can just import the copy downloaded and > > installed into site-packages. The only thing people would have to > > contend with is a Py3KWarning stating that the module has been removed > > from the stdlib but available for download from PyPI. > > Again, that applies to most modules in the stdlib. > > It's not really an argument for dropping the more used module in > favor of a different module without any real benefit. > Benefit to old users, no. Benefit to the developers, definitely. Benefit to new users, yes as there will be less to deal with. > > > >> My main argument for keeping urllib logic in place is current > >> use of that module. > > > > Right, which is why I asked if people thought the current usage was > > small enough in terms of using things other than urlopen() that it > > would be okay to pull it out. And giving people the option to download > > and install a 3.0-compliant version would fill in the gap for anyone > > else who still wants the code. > > > >> If you look at Google code search (which only > >> scans OSS software and not even all of it), you get: > >> > >> import urllib -urllib.py -test_urllib -urllib2 > >> 28,800 matches > >> > >> + urllib.URLopener -FancyURLopener > >> 300 matches > >> > >> + urllib.FancyURLopener > >> 700 matches > >> > >> compared to: > >> > >> import urllib2 -urllib2.py -test_urllib2 > >> 10,700 matches > >> > >> + urllib2.Opener > >> 300 matches > >> > >> If you compare those results to searches for other modules > >> in the std lib, you'll find that those figures are high up > >> on the scale. > > > > Right, but how many of those urlopen() calls are just urlopen("some > > url"), which has a practical drop-in replacement from urllib2. > > That's the wrong question. urlopen() is provided by both > and also the most used API in those modules. > > You have to ask yourself whether > it's ok to ask the maintainers of those ~1000 code modules > using urllib for subclassing from the two main classes > URLopener and FancyURLopener to download an external dependency > from PyPI or ship the module with their code. Well, I obviously think it is. Question is what do other people think. Can other people weigh in on this? Separate urllib or just toss in the two classes and continue to support the two separate approaches? -Brett From brett at python.org Sat Mar 1 21:13:56 2008 From: brett at python.org (Brett Cannon) Date: Sat, 1 Mar 2008 12:13:56 -0800 Subject: [Web-SIG] [stdlib-sig] Choosing one of two options for url* in the stdlib reorg In-Reply-To: <47C94D6A.9030208@egenix.com> References: <47C7DE17.4050606@egenix.com> <47C86270.9070605@egenix.com> <47C89987.4000401@egenix.com> <47C94D6A.9030208@egenix.com> Message-ID: On Sat, Mar 1, 2008 at 4:34 AM, M.-A. Lemburg wrote: > On 2008-03-01 05:06, Brett Cannon wrote: > > Seriously, I just don't want to support two different approaches to > > the same problem. > > Then what makes you believe that the urllib2 approach is the > better one ? > > Why not move urllib2 to PyPI and keep urllib ? > Well, I have personal experience where urllib2 was much easier to use for some custom fetching than urllib. But I get your point. If it comes down to preference then your argument is to choose the one the is used more widely. > > >> It's not really an argument for dropping the more used module in > >> favor of a different module without any real benefit. > > > > Benefit to old users, no. Benefit to the developers, definitely. > > Benefit to new users, yes as there will be less to deal with. > > Same question as above. > > > >> You have to ask yourself whether > >> it's ok to ask the maintainers of those ~1000 code modules > >> using urllib for subclassing from the two main classes > >> URLopener and FancyURLopener to download an external dependency > >> from PyPI or ship the module with their code. > > > > Well, I obviously think it is. > > Please explain. I have yet to see a single comment explaining why > urllib2 would be the better choice - if there's really a need to > decide (which I don't think there really is). > > If you can put up some sound arguments for why urllib2 is better > than urllib, we could move the discussion forward. If not, then > I don't really see any benefit in having the discussion at all. Well, look at the docs for urllib. There is a list of restrictions (e.g., does not support the use of proxies which require authentication). From what I can tell, those items on the list that are an actual restriction do not carry over to urllib2. Another thing, how do you add a custom line to the header for the request in urllib (e.g., Referer)? The docs for URLOpener don't seem to provide a way. urllib2, on the other hand, has a very specific way to add headers. But as I said in my last email, I am happy to include URLOpener if some other people are willing to back the idea up. -Brett From brett at python.org Sun Mar 2 21:11:52 2008 From: brett at python.org (Brett Cannon) Date: Sun, 2 Mar 2008 12:11:52 -0800 Subject: [Web-SIG] [stdlib-sig] Choosing one of two options for url* in the stdlib reorg In-Reply-To: <47CAB031.5080002@egenix.com> References: <47C7DE17.4050606@egenix.com> <47C86270.9070605@egenix.com> <47C89987.4000401@egenix.com> <47C94D6A.9030208@egenix.com> <47CAB031.5080002@egenix.com> Message-ID: On Sun, Mar 2, 2008 at 5:48 AM, M.-A. Lemburg wrote: > On 2008-03-01 21:13, Brett Cannon wrote: > > On Sat, Mar 1, 2008 at 4:34 AM, M.-A. Lemburg wrote: > >> On 2008-03-01 05:06, Brett Cannon wrote: > >> > Seriously, I just don't want to support two different approaches to > >> > the same problem. > >> > >> Then what makes you believe that the urllib2 approach is the > >> better one ? > >> > >> Why not move urllib2 to PyPI and keep urllib ? > >> > > > > Well, I have personal experience where urllib2 was much easier to use > > for some custom fetching than urllib. > > > > But I get your point. If it comes down to preference then your > > argument is to choose the one the is used more widely. > > Right. > > I also believe that having a choice is more useful than trying > to invent the One Right Way. This may exist for simple problems, > but as soon as things get more complicated limiting yourself to > just one path on the search tree is bound to cause problems. > > > >> >> It's not really an argument for dropping the more used module in > >> >> favor of a different module without any real benefit. > >> > > >> > Benefit to old users, no. Benefit to the developers, definitely. > >> > Benefit to new users, yes as there will be less to deal with. > >> > >> Same question as above. > >> > >> > >> >> You have to ask yourself whether > >> >> it's ok to ask the maintainers of those ~1000 code modules > >> >> using urllib for subclassing from the two main classes > >> >> URLopener and FancyURLopener to download an external dependency > >> >> from PyPI or ship the module with their code. > >> > > >> > Well, I obviously think it is. > >> > >> Please explain. I have yet to see a single comment explaining why > >> urllib2 would be the better choice - if there's really a need to > >> decide (which I don't think there really is). > >> > >> If you can put up some sound arguments for why urllib2 is better > >> than urllib, we could move the discussion forward. If not, then > >> I don't really see any benefit in having the discussion at all. > > > > Well, look at the docs for urllib. There is a list of restrictions > > (e.g., does not support the use of proxies which require > > authentication). From what I can tell, those items on the list that > > are an actual restriction do not carry over to urllib2. > > I'm not sure I follow you: urllib *does* support proxies that > require authentication (see the .open_http() method). > According to http://docs.python.org/dev/library/urllib.html#module-urllib: "This module does not support the use of proxies which require authentication". > > > Another thing, > > how do you add a custom line to the header for the request in urllib > > (e.g., Referer)? The docs for URLOpener don't seem to provide a way. > > urllib2, on the other hand, has a very specific way to add headers. > > That's easy: > > class URLReader(urllib.URLopener): > > # Crawler name > agentname = 'mxHTMLTools-Crawler' > > def __init__(*args): > > """ Add a user-agent header to the HTTP requests. > """ > self = args[0] > apply(urllib.URLopener.__init__, args) > # Override the default settings for self.addheaders: > assert len(self.addheaders) == 1 > self.addheaders = [ > ('user-agent', '%s/%s' % (self.agentname, HTMLTools.__version__)), > ] > ... > But none of that is documented. So if the classes do stay then they really need to have their documentation flushed out (along with making sure they have the proper unit tests for those exposed APIs, of course). > > > But as I said in my last email, I am happy to include URLOpener if > > some other people are willing to back the idea up. > > Fair enough. I will send a separate email to the SIG since people have probably stopped following most of this thread. =) -Brett From brett at python.org Sun Mar 2 21:15:42 2008 From: brett at python.org (Brett Cannon) Date: Sun, 2 Mar 2008 12:15:42 -0800 Subject: [Web-SIG] Two options for handling urllib Message-ID: MAL and I have obviously been going back and forth about two options on how to handle urllib. One option is to move URLOpener and FancyURLOpener over to what is currently urllib2 and leave it at that. MAL's argument is that this is easier on people who use urllib's more advanced features. It also means users don't need to ship an extra module with their code. The second option is to not move the code over but provide urllib as a downloadable module from PyPI in a 3.0-compatible version. My argument for this is that we should have just a single approach for URLs and be done with it. Providing urllib externally allows people to let their code to continue to work, albeit with one third-party module. What do other people think? Either solution is acceptable to me, so I would really appreciate feedback. -Brett From manlio_perillo at libero.it Mon Mar 3 20:35:00 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 03 Mar 2008 20:35:00 +0100 Subject: [Web-SIG] [ANN] WSGI module 0.0.6 Message-ID: <47CC52E4.3080206@libero.it> I'm pleased to announce the release of the WSGI module for nginx, version 0.0.6. The WSGI module is an implementation of the Python Web Server Gateway Interface v1.0 (http://python.org/dev/peps/pep-0333/) for the Nginx server. The module is available only via a Mercurial repository at: http://hg.mperillo.ath.cx/nginx/mod_wsgi/ Here is a changelog: - Bug fix: configuration problem on Mac OS X. - Added support for nginx 0.5.34, and dropped support for older versions. Note: older version are only supported via patches. - Added wsgi_optimize_send_headers directive. WSGI spec requires that the headers must be sent when the first not empty string is yielded, however sending headers early can optimize content generation. - Added experimental support for interpreter finalization. This will ensure that sys.exitfunc is called. - Variables declared with wsgi_var directive can override HTTP_ variables. It can be useful, as an example, to override HTTP_COOKIE with: wsgi_var HTTP_COOKIE $http_cookie; since the $http_cookie variable combines multiple Cookie headers. - Renamed the wsgi_params directive to wsgi_var. wsgi_param was a poor choice, since parameters are a concept used by FastCGI. This revision breaks compatibility. - Added the wsgi_allow_ranges directive, for integrated support to partial HTTP requests. - Added the ngx_wsgitest.py script, that executes a WSGI application in a testing environment. - Added the ngx_wsgiref.py script, that runs a WSGI application using the builtin wsgiref server, with a command line compatibile with ngx_wsgi.py. - Added the ngx_wsgi.py script, for rapid deployment of WSGI applications using nginx and mod_wsgi. - Added wsgi_middleware directive. This directive enables middleware stacking in mod_wsgi. - Removed the callable_object and the wsgi_alias directives, and added a new wsgi_pass directive. The new directive is more consistent with other nginx modules like mod_fastcgi and mod_proxy. This revision breaks compatibility. Manlio Perillo From Graham.Dumpleton at gmail.com Wed Mar 5 00:05:56 2008 From: Graham.Dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 4 Mar 2008 15:05:56 -0800 (PST) Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> Message-ID: Jose Galvez wrote: > this is an interesting issue, because I would suspect that all our pylons > applications will have to be converted as well as the pylons base code. I > know that there is going to be a program which will automate the > translation, but not having used it I don't know what issues the translation > will cause. The other big question is will eggs will they be able to tell > the difference between python 2.x and 3.x since the code will be different > Jose > > On Tue, Mar 4, 2008 at 3:17 AM, Leo wrote: > > > > > Subj. > > Is Python 3000 migration planned? There is more to it than just that. One problem is that the WSGI 1.0 specification is incompatible with Python 3.0. There were some preliminary discussions about how the specification would need to be changed, but no real final outcome. The discussions also probably didn't cover everything that would need to be changed in the specification. For example, wsgi.file_wrapper and how it would have to be changed wasn't discussed. The main issues were captured in: http://www.wsgi.org/wsgi/Amendments_1.0 Note though that that page is merely a collection of points discussed and is itself not any sort of official set of amendments to the WSGI specification. Personally I believe that WSGI 1.0 should die along with Python 2.X. I believe that WSGI 2.0 should be developed to replace it and the introduction of Python 3.0 would be a great time to do that given that people are going to have to change their code anyway and that code isn't then likely to be backward compatible with Python 2.X. Graham From faassen at startifact.com Wed Mar 5 00:17:40 2008 From: faassen at startifact.com (Martijn Faassen) Date: Wed, 5 Mar 2008 00:17:40 +0100 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> Message-ID: <8928d4e90803041517r425b8c47x5c95274fa9c6f0c9@mail.gmail.com> Hey, On Wed, Mar 5, 2008 at 12:05 AM, Graham Dumpleton wrote: [snip] > Personally I believe that WSGI 1.0 should die along with Python 2.X. I > believe that WSGI 2.0 should be developed to replace it and the > introduction of Python 3.0 would be a great time to do that given that > people are going to have to change their code anyway and that code > isn't then likely to be backward compatible with Python 2.X. I think lots of Python projects reason this way: Python 3 transition is the right time to break backwards compatibility in our library/framework. It's understandable. Unfortunately this means that for people adjusting their code, they won't just have to deal with the large Python 3 transition, but also with lots of their frameworks and libraries making backwards-incompatible changes. That's unfortunate, as that means any automatic conversion strategy using the py2to3 script won't be possible, and there won't be any way to keep libraries in transition working in both Python 2 and 3 for a while (which is Guido's plan), as their dependencies don't support it. Regards, Martijn From graham.dumpleton at gmail.com Wed Mar 5 01:48:44 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Wed, 5 Mar 2008 11:48:44 +1100 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: <8928d4e90803041517r425b8c47x5c95274fa9c6f0c9@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <8928d4e90803041517r425b8c47x5c95274fa9c6f0c9@mail.gmail.com> Message-ID: <88e286470803041648l58a4a871oca926db2d50480ae@mail.gmail.com> On 05/03/2008, Martijn Faassen wrote: > Hey, > > On Wed, Mar 5, 2008 at 12:05 AM, Graham Dumpleton > wrote: > [snip] > > > Personally I believe that WSGI 1.0 should die along with Python 2.X. I > > believe that WSGI 2.0 should be developed to replace it and the > > introduction of Python 3.0 would be a great time to do that given that > > people are going to have to change their code anyway and that code > > isn't then likely to be backward compatible with Python 2.X. > > I think lots of Python projects reason this way: Python 3 transition > is the right time to break backwards compatibility in our > library/framework. It's understandable. > > Unfortunately this means that for people adjusting their code, they > won't just have to deal with the large Python 3 transition, but also > with lots of their frameworks and libraries making > backwards-incompatible changes. That's unfortunate, as that means any > automatic conversion strategy using the py2to3 script won't be > possible, and there won't be any way to keep libraries in transition > working in both Python 2 and 3 for a while (which is Guido's plan), as > their dependencies don't support it. In the case of code which directly talks to the interface defined by WSGI specification I very much doubt the py2to3 script will help. This is because for WSGI to work with Python 3.0 there needs to be a change from use of string type objects to byte string type objects. I would suspect that py2to3 is only get help in any sort of automated way with the fact that a string object becomes unicode aware, not where with WSGI the code would have to change to use and deal with a different type of object completely. The implications of this change to a byte string type object are going to be much more complicated. What I fear is that if Python 3.0 isn't used as a trigger to push out WSGI 2.0, we will end up being stuck with WSGI 1.0 forever and there will never ever be any momentum to updating it even though a range of deficiencies and shortcomings have been identified in the specification as far as the way it is drafted, with the functionality it provides and how that functionality is described as needing to be implemented. I'd rather not see another XML-RPC where in practice it was a good first attempt, but with a little bit of tweaking would have made it so much better, but still keep its simplicity. And no I don't mean SOAP, that went too far. Problem with XML-RPC from what I saw at the time is that the original author had a lot invested in software that used the original XML-RPC and he wasn't going to budge as he didn't want to have to change his own systems based on it. With Python 3.0 people are going to have to change their code anyway and so it is an ideal time to push to a new version of WSGI specification which fixes its warts and eliminates the oddities it had to support certain legacy systems, something which is now not seen as necessary. Also, for most systems that use WSGI it would be quite minimal impact, as they often use it merely as a bridge to some existing web server interface. Thus changes would be very localised. Even something like Paste/Pylons hides a lot of what is WSGI behind its own veneer, for example WebOb and its predecessor and so higher layers may not even be affected much. As much as I'd like to see everything move to a better WSGI 2.0, if there are components which people don't want to update, then a WSGI 2.0 to 1.0 bridging middleware can be used to adapt them. Graham From faassen at startifact.com Wed Mar 5 03:13:19 2008 From: faassen at startifact.com (Martijn Faassen) Date: Wed, 5 Mar 2008 03:13:19 +0100 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: <88e286470803041648l58a4a871oca926db2d50480ae@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <8928d4e90803041517r425b8c47x5c95274fa9c6f0c9@mail.gmail.com> <88e286470803041648l58a4a871oca926db2d50480ae@mail.gmail.com> Message-ID: <8928d4e90803041813q6a75b122nd47bd35506895b0a@mail.gmail.com> Hey, On Wed, Mar 5, 2008 at 1:48 AM, Graham Dumpleton wrote: [snip] > In the case of code which directly talks to the interface defined by > WSGI specification I very much doubt the py2to3 script will help. This > is because for WSGI to work with Python 3.0 there needs to be a change > from use of string type objects to byte string type objects. I would > suspect that py2to3 is only get help in any sort of automated way with > the fact that a string object becomes unicode aware, not where with > WSGI the code would have to change to use and deal with a different > type of object completely. The implications of this change to a byte > string type object are going to be much more complicated. I have no idea what the capabilities of this script are. I would *imagine* it would convert classic strings into the bytes types, and unicode strings into the new string type. > What I fear is that if Python 3.0 isn't used as a trigger to push out > WSGI 2.0, we will end up being stuck with WSGI 1.0 forever and there > will never ever be any momentum to updating it even though a range of > deficiencies and shortcomings have been identified in the > specification as far as the way it is drafted, with the functionality > it provides and how that functionality is described as needing to be > implemented. [snip XML-RPC example] That argument doesn't work for me. You're implying that if Python 3.0 did not exist, there would be no way to come out with a new version of the specification to fix shortcomings? We can't fix APIs unless we have the momentum given by a language change? You better never have any ideas on WSGI 3.0 then, as it's unlikely you'll have another such opportunity. > With Python 3.0 people are going to have to change their code anyway > and so it is an ideal time to push to a new version of WSGI > specification which fixes its warts and eliminates the oddities it had > to support certain legacy systems, something which is now not seen as > necessary. "With Python 3.0 people are going to have their change their code anyway as the language changes, so we're going to make it harder for them by breaking their libraries too" Having one thing change is hard enough on people. It's then nice to be able to run your tests and have some indication it works. It's also nice to be able to continue releasing for Python 2.x for a while, and release the converted code using the conversion script. I'm not making up this plan, that's the official plan. Changing libraries will break this plan. [WSGI is hidden, so it will be a low-impact change] This may be true. I still don't see a reason to connect it to the language change. Anyway, I'll stop on this now. I just think it's a worrying trend. > As much as I'd like to see everything move to a better WSGI 2.0, if > there are components which people don't want to update, then a WSGI > 2.0 to 1.0 bridging middleware can be used to adapt them. Yes, that would help people using Python 2.x, but would WSGI 1.0 even be available in Python 3.0 given your plan? Regards, Martijn From guido at python.org Wed Mar 5 03:25:21 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 4 Mar 2008 18:25:21 -0800 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: <8928d4e90803041813q6a75b122nd47bd35506895b0a@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <8928d4e90803041517r425b8c47x5c95274fa9c6f0c9@mail.gmail.com> <88e286470803041648l58a4a871oca926db2d50480ae@mail.gmail.com> <8928d4e90803041813q6a75b122nd47bd35506895b0a@mail.gmail.com> Message-ID: On Tue, Mar 4, 2008 at 6:13 PM, Martijn Faassen wrote: > Hey, > > On Wed, Mar 5, 2008 at 1:48 AM, Graham Dumpleton > wrote: > [snip] > > > In the case of code which directly talks to the interface defined by > > WSGI specification I very much doubt the py2to3 script will help. This > > is because for WSGI to work with Python 3.0 there needs to be a change > > from use of string type objects to byte string type objects. I would > > suspect that py2to3 is only get help in any sort of automated way with > > the fact that a string object becomes unicode aware, not where with > > WSGI the code would have to change to use and deal with a different > > type of object completely. The implications of this change to a byte > > string type object are going to be much more complicated. > > I have no idea what the capabilities of this script are. I would > *imagine* it would convert classic strings into the bytes types, and > unicode strings into the new string type. It does nothing of the kind. It leaves 'xxx' literals alone and translates u'xxx' to 'xxx'. That's because (in many apps) both are used primarily for text. BTW I suggest that you play with it at least a little bit (run it on its own example.py file) before diving into this discussion... > > What I fear is that if Python 3.0 isn't used as a trigger to push out > > WSGI 2.0, we will end up being stuck with WSGI 1.0 forever and there > > will never ever be any momentum to updating it even though a range of > > deficiencies and shortcomings have been identified in the > > specification as far as the way it is drafted, with the functionality > > it provides and how that functionality is described as needing to be > > implemented. > [snip XML-RPC example] > > That argument doesn't work for me. You're implying that if Python 3.0 > did not exist, there would be no way to > come out with a new version of the specification to fix shortcomings? > We can't fix APIs unless we have the momentum given by a language > change? You better never have any ideas on WSGI 3.0 then, as it's > unlikely you'll have another such opportunity. > > > > With Python 3.0 people are going to have to change their code anyway > > and so it is an ideal time to push to a new version of WSGI > > specification which fixes its warts and eliminates the oddities it had > > to support certain legacy systems, something which is now not seen as > > necessary. > > "With Python 3.0 people are going to have their change their code > anyway as the language changes, so we're going to make it harder for > them by breaking their libraries too" > > Having one thing change is hard enough on people. It's then nice to be > able to run your tests and have some indication it works. It's also > nice to be able to continue releasing for Python 2.x for a while, and > release the converted code using the conversion script. I'm not making > up this plan, that's the official plan. Changing libraries will break > this plan. > > [WSGI is hidden, so it will be a low-impact change] > > This may be true. I still don't see a reason to connect it to the > language change. Anyway, I'll stop on this now. I just think it's a > worrying trend. > > > > As much as I'd like to see everything move to a better WSGI 2.0, if > > there are components which people don't want to update, then a WSGI > > 2.0 to 1.0 bridging middleware can be used to adapt them. > > Yes, that would help people using Python 2.x, but would WSGI 1.0 even > be available in Python 3.0 given your plan? > > Regards, > > Martijn > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ianb at colorstudy.com Wed Mar 5 04:53:49 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 04 Mar 2008 21:53:49 -0600 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> Message-ID: <47CE194D.5080605@colorstudy.com> Graham Dumpleton wrote: > Personally I believe that WSGI 1.0 should die along with Python 2.X. I > believe that WSGI 2.0 should be developed to replace it and the > introduction of Python 3.0 would be a great time to do that given that > people are going to have to change their code anyway and that code > isn't then likely to be backward compatible with Python 2.X. I don't believe it should just *die*. But I agree that this is a good time to revisit the specification. Especially since I have no idea how the change to unicode text would effect the WSGI environment. Having the environment hold bytes seems weird, but having it hold unicode is a substantial change. I don't think it will be as bad as Martijn thinks, because the libraries people use will probably have relatively few interface changes. Pylons and WebOb for instance should maintain largely the same interface (and they already expose unicode when possible). None of the changes proposed for WSGI 2 would change this. If I'm maintaining two versions of a library (one for Python 2, one for Python 3), then at least I'd like to get a little benefit out of it, and a revised WSGI would give some benefit. I think we might still need some kind of WSGI 1.1 to clarify what WSGI 1 (-like semantics) means in a Python 3.0 environment. Creating adapters from WSGI 1 to WSGI 2 should be easy enough that we could still offer some support for minimally-translated WSGI code. Ian From brett at python.org Wed Mar 5 07:49:28 2008 From: brett at python.org (Brett Cannon) Date: Tue, 4 Mar 2008 22:49:28 -0800 Subject: [Web-SIG] The solution for urllib and the url package Message-ID: OK, here is the stuff that is not disputed: urllib2 -> url.request urlparse -> url.parse urllib quoting-related functions -> url.parse The thing up for debate is the rest of urllib. I have decided MAL can have what he wants and have URLOpener and FancyURLOpener put in url.request if the documentation for both classes is completely updated to expose the proper API for both classes for 2.6. May 1st is the deadline to get this to happen. If this does not occur then the non-quoting code gets made external for people download and use if they want but is not kept in the stdlib. -Brett From manlio_perillo at libero.it Wed Mar 5 09:39:26 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 05 Mar 2008 09:39:26 +0100 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> Message-ID: <47CE5C3E.5030609@libero.it> Graham Dumpleton ha scritto: > [...] > > Personally I believe that WSGI 1.0 should die along with Python 2.X. I > believe that WSGI 2.0 should be developed to replace it and the > introduction of Python 3.0 would be a great time to do that given that > people are going to have to change their code anyway and that code > isn't then likely to be backward compatible with Python 2.X. > Fine with me but there is a *big* problem. WSGI 2.0 "breaks" support for asynchronous applications (since you can no more send headers in the app iter). I have finally implemented an extension for the Nginx's WSGI module that give support to asynchronos applications. I *need* it because in a application I'm developing I have to talk with a web service on the Internet, and not using an asynchronous http client (I'm using pycurl) is a suicide. > Graham > _______________________________________________ Manlio Perillo From faassen at startifact.com Wed Mar 5 09:40:23 2008 From: faassen at startifact.com (Martijn Faassen) Date: Wed, 5 Mar 2008 09:40:23 +0100 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <8928d4e90803041517r425b8c47x5c95274fa9c6f0c9@mail.gmail.com> <88e286470803041648l58a4a871oca926db2d50480ae@mail.gmail.com> <8928d4e90803041813q6a75b122nd47bd35506895b0a@mail.gmail.com> Message-ID: <8928d4e90803050040u1bb9571j4fef823fa802d7c1@mail.gmail.com> Hey, On Wed, Mar 5, 2008 at 3:25 AM, Guido van Rossum wrote: > On Tue, Mar 4, 2008 at 6:13 PM, Martijn Faassen wrote: > > Hey, > > > > On Wed, Mar 5, 2008 at 1:48 AM, Graham Dumpleton > > wrote: > > [snip] > > > > > In the case of code which directly talks to the interface defined by > > > WSGI specification I very much doubt the py2to3 script will help. This > > > is because for WSGI to work with Python 3.0 there needs to be a change > > > from use of string type objects to byte string type objects. I would > > > suspect that py2to3 is only get help in any sort of automated way with > > > the fact that a string object becomes unicode aware, not where with > > > WSGI the code would have to change to use and deal with a different > > > type of object completely. The implications of this change to a byte > > > string type object are going to be much more complicated. > > > > I have no idea what the capabilities of this script are. I would > > *imagine* it would convert classic strings into the bytes types, and > > unicode strings into the new string type. > > It does nothing of the kind. It leaves 'xxx' literals alone and > translates u'xxx' to 'xxx'. That's because (in many apps) both are > used primarily for text. > BTW I suggest that you play with it at least a little bit (run it on > its own example.py file) before diving into this discussion... I accurately described my lack of knowledge of the script, then. :) Sure, I need to play with the script. I guess the best route would be to introduce bytes in your code in Python 2.x and have the script leave that alone. If WSGI 2.0 then makes it into Python 2.x as well, then there's no problem with API breakage. Playing with the script will happen sometime, but I think it's quite clear the script will be of no help if important library APIs also break down because people take their chances during transition (and the script doesn't take care of it, which it can't for third party APIs). WSGI is probably not the best example given the string issue and its inclusion in the Python core, though: as Graham expressed, it's probably going to have problems no matter what. I also think any new version could be developed on Python 2.6 first, as this will support the bytes type as far as I understand. And yes, I need to try the Python 2.6 alpha interpreter first too. :) Regards, Martijn From faassen at startifact.com Wed Mar 5 09:43:41 2008 From: faassen at startifact.com (Martijn Faassen) Date: Wed, 5 Mar 2008 09:43:41 +0100 Subject: [Web-SIG] Are you going to convert Pylons code into Python 3000? In-Reply-To: <47CE194D.5080605@colorstudy.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE194D.5080605@colorstudy.com> Message-ID: <8928d4e90803050043q5aa10bd5j67e8ba38b870fed0@mail.gmail.com> Hey, On Wed, Mar 5, 2008 at 4:53 AM, Ian Bicking wrote: > Graham Dumpleton wrote: > > Personally I believe that WSGI 1.0 should die along with Python 2.X. I > > believe that WSGI 2.0 should be developed to replace it and the > > introduction of Python 3.0 would be a great time to do that given that > > people are going to have to change their code anyway and that code > > isn't then likely to be backward compatible with Python 2.X. > > I don't believe it should just *die*. But I agree that this is a good > time to revisit the specification. Especially since I have no idea how > the change to unicode text would effect the WSGI environment. Having > the environment hold bytes seems weird, but having it hold unicode is a > substantial change. > I don't think it will be as bad as Martijn thinks, because the libraries > people use will probably have relatively few interface changes. Pylons > and WebOb for instance should maintain largely the same interface (and > they already expose unicode when possible). None of the changes > proposed for WSGI 2 would change this. That's probably true. WSGI is likely not the best example for this case, just the trigger murmur that caused me to speak out. The WSGI spec is not the only place where people will take the opportunity to break APIs. Unfortunately as with WSGI, API breakage may in many cases be unavoidable.. I would like to encourage the adoption of any new such standard in the Python 2.6 environment already, if at all possible. This way it's not an extra step for people to be burdened with when they move to Python 3, but something they can prepare for gradually. Regards, Martijn From manlio_perillo at libero.it Wed Mar 5 17:37:48 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 05 Mar 2008 17:37:48 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <002501c87ec4$4b68ac60$6401a8c0@T60> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> Message-ID: <47CECC5C.1010505@libero.it> Brian Smith ha scritto: > Manlio Perillo wrote: >> Fine with me but there is a *big* problem. >> >> WSGI 2.0 "breaks" support for asynchronous applications >> (since you can no more send headers in the app iter). > > WSGI 1.0 doesn't guarentee that all asynchronous applications will work > either, because it allows the WSGI gateway to wait for and buffer all > the input from the client before even calling the application callable. > And, it doesn't provide a way to read an indefinite stream of input from > the client, which is also problematic. > > Anyway, please post a small example of a program that fails to work > because of these proposed changes for WSGI 2.0. > > Thanks, > Brian > Attached there are two working examples (I have not committed it yet, because I'm still testing - there are some problems that I need to solve). The `curl_client` module is an high level interface to pycurl. The `nginx-poll-proxy.py` script is an asyncronous WSGI application that implements an HTTP proxy. The `nginx-poll-sleep.py` script is a simple asynchronous WSGI application that get the content of an HTTP resource using poll just to "sleep" (suspend execution) for a fixed amount of time. NOTE: I have also added a `ngx.sleep` extension, but I'm going to remove it since the same behaviour can be obtained with ngx.poll. An explanation of the interfaces -------------------------------- The ngx.poll extension is based on the Python stdlib select.poll interface. There are two constants: `ngx.WSGI_POLLIN` and `ngx.WSGI_POLLOUT`. These are defined in the WSGI environment, but their value is "know" (`0x01` and `0x04`) and can be used for bit masking. The `ngx.connection_wrapper(fd)` function takes as input a file descriptor (as integer) and returns a Connection wrapper object, to be used for later operations. The Connection wrapper object has the following methods: - fileno(): return the associated socket descriptor - register(flags): register the connection with the server "reactor"; flags is a bit mask of ngx.WSGI_POLLIN and ngx.WSGI_POLLOUT - deregister(flags=None): deregister the connection from the server "reactor" - close: close the connection object, deregisterering it from the server "reactor" if still active. XXX it also can close the socket, but this should be done by the client The last function is `ngx.poll(timeout)`. When called, the user *should* yield an empty string (yielding a non empty string will result in an "undefined behaviour"). The WSGI application iteration will be suspended until a connection is ready for reading or writing, or the timeout expires. The `ngx.poll` function returns a callable that, when called, returns a tuple with the connection object "ready" (or None if timedout) and a flag indicating if the connection is ready for reading or writing. NOTE: due to the internal architecture of the Nginx event module (it have to support several different event systems), mod_wsgi for Nginx will only return ngx.WSGI_POLLIN or ngx.WSGI_POLLPUT, *never* ngx.WSGI_POLLIN | ngx.WSGI_POLLPUT. Also, no error status is reported. That's all. An asynchronous application is simply impossible to develope with the current draft of WSGI 2.0, since I need to send the headers after some steps in the application iterator. So, please, don't "ruin" the WSGI specification just to make it more easy to implement and to use. For me asynchronous support is very important. P.S: I have chosen to implement this interface, instead of `wsgi.pause_output`, because IMHO it is very easy to implement for "normal" servers. Moreover it is also more simple to use, with a very "natural" interface, and it avoids the use of callbacks and a more strict interaction with the server "reactor". Regards Manlio Perillo -------------- next part -------------- A non-text attachment was scrubbed... Name: curl_client.py Type: text/x-python Size: 4582 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20080305/311b4450/attachment.py -------------- next part -------------- A non-text attachment was scrubbed... Name: nginx-poll-proxy.py Type: text/x-python Size: 1410 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20080305/311b4450/attachment-0001.py -------------- next part -------------- A non-text attachment was scrubbed... Name: nginx-poll-sleep.py Type: text/x-python Size: 1628 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20080305/311b4450/attachment-0002.py From graham.dumpleton at gmail.com Wed Mar 5 23:37:45 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 6 Mar 2008 09:37:45 +1100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47CECC5C.1010505@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> Message-ID: <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> Let me get this right. You are complaining that the WSGI 2.0 would break your non standard extension which was never a part of the WSGI 1.0 specification to begin with. I also find it interesting that in the very early days you were pushing very very hard for WSGI 2.0 to be specified and you had no intention of even supporting WSGI 1.0 style interface. Now things seem to be the complete opposite. Anyway, your complaint seems to resolve around: """An asynchronous application is simply impossible to develope with the current draft of WSGI 2.0, since I need to send the headers after some steps in the application iterator.""" You probably need to explain the second half of that sentence a bit better. From memory the WSGI 1.0 specification says that for an iterable, the headers should be sent upon the generation of the first non empty string being yielded. How does what you are doing relate to that, are you not doing that? Why would WSGI 2.0 necessarily be any different and cause a problem? Graham On 06/03/2008, Manlio Perillo wrote: > Brian Smith ha scritto: > > Manlio Perillo wrote: > >> Fine with me but there is a *big* problem. > >> > >> WSGI 2.0 "breaks" support for asynchronous applications > >> (since you can no more send headers in the app iter). > > > > WSGI 1.0 doesn't guarentee that all asynchronous applications will work > > either, because it allows the WSGI gateway to wait for and buffer all > > the input from the client before even calling the application callable. > > And, it doesn't provide a way to read an indefinite stream of input from > > the client, which is also problematic. > > > > Anyway, please post a small example of a program that fails to work > > because of these proposed changes for WSGI 2.0. > > > > Thanks, > > Brian > > > > > Attached there are two working examples (I have not committed it yet, > because I'm still testing - there are some problems that I need to solve). > > > The `curl_client` module is an high level interface to pycurl. > > The `nginx-poll-proxy.py` script is an asyncronous WSGI application that > implements an HTTP proxy. > > The `nginx-poll-sleep.py` script is a simple asynchronous WSGI > application that get the content of an HTTP resource using poll just to > "sleep" (suspend execution) for a fixed amount of time. > > > NOTE: I have also added a `ngx.sleep` extension, but I'm going to remove > it since the same behaviour can be obtained with ngx.poll. > > > An explanation of the interfaces > -------------------------------- > > The ngx.poll extension is based on the Python stdlib select.poll interface. > > There are two constants: `ngx.WSGI_POLLIN` and `ngx.WSGI_POLLOUT`. > These are defined in the WSGI environment, but their value is "know" > (`0x01` and `0x04`) and can be used for bit masking. > > The `ngx.connection_wrapper(fd)` function takes as input a file > descriptor (as integer) and returns a Connection wrapper object, to be > used for later operations. > > > The Connection wrapper object has the following methods: > - fileno(): > return the associated socket descriptor > - register(flags): > register the connection with the server "reactor"; > flags is a bit mask of ngx.WSGI_POLLIN and ngx.WSGI_POLLOUT > - deregister(flags=None): > deregister the connection from the server "reactor" > - close: > close the connection object, deregisterering it from the server > "reactor" if still active. > XXX it also can close the socket, but this should be done by the > client > > The last function is `ngx.poll(timeout)`. > When called, the user *should* yield an empty string (yielding a non > empty string will result in an "undefined behaviour"). > > The WSGI application iteration will be suspended until a connection is > ready for reading or writing, or the timeout expires. > > The `ngx.poll` function returns a callable that, when called, returns a > tuple with the connection object "ready" (or None if timedout) and a > flag indicating if the connection is ready for reading or writing. > > NOTE: due to the internal architecture of the Nginx event module (it > have to support several different event systems), mod_wsgi for > Nginx will only return ngx.WSGI_POLLIN or ngx.WSGI_POLLPUT, > *never* ngx.WSGI_POLLIN | ngx.WSGI_POLLPUT. > > Also, no error status is reported. > > > > That's all. > > An asynchronous application is simply impossible to develope with the > current draft of WSGI 2.0, since I need to send the headers after some > steps in the application iterator. > > > So, please, don't "ruin" the WSGI specification just to make it more > easy to implement and to use. > For me asynchronous support is very important. > > > P.S: I have chosen to implement this interface, instead of > `wsgi.pause_output`, because IMHO it is very easy to implement for > "normal" servers. > > Moreover it is also more simple to use, with a very "natural" > interface, and it avoids the use of callbacks and a more strict > interaction with the server "reactor". > > > Regards Manlio Perillo > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > > > From pje at telecommunity.com Thu Mar 6 03:05:38 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 05 Mar 2008 21:05:38 -0500 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.co m> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> Message-ID: <20080306020523.2802E3A40D8@sparrow.telecommunity.com> At 09:37 AM 3/6/2008 +1100, Graham Dumpleton wrote: >You probably need to explain the second half of that sentence a bit >better. From memory the WSGI 1.0 specification says that for an >iterable, the headers should be sent upon the generation of the first >non empty string being yielded. How does what you are doing relate to >that, are you not doing that? Why would WSGI 2.0 necessarily be any >different and cause a problem? Because (in concept anyway) WSGI 2.0 is synchronous with respect to headers -- you don't get to yield empty strings and *then* return the headers. Personally, I see truly-async web apps as a niche, because in order to write a useful async app, you need *other* async APIs besides your incoming HTTP one. Which means you're going to have to write to Twisted or some other library's API, or else roll your own. At which point, connecting your app to a web server is the least of your concerns. (Since it has to be a web server that's compatible with the API you're using, which means you might as well use its native API.) That having been said, I don't see a problem with having a Web Server Asynchronous Interface (WSAI?) for folks who want that sort of thing. Ideally, such a thing would be the CPS (continuation-passing style) mirror of WSGI 2.0. Where in WSGI 2.0 you return a 3-tuple, in WSAI you'd essentially use start_response() and write(). In essence, you might say that WSGI 1.0 is a broken-down version of a hideous crossbreeding of pure WSGI and pure WSAI. It would probably be better to split them and have bridges. A truly-async system like Twisted has to (effectively) do WSAI-WSGI bridging right now, but if we had a WSAI standard, then there could perhaps be third-party bridges. Even so, it's quite a niche: Twisted, nginx, and...? I know there are a handful of async frameworks, and how many of those have web servers included? From manlio_perillo at libero.it Thu Mar 6 11:12:49 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Thu, 06 Mar 2008 11:12:49 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> Message-ID: <47CFC3A1.4000003@libero.it> Graham Dumpleton ha scritto: > Let me get this right. You are complaining that the WSGI 2.0 would > break your non standard extension which was never a part of the WSGI > 1.0 specification to begin with. > No, you are wrong. WSGI *allows* an implementation to develope extensions. I'm complaining that WSGI 2.0 will break support for truly-async web apps. > I also find it interesting that in the very early days you were > pushing very very hard for WSGI 2.0 to be specified and you had no > intention of even supporting WSGI 1.0 style interface. Now things seem > to be the complete opposite. > First of all, in the early days I had very little experience with WSGI and Nginx internals. Moreover, as I can remember, I have never said that I was not going to support WSGI 1.0. I have started with an implementation of WSGI 2.0 because it was more "easy" to implement and it allowed me (with little experience at that time) to have a working implementation as soon as possible. > Anyway, your complaint seems to resolve around: > > """An asynchronous application is simply impossible to develope with the > current draft of WSGI 2.0, since I need to send the headers after some > steps in the application iterator.""" > Right. > You probably need to explain the second half of that sentence a bit > better. From memory the WSGI 1.0 specification says that for an > iterable, the headers should be sent upon the generation of the first > non empty string being yielded. How does what you are doing relate to > that, are you not doing that? Why would WSGI 2.0 necessarily be any > different and cause a problem? > See the response from Phillip J. Eby. > Graham > Manlio Perillo From manlio_perillo at libero.it Thu Mar 6 11:34:54 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Thu, 06 Mar 2008 11:34:54 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <20080306020523.2802E3A40D8@sparrow.telecommunity.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <20080306020523.2802E3A40D8@sparrow.telecommunity.com> Message-ID: <47CFC8CE.8040208@libero.it> Phillip J. Eby ha scritto: > At 09:37 AM 3/6/2008 +1100, Graham Dumpleton wrote: >> You probably need to explain the second half of that sentence a bit >> better. From memory the WSGI 1.0 specification says that for an >> iterable, the headers should be sent upon the generation of the first >> non empty string being yielded. How does what you are doing relate to >> that, are you not doing that? Why would WSGI 2.0 necessarily be any >> different and cause a problem? > > Because (in concept anyway) WSGI 2.0 is synchronous with respect to > headers -- you don't get to yield empty strings and *then* return the > headers. > > Personally, I see truly-async web apps as a niche, because in order to > write a useful async app, you need *other* async APIs besides your > incoming HTTP one. Yes, this is true. But I have to say that: 1) the asynchronous model is the "right" model to use to develope robust and scalable applications (expecially in Python). The fact that it is a niche does not means that it should not be supported and promoted. > Which means you're going to have to write to Twisted > or some other library's API, or else roll your own. This is true, but there are already some working(?) asynchronous clients: pycurl and psycopg2. You don't need to use the web server "private" API. An HTTP client and an a database client is usually all you need in a web application (well, you usually need also an SMTP client, but since a server probably has a local SMTP daemon running, this should not be a problem) > At which point, > connecting your app to a web server is the least of your concerns. This is not always true. > (Since it has to be a web server that's compatible with the API you're > using, which means you might as well use its native API.) > No, this is not correct. The ngx.poll extension should be easy to implement in a "standard" server (I would like to write a reference implementation for wsgiref). Moreover it is not impossible to write a pure async WSGI implementation in Twisted Web, and then having it support the poll extension. Then, a portable application can just use pycurl or psycopg2 + the poll extension and should be portable. Of course many WSGI implementations will not implements an "optimized" version of the poll extension, but isn't the same true for wsgi.file_wrapper? > That having been said, I don't see a problem with having a Web Server > Asynchronous Interface (WSAI?) for folks who want that sort of thing. > Ideally, such a thing would be the CPS (continuation-passing style) > mirror of WSGI 2.0. Where in WSGI 2.0 you return a 3-tuple, in WSAI > you'd essentially use start_response() and write(). > Why write? It's only a problem. An asynchronous application should just use a generator. This solves some problems, like the consumer producer problem. Moreover it is also more convienent to use (IMHO). > In essence, you might say that WSGI 1.0 is a broken-down version of a > hideous crossbreeding of pure WSGI and pure WSAI. It would probably be > better to split them and have bridges. A truly-async system like > Twisted has to (effectively) do WSAI-WSGI bridging right now, but if we > had a WSAI standard, then there could perhaps be third-party bridges. > > Even so, it's quite a niche: Twisted, nginx, and...? I know there are a > handful of async frameworks, and how many of those have web servers > included? > Yes, this is a problem But what makes WSGI 1.0 great, is that it is able to support this niche. Thanks Manlio Perillo From l.oluyede at gmail.com Thu Mar 6 12:03:48 2008 From: l.oluyede at gmail.com (Lawrence Oluyede) Date: Thu, 6 Mar 2008 12:03:48 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47CFC3A1.4000003@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <47CFC3A1.4000003@libero.it> Message-ID: <9eebf5740803060303pc473ea9m83022832f57912ba@mail.gmail.com> > No, you are wrong. > WSGI *allows* an implementation to develope extensions. > > I'm complaining that WSGI 2.0 will break support for truly-async web apps. Correct me if I'm wrong. WSGI is great on paper and almost great in daily use. One of this peculiarities in the "middleware extension pattern", which has to foster reuse and spread of middleware doing (I hope) one thing and doing right. AFAIK most of the middleware out there are not written thinking about async at all. I don't see Twisted developers crying out loud begging people to write async middlewares and never block. Don't take it the wrong way but what's the point in fighting so hard for WSGI when there's plenty of ways to just ignore it? I know that my statement will upset someone but I think the idea of two separate web standard is great. It's too late to force in async in the WSGI world and you, with your twisted expertise, should now that writing async is hard and asking everyone to not block is even harder (that's why even Twisted Matrix has callInThread and something like that). From manlio_perillo at libero.it Thu Mar 6 12:44:30 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Thu, 06 Mar 2008 12:44:30 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <9eebf5740803060303pc473ea9m83022832f57912ba@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <47CFC3A1.4000003@libero.it> <9eebf5740803060303pc473ea9m83022832f57912ba@mail.gmail.com> Message-ID: <47CFD91E.7080309@libero.it> Lawrence Oluyede ha scritto: >> No, you are wrong. >> WSGI *allows* an implementation to develope extensions. >> >> I'm complaining that WSGI 2.0 will break support for truly-async web apps. > > Correct me if I'm wrong. WSGI is great on paper and almost great in > daily use. One of this peculiarities in the "middleware extension > pattern", which has to foster reuse and spread of middleware doing (I > hope) one thing and doing right. AFAIK most of the middleware out > there are not written thinking about async at all. I don't see Twisted > developers crying out loud begging people to write async middlewares > and never block. > > Don't take it the wrong way but what's the point in fighting so hard > for WSGI when there's plenty of ways to just ignore it? > Because I don't care if people implements WSGI in the wrong way :). If there is a good middleware, but it is not async friendly, I simply will not use it and I will try to rewrite it, if it is feasible. I'm fighting so hard because I think that it is wrong to try to simplify the WSGI spec so much to make it not usable for writing pute async applications. > I know that my statement will upset someone but I think the idea of > two separate web standard is great. It's too late to force in async in > the WSGI world and you, with your twisted expertise, should now that > writing async is hard and asking everyone to not block is even harder > (that's why even Twisted Matrix has callInThread and something like > that). > I'm not asking everyone to not block! This is not pratical. And yes, a sync application is very different from a async application. As an example of HTTP client: def application(environ, start_response): c = Connection(...) r = c.request(...) for block in r: yield block data = r.get_response() VS def application(environ, start_response): c = Connection(...) r = c.request(...) data = r.get_response() I'm not sure that having two standards is the best solution, since it will complicate the implementation of a WSGI middleware. Right now, the WSGI module for Nginx can serve both sync and async applications without problems. Manlio Perillo From manlio_perillo at libero.it Thu Mar 6 13:11:14 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Thu, 06 Mar 2008 13:11:14 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47CFD91E.7080309@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <47CFC3A1.4000003@libero.it> <9eebf5740803060303pc473ea9m83022832f57912ba@mail.gmail.com> <47CFD91E.7080309@libero.it> Message-ID: <47CFDF62.80100@libero.it> Manlio Perillo ha scritto: > [...] > > I'm not sure that having two standards is the best solution, since it > will complicate the implementation of a WSGI middleware. A correction: it should be WSGI gateway and not WSGI middleware. Manlio Perillo From brian at briansmith.org Thu Mar 6 16:08:18 2008 From: brian at briansmith.org (Brian Smith) Date: Thu, 6 Mar 2008 07:08:18 -0800 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47CECC5C.1010505@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> Message-ID: <003601c87f9b$eb132ce0$6401a8c0@T60> Manlio Perillo wrote: > Brian Smith ha scritto: > > Manlio Perillo wrote: > >> Fine with me but there is a *big* problem. > >> > >> WSGI 2.0 "breaks" support for asynchronous applications (since you > >> can no more send headers in the app iter). > > > > WSGI 1.0 doesn't guarentee that all asynchronous applications will > > work either, because it allows the WSGI gateway to wait for > and buffer > > all the input from the client before even calling the > application callable. > > And, it doesn't provide a way to read an indefinite stream of input > > from the client, which is also problematic. > > > > Anyway, please post a small example of a program that fails to work > > because of these proposed changes for WSGI 2.0. > > > > Thanks, > > Brian > > > > > Attached there are two working examples (I have not committed > it yet, because I'm still testing - there are some problems > that I need to solve). I looked at your examples and now I understand better what you are trying to do. I think what you are trying to do is reasonable but it isn't something that is supported even by WSGI 1.0. It happens to work efficiently for your particular gateway, but that isn't what WSGI is about. In fact, any WSGI application that doesn't run correctly with an arbitrary WSGI gateway (assuming no bugs in any gateway) isn't a WSGI application at all. It seems that the problem with your examples is not that they won't work with WSGI 2.0. Rather, the problem is that the applications block too long. The application will still work correctly, but will not be efficient when run in nginx's mod_wsgi. However, that isn't a problem with the specification or with the application; it is a problem with nginx's mod_wsgi. I hate reading about the "Pythonic way" of doing things, but writing a WSGI application so that it doesn't block too much or too long is simply not Pythonic. The WSGI gateway needs to abstract away those concerns so that they aren't an issue. Otherwise, the gateway will only be useful for specialized applications designed to run well on that particular gateway. Such specialized applications might as well use specialized (gateway-specific) APIs, if they have to be designed specifically for a particular gateway anyway. Further, it is impossible to write a good HTTP proxy with WSGI. The control over threading, blocking, I/O, and buffer management is just not there in WSGI. In order to support efficient implementations of such things, WSGI would have to become so low-level that it would become pointless--it would be exposing an interface that is so low-level that it wouldn't even be cross-platform. It wouldn't abstract away anything. At the same time, the current WSGI 2.0 proposal abstracts too much. It is good for applications that are written directly on top of the gateway, and for simple middleware. But, it is not appropriate for a serious framework to be built on. It is wrong to think that the same interface is suitable for frameworks, middleware developers, and application developers. I would rather see WSGI 2.0 become a much lower-level framework that works at the buffer level (not strings), with the ability to do non-blocking reads from wsgi.input, and the ability to let the WSGI gateway do buffering in a sane and efficient manner (there's no reason for the application to do a bunch of string joins when the gateway could just send all the pieces in a single writev()). Some control over blocking, HTTP chunked encoding, etc. could be included as well. The current suggestions for WSGI 2.0 would then just be a sample framework layered on top of this low-level interface, for developers that don't want to use a big framework like DJango or Pylons. But, the big frameworks and middleware would use the low-level interface to run efficiently. - Brian From pje at telecommunity.com Thu Mar 6 17:59:57 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 06 Mar 2008 11:59:57 -0500 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47CFDF62.80100@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <47CFC3A1.4000003@libero.it> <9eebf5740803060303pc473ea9m83022832f57912ba@mail.gmail.com> <47CFD91E.7080309@libero.it> <47CFDF62.80100@libero.it> Message-ID: <20080306165938.D066F3A40AC@sparrow.telecommunity.com> At 01:11 PM 3/6/2008 +0100, Manlio Perillo wrote: >Manlio Perillo ha scritto: > > [...] > > > > I'm not sure that having two standards is the best solution, since it > > will complicate the implementation of a WSGI middleware. > >A correction: it should be WSGI gateway and not WSGI middleware. On the contrary, it will simplify gateway implementation, if bridges are available. Async gateways would implement WSAI, synchronous gateways would implement WSGI. The wsgiref library could include a standard bridge or two to go in each direction (WSGI->WSAI and WSAI->WSGI), and the gateway would provide some support for spawning, pooling, or queueing of threads, where threads are needed to make the conversion from WSAI to WSGI (since in the other direction, you can simply block waiting for a callback). The APIs could be provided through some standardized environ keys defined in the WSAI spec. From graham.dumpleton at gmail.com Thu Mar 6 18:16:23 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 7 Mar 2008 04:16:23 +1100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <003601c87f9b$eb132ce0$6401a8c0@T60> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> Message-ID: <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> On 07/03/2008, Brian Smith wrote: > Manlio Perillo wrote: > > Brian Smith ha scritto: > > > Manlio Perillo wrote: > > >> Fine with me but there is a *big* problem. > > >> > > >> WSGI 2.0 "breaks" support for asynchronous applications (since you > > >> can no more send headers in the app iter). > > > > > > WSGI 1.0 doesn't guarentee that all asynchronous applications will > > > work either, because it allows the WSGI gateway to wait for > > and buffer > > > all the input from the client before even calling the > > application callable. > > > And, it doesn't provide a way to read an indefinite stream of input > > > from the client, which is also problematic. > > > > > > Anyway, please post a small example of a program that fails to work > > > because of these proposed changes for WSGI 2.0. > > > > > > Thanks, > > > Brian > > > > > > > > > Attached there are two working examples (I have not committed > > it yet, because I'm still testing - there are some problems > > that I need to solve). > > > I looked at your examples and now I understand better what you are > trying to do. I think what you are trying to do is reasonable but it > isn't something that is supported even by WSGI 1.0. It happens to work > efficiently for your particular gateway, but that isn't what WSGI is > about. In fact, any WSGI application that doesn't run correctly with an > arbitrary WSGI gateway (assuming no bugs in any gateway) isn't a WSGI > application at all. > > It seems that the problem with your examples is not that they won't work > with WSGI 2.0. Rather, the problem is that the applications block too > long. The application will still work correctly, but will not be > efficient when run in nginx's mod_wsgi. However, that isn't a problem > with the specification or with the application; it is a problem with > nginx's mod_wsgi. I hate reading about the "Pythonic way" of doing > things, but writing a WSGI application so that it doesn't block too much > or too long is simply not Pythonic. The WSGI gateway needs to abstract > away those concerns so that they aren't an issue. Otherwise, the gateway > will only be useful for specialized applications designed to run well on > that particular gateway. Such specialized applications might as well use > specialized (gateway-specific) APIs, if they have to be designed > specifically for a particular gateway anyway. > > Further, it is impossible to write a good HTTP proxy with WSGI. The > control over threading, blocking, I/O, and buffer management is just not > there in WSGI. In order to support efficient implementations of such > things, WSGI would have to become so low-level that it would become > pointless--it would be exposing an interface that is so low-level that > it wouldn't even be cross-platform. It wouldn't abstract away anything. > > At the same time, the current WSGI 2.0 proposal abstracts too much. It > is good for applications that are written directly on top of the > gateway, and for simple middleware. But, it is not appropriate for a > serious framework to be built on. It is wrong to think that the same > interface is suitable for frameworks, middleware developers, and > application developers. I would rather see WSGI 2.0 become a much > lower-level framework that works at the buffer level (not strings), with > the ability to do non-blocking reads from wsgi.input, and the ability to > let the WSGI gateway do buffering in a sane and efficient manner > (there's no reason for the application to do a bunch of string joins > when the gateway could just send all the pieces in a single writev()). > Some control over blocking, HTTP chunked encoding, etc. could be > included as well. The current suggestions for WSGI 2.0 would then just > be a sample framework layered on top of this low-level interface, for > developers that don't want to use a big framework like DJango or Pylons. > But, the big frameworks and middleware would use the low-level interface > to run efficiently. In part adding to what Brian is saying, you (Manlio) speak as if WSGI 2.0 is already somehow set in stone and because you can't do what you want, then it is no good and we should keep the WSGI 1.0 way of doing things. Like Brian is starting to think about what else WSGI 2.0 could be so as to allow other ways of doing things, why don't you try the same thing and think about how you could do what you want in a similar style to WSGI 2.0, but adapting the WSGI 2.0 interface in some way. If the changes make sense and don't deviate too far from where we have been going, maybe people might accept it. This following idea may not make much sense, but baby keeping me up, its 4am and I am probably not going to get back to sleep until I get this idea out of my head now. Anyway, WSGI 2.0 currently talks about returning a single tuple containing status, headers and iterable. What if it actually optionally allowed the response to itself be an iterable, such that you could do: yield ('102 Processing', [], None) ... yield ('102 Processing', [], None) ... yield ('200 OK', [...], [...]) I'll admit that I am not totally across what the HTTP 102 status code is meant to be used for and am sort of presuming that this might make sense. Am sure though that Brian who understands this sort of level better than me will set me straight. That said, could the return of 102 like this allow the same result as what you are currently doing with yielding empty strings prior to setting up headers? Going a bit further with this, would it make sense for an application to also be able to return a 100 to force server layer to tell client to start sending data if 100-continue expect header sent. Could it also be used in some way to allow better control over output chunking by allowing: yield ('200 OK', [...], [...]) ... yield (None, None, [...]) In other words the application could effectively yield up multiple iterables related to the actual response content. Not that all HTTP servers support it, could this be a way of allowing an application when using output chunking to specify trailer headers for after last response content chunk. yield ('200 OK', [...], [...]) ... yield (None, None, [...]) ... yield (None, [...], None) Important thing though is I am not suggesting this be the default way of doing responses, but that be an optionally available lower level layer for doing it. An application could still just return a single tuple as per WSGI 2.0 now. A good server adapter may optionally also allow this more low level interface which allows some better measure of control. Support of this low level interface could be optional, with WSGI environment used to indicate if server supports it or not. Now, this doesn't deal with request content and an alternative to current wsgi.input so that one could do the non blocking read to get back just what was available, ie. next chunk, but surely we can come up with solutions for that as well. Thus I don't see it as impossible to also handle input chunked content as well. We just need to stop thinking that what has been proposed for WSGI 2.0 so far is the full and complete interface. Okay, I feel I can go back to sleep now. You can all start laughing now if this insomnia driven idea is plain stupid. :-) Graham From manlio_perillo at libero.it Thu Mar 6 20:20:19 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Thu, 06 Mar 2008 20:20:19 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <20080306165938.D066F3A40AC@sparrow.telecommunity.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <47CFC3A1.4000003@libero.it> <9eebf5740803060303pc473ea9m83022832f57912ba@mail.gmail.com> <47CFD91E.7080309@libero.it> <47CFDF62.80100@libero.it> <20080306165938.D066F3A40AC@sparrow.telecommunity.com> Message-ID: <47D043F3.4040903@libero.it> Phillip J. Eby ha scritto: > At 01:11 PM 3/6/2008 +0100, Manlio Perillo wrote: >> Manlio Perillo ha scritto: >> > [...] >> > >> > I'm not sure that having two standards is the best solution, since it >> > will complicate the implementation of a WSGI middleware. >> >> A correction: it should be WSGI gateway and not WSGI middleware. > > On the contrary, it will simplify gateway implementation, if bridges are > available. I can confirm that implementing WSGI 2.0 is far more simple, however: 1) This is not an issue, since we already have many implementations of WSGI 1.0: wsgiref, Twisted, Apache, Nginx, flup, ... 2) If you need to implement some extensions (like file_wrapper), then the implementation is going to become more complex anyway. > Async gateways would implement WSAI, synchronous gateways > would implement WSGI. > Ok. But I see no need to "invent" a new term (WSAI): the current specification of WSGI is already good for async gateways/applications. Is it really the best solution to split WSGI 1.0 into two separate specifications? > The wsgiref library could include a standard bridge or two to go in each > direction (WSGI->WSAI and WSAI->WSGI), and the gateway would provide > some support for spawning, pooling, or queueing of threads, where > threads are needed to make the conversion from WSAI to WSGI (since in > the other direction, you can simply block waiting for a callback). If a specification explicitly requires the use of threads, then there is something bad in it :). Simply speaking: I want to avoid to use threads in Nginx. They are not supported by the server. > The > APIs could be provided through some standardized environ keys defined in > the WSAI spec. > Can you make an example? Thanks. I'm not sure to understand you architecture. Manlio Perillo From brian at briansmith.org Thu Mar 6 20:42:43 2008 From: brian at briansmith.org (Brian Smith) Date: Thu, 6 Mar 2008 11:42:43 -0800 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> Message-ID: <000c01c87fc2$4144a5a0$6401a8c0@T60> Graham Dumpleton wrote: > This following idea may not make much sense, but baby keeping > me up, its 4am and I am probably not going to get back to > sleep until I get this idea out of my head now. :) I think you need to have a serious discussion with the baby. Maybe if she got a job she wouldn't sleep all day, and she would sleep through the night. I had such a talk with my roommate a few years ago, and we got along much better after that. > Anyway, WSGI 2.0 currently talks about returning a single > tuple containing status, headers and iterable. What if it > actually optionally allowed the response to itself be an > iterable, such that you could do: > > yield ('102 Processing', [], None) > ... > yield ('102 Processing', [], None) > ... > yield ('200 OK', [...], [...]) > > I'll admit that I am not totally across what the HTTP 102 > status code is meant to be used for and am sort of presuming > that this might make sense. Am sure though that Brian who > understands this sort of level better than me will set me straight. The application should definitely be able to send as many 1xx status lines as it wants. However, I expect any yielded status line to be sent to the client, and there should be no need to include other headers or a body. I will write more about this below. That idea doesn't really benefit Manlio's programs. Manlio's program is trying to say "use my thread for some other processing until some (external) event happens." We already have standard mechanisms for doing something similar in WSGI: multi-threaded and multi-process WSGI gateways that let applications block indefinitely while letting other applications run. A polling interface like Manlio proposes does help for applications that are doing I/O via TCP(-like) protocols. But, it doesn't provide a way to wait for a database query to finish, or for any other kind of IPC to complete, unless everything is rebuilt around that polling mechanism. It isn't a general enough interface to become a part of WSGI. I think it is safe to say that multi-threaded or multi-process execution is something that is virtually required for WSGI. > Going a bit further with this, would it make sense for an > application to also be able to return a 100 to force server > layer to tell client to start sending data if 100-continue > expect header sent. The handling of 1xx status codes is inhibited by the current state of CGI and FastCGI gateways. In particular, most CGI and FastCGI modules do not provide useful support for "Expect: 100-continue"; they always send the "100 Continue" even when you don't want them to. As long as CGI and FastCGI have to be supported as gateways, the design of WSGI will not be able to change substantially from WSGI 1.0 (the current proposed changes for WSGI 2.0 are really just cosmetic except for the removal of start_response.write()). Consequently, support for 1xx status lines must be optional, so it might as well be done as a WSGI 1.0-compatible extension like this: def application(environ, start_response): def ignore(x): pass send_provisional_response = environ.get( "wsgi.send_provisional_response", ignore) ... send_provisional_response("102 Processing") ... send_provisional_response("102 Processing") > Could it also be used in some way to allow better control > over output chunking by allowing: > > yield ('200 OK', [...], [...]) > ... > yield (None, None, [...]) > > In other words the application could effectively yield up > multiple iterables related to the actual response content. Again, I like the simplification that WSGI 2.0 applications are always functions or function-like callables, and never iterables. It would be easy to create a WSGI-1.0-compatible interface for efficient batching of output strings, which could also then support buffer objects instead of just (byte)strings: def application(environ, start_response): def join_buffers(buffers): return "".join([str(b) for b in buffers]) vectorize = environ.get("wsgi.vectorize", join_buffers) return vectorize(buffers) > Not that all HTTP servers support it, could this be a way of > allowing an application when using output chunking to specify > trailer headers for after last response content chunk. The trailers feature is something I haven't thought a lot about. Again, that is something that CGI doesn't support (I don't think FastCGI supports it either). So, that is something that has to also be done in a way similar to the above: def application(environ, start_response): headers = [...] trailers = environ.get("wsgi.trailers") if trailers is None: # inefficiently calculate the trailer fields # in advance headers.append("Header-A", ...) headers.append("Header-B", ...) ... start_response("200 OK", headers) ... while ...: if trailers is not None: # calculate trailer fields as we yield # output yield output trailers.append("Header-A", ...) trailers.append("Header-B", ...) It would be nice of the specification for the trailers extension specified that the trailers list is included in the WSGI environment if and only if (1) we are talking HTTP/1.1, and (2) the gateway and web server support trailers. > Important thing though is I am not suggesting this be the > default way of doing responses, but that be an optionally > available lower level layer for doing it. An application > could still just return a single tuple as per WSGI 2.0 now. A > good server adapter may optionally also allow this more low > level interface which allows some better measure of control. > Support of this low level interface could be optional, with > WSGI environment used to indicate if server supports it or not. Right, but if these features are all optional, then they can be spec'd to work with WSGI 1.0. > Now, this doesn't deal with request content and an > alternative to current wsgi.input so that one could do the > non blocking read to get back just what was available, ie. > next chunk, but surely we can come up with solutions for that > as well. Thus I don't see it as impossible to also handle > input chunked content as well. We just need to stop thinking > that what has been proposed for WSGI 2.0 so far is the full > and complete interface. We can just say that WSGI-2.0-style applications must support chunked request bodies, but gateways are not required to support them. WSGi-2.0-style applications would have to check for CONTENT_LENGTH, and if that is missing, check to see if environ['HTTP_TRANSFER_ENCODING'] includes the "chunked" token. wsgi_input.read() would have to stop at the end of the request; applications would not restricted from attempting to read more than CONTENT_LENGTH bytes. WSGI gateways would have to support an additional (keyword?) argument to wsgi.input.read() that controls whether it is blocking or non-blocking. It seems pretty simple. Notice that all of this can be done even with WSGI 1.0, if these additional features were broken out into their own PEP(s). - Brian From manlio_perillo at libero.it Thu Mar 6 22:06:29 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Thu, 06 Mar 2008 22:06:29 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <000c01c87fc2$4144a5a0$6401a8c0@T60> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> <000c01c87fc2$4144a5a0$6401a8c0@T60> Message-ID: <47D05CD5.4000502@libero.it> Brian Smith ha scritto: > > [...] > > That idea doesn't really benefit Manlio's programs. Manlio's program is > trying to say "use my thread for some other processing until some > (external) event happens." Right. > We already have standard mechanisms for doing > something similar in WSGI: multi-threaded and multi-process WSGI > gateways that let applications block indefinitely while letting other > applications run. Ok, but this is not the best solution to the problem! > A polling interface like Manlio proposes does help for > applications that are doing I/O via TCP(-like) protocols. This is true only on Windows. > But, it > doesn't provide a way to wait for a database query to finish, or for any > other kind of IPC to complete, unless everything is rebuilt around that > polling mechanism. This is not generally true. > It isn't a general enough interface to become a part > of WSGI. I'm not proposing it to become part of WSGI (since it should not be stay here), but part of the wsgiorg "namespace", or an officially asynchronous extensions interface. > I think it is safe to say that multi-threaded or multi-process > execution is something that is virtually required for WSGI. > but only if the application is synchronous and heavy I/O bound. Note that Nginx is multi-process, but it only executes a fixed number of worker processes, so if an I/O request can block for a significative amount of time, you can not afford to let it block. Moreover with an asynchronous gateway it is possible to implement a "middleware" that can execute an application inside a thread. This is possible by creating a pipe, starting a new thread, having the main thread polling the pipe, and having the thread write some data in the pipe to "wake" the main thread when finish its job. I'm going to write a sample implementation when I find some time. Yes, we need to use a thread, but this can be done in pure Python code only (altought I'm not sure if this can have side effects on Nginx). > [...] > > Again, I like the simplification that WSGI 2.0 applications are always > functions or function-like callables, and never iterables. Where is the simplification? > > The trailers feature is something I haven't thought a lot about. Again, > that is something that CGI doesn't support (I don't think FastCGI > supports it either). So, that is something that has to also be done in a > way similar to the above: > > def application(environ, start_response): > headers = [...] > trailers = environ.get("wsgi.trailers") > if trailers is None: > # inefficiently calculate the trailer fields > # in advance > headers.append("Header-A", ...) > headers.append("Header-B", ...) > ... > start_response("200 OK", headers) > ... > while ...: > if trailers is not None: > # calculate trailer fields as we yield > # output > yield output > > trailers.append("Header-A", ...) > trailers.append("Header-B", ...) > > It would be nice of the specification for the trailers extension > specified that the trailers list is included in the WSGI environment if > and only if (1) we are talking HTTP/1.1, and (2) the gateway and web > server support trailers. > This is an interesting idea. Unfortunately right now Nginx does not supports trailing headers, and I don't know if common browsers support them. > [...] >> Now, this doesn't deal with request content and an >> alternative to current wsgi.input so that one could do the >> non blocking read to get back just what was available, ie. >> next chunk, but surely we can come up with solutions for that >> as well. Thus I don't see it as impossible to also handle >> input chunked content as well. We just need to stop thinking >> that what has been proposed for WSGI 2.0 so far is the full >> and complete interface. > > We can just say that WSGI-2.0-style applications must support chunked > request bodies, but gateways are not required to support them. > WSGi-2.0-style applications would have to check for CONTENT_LENGTH, and > if that is missing, check to see if environ['HTTP_TRANSFER_ENCODING'] > includes the "chunked" token. wsgi_input.read() would have to stop at > the end of the request; applications would not restricted from > attempting to read more than CONTENT_LENGTH bytes. > > WSGI gateways would have to support an additional (keyword?) argument to > wsgi.input.read() that controls whether it is blocking or non-blocking. > It seems pretty simple. > How should be written an application to use this feature? > Notice that all of this can be done even with WSGI 1.0, if these > additional features were broken out into their own PEP(s). > > - Brian > Manlio Perillo From graham.dumpleton at gmail.com Thu Mar 6 23:46:12 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 7 Mar 2008 09:46:12 +1100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <000c01c87fc2$4144a5a0$6401a8c0@T60> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> <000c01c87fc2$4144a5a0$6401a8c0@T60> Message-ID: <88e286470803061446t6f67a9c1u1c1269416beddb01@mail.gmail.com> On 07/03/2008, Brian Smith wrote: > Graham Dumpleton wrote: > > Anyway, WSGI 2.0 currently talks about returning a single > > tuple containing status, headers and iterable. What if it > > actually optionally allowed the response to itself be an > > iterable, such that you could do: > > > > yield ('102 Processing', [], None) > > ... > > yield ('102 Processing', [], None) > > ... > > yield ('200 OK', [...], [...]) > > > > I'll admit that I am not totally across what the HTTP 102 > > status code is meant to be used for and am sort of presuming > > that this might make sense. Am sure though that Brian who > > understands this sort of level better than me will set me straight. > > That idea doesn't really benefit Manlio's programs. Manlio's program is > trying to say "use my thread for some other processing until some > (external) event happens. Okay, like some of our protocol discussions before, you possibly don't see what I see in how what I suggested could be used. ;-) Anyway, I am not the one battling for this, so will not try and explain it further. Or I'll leave it to my sleep deprived hours in the middle of the night. :-) Graham From graham.dumpleton at gmail.com Fri Mar 7 00:13:46 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 7 Mar 2008 10:13:46 +1100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47CFC8CE.8040208@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <20080306020523.2802E3A40D8@sparrow.telecommunity.com> <47CFC8CE.8040208@libero.it> Message-ID: <88e286470803061513g61771453s37ebb32c0ebb6275@mail.gmail.com> On 06/03/2008, Manlio Perillo wrote: > But I have to say that: > > 1) the asynchronous model is the "right" model to use to develope > robust and scalable applications (expecially in Python). No it isn't. It is one model, it is not necessarily the 'right' model. The asynchronous model actually has worse drawbacks than the GIL problem when multithreading is used and you have a multi core or multi process system. This is because in an asynchronous system with only a single thread, it is theoretically impossible to use more than one processor at a time. Even with the Python GIL as a contention point, threads in C extension modules can at least release the GIL and perform work in parallel and so theoretically the process can consume the resources of more than one core or processor at a time. The whole nature of web applications where requests perform small amounts of work and then complete actually simplifies the use of multithreading. This is because unlike complex applications where there are long running activities occurring in different threads there is no real need for the threads handling different requests to communicate with each other. Thus the main problem is merely protecting concurrent access to shared resources. Even that is not so bad as each request handler is mostly operating on data specific to the context of that request rather than shared data. Thus, whether one uses multithreading or an event driven system, one can't but avoid use of multiple processes to build a really good scalable system. This is where nginx limits itself a bit as the number of worker processes is fixed, whereas with Apache it can create additional worker processes if demand requires and reap them when no longer required. You can therefore with Apache factor in some slack to cope with bursts in demand and it will scale up the number of processes as necessary. With nginx you have to have a really good idea in advance of what sort of maximum load you will need to handle as you need to fix the number of worker processes. For static file serving the use of an event driven system may make this easier, but factor in a Python web application where each request has a much greater overhead and possibility of blocking and it becomes a much tricker proposition to plan how many worker processes you may need. No matter what technology one uses there will be such trade offs and they will vary depending on what you are doing. Thus it is going to be very rare that one technology is always the "right" technology. Also, as much as people like to focus on raw performance of the web server for hosting Python web applications, in general the actual performance matters very little in the greater scheme of things (unless your stupid enough to use CGI). This is because that isn't where the bottlenecks are generally going to be. Thus, that one hosting solution may for a hello world program be three times faster than another, means absolutely nothing if that ends up translating to less than 1 percent throughput when someone loads and runs their mega Python application. This is especially the case when the volume of traffic the application receives never goes any where near fully utilising the actual resources available. For large systems, you would never even depend on one machine anyway and load balance across a cluster. Thus the focus by many on raw speed in many cases is just plain ridiculous as there is a lot more to it than that. Graham From graham.dumpleton at gmail.com Fri Mar 7 00:25:08 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 7 Mar 2008 10:25:08 +1100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47D05CD5.4000502@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> <000c01c87fc2$4144a5a0$6401a8c0@T60> <47D05CD5.4000502@libero.it> Message-ID: <88e286470803061525p31e5a1a9p18d18ffe93319403@mail.gmail.com> On 07/03/2008, Manlio Perillo wrote: > Moreover with an asynchronous gateway it is possible to implement a > "middleware" that can execute an application inside a thread. > > This is possible by creating a pipe, starting a new thread, having the > main thread polling the pipe, and having the thread write some data in > the pipe to "wake" the main thread when finish its job. > > I'm going to write a sample implementation when I find some time. > > Yes, we need to use a thread, but this can be done in pure Python code > only (altought I'm not sure if this can have side effects on Nginx). So you do understand this technique of using a socketpair() pipe as a way of communicating between code which is thread safe and other code which is potentially non thread safe. This makes moot your prior point that they (threads) are not supported by the server and thus you want to avoid using them. In other words, as I have pointed out previously, in practice it would be possible to implement a thread pool mechanism on top of nginx such that you could avoid this whole problem of the asynchronous model at the WSGI level. I still don't understand why you are so resistant to going this path given that for Python web applications, the event driven model doesn't necessarily provide any benefits when one looks at the bigger picture and perhaps just makes it harder to implement application code. If you want to pursue an even driven model because you find it an interesting area to work in then fine, but you shouldn't expect everyone else to try and accommodate that way of thinking when people are happy with the alternative. Graham From brian at briansmith.org Fri Mar 7 01:29:07 2008 From: brian at briansmith.org (Brian Smith) Date: Thu, 6 Mar 2008 16:29:07 -0800 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47D05CD5.4000502@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> <000c01c87fc2$4144a5a0$6401a8c0@T60> <47D05CD5.4000502@libero.it> Message-ID: <004001c87fea$42340780$6401a8c0@T60> Manlio Perillo wrote: > Brian Smith ha scritto: > > We already have standard mechanisms for doing something > > similar in WSGI: multi-threaded and multi-process WSGI > > gateways that let applications block indefinitely while > > letting other applications run. > > Ok, but this is not the best solution to the problem! Why not? > > I think it is safe to say that multi-threaded or multi-process > > execution is something that is virtually required for WSGI. > > but only if the application is synchronous and heavy I/O bound. Isn't that almost every WSGI application? > Note that Nginx is multi-process, but it only executes a > fixed number of worker processes, so if an I/O request can > block for a significative amount of time, you can not afford > to let it block. Can't you just increase the number of processes? > Moreover with an asynchronous gateway it is possible to > implement a "middleware" that can execute an application > inside a thread. > > This is possible by creating a pipe, starting a new thread, > having the main thread polling the pipe, and having the > thread write some data in the pipe to "wake" the main thread > when finish its job. Right. This is exactly what I was saying. By using multiprocessing/multithreading, each application can block as much as it wants. > > Again, I like the simplification that WSGI 2.0 applications > > are always functions or function-like callables, and never > > iterables. > > Where is the simplification? My understanding is that the application callable never returns an interator (it never yields, it only returns). This is simpler to explain to people that are new to WSGI. It also simplifies the language in the specification. The difference is basically immaterial to WSGI gateway implementers, but that is because the WSGI specification is biased towards making gateways simple to implement. > Unfortunately right now Nginx does not supports trailing > headers, and I don't know if common browsers support them. Right, trailers are not really that useful right now. Too many applications expect to get all header fields first, and most people don't even know about trailers in the first place. > > We can just say that WSGI-2.0-style applications must > > support chunked request bodies, but gateways are not > > required to support them. > > WSGi-2.0-style applications would have to check for > > CONTENT_LENGTH, and if that is missing, check to see if > > environ['HTTP_TRANSFER_ENCODING'] includes the "chunked" > > token. wsgi_input.read() would have to stop at the end > > of the request; applications would not restricted from > > attempting to read more than CONTENT_LENGTH bytes. > > > > WSGI gateways would have to support an additional > > (keyword?) argument to wsgi.input.read() that > > controls whether it is blocking or non-blocking. > > It seems pretty simple. > > How should be written an application to use this feature? For chunked request bodies: instead of reading until exactly CONTENT_LENGTH bytes have been read, keep reading until environ["wsgi.input"].read(chunk_size) returns "". For "non-blocking reads", given environ["wsgi.input"].read(64000, min=8000): 1. If more than 64000 bytes are available without blocking, 8192 bytes are returned. 2. If less than 8000 bytes are available without blocking, then the gateway blocks until at least 1024 bytes are available. 3. When 8000-63999 bytes are available, then all those bytes are returned. The non-blocking behavior is useful when the application can process arbitrary chunks of input without having all the input available. For example, if you are transcoding a POSTed video, you probably can transcode the video with arbitrarily-sized chunks of input. If you already have 32K of input available, you don't really need to wait around for 32K more input before you start processing. But, if you have 64K of input ready to process, then you might as well process all of it at once. My understanding is that nginx completely buffers all input, so that all reads from wsgi.input are basically non-blocking. - Brian From manlio_perillo at libero.it Fri Mar 7 10:16:39 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 07 Mar 2008 10:16:39 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <88e286470803061513g61771453s37ebb32c0ebb6275@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <20080306020523.2802E3A40D8@sparrow.telecommunity.com> <47CFC8CE.8040208@libero.it> <88e286470803061513g61771453s37ebb32c0ebb6275@mail.gmail.com> Message-ID: <47D107F7.4020102@libero.it> Graham Dumpleton ha scritto: > On 06/03/2008, Manlio Perillo wrote: >> But I have to say that: >> >> 1) the asynchronous model is the "right" model to use to develope >> robust and scalable applications (expecially in Python). > > No it isn't. It is one model, it is not necessarily the 'right' model. > Ok. > The asynchronous model actually has worse drawbacks than the GIL > problem when multithreading is used and you have a multi core or multi > process system. This is because in an asynchronous system with only a > single thread, it is theoretically impossible to use more than one > processor at a time. This is the reason why I'm using Nginx instead of Twisted. > Even with the Python GIL as a contention point, > threads in C extension modules can at least release the GIL and > perform work in parallel and so theoretically the process can consume > the resources of more than one core or processor at a time. > > The whole nature of web applications where requests perform small > amounts of work and then complete actually simplifies the use of > multithreading. Yes, this is true most of the time. But the reason I have finally added the poll extension in my WSGI implementation for Nginx is that I have some requests that *do not* take small amounts of work to be served. Database queries, as an example, are not a problem if executed synchronously, since Nginx has multiple worker processes, and the environment is "controlled" (that is, I can optimize the query/database, the connection is on the localhost or on a LAN, and so on). > This is because unlike complex applications where > there are long running activities occurring in different threads there > is no real need for the threads handling different requests to > communicate with each other. Thus the main problem is merely > protecting concurrent access to shared resources. Even that is not so > bad as each request handler is mostly operating on data specific to > the context of that request rather than shared data. > Again, this is true. However the problem is that multithreaded servers usually does not scales well as asynchronous one. http://blog.emmettshear.com/post/2008/03/03/Dont-use-Pound-for-load-balancing Of course this is special case, a server that is mostly I/O bound. > Thus, whether one uses multithreading or an event driven system, one > can't but avoid use of multiple processes to build a really good > scalable system. This is where nginx limits itself a bit as the number > of worker processes is fixed, whereas with Apache it can create > additional worker processes if demand requires and reap them when no > longer required. Right. But this is a subject that needs more discussion (and I suspect that we are going off topic). Is it true that Apache can spawn additional processes, but (again, when the request is mainly I/O bound) each process does very little work *but* using not little amount of system resources. Nginx instead use a fixed (and small) number of processes, but each process is used at 100%. Apache model is great when you need to run generic embedded applications. I think that Nginx is great for serving static content, proxing, and serving embedded application that are written with the asynchronous nature of Nginx in mind. > You can therefore with Apache factor in some slack to > cope with bursts in demand and it will scale up the number of > processes as necessary. With nginx you have to have a really good idea > in advance of what sort of maximum load you will need to handle as you > need to fix the number of worker processes. Right. > For static file serving > the use of an event driven system may make this easier, By the way, I know there is an event based worker in Apache. Have you exterience with it? > but factor in > a Python web application where each request has a much greater > overhead and possibility of blocking and it becomes a much tricker > proposition to plan how many worker processes you may need. > Right. > No matter what technology one uses there will be such trade offs and > they will vary depending on what you are doing. Thus it is going to be > very rare that one technology is always the "right" technology. Also, > as much as people like to focus on raw performance of the web server > for hosting Python web applications, in general the actual performance > matters very little in the greater scheme of things (unless your > stupid enough to use CGI). This is because that isn't where the > bottlenecks are generally going to be. Thus, that one hosting solution > may for a hello world program be three times faster than another, > means absolutely nothing if that ends up translating to less than 1 > percent throughput when someone loads and runs their mega Python > application. This is especially the case when the volume of traffic > the application receives never goes any where near fully utilising the > actual resources available. For large systems, you would never even > depend on one machine anyway and load balance across a cluster. Thus > the focus by many on raw speed in many cases is just plain ridiculous > as there is a lot more to it than that. > There is not only the problem on raw speed. There is also a problem of server resources usage. As an example, an Italian hosting company poses strict limits on resource usage for each client. They do not use Apache, since they fear that serving embedded applications limits their control (but, if I'm not wrong, you have implemented a solution for this problem in mod_wsgi). Using Nginx + the wsgi module has the benefit to require less system resources than flup (as an example) and, probabily, Apache. > Graham > Manlio Perillo From manlio_perillo at libero.it Fri Mar 7 10:48:27 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 07 Mar 2008 10:48:27 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <003601c87f9b$eb132ce0$6401a8c0@T60> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> Message-ID: <47D10F6B.8000003@libero.it> Brian Smith ha scritto: > Manlio Perillo wrote: >> Brian Smith ha scritto: >>> Manlio Perillo wrote: >>>> Fine with me but there is a *big* problem. >>>> >>>> WSGI 2.0 "breaks" support for asynchronous applications (since you >>>> can no more send headers in the app iter). >>> WSGI 1.0 doesn't guarentee that all asynchronous applications will >>> work either, because it allows the WSGI gateway to wait for >> and buffer >>> all the input from the client before even calling the >> application callable. >>> And, it doesn't provide a way to read an indefinite stream of input >>> from the client, which is also problematic. >>> >>> Anyway, please post a small example of a program that fails to work >>> because of these proposed changes for WSGI 2.0. >>> >>> Thanks, >>> Brian >>> >> >> Attached there are two working examples (I have not committed >> it yet, because I'm still testing - there are some problems >> that I need to solve). > > I looked at your examples and now I understand better what you are > trying to do. I think what you are trying to do is reasonable but it > isn't something that is supported even by WSGI 1.0. It happens to work > efficiently for your particular gateway, but that isn't what WSGI is > about. In fact, any WSGI application that doesn't run correctly with an > arbitrary WSGI gateway (assuming no bugs in any gateway) isn't a WSGI > application at all. > No, this is not true. First of all, this extension should be easy to implement for any WSGI implementation (maybe even with a middleware? I have to check). Lastly, truly portability is a complex topic. The WSGI spec allows the implementation of extensions. Of course if an application uses an extension it is no more portable; maybe it should check the presence of the extension and execute an alternative code if it is not available. This is possible in the example I have posted. I like to think about WSGI the same way as OpenGL or SQL. There are well established standards, but an application *should* be allowed to use specialized extension. > It seems that the problem with your examples is not that they won't work > with WSGI 2.0. Rather, the problem is that the applications block too > long. The application will still work correctly, but will not be > efficient when run in nginx's mod_wsgi. However, that isn't a problem > with the specification or with the application; it is a problem with > nginx's mod_wsgi. No. It's a problem for every server, even for Apache. Apache, as an example, can spawn additional processes; but there is a limit. What happens if 500+ concurrent requests run the blocking code? If you do not set a limit of child processes in Apache, the system will very probably "die". If you set a limit, than some requests will have to wait. Writing an asynchronous client is really the most sensate solution of this problem. Of course, again, the application should work with any WSGI implementation. But the solution *is not* to write "generic" code. The solution is to write specialized code, and to write a version of the code for each of the possible server architecture (multithread/multiprocess/CGI, asynchronous). This is where it is important to standardize an interface for asynchronous extensions. If in future a new asynchronous WSGI implementation will be developed, I would like to use the same interface, so I will not have to write yet another specialized version of my code. > I hate reading about the "Pythonic way" of doing > things, but writing a WSGI application so that it doesn't block too much > or too long is simply not Pythonic. Sorry, but this is absurd ;-). I need to talk with a web service on Internet: I have *no* control on this. The solution is to not have to use a web service, but this, again, is not under my control. In general, however, I agree. A web application should be written in the most efficient way; this is the reason why I try to avoid to use object relationals mappers, as an example. > The WSGI gateway needs to abstract > away those concerns so that they aren't an issue. What concerns? > Otherwise, the gateway > will only be useful for specialized applications designed to run well on > that particular gateway. Such specialized applications might as well use > specialized (gateway-specific) APIs, if they have to be designed > specifically for a particular gateway anyway. > > Further, it is impossible to write a good HTTP proxy with WSGI. The > control over threading, blocking, I/O, and buffer management is just not > there in WSGI. No. With WSGI this is possible. > In order to support efficient implementations of such > things, WSGI would have to become so low-level that it would become > pointless--it would be exposing an interface that is so low-level that > it wouldn't even be cross-platform. It wouldn't abstract away anything. > > At the same time, the current WSGI 2.0 proposal abstracts too much. It > is good for applications that are written directly on top of the > gateway, and for simple middleware. But, it is not appropriate for a > serious framework to be built on. It is wrong to think that the same > interface is suitable for frameworks, middleware developers, and > application developers. I would rather see WSGI 2.0 become a much > lower-level framework that works at the buffer level (not strings), with > the ability to do non-blocking reads from wsgi.input, and the ability to > let the WSGI gateway do buffering in a sane and efficient manner > (there's no reason for the application to do a bunch of string joins > when the gateway could just send all the pieces in a single writev()). I agree. The WSGI 1.0 spec disallows a WSGI implementation to do buffering, but I think that it should allow it. The WSGI implementation for Nginx already do this, when enabling an option (disabled as default). Nginx will use writev. > Some control over blocking, HTTP chunked encoding, etc. could be > included as well. The current suggestions for WSGI 2.0 would then just > be a sample framework layered on top of this low-level interface, for > developers that don't want to use a big framework like DJango or Pylons. +1. This is what I would like to have. A WSGI 1.1 spec, based on WSGI 1.0 with some corrections. And a simplified interface for people who want to use it. > But, the big frameworks and middleware would use the low-level interface > to run efficiently. > > - Brian > Manlio Perillo From manlio_perillo at libero.it Fri Mar 7 10:54:48 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 07 Mar 2008 10:54:48 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> Message-ID: <47D110E8.1020804@libero.it> Graham Dumpleton ha scritto: > [...] > > In part adding to what Brian is saying, you (Manlio) speak as if WSGI > 2.0 is already somehow set in stone Well, Philip J. Eby explicitly said that WSGI 2.0 exists only for removing the use of start_response... So I assume that it is already set in stone. > and because you can't do what you > want, then it is no good and we should keep the WSGI 1.0 way of doing > things. > > Like Brian is starting to think about what else WSGI 2.0 could be so > as to allow other ways of doing things, why don't you try the same > thing and think about how you could do what you want in a similar > style to WSGI 2.0, but adapting the WSGI 2.0 interface in some way. If > the changes make sense and don't deviate too far from where we have > been going, maybe people might accept it. > I have tried to figure out how to implement an asynchronous application with WSGI 2.0, but the results are not good: def application(environ, start_response): def app_iter() c = Connection(...) r = c.request(...) for block in r: yield block data = r.get_response() environ['start_response']( '200 OK', [('Content-Type', ('text/plain')]) yield data return '', [], app_iter > [...] Manlio Perillo From manlio_perillo at libero.it Fri Mar 7 11:11:01 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 07 Mar 2008 11:11:01 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <88e286470803061525p31e5a1a9p18d18ffe93319403@mail.gmail.com> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> <000c01c87fc2$4144a5a0$6401a8c0@T60> <47D05CD5.4000502@libero.it> <88e286470803061525p31e5a1a9p18d18ffe93319403@mail.gmail.com> Message-ID: <47D114B5.9050603@libero.it> Graham Dumpleton ha scritto: > On 07/03/2008, Manlio Perillo wrote: >> Moreover with an asynchronous gateway it is possible to implement a >> "middleware" that can execute an application inside a thread. >> >> This is possible by creating a pipe, starting a new thread, having the >> main thread polling the pipe, and having the thread write some data in >> the pipe to "wake" the main thread when finish its job. >> >> I'm going to write a sample implementation when I find some time. >> >> Yes, we need to use a thread, but this can be done in pure Python code >> only (altought I'm not sure if this can have side effects on Nginx). > > So you do understand this technique of using a socketpair() pipe as a > way of communicating between code which is thread safe and other code > which is potentially non thread safe. Right. > This makes moot your prior point > that they (threads) are not supported by the server and thus you want > to avoid using them. > Not really true ;-). Threads are still not supported by Nginx. This means that using threads in an application embedded in Nginx can cause who knows what problems (ok, probabily it will *not* cause any problems). Moreover, I'm not sure that such a *middleware* will be a full WSGI 1.0 conforming middleware. > In other words, as I have pointed out previously, in practice it would > be possible to implement a thread pool mechanism on top of nginx such > that you could avoid this whole problem of the asynchronous model at > the WSGI level. > No, this does not solves the problem. The number of threads I can create is limited, so I can serve only a limited number of concurrent requests. The asynchronous solution is more optimized. > I still don't understand why you are so resistant to going this path > given that for Python web applications, the event driven model doesn't > necessarily provide any benefits when one looks at the bigger picture > and perhaps just makes it harder to implement application code. > The event drive model *does* provide benefits for my problem. And I'm not looking at the bigger picture here. I'm looking at a HTTP resource that needs to execute an HTTP request to an external web application. > If you want to pursue an even driven model because you find it an > interesting area to work in then fine, but you shouldn't expect > everyone else to try and accommodate that way of thinking when people > are happy with the alternative. > The problem here is that I'm just pointing out that the *current* WSGI 1.0 *supports* asynchronous applications. Until now nobody else have implemented asynchronous applications on top of WSGI, and so Philip J. Eby have decided to getting rid of the asynchronous support, for the sake of having a simplified implementation. I'm only saying: "hey, wait. Actually WSGI 1.0 *can* really be used for writing asynchronous applications, here is a *working* and not pure academic example". > Graham > Manlio Perillo From graham.dumpleton at gmail.com Fri Mar 7 11:15:59 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 7 Mar 2008 21:15:59 +1100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <47D107F7.4020102@libero.it> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <88e286470803051437w42e7715aof2e9853ef34d2ebb@mail.gmail.com> <20080306020523.2802E3A40D8@sparrow.telecommunity.com> <47CFC8CE.8040208@libero.it> <88e286470803061513g61771453s37ebb32c0ebb6275@mail.gmail.com> <47D107F7.4020102@libero.it> Message-ID: <88e286470803070215n79857e6cn8fdcce1eff6965be@mail.gmail.com> On 07/03/2008, Manlio Perillo wrote: > Is it true that Apache can spawn additional processes, Yes, for prefork and worker MPM, but not winnt on Windows. See for example details for worker MPM in: http://httpd.apache.org/docs/2.2/mod/worker.html > By the way, I know there is an event based worker in Apache. > Have you exterience with it? No, haven't used it. It isn't an event driven system like you know it. It still uses threads like worker MPM. The difference as I understand it is that it dedicates a single thread to managing client socket connections maintained due to keep alive, rather than a whole thread being tied up for each such connection. So, it is just an improvement over worker and does not implement a full event driven system. > > No matter what technology one uses there will be such trade offs and > > they will vary depending on what you are doing. Thus it is going to be > > very rare that one technology is always the "right" technology. Also, > > as much as people like to focus on raw performance of the web server > > for hosting Python web applications, in general the actual performance > > matters very little in the greater scheme of things (unless your > > stupid enough to use CGI). This is because that isn't where the > > bottlenecks are generally going to be. Thus, that one hosting solution > > may for a hello world program be three times faster than another, > > means absolutely nothing if that ends up translating to less than 1 > > percent throughput when someone loads and runs their mega Python > > application. This is especially the case when the volume of traffic > > the application receives never goes any where near fully utilising the > > actual resources available. For large systems, you would never even > > depend on one machine anyway and load balance across a cluster. Thus > > the focus by many on raw speed in many cases is just plain ridiculous > > as there is a lot more to it than that. > > There is not only the problem on raw speed. > There is also a problem of server resources usage. > > As an example, an Italian hosting company poses strict limits on > resource usage for each client. As would any sane web hosting company. > They do not use Apache, since they fear that serving embedded > applications limits their control If they believe that embedded solutions like mod_python are the only things available for Apache, then I can understand that. There are other solutions though such as fastcgi and mod_wsgi daemon mode, so it isn't as necessarily as unmanageable as they may believe. They perhaps just don't know what options are available, don't understand the technology well or how to manage it. I do admit though it would be harder when it isn't your own application and you are hosting stuff written by a third party. > Using Nginx + the wsgi module has the benefit to require less system > resources than flup (as an example) and, probabily, Apache. Memory usage is also relative, just like network performance. Configure Apache correctly and don't load modules you don't need and the base overhead of Apache can be reduced quite a lot. For a big system heavy on media using a separate media server such as nginx or lighttpd can be sensible. One can then turn off keep alive on Apache for the dynamic Python web application since keep alive doesn't necessarily help there and will cause the sorts of issues the event MPM attempts to solve. So, its manageable and there are known steps one can take. The real memory usage comes when someone loads up a Python web application which requires 80-100MB per process at the outset before much has even happened. Just because you are using another web hosting solution, be it nginx or even a Python based web server, this will not change that the Python web application is chewing up so much memory. The one area where memory usage can be a problem with Python web applications and which is not necessarily understood well by most people, is the risk of concurrent requests causing a sudden burst in memory usage. Imagine a specific URL which needs a large amount of transient memory, for example something which is generating PDFs using reportlab and PIL. All is okay if the URL only gets hit by one request at a time, but if multiple requests hit at the same time, then your memory blows out considerably as each request needs the large amount of transient memory at the same time and once allocated it will be retained by the process. So, if one was using worker MPM to keep down the number of overall processes and memory usage, you run the risk of this sort of problem occurring. One could stop it occurring by implementing throttling in the application, that is put locking on specific URLs which consumed lots of transient memory to restrict number of concurrent requests, but frankly I have never actually ever heard of anyone actually doing it. The alternative is to use prefork MPM, or similar model, such that there can only be one active request in the process at a time. But then you need more processes to handle the same number of requests, so overall memory usage is high again. For large sites however, which can afford lots of memory, using prefork would be the better way to go as it will at least limit the possibilities of individual processes spiking memory usage unexpectedly, with memory usage being more predictable. That all said, just because you aren't using threads and are handling concurrency using an event driven system approach will not necessarily isolate you from this specific problem. All in all it can be a tough problem. If your web application demands are relatively simple then it may never be an issue, but people are trying to do more and more within the web application itself, rather than delegating it to separate back end systems or programs. At the same time they want to use cheap memory constrained VPS systems. So, lots of fun. :-) Graham From manlio_perillo at libero.it Fri Mar 7 11:30:29 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 07 Mar 2008 11:30:29 +0100 Subject: [Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?) In-Reply-To: <004001c87fea$42340780$6401a8c0@T60> References: <1fed3a88-f7cb-4779-953d-c18803cb393c@e60g2000hsh.googlegroups.com> <47CE5C3E.5030609@libero.it> <002501c87ec4$4b68ac60$6401a8c0@T60> <47CECC5C.1010505@libero.it> <003601c87f9b$eb132ce0$6401a8c0@T60> <88e286470803060916y328ccfa8n2e8b5d0ab3830a2f@mail.gmail.com> <000c01c87fc2$4144a5a0$6401a8c0@T60> <47D05CD5.4000502@libero.it> <004001c87fea$42340780$6401a8c0@T60> Message-ID: <47D11945.6080203@libero.it> Brian Smith ha scritto: > Manlio Perillo wrote: >> Brian Smith ha scritto: >>> We already have standard mechanisms for doing something >>> similar in WSGI: multi-threaded and multi-process WSGI >>> gateways that let applications block indefinitely while >>> letting other applications run. >> Ok, but this is not the best solution to the problem! > > Why not? > >>> I think it is safe to say that multi-threaded or multi-process >>> execution is something that is virtually required for WSGI. >> but only if the application is synchronous and heavy I/O bound. > > Isn't that almost every WSGI application? > I'm not sure that a generic application that uses a database can be considered *heavy* I/O bound. Compare, as an example, a query to a database that can take up to 0.2 seconds with an HTTP request to a web service that can take up to 2 seconds. >> Note that Nginx is multi-process, but it only executes a >> fixed number of worker processes, so if an I/O request can >> block for a significative amount of time, you can not afford >> to let it block. > > Can't you just increase the number of processes? > Yes, but you should agree withe me that the asynchronous solution is more optimized. Moreover my application needs to run in a shared hosting, where there is a limit on the mumber of processes an user can execute. I can not run too many worker processes. >> Moreover with an asynchronous gateway it is possible to >> implement a "middleware" that can execute an application >> inside a thread. >> >> This is possible by creating a pipe, starting a new thread, >> having the main thread polling the pipe, and having the >> thread write some data in the pipe to "wake" the main thread >> when finish its job. > > Right. This is exactly what I was saying. By using > multiprocessing/multithreading, each application can block as much as it > wants. > Ok, but the middleware *needs* the poll extension :). So the best solution, IMHO, is to implement the WSGI 1.0 spec for Nginx, and then implement a pure Python middleware/adapter that will execute a WSGI 2.0 application in a thread. However if some corrections are going to be implemented in WSGI 2.0, I would like to have them "backported" to WSGI 1.1, as an example. >>> Again, I like the simplification that WSGI 2.0 applications >>> are always functions or function-like callables, and never >>> iterables. >> Where is the simplification? > > My understanding is that the application callable never returns an > interator (it never yields, it only returns). This is simpler to explain > to people that are new to WSGI. This is indeed true. I too found some problems when I first read the WSGI specification. *However* now it seems to me the most natural API. It only needs some practice. > It also simplifies the language in the > specification. The difference is basically immaterial to WSGI gateway > implementers, but that is because the WSGI specification is biased > towards making gateways simple to implement. > No, it also make it simpler to implement. > [...] >>> We can just say that WSGI-2.0-style applications must >>> support chunked request bodies, but gateways are not >>> required to support them. >>> WSGi-2.0-style applications would have to check for >>> CONTENT_LENGTH, and if that is missing, check to see if >>> environ['HTTP_TRANSFER_ENCODING'] includes the "chunked" >>> token. wsgi_input.read() would have to stop at the end >>> of the request; applications would not restricted from >>> attempting to read more than CONTENT_LENGTH bytes. >>> >>> WSGI gateways would have to support an additional >>> (keyword?) argument to wsgi.input.read() that >>> controls whether it is blocking or non-blocking. >>> It seems pretty simple. >> How should be written an application to use this feature? > > For chunked request bodies: instead of reading until exactly > CONTENT_LENGTH bytes have been read, keep reading until > environ["wsgi.input"].read(chunk_size) returns "". > > For "non-blocking reads", given environ["wsgi.input"].read(64000, > min=8000): > > 1. If more than 64000 bytes are available without blocking, 8192 bytes > are returned. > 2. If less than 8000 bytes are available without blocking, then the > gateway blocks until at least 1024 bytes are available. > 3. When 8000-63999 bytes are available, then all those bytes are > returned. > Ok. > [...] > > My understanding is that nginx completely buffers all input, so that all > reads from wsgi.input are basically non-blocking. > Right. This makes my life easier, since I can just use a cStringIO of File object :). However in future the Nginx author is planning to add support for input filters and chunked request bodies. At that time, I will implement an extension that will allow a non blocking (asynchronous) reading from wsgi.input. > - Brian > Manlio Perillo From fumanchu at aminus.org Sat Mar 8 21:58:12 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Sat, 8 Mar 2008 12:58:12 -0800 Subject: [Web-SIG] Web Dev Pad Message-ID: Hello, all you Python web tool developers! Like last year, Chad Whitacre (author of Aspen) and I (lead dev of CherryPy) are going to get a suite and run the Web Dev Pad again. Come on by any evening from Thursday night to Sunday night--we'll be up late serving the three M's (mudslides, margaritas, and martinis) and plenty of camaraderie in our living room at the Marriott Renaissance [1]. It's a few blocks from the conference hotel but well worth the trip; try to call if it's before 8pm to make sure we're not both out to dinner. We're open as late as you like to anyone who works on web libraries, servers, or frameworks (or their friends; don't let domain boundaries stop you ;). Robert Brewer fumanchu at aminus.org 619 374 1117 [1] http://www.marriott.com/hotels/travel/chibr-renaissance-chicago-ohare-su ites-hotel/ From pywebsig at xhaus.com Mon Mar 10 13:21:57 2008 From: pywebsig at xhaus.com (Alan Kennedy) Date: Mon, 10 Mar 2008 12:21:57 +0000 Subject: [Web-SIG] Time a for JSON parser in the standard library? Message-ID: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> Dear all, Given that 1. Python comes with "batteries included" 2. There is a standard library re-org happening because of Py3K 3. JSON is now a very commonly used format on the web Is it time there was a JSON codec included in the python standard library? (If XML is already supported, I see no reason why JSON shouldn't be) Or is it best to make users who want to use JSON go and research all of the different options available to them? Choosing a Python JSON Translator http://blog.hill-street.net/?p=7 Just a thought. Regards, Alan. From mark.mchristensen at gmail.com Mon Mar 10 14:37:35 2008 From: mark.mchristensen at gmail.com (Mark Ramm) Date: Mon, 10 Mar 2008 09:37:35 -0400 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> Message-ID: > Is it time there was a JSON codec included in the python standard library? I would definitely support the incusion of a JSON library in the standard lib. And, I think that it should be simplejson which is used by TurboGears, Pylons, and bundled with Django. > Choosing a Python JSON Translator > http://blog.hill-street.net/?p=7 This blog is a year old, and isn't quite accurate anymore. in particular simplejson sprouted some optional C code, and is now a lot faster. --Mark Ramm From jodok at lovelysystems.com Mon Mar 10 15:26:05 2008 From: jodok at lovelysystems.com (Jodok Batlogg) Date: Mon, 10 Mar 2008 15:26:05 +0100 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> Message-ID: On 10.03.2008, at 14:37, Mark Ramm wrote: >> Is it time there was a JSON codec included in the python standard >> library? > > I would definitely support the incusion of a JSON library in the > standard lib. And, I think that it should be simplejson which is > used by TurboGears, Pylons, and bundled with Django. +1 for simplejson > > >> Choosing a Python JSON Translator >> http://blog.hill-street.net/?p=7 > > This blog is a year old, and isn't quite accurate anymore. in > particular simplejson sprouted some optional C code, and is now a lot > faster. yes. i second that as well. jodok batlogg > > > --Mark Ramm > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/jodok%40lovelysystems.com -- "Beautiful is better than ugly." -- The Zen of Python, by Tim Peters Jodok Batlogg, Lovely Systems GmbH Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria mobile: +43 676 5683591, phone: +43 5572 908060 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2454 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20080310/a463c2fc/attachment.bin From jonathan at carnageblender.com Mon Mar 10 17:08:59 2008 From: jonathan at carnageblender.com (Jonathan Ellis) Date: Mon, 10 Mar 2008 09:08:59 -0700 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> Message-ID: <1205165339.12593.1241570907@webmail.messagingengine.com> On Mon, 10 Mar 2008 09:37:35 -0400, "Mark Ramm" said: > > Is it time there was a JSON codec included in the python standard library? > > I would definitely support the incusion of a JSON library in the > standard lib. And, I think that it should be simplejson which is > used by TurboGears, Pylons, and bundled with Django. +1 From janssen at parc.com Mon Mar 10 18:02:01 2008 From: janssen at parc.com (Bill Janssen) Date: Mon, 10 Mar 2008 10:02:01 PDT Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> Message-ID: <08Mar10.100208pdt."58696"@synergy1.parc.xerox.com> > Is it time there was a JSON codec included in the python standard library? Great idea. In fact, I'd support including a whole ECMAscript interpreter module, much as we have XML parsers. Bill From guido at python.org Mon Mar 10 21:59:13 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Mar 2008 13:59:13 -0700 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <1205165339.12593.1241570907@webmail.messagingengine.com> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <1205165339.12593.1241570907@webmail.messagingengine.com> Message-ID: On Mon, Mar 10, 2008 at 9:08 AM, Jonathan Ellis wrote: > On Mon, 10 Mar 2008 09:37:35 -0400, "Mark Ramm" > said: > > > Is it time there was a JSON codec included in the python standard library? > > > > I would definitely support the incusion of a JSON library in the > > standard lib. And, I think that it should be simplejson which is > > used by TurboGears, Pylons, and bundled with Django. > > +1 +1 here too. Brett should probably figure out where to put it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mdipierro at cs.depaul.edu Tue Mar 11 03:04:03 2008 From: mdipierro at cs.depaul.edu (Massimo Di Pierro) Date: Mon, 10 Mar 2008 21:04:03 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> Message-ID: <6A302A46-EFBA-4A97-894C-4B31946E768C@cs.depaul.edu> I agree. simplejson is used by web2py as well. Massimo On Mar 10, 2008, at 8:37 AM, Mark Ramm wrote: >> Is it time there was a JSON codec included in the python standard >> library? > > I would definitely support the incusion of a JSON library in the > standard lib. And, I think that it should be simplejson which is > used by TurboGears, Pylons, and bundled with Django. > >> Choosing a Python JSON Translator >> http://blog.hill-street.net/?p=7 > > This blog is a year old, and isn't quite accurate anymore. in > particular simplejson sprouted some optional C code, and is now a lot > faster. > > --Mark Ramm > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/ > mdipierro%40cti.depaul.edu From ubernostrum at gmail.com Tue Mar 11 03:23:47 2008 From: ubernostrum at gmail.com (James Bennett) Date: Mon, 10 Mar 2008 21:23:47 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> Message-ID: <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> On Mon, Mar 10, 2008 at 8:37 AM, Mark Ramm wrote: > I would definitely support the incusion of a JSON library in the > standard lib. And, I think that it should be simplejson which is > used by TurboGears, Pylons, and bundled with Django. I'd tentatively agree, though I recall seeing a post not long ago (which I am currently unable to find) from the author of jsonlib lamenting the fact that most of the other JSON modules for Python had various significant inconsistencies with the RFC. While authors of competing tools should be taken with a grain of salt, I do think compliance with the spec is an important factor for any particular module that might be blessed with stdlib membership, and so should play a bigger role in any such decision than mere benchmark speed. -- "Bureaucrat Conrad, you are technically correct -- the best kind of correct." From guido at python.org Tue Mar 11 03:32:38 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Mar 2008 19:32:38 -0700 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> Message-ID: On Mon, Mar 10, 2008 at 7:23 PM, James Bennett wrote: > On Mon, Mar 10, 2008 at 8:37 AM, Mark Ramm wrote: > > I would definitely support the incusion of a JSON library in the > > standard lib. And, I think that it should be simplejson which is > > used by TurboGears, Pylons, and bundled with Django. > > I'd tentatively agree, though I recall seeing a post not long ago > (which I am currently unable to find) from the author of jsonlib > lamenting the fact that most of the other JSON modules for Python had > various significant inconsistencies with the RFC. While authors of > competing tools should be taken with a grain of salt, I do think > compliance with the spec is an important factor for any particular > module that might be blessed with stdlib membership, and so should > play a bigger role in any such decision than mere benchmark speed. Well, so fix this. How hard can it be? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From graham.dumpleton at gmail.com Tue Mar 11 03:38:39 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 11 Mar 2008 13:38:39 +1100 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> Message-ID: <88e286470803101938q30d37d44t5fde6b1e85d1bf9d@mail.gmail.com> On 11/03/2008, James Bennett wrote: > On Mon, Mar 10, 2008 at 8:37 AM, Mark Ramm wrote: > > I would definitely support the incusion of a JSON library in the > > standard lib. And, I think that it should be simplejson which is > > used by TurboGears, Pylons, and bundled with Django. > > > I'd tentatively agree, though I recall seeing a post not long ago > (which I am currently unable to find) from the author of jsonlib > lamenting the fact that most of the other JSON modules for Python had > various significant inconsistencies with the RFC. While authors of > competing tools should be taken with a grain of salt, I do think > compliance with the spec is an important factor for any particular > module that might be blessed with stdlib membership, and so should > play a bigger role in any such decision than mere benchmark speed. The problem with the JSON 1.0 specification was that it wasn't always as clear as could have been. As a result different server side implementations interpreted it differently, as did the JavaScript clients. I'll admit that it has been a while since I looked at it and maybe things have improved, but certainly it used to be the case that finding a JavaScript library that talked to a specific server side implementation wasn't always easy. End result was that the JavaScript library would often only work with the specific web framework it was originally designed for and nothing else. The problem areas were, different interpretations of what could be supplied in an error response. Whether an integer, string or arbitrary object could be supplied as the id attribute in a request. Finally, some JavaScript clients would only work with a server side implementation which provided introspection methods as they would dynamically create a JavaScript proxy object based on a call of the introspection methods. Unfortunately the JSON 1.1 draft specification didn't necessarily make things better. Rather than creating a proper layered specification which separate lower level transport and encoding concerns from higher level application concepts such as introspection they bundle it all together. Thus they try to enforce that a server must support introspection even though doing so may be totally impractical depending on what the JSON server adapter is hooking in to. They also introduced all this muck about having to support both positional and named parameters at the same time. As well as all that, I also recollect seeing some complaints about servers handing character encoding wrongly. This may be the thing you are talking about. Thus my question is, what version of the JSON specification are you intending to support. Graham From ubernostrum at gmail.com Tue Mar 11 03:50:35 2008 From: ubernostrum at gmail.com (James Bennett) Date: Mon, 10 Mar 2008 21:50:35 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> Message-ID: <21787a9f0803101950s4f6fb41eubdd7ee56462a6587@mail.gmail.com> On Mon, Mar 10, 2008 at 9:32 PM, Guido van Rossum wrote: > Well, so fix this. How hard can it be? A bit of poking around turned up the post I was looking for: http://jmillikin.blogspot.com/2008/02/python-json-catastrophe.html Seems like his beef with simplejson is mostly Unicode/encoding handling; the floating-point stuff is a bit more debatable wrt to the spec, because rfc4627 doesn't say anything about how to handle these aside from saying that a "number" in JSON is allowed to contain a decimal point followed by more digits. Since the post is only a couple weeks old, I'm assuming that the Unicode stuff is current, so if the consensus is in favor of simplejson I suppose that'd be the area to concentrate on. -- "Bureaucrat Conrad, you are technically correct -- the best kind of correct." From mark.mchristensen at gmail.com Tue Mar 11 03:51:06 2008 From: mark.mchristensen at gmail.com (Mark Ramm) Date: Mon, 10 Mar 2008 22:51:06 -0400 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> Message-ID: > Well, so fix this. How hard can it be? A google search for "jsonlib rfc" turns up this article: http://jmillikin.blogspot.com/2008/02/python-json-catastrophe.html And it looks like the two issues with simplejson are: 1) decoding JSON with unicode code-points outside the Basic Multilingual Plane 2) Decoding the json 1.1 as a python float rather than a decimal I think the author could very well be incorrect in believing that the spec requires using a decimal, so that just leaves the Unicode issue, which isn't very clearly defined in the article, but ought to be reasonably easy to fix. The spec isn't as clear as it could be about a lot of issues, but I've had no problems with simplejson and interoperability with the various javascript libraries. I've also used simplejson with one or two of the ruby json libraries, flex's json lib, and a couple of libraries for other languages, with no particular issues. That's not to say there are no bugs, but I don't think there are too many issues buried in there. --Mark From mdipierro at cs.depaul.edu Tue Mar 11 05:55:04 2008 From: mdipierro at cs.depaul.edu (Massimo Di Pierro) Date: Mon, 10 Mar 2008 23:55:04 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> Message-ID: <2A6BD2D2-0DBB-453C-8994-114A5AD32391@cs.depaul.edu> It would also be nice to have a common interface to all modules that do serialization. For example pickle, cPickle, marshall has dumps, so json should also have dumps. Massimo On Mar 10, 2008, at 9:51 PM, Mark Ramm wrote: >> Well, so fix this. How hard can it be? > > A google search for "jsonlib rfc" turns up this article: > > http://jmillikin.blogspot.com/2008/02/python-json-catastrophe.html > > And it looks like the two issues with simplejson are: > > 1) decoding JSON with unicode code-points outside the Basic > Multilingual Plane > 2) Decoding the json 1.1 as a python float rather than a decimal > > I think the author could very well be incorrect in believing that the > spec requires using a decimal, so that just leaves the Unicode issue, > which isn't very clearly defined in the article, but ought to be > reasonably easy to fix. > > The spec isn't as clear as it could be about a lot of issues, but I've > had no problems with simplejson and interoperability with the various > javascript libraries. I've also used simplejson with one or two of > the ruby json libraries, flex's json lib, and a couple of libraries > for other languages, with no particular issues. That's not to say > there are no bugs, but I don't think there are too many issues buried > in there. > > --Mark > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/ > mdipierro%40cti.depaul.edu From titus at caltech.edu Tue Mar 11 05:57:09 2008 From: titus at caltech.edu (Titus Brown) Date: Mon, 10 Mar 2008 21:57:09 -0700 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <2A6BD2D2-0DBB-453C-8994-114A5AD32391@cs.depaul.edu> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> <2A6BD2D2-0DBB-453C-8994-114A5AD32391@cs.depaul.edu> Message-ID: <20080311045709.GA15552@caltech.edu> On Mon, Mar 10, 2008 at 11:55:04PM -0500, Massimo Di Pierro wrote: -> It would also be nice to have a common interface to all modules that -> do serialization. For example pickle, cPickle, marshall has dumps, so -> json should also have dumps. Doesn't it? Or did you want something additional? http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html --titus From pywebsig at xhaus.com Tue Mar 11 10:47:33 2008 From: pywebsig at xhaus.com (Alan Kennedy) Date: Tue, 11 Mar 2008 09:47:33 +0000 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <88e286470803101938q30d37d44t5fde6b1e85d1bf9d@mail.gmail.com> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> <88e286470803101938q30d37d44t5fde6b1e85d1bf9d@mail.gmail.com> Message-ID: <4a951aa00803110247j4ee4cfa8l84bac6050eb5d2a4@mail.gmail.com> [Graham] > The problem areas were, different interpretations of what could be > supplied in an error response. Whether an integer, string or arbitrary > object could be supplied as the id attribute in a request. Finally, > some JavaScript clients would only work with a server side > implementation which provided introspection methods as they would > dynamically create a JavaScript proxy object based on a call of the > introspection methods. These are JSON-RPC concerns, and nothing to do with JSON text de/serialization. I do believe we're only discussing JSON<->python objects transformation, in this thread at least. > Unfortunately the JSON 1.1 draft specification didn't necessarily make > things better. There is no JSON 1.1 spec; but there is a JSON-RPC 1.1 spec. http://json-rpc.org/wiki/specification > Thus my question is, what version of the JSON specification are you > intending to support. The one specified in RFC 4627 http://www.ietf.org/rfc/rfc4627.txt Regards, Alan. From graham.dumpleton at gmail.com Tue Mar 11 10:56:10 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 11 Mar 2008 20:56:10 +1100 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <4a951aa00803110247j4ee4cfa8l84bac6050eb5d2a4@mail.gmail.com> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> <88e286470803101938q30d37d44t5fde6b1e85d1bf9d@mail.gmail.com> <4a951aa00803110247j4ee4cfa8l84bac6050eb5d2a4@mail.gmail.com> Message-ID: <88e286470803110256y3a520066p92c49a2d83f8e311@mail.gmail.com> On 11/03/2008, Alan Kennedy wrote: > [Graham] > > > The problem areas were, different interpretations of what could be > > supplied in an error response. Whether an integer, string or arbitrary > > object could be supplied as the id attribute in a request. Finally, > > some JavaScript clients would only work with a server side > > implementation which provided introspection methods as they would > > dynamically create a JavaScript proxy object based on a call of the > > introspection methods. > > > These are JSON-RPC concerns, and nothing to do with JSON text de/serialization. > > I do believe we're only discussing JSON<->python objects > transformation, in this thread at least. Okay. No problem then. :-) Graham > > Unfortunately the JSON 1.1 draft specification didn't necessarily make > > things better. > > > There is no JSON 1.1 spec; but there is a JSON-RPC 1.1 spec. > > http://json-rpc.org/wiki/specification > > > > Thus my question is, what version of the JSON specification are you > > intending to support. > > > The one specified in RFC 4627 > > http://www.ietf.org/rfc/rfc4627.txt > > Regards, > > > Alan. > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > From pywebsig at xhaus.com Tue Mar 11 11:30:30 2008 From: pywebsig at xhaus.com (Alan Kennedy) Date: Tue, 11 Mar 2008 10:30:30 +0000 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <2A6BD2D2-0DBB-453C-8994-114A5AD32391@cs.depaul.edu> References: <4a951aa00803100521y282d5b5bw756d7fc4652c2afc@mail.gmail.com> <21787a9f0803101923k5dc75bd0w456f13407f5ff8e4@mail.gmail.com> <2A6BD2D2-0DBB-453C-8994-114A5AD32391@cs.depaul.edu> Message-ID: <4a951aa00803110330p5a18ff0ey5214861a4970e504@mail.gmail.com> [Massimo] > It would also be nice to have a common interface to all modules that > do serialization. For example pickle, cPickle, marshall has dumps, so > json should also have dumps. Indeed, this is my primary concern also. The reason is that I have a pure-java JSON codec for jython, that I will either publish separately or contribute to jython itself. If we're going to have the facility in both cpython and jython (and probably ironpython, etc), then it would be optimal to have a compatible API so that we have full interoperability. And given that we in jython land are always left implementing cpython APIs (which are not necessarily always the optimal design for jython) it would be nice if we could agree on APIs, etc, *before* stuff goes into the standard library. The API for my codec is slightly different from simplejson, although it could be made the same with a little work, including exception signatures, etc. But there are some things about my own design that I like. For example, simplejson allows override of the JSON output representing certain objects, by the use of subclasses of JSONEncoder. My design does it differently; it simply looks for a "__json__()" callable on every object being serialised, and if found, calls it and uses its return value to represent the object. I have no equivalent of simplejson's decoding extensions. Another difference is the set of options. Simplejson has options to control parsing and generation, and so does mine. But the sets of options are different, e.g. simplejson has no option to permit/reject dangling commas (e.g. "[1,2,3,]")*, whereas mine has no support for accepting NaN, infinity, etc, etc. On the encoding side, I simply make the assumption that all character transcoding has happened before the JSON text reaches the JSON parser. (I think this is a reasonable assumption, given that byte streams are always associated with file storage, network transmission, etc, and only the programmer has access to the relevant encoding information). But given that RFC 4627 specifies how to guess encoding of JSON byte streams, I'll probably change that policy. Lastly, another area of potential cooperation is testing: I have over 100 unit-tests, with fairly extensive coverage. I think that test coverage is very important in the case of JSON; you can never have too many tests. So, what is the best way to go about agreeing on the best API? 1. Discussion on web-sig? 2. Discussion on stdlib-sig? 3. Collaborative authoring/discussion on a WIKI page? 4. ???? Regards, Alan. * Which can mean different things to different software. Some javascript interpreters interpret it as a 4 element list (inferring the last object between the comma and the closing square bracket as a null) , others as a 3 element list. Python obviously interprets it as a 3-element list. So the general internet maxim "be liberal in what you accept and strict in what produce" applies. My API gives control of this strictness/relaxedness to the user. From petite.abeille at gmail.com Tue Mar 11 21:07:14 2008 From: petite.abeille at gmail.com (Petite Abeille) Date: Tue, 11 Mar 2008 21:07:14 +0100 Subject: [Web-SIG] [OT] HTTP headers status diagram Message-ID: <46EA39C8-F7BD-4BAB-852A-5601687BA09B@gmail> To paraphrase Woody Allen: "Everything you always wanted to know about HTTP headers status, but were afraid to ask" Alternatively: "An activity diagram to describe the resolution of HTTP response status codes, given various headers." http://thoughtpad.net/alan-dean/http-headers-status.html http://thoughtpad.net/alan-dean/http-headers-status.gif Cheers, -- PA http://alt.textdrive.com/nanoki/ From deron.meranda at gmail.com Thu Mar 13 21:09:16 2008 From: deron.meranda at gmail.com (Deron Meranda) Date: Thu, 13 Mar 2008 15:09:16 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? Message-ID: <5c06fa770803131309o1118a677w5fe604ed735ae010@mail.gmail.com> (I just joined this list, so this reponse may not be threaded properly) With regards to JSON libraries, most of them I've looked at are very far away from implementing RFC 4627. For many uses this might not be a big deal, but for anything going into the Python standard library it is much more important. I've just released a new version of my own library (demjson) which is on PyPI, or at http://deron.meranda.us/python/demjson/ I wrote this one to attempt to be as strictly conforming to the RFC as I could make it. My latest version also comes with a lint-like script which can validate JSON data. It should handle all of the Unicode issues too. I've also made it very fast (for a pure Python implementation), and comes close some of the C implementation; while still being RFC strict. Right now its under the GPL 3; but if anybody wanted to consider it for inclusion (or a derivative of it), I would be agreeable to a more Pythonic license change. The big issue I've not yet addressed at this time is Python 3000 support. Mainly because the semantics of dealing with JSON data necessarily should change to reflect the new bytes type. -- Deron Meranda From fumanchu at aminus.org Fri Mar 14 04:28:52 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Thu, 13 Mar 2008 20:28:52 -0700 Subject: [Web-SIG] Web Dev Pad References: Message-ID: Just an update and reminder. We're in room 1021 of the Marriott Renaissance through Sunday night. Looks like we have a good-sized crew already and it's only Thursday. :) Call anytime! Robert Brewer fumanchu at aminus.org ________________________________ From: web-sig-bounces+fumanchu=aminus.org at python.org on behalf of Robert Brewer Sent: Sat 3/8/2008 12:58 PM To: web-sig at python.org Subject: [Web-SIG] Web Dev Pad Hello, all you Python web tool developers! Like last year, Chad Whitacre (author of Aspen) and I (lead dev of CherryPy) are going to get a suite and run the Web Dev Pad again. Come on by any evening from Thursday night to Sunday night--we'll be up late serving the three M's (mudslides, margaritas, and martinis) and plenty of camaraderie in our living room at the Marriott Renaissance [1]. It's a few blocks from the conference hotel but well worth the trip; try to call if it's before 8pm to make sure we're not both out to dinner. We're open as late as you like to anyone who works on web libraries, servers, or frameworks (or their friends; don't let domain boundaries stop you ;). Robert Brewer fumanchu at aminus.org 619 374 1117 [1] http://www.marriott.com/hotels/travel/chibr-renaissance-chicago-ohare-su ites-hotel/ _______________________________________________ Web-SIG mailing list Web-SIG at python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/fumanchu%40aminus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20080313/e17bff73/attachment.htm From bob at redivi.com Sun Mar 16 19:47:14 2008 From: bob at redivi.com (Bob Ippolito) Date: Sun, 16 Mar 2008 13:47:14 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? Message-ID: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> I wasn't subscribed to the list at the time this came up, but I'm all for getting simplejson into the stdlib. We use it a lot here at Mochi Media and we're willing to support it in or out of the stdlib. I'm willing to sign over copyright to the PSF and/or relicense and help with whatever else needs to happen, just let me know. As far as the reading of non-BMP unicode goes, give me a test and we'll fix it. So far nobody has sent any bug reports related to unicode, so I'm willing to wager that nobody actually cares, but I'd still like to do it correctly. I haven't tried very hard because I use a UCS2 build of Python on my workstation and I don't know enough about unicode edge cases to craft a proper suite. Practically speaking I've tried using decimal instead of float for JSON and it's generally The Wrong Thing To Do. The spec doesn't say what to do about numbers, but for proper JavaScript interaction you want to do things that approximate what JS is going to do: 64-bit floating point. Encoding floats with repr might be the wrong thing to do because the representation is long, but ideally repr in Python would use a better algorithm[1]. Changing from repr to str is trivial and beneficial so I've already put that on the trunk. [1] http://www.cs.indiana.edu/~burger/fp/index.html -bob From pywebsig at xhaus.com Sun Mar 16 23:35:26 2008 From: pywebsig at xhaus.com (Alan Kennedy) Date: Sun, 16 Mar 2008 22:35:26 +0000 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> Message-ID: <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> [Deron] > (I just joined this list, so this reponse may not be threaded properly) [Bob] > I wasn't subscribed to the list at the time this came up, but I'm all > for getting simplejson into the stdlib. Well, it appears we have a quorum of JSON<->python codec writers, since I've written a jython module that I'd like to interoperate with cpython codecs. I think it's appropriate for any discussions of JSON to take place on the web-sig. I've been thinking about how to take this forward. I see two ways Formal approach ============ Introduce a "Standards Track" Library PEP, which is designed for the purpose of bringing a new module through a full peer-review process and into the python standard library. (Which means we in jython and ironpython land should also then provide it). This would have the following outcomes - Result in a single JSON implementation going into the cpython standard library, possibly in Python 3000 - Expose the new module to full community review/bug-tracking/modification - Opportunity to thrash out all of the finer points of JSON<->python transcoding, including but not limited to - NaN, Infinity, etc - What is the most appropriate number/integer/float/double/decimal representation - Structural strictness, e.g. junk after document body, dangling commas, etc. - BMP support - Byte encoding detection - Python 3000 support - Standardise the interface, de facto However, this option is somewhat complicated by the fact that we seem to have TWO quality cpython implementations competing for a place in the cpython standard library. Also, I think the PEP process might be a little cumbersome for this topic, given that the PEP process involves commit rights to the cpython source tree (since the proposal for a new module should be accompanied by the source code of the proposed implementation). Informal approach ============= Develop and document a standard interface, and ensure that all of our modules support it. This interface would define method, class and exception names. Standard methods would probably "load" and "dump" objects, possibly creating "JSONEncoder"s and "JSONDecoder"s to do the job: "JSONException" and subclasses thereof would signify errors. Perhaps a standard mechanism to retrieve the location of errors, e.g. line and column, would be appropriate? Perhaps a standard set of feature/option names could be agreed, e.g. "accept_NaN", etc. User code written to this standard could move reasonably easily between implementations, or indeed between platforms. This approach has the benefits that - Authors are free to interpret edge cases as they see fit, and provide options. - Competing implementations can continue to improve in the field - Changing implementations could be as simple as using a different egg (Although an exhaustive set of test cases covering the required behaviour is recommended) We could call it PAJ, Python Api for Json, or some such. I feel the informal option is more appropriate. It could be effectively managed on a wiki page. Or perhaps a ticketing system (e.g. TRAC) would be good for tracking detailed discussions of JSON's many edge cases, etc. I would be willing to start a wiki page with details about a putative module interface. Finally, at this stage I think speed is less of a concern; correctness is more important for now. As Aahz is fond of quoting, "It is easier to optimize correct code than to correct optimized code". Thoughts? Alan. From bob at redivi.com Sun Mar 16 23:52:26 2008 From: bob at redivi.com (Bob Ippolito) Date: Sun, 16 Mar 2008 17:52:26 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> Message-ID: <6a36e7290803161552q27315924s24b1f0980d368b21@mail.gmail.com> On Sun, Mar 16, 2008 at 5:35 PM, Alan Kennedy wrote: > [Deron] > > (I just joined this list, so this reponse may not be threaded properly) > > [Bob] > > > I wasn't subscribed to the list at the time this came up, but I'm all > > for getting simplejson into the stdlib. > > Well, it appears we have a quorum of JSON<->python codec writers, > since I've written a jython module that I'd like to interoperate with > cpython codecs. I think it's appropriate for any discussions of JSON > to take place on the web-sig. > > I've been thinking about how to take this forward. I see two ways Are there *really* competing implementations? I mean, it seems that pretty much everyone uses simplejson, at least in the web framework world. If someone writes a test for BMP, we'll fix it. As far as byte encoding detection I think that's beyond the scope of a JSON implementation and I think it's unnecessary in the first place. As far as Jython support goes, I suppose that's probably fixable without too much effort. I would imagine that the problems are just in the decoder, because of the sre_* module (ab)use. Was there some other reason for writing a Jython-specific codec? -bob From guido at python.org Mon Mar 17 00:18:37 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 16 Mar 2008 18:18:37 -0500 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <6a36e7290803161552q27315924s24b1f0980d368b21@mail.gmail.com> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> <6a36e7290803161552q27315924s24b1f0980d368b21@mail.gmail.com> Message-ID: On Sun, Mar 16, 2008 at 5:52 PM, Bob Ippolito wrote: > On Sun, Mar 16, 2008 at 5:35 PM, Alan Kennedy wrote: > > [Deron] > > > (I just joined this list, so this reponse may not be threaded properly) > > > > [Bob] > > > > > I wasn't subscribed to the list at the time this came up, but I'm all > > > for getting simplejson into the stdlib. > > > > Well, it appears we have a quorum of JSON<->python codec writers, > > since I've written a jython module that I'd like to interoperate with > > cpython codecs. I think it's appropriate for any discussions of JSON > > to take place on the web-sig. > > > > I've been thinking about how to take this forward. I see two ways > > Are there *really* competing implementations? I mean, it seems that > pretty much everyone uses simplejson, at least in the web framework > world. If someone writes a test for BMP, we'll fix it. As far as byte > encoding detection I think that's beyond the scope of a JSON > implementation and I think it's unnecessary in the first place. I'm reading Alan's post as saying that he has a competing implementation. But I believe that both approaches Alan offers are overkill: he proposes something like the db-API, where there are multiple implementations of the same API (except in the case of the db-API they aren't necessarily 100% interchangeable). It'm sorry for Alan, but I'd much rather pick one implementation (simplejson), fix 1-2 minor issues with it, and put it into the standard library. Anything else is just a lot of opportunity for endless debate without much benefit. > As far as Jython support goes, I suppose that's probably fixable > without too much effort. I would imagine that the problems are just in > the decoder, because of the sre_* module (ab)use. Was there some other > reason for writing a Jython-specific codec? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bob at redivi.com Mon Mar 17 15:58:31 2008 From: bob at redivi.com (Bob Ippolito) Date: Mon, 17 Mar 2008 07:58:31 -0700 Subject: [Web-SIG] JSON object names (was Re: Time a for JSON parser in the standard library?) In-Reply-To: <47DE855D.3050504@pollenation.net> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <47DE855D.3050504@pollenation.net> Message-ID: <6a36e7290803170758v362a1adexa8c1e3730d588843@mail.gmail.com> On Mon, Mar 17, 2008 at 7:51 AM, Matt Goodall wrote: > Hi, > > One thing I keep meaning to mention, prompted by the possibility of > simplejson being sucked into the std lib, is the handling of JSON object > names. > > "An object structure is represented as a pair of curly brackets > surrounding zero or more name/value pairs (or members). ***A name is > a string.*** A single colon comes after each name, separating the > name from the value." (My emphasis added.) > > I noticed simplejson (and others, I suspect) allow more types than just > a string to be given as a name, although they're always deserialised to > unicode instances: > > >>> loads(dumps({'s': None})) > {u's': None} > >>> loads(dumps({1: None})) > {u'1': None} > >>> loads(dumps({None: None})) > {u'null': None} > >>> loads(dumps({True: None})) > {u'True': None} > >>> > > Am I reading the spec correctly? If so, is it worth explicitly > disallowing anything other than a string when serializing dict keys > before anything gets added to the std lib? > > > I guess the realy question is, has this been a problem to those who use > JSON a lot to make it worth changing? I chose to make simplejson behave the way it does because it mirrored what happens in JavaScript, and it was practical to do so. It's often useful to (at least) have a number->object mapping without having to do dict((str(k), v) for k,v in d.iteritems()). Having it do this for True, False, and None makes somewhat less sense but I haven't had anyone complain. -bob From matt at pollenation.net Mon Mar 17 15:51:09 2008 From: matt at pollenation.net (Matt Goodall) Date: Mon, 17 Mar 2008 14:51:09 +0000 Subject: [Web-SIG] JSON object names (was Re: Time a for JSON parser in the standard library?) In-Reply-To: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> Message-ID: <47DE855D.3050504@pollenation.net> Hi, One thing I keep meaning to mention, prompted by the possibility of simplejson being sucked into the std lib, is the handling of JSON object names. "An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). ***A name is a string.*** A single colon comes after each name, separating the name from the value." (My emphasis added.) I noticed simplejson (and others, I suspect) allow more types than just a string to be given as a name, although they're always deserialised to unicode instances: >>> loads(dumps({'s': None})) {u's': None} >>> loads(dumps({1: None})) {u'1': None} >>> loads(dumps({None: None})) {u'null': None} >>> loads(dumps({True: None})) {u'True': None} >>> Am I reading the spec correctly? If so, is it worth explicitly disallowing anything other than a string when serializing dict keys before anything gets added to the std lib? I guess the realy question is, has this been a problem to those who use JSON a lot to make it worth changing? - Matt From benji at benjiyork.com Mon Mar 17 16:42:22 2008 From: benji at benjiyork.com (Benji York) Date: Mon, 17 Mar 2008 11:42:22 -0400 Subject: [Web-SIG] JSON object names (was Re: Time a for JSON parser in the standard library?) In-Reply-To: <47DE855D.3050504@pollenation.net> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <47DE855D.3050504@pollenation.net> Message-ID: <47DE915E.6020702@benjiyork.com> Matt Goodall wrote: > I noticed simplejson (and others, I suspect) allow more types than just > a string to be given as a name, although they're always deserialised to > unicode instances: I suspect the intent is to mirror JavaScript's propensity for coercion. For example, evaluating {1: 2}['1'] results in the integer 2. -- Benji York http://benjiyork.com From jmillikin at gmail.com Wed Mar 19 16:26:00 2008 From: jmillikin at gmail.com (John Millikin) Date: Wed, 19 Mar 2008 07:26:00 -0800 Subject: [Web-SIG] Time a for JSON parser in the standard library? Message-ID: <3283f7fe0803190826u18356161h4d5ff53cf0336db2@mail.gmail.com> I am the author of jsonlib. Apologies for not replying to this thread earlier; I had not realized that JSON was considered a web-side technology, as I use it primarily for serialization of simple data sets between Python, Java, and C. First, I would like to state that my purpose for writing jsonlib was not to become the maintainer of yet another (de)serializer, but merely to solve my immediate problem of no JSON libraries that handle Unicode correctly. Any solution that results in a standard JSON library I fully support. At the worst, if Python decides to go down the route of the super-forgiving parsers I will simply maintain jsonlib as a separate library. > I think the author could very well be incorrect in believing that the > spec requires using a decimal > The spec does not require a decimal, but I dislike losing information in the parsing stage. Any implementation in the standard library should, in my opinion, at least offer a parameter for lossless parsing of number values. > On the encoding side, I simply make the assumption that all character > transcoding has happened before the JSON text reaches the JSON parser. > (I think this is a reasonable assumption, given that byte streams are > always associated with file storage, network transmission, etc, and > only the programmer has access to the relevant encoding information). > But given that RFC 4627 specifies how to guess encoding of JSON byte > streams, I'll probably change that policy. > jsonlib has an encoding autodetection feature. It's not particularly hard, except that Python 2.5 (the version I use) does not have a codec for UTF-32, so I had to hand-roll one using the struct module. Finally, as regards APIs, I very much like the simplejson API and have attempted to model the options of jsonlib's write() function after simplejson's dumps. My reason for using read() and write() is due to python-json, which is the first JSON library I used. From deron.meranda at gmail.com Thu Mar 20 21:37:24 2008 From: deron.meranda at gmail.com (Deron Meranda) Date: Thu, 20 Mar 2008 16:37:24 -0400 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> <6a36e7290803161552q27315924s24b1f0980d368b21@mail.gmail.com> Message-ID: <5c06fa770803201337k7bdf54b3o1fd333a648261657@mail.gmail.com> On Sun, Mar 16, 2008 at 7:18 PM, Guido van Rossum wrote: > I'm reading Alan's post as saying that he has a competing implementation. Yes, there are several JSON implementation now, some better than others. I finally sat down and put the five or so top JSON libraries to the test so we can all see what's what. I've put everything in a report here: http://deron.meranda.us/python/comparing_json_modules/ I have tried to be very rigorous. There's probably mistakes in there, so let me know if anybody finds any. Also if any of you module authors update your code and want me to re-do the tests against a newer version let me know. I do have to say I'm glad to see that many of the implementations have been getting much much better since I first checked out the scene a year ago. Yes, my module is among those, but I don't particularly care who's we use (or derive from) for inclusion in Python, as long as we can clean up any warts or issues with non-conformance. Seeing all these compared in one place might give all of us ideas for a better approach before anything becomes an official Python component. I've certainly learned a few things I could do better with mine by looking at everybody else's. I do think though that if this is targeted for Python 3, that none of the modules really works well. We should really design an interface that uses the bytes type rather than str for pushing around encoded JSON data. > But I believe that both approaches Alan offers are overkill: he > proposes something like the db-API, where there are multiple > implementations of the same API (except in the case of the db-API they > aren't necessarily 100% interchangeable). I agree, let's not go down the db-API path! It's full of mud. -- Deron Meranda From bob at redivi.com Thu Mar 20 23:03:37 2008 From: bob at redivi.com (Bob Ippolito) Date: Thu, 20 Mar 2008 15:03:37 -0700 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <5c06fa770803201337k7bdf54b3o1fd333a648261657@mail.gmail.com> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> <6a36e7290803161552q27315924s24b1f0980d368b21@mail.gmail.com> <5c06fa770803201337k7bdf54b3o1fd333a648261657@mail.gmail.com> Message-ID: <6a36e7290803201503k3b75ca2bqe3d8c69f0d065c4a@mail.gmail.com> On Thu, Mar 20, 2008 at 1:37 PM, Deron Meranda wrote: > On Sun, Mar 16, 2008 at 7:18 PM, Guido van Rossum wrote: > > I'm reading Alan's post as saying that he has a competing implementation. > > Yes, there are several JSON implementation now, some > better than others. > > I finally sat down and put the five or so top JSON libraries > to the test so we can all see what's what. I've put everything > in a report here: > > http://deron.meranda.us/python/comparing_json_modules/ > > I have tried to be very rigorous. There's probably mistakes > in there, so let me know if anybody finds any. Also if any > of you module authors update your code and want me to > re-do the tests against a newer version let me know. This is very cool, but it's not reproducible. If you published the code that you used then it would help everyone out considerably. For example, with simplejson 1.7.5 I'm unable to reproduce the problems you're experiencing with http://deron.meranda.us/python/comparing_json_modules/strings in Table 1 (other than UserString of course). I've tried this both on 32-bit UCS-2 python (OS X) as well as 64-bit UCS4 python (x86-64 linux) and I get correct output. I'm still looking through the rest of the tests, so that might not be the only supposed problem that I'm unable to reproduce. > I do think though that if this is targeted for Python 3, that > none of the modules really works well. We should really > design an interface that uses the bytes type rather > than str for pushing around encoded JSON data. I can certainly agree with that. The scope of the changes wouldn't be that big though, at least for simplejson's default behavior (escape everything to ASCII). -bob From deron.meranda at gmail.com Thu Mar 20 23:38:28 2008 From: deron.meranda at gmail.com (Deron Meranda) Date: Thu, 20 Mar 2008 18:38:28 -0400 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <6a36e7290803201503k3b75ca2bqe3d8c69f0d065c4a@mail.gmail.com> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> <6a36e7290803161552q27315924s24b1f0980d368b21@mail.gmail.com> <5c06fa770803201337k7bdf54b3o1fd333a648261657@mail.gmail.com> <6a36e7290803201503k3b75ca2bqe3d8c69f0d065c4a@mail.gmail.com> Message-ID: <5c06fa770803201538j69cd38a6rd7a7838b0212f4dc@mail.gmail.com> On Thu, Mar 20, 2008 at 6:03 PM, Bob Ippolito wrote: > On Thu, Mar 20, 2008 at 1:37 PM, Deron Meranda wrote: > > I finally sat down and put the five or so top JSON libraries > > to the test so we can all see what's what. I've put everything > > in a report here: > > > > http://deron.meranda.us/python/comparing_json_modules/ > This is very cool, but it's not reproducible. If you published the > code that you used then it would help everyone out considerably. Yes, I will try to do that. I just need to clean it up a bit so it's a bit less manual. > For example, with simplejson 1.7.5 I'm unable to reproduce the problems > you're experiencing with > http://deron.meranda.us/python/comparing_json_modules/strings in Table > 1 (other than UserString of course). Do you mean tests 1-12 and 1-13 where simepljson is seemingly not \u escaping U+001A through U+001F ? I think my table might be confusing. In those sets of tests I'm looking at the JSON after its been converted into UTF-8 (because while it is sitting in memory as a "unicode" string, its not technically JSON). I talk about that on the next page (Unicode), but that's not the same order you read things in...my fault; I'll try to rewrite that a bit so its clearer. Anyway, to get JSON in UTF-8, I'm calling it like this: simplejson.dumps( ["\x1a"], ensure_ascii=False ).encode('utf8') which on my system outputs this: '["\x1a"]' rather than this: '["\\u001a"]' If I change the ensure_ascii to its default of True, then I do get the correct results. Interestingly it works correctly either way for all characters at or less than U+0019. So is it just the way I'm calling it (perhaps not as intended), or do we still see a difference between your system and mine? I'll try to get some test framework code up within the next day. -- Deron Meranda From jmillikin at gmail.com Thu Mar 20 23:48:08 2008 From: jmillikin at gmail.com (John Millikin) Date: Thu, 20 Mar 2008 14:48:08 -0800 Subject: [Web-SIG] Time a for JSON parser in the standard library? In-Reply-To: <5c06fa770803201337k7bdf54b3o1fd333a648261657@mail.gmail.com> References: <6a36e7290803161147m60a26eeajd514faca98b3e436@mail.gmail.com> <4a951aa00803161535i154d2fc2v7b7e443cede89d4b@mail.gmail.com> <6a36e7290803161552q27315924s24b1f0980d368b21@mail.gmail.com> <5c06fa770803201337k7bdf54b3o1fd333a648261657@mail.gmail.com> Message-ID: <3283f7fe0803201548l7d986513w2189791665b0c3d8@mail.gmail.com> On Thu, Mar 20, 2008 at 12:37 PM, Deron Meranda wrote: > I finally sat down and put the five or so top JSON libraries > to the test so we can all see what's what. I've put everything > in a report here: > > http://deron.meranda.us/python/comparing_json_modules/ > This is fantastic. My knowledge of other JSON modules was based mainly on the comparison page from json.org, and yours is much more complete and informative. You could try adding a section to the numbers area about Arabic/Chinese/whatever numbers, such as U+0661. These are not allowed in JSON, but are accepted by parsers that use \d regex patterns with the re.UNICODE flag set. For strings, I would like to suggest that escaping "/" to "\\/" be considered the norm, with deviations from this marked on the table. This is to protect against foolish website authors including JSON directly using a