From mi-mal at o2.pl Fri Jan 9 08:25:50 2004 From: mi-mal at o2.pl (Mimal) Date: Fri Jan 9 08:30:30 2004 Subject: [Web-SIG] CGI module problem: duplicated output Message-ID: Hello, I started to learn how to use python with CGI. I went through some tutorials, but then I found one problem, that seems to be something stupid. I tried to find answer using google, but I couldn't. This is my simple CGI script: #!/usr/bin/python import cgi print "Content-type: text/html\n" print "Hello, world!" After I run it under Apache I got (HTML source code): Hello, world!Content-type: text/html Hello, world! I tried to run it under bash console. I got this: Hello, world! Content-type: text/html Hello, world! That's very strange for me. I'm using Mandrake 9.2 + Apache 2 + Python 2.3, but the same problem occurs under WinNT + Python 2.1 and WinXP + Zope + Python 2.3. Thanks in advance for help! -- Mimal From jjl at pobox.com Fri Jan 9 08:57:35 2004 From: jjl at pobox.com (John J Lee) Date: Fri Jan 9 08:57:43 2004 Subject: [Web-SIG] CGI module problem: duplicated output In-Reply-To: References: Message-ID: On Fri, 9 Jan 2004, Mimal wrote: > I started to learn how to use python with CGI. I went through some > tutorials, but then I found one problem, that seems to be something > stupid. I tried to find answer using google, but I couldn't. Hi. This is the wrong list for requests for help with programming. The web-sig list is (was?) for discussion of additions to the Python standard library that have to do with the WWW. Try comp.lang.python (gatewayed to python-list@python.org). John From jeremy at alum.mit.edu Fri Jan 9 15:33:37 2004 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri Jan 9 18:30:27 2004 Subject: [Web-SIG] web-sig sprint at PyCon? Message-ID: <1073680417.6341.296.camel@localhost.localdomain> I notice that activity on this basically stopped in early December. It looks like a decent sent of goals has been outlined, but there's no one to manage the process or get started on the work. If there is any interest, it would be great to have a sprint at PyCon. There is space and wireless available for the four days before the main conference for people to work on Python projects. To make a sprint work, we'd need to find a sprint coach who can lead the effort -- provide some vision for what to do, help round up people to attend, etc. Any takers? Jeremy From janssen at parc.com Fri Jan 9 18:56:24 2004 From: janssen at parc.com (Bill Janssen) Date: Fri Jan 9 18:56:58 2004 Subject: [Web-SIG] web-sig sprint at PyCon? In-Reply-To: Your message of "Fri, 09 Jan 2004 12:33:37 PST." <1073680417.6341.296.camel@localhost.localdomain> Message-ID: <04Jan9.155634pst."58611"@synergy1.parc.xerox.com> > I notice that activity on this basically stopped in early December. Well, I think we just slowed down for the holidays. I'm planning to get working on it again after the 15th, when a couple of paper submission deadlines expire. Bill From jjl at pobox.com Fri Jan 9 20:27:39 2004 From: jjl at pobox.com (John J Lee) Date: Fri Jan 9 20:27:48 2004 Subject: [Web-SIG] web-sig sprint at PyCon? In-Reply-To: <1073680417.6341.296.camel@localhost.localdomain> References: <1073680417.6341.296.camel@localhost.localdomain> Message-ID: On Fri, 9 Jan 2004, Jeremy Hylton wrote: > I notice that activity on this basically stopped in early December. It > looks like a decent sent of goals has been outlined, but there's no one > to manage the process or get started on the work. Just a note that I've got an alpha version of a UserAgent class that was discussed here. I'm distributing it as part of this package http://wwwsearch.sf.net/mechanize/ It's a subclass of urllib2.OpenerDirector, and the base class of mechanize.Browser, and could be added to urllib2. Perhaps I should have finished urllib2.OpenerFactory instead of deriving my class from OpenerDirector, since my class has to have an ugly ._replace_handler() method. OTOH, having it be a subclass seems conceptually simpler: no new "layer" of code to add more things for the user to know about (and more code to write, perhaps). Also, my class mechanize.Browser seems to work nicely as a subclass of mechanize.UserAgent. There's also an untested http_get function in there (in _useragent.py, commented out), for doing an HTTP GET with a Range header. John From moof at metamoof.net Sat Jan 10 08:47:49 2004 From: moof at metamoof.net (Giles A. Radford) Date: Sat Jan 10 08:47:59 2004 Subject: [Web-SIG] web-sig sprint at PyCon? In-Reply-To: References: <1073680417.6341.296.camel@localhost.localdomain> Message-ID: <20040110134749.GA21078@www.abyss-uk.com> On Sat, Jan 10, 2004 at 01:27:39AM +0000, John J Lee wrote: > Just a note that I've got an alpha version of a UserAgent class that was > discussed here. I'm distributing it as part of this package > > http://wwwsearch.sf.net/mechanize/ Thanks John, I've actually been using mechanize.Browser all this past week (going through the various iterations thereof). It seems to be a quite nice solution, certainly to what I was after, and I'm happily automating my web access away now. The only major bug I came across was fixed before I got round to reporting it (the HTTP Referer thing), but I haven't been doing in depth testing, as it were. I suppose I'm writing this mail to say "Thanks. I'm using it, and it works great. Keep up the great work!", which is not said often enough to Open Source developers. Are you wanting any help in particular other than testing? Moof From jjl at pobox.com Sat Jan 10 08:56:29 2004 From: jjl at pobox.com (John J Lee) Date: Sat Jan 10 08:57:34 2004 Subject: [Web-SIG] web-sig sprint at PyCon? In-Reply-To: <20040110134749.GA21078@www.abyss-uk.com> References: <1073680417.6341.296.camel@localhost.localdomain> <20040110134749.GA21078@www.abyss-uk.com> Message-ID: On Sat, 10 Jan 2004, Giles A. Radford wrote: [...] > I suppose I'm writing this mail to say "Thanks. I'm using it, and it > works great. Keep up the great work!", which is not said often enough to > Open Source developers. Thanks! > Are you wanting any help in particular other than testing? I don't expect any major additions, so probably not (apart from integrating and developing DOMForm, but I need to do that myself). Actually, there is one thing, but I'll post that in a separate message... John From jjl at pobox.com Sat Jan 10 10:23:51 2004 From: jjl at pobox.com (John J Lee) Date: Sat Jan 10 10:24:14 2004 Subject: [Web-SIG] web-sig sprint at PyCon? In-Reply-To: References: <1073680417.6341.296.camel@localhost.localdomain> <20040110134749.GA21078@www.abyss-uk.com> Message-ID: On Sat, 10 Jan 2004, John J Lee wrote: [...] > Actually, there is one thing, but I'll post that in a separate message... No I won't. It was about HTMLParser being less liberal than sgmllib/htmllib, but it turns out the HTML in question is so ugly that it fits into the "leave it to tidylib" category. John From quentel.pierre at wanadoo.fr Sun Jan 11 16:05:17 2004 From: quentel.pierre at wanadoo.fr (Pierre Quentel) Date: Sun Jan 11 16:05:23 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? Message-ID: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL> Hello everybody, I am new to this web-sig mailing list, I was lead to it by Wilk after posting the suggestion below on comp.lang.python. If you have heard of Karrigell, one of the web frameworks in Python, I'm the one to blame for it... Well, here is my suggestion : -------------- Python standard library provides two modules for asynchronous socket programming : asyncore and asynchat. Several web servers have been built upon these modules (medusa being the best-known I suppose) and are famous for their performance level Unfortunately no example of use is provided in the standard library (whereas the more "classic" SocketServer is illustrated by BaseHTTPServer, SimpleHTTPServer, etc). I think it would be useful if Python came with a simple HTTP server written with these modules, to help beginners understand how use them I've written one, which handles GET and POST requests. It's inspired by (and partly copied from) the http subset of medusa, only reduced to less than 200 lines. It's called SimpleAsyncHTTPServer and published on Active State Python Cookbook http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259148 Any thoughts ? Pierre --------------- I've quickly read the archive of this mailing list, my proposal matches one of the items on Bill Janssen's page : "A standard server framework on the order of Medusa. This should support a standalone Python web server, with the ability to serve files, and the ability to add new handlers. Not sure it has to support CGI invocation. -- Bill Janssen " Perhaps this SimpleAsyncHTTPServer is a step in this direction ? Regards, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20040111/d4f7cb78/attachment.html From janssen at parc.com Mon Jan 12 21:06:22 2004 From: janssen at parc.com (Bill Janssen) Date: Mon Jan 12 21:06:46 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: Your message of "Sun, 11 Jan 2004 13:05:17 PST." <002f01c3d886$9dce0dd0$c022fea9@QUENTEL> Message-ID: <04Jan12.180627pst."58611"@synergy1.parc.xerox.com> Yes, I think it's a good idea. I'd like to see something a bit more substantial, though, on the order of Medusa. Perhaps we could talk about what parts of Medusa could be skipped/re-written? Bill From amk at amk.ca Tue Jan 13 07:35:07 2004 From: amk at amk.ca (A.M. Kuchling) Date: Tue Jan 13 07:36:26 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: <04Jan12.180627pst."58611"@synergy1.parc.xerox.com> References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL> <04Jan12.180627pst."58611"@synergy1.parc.xerox.com> Message-ID: <20040113123507.GA7812@rogue.amk.ca> On Mon, Jan 12, 2004 at 06:06:22PM -0800, Bill Janssen wrote: > Yes, I think it's a good idea. I'd like to see something a bit more > substantial, though, on the order of Medusa. Perhaps we could talk > about what parts of Medusa could be skipped/re-written? Graham Fawcett just made a subset of Medusa for inclusion in Quixote; it came out to four files (http_server.py, producers.py, logger.py and http_date.py). Possibly logger.py can be discarded, and if desired we could probably reduce the number of modules further by merging them. The Medusa HTTP server supports 1.1's keepalive, but not pipelining. --amk From jeremy at alum.mit.edu Tue Jan 13 09:46:03 2004 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue Jan 13 09:51:07 2004 Subject: [Web-SIG] web-sig sprint at PyCon? In-Reply-To: <04Jan9.155634pst."58611"@synergy1.parc.xerox.com> References: <04Jan9.155634pst."58611"@synergy1.parc.xerox.com> Message-ID: <1074005161.6341.1911.camel@localhost.localdomain> On Fri, 2004-01-09 at 18:56, Bill Janssen wrote: > > I notice that activity on this basically stopped in early December. > > Well, I think we just slowed down for the holidays. I'm planning to > get working on it again after the 15th, when a couple of paper > submission deadlines expire. Fair enough. It sounds, also, like there isn't anyone interested in a sprint at PyCon. Jeremy From neel at mediapulse.com Tue Jan 13 10:36:19 2004 From: neel at mediapulse.com (Michael C. Neel) Date: Tue Jan 13 10:36:31 2004 Subject: [Web-SIG] web-sig sprint at PyCon? Message-ID: I'm up for it, just no time off and no one to pay for it =) Mike > -----Original Message----- > From: Jeremy Hylton [mailto:jeremy@alum.mit.edu] > Sent: Tuesday, January 13, 2004 9:46 AM > To: Bill Janssen > Cc: web-sig@python.org > Subject: Re: [Web-SIG] web-sig sprint at PyCon? > > > On Fri, 2004-01-09 at 18:56, Bill Janssen wrote: > > > I notice that activity on this basically stopped in early > December. > > > > Well, I think we just slowed down for the holidays. I'm planning to > > get working on it again after the 15th, when a couple of paper > > submission deadlines expire. > > Fair enough. > > It sounds, also, like there isn't anyone interested in a sprint at > PyCon. > > Jeremy > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-> sig/neel%40mediapulse.com > From quentel.pierre at wanadoo.fr Tue Jan 13 17:02:07 2004 From: quentel.pierre at wanadoo.fr (Pierre Quentel) Date: Tue Jan 13 17:02:14 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL><04Jan12.180627pst."58611"@synergy1.parc.xerox.com> <20040113123507.GA7812@rogue.amk.ca> Message-ID: <002101c3da20$e3728e10$c022fea9@QUENTEL> > Graham Fawcett just made a subset of Medusa for inclusion in Quixote; it > came out to four files (http_server.py, producers.py, logger.py and > http_date.py). Possibly logger.py can be discarded, and if desired we could > probably reduce the number of modules further by merging them. I saw the discussion on the Quixote mailing list, concluding in favour of this subset of medusa instead of SimpleAsyncHTTPServer. It was certainly the best choice for Quixote, but I don't know for the standard library. Where can we find the 4 files ? Thanks, Pierre From fawcett at teksavvy.com Tue Jan 13 21:45:41 2004 From: fawcett at teksavvy.com (Graham Fawcett) Date: Tue Jan 13 21:45:48 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: <002101c3da20$e3728e10$c022fea9@QUENTEL> References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL><04Jan12.180627pst."58611"@synergy1.parc.xerox.com> <20040113123507.GA7812@rogue.amk.ca> <002101c3da20$e3728e10$c022fea9@QUENTEL> Message-ID: <4004AD55.30303@teksavvy.com> Pierre Quentel wrote: >>Graham Fawcett just made a subset of Medusa for inclusion in Quixote; it >>came out to four files (http_server.py, producers.py, logger.py and >>http_date.py). Possibly logger.py can be discarded, and if desired we >> >> >could > > >>probably reduce the number of modules further by merging them. >> >> > >I saw the discussion on the Quixote mailing list, concluding in favour of >this subset of medusa instead of SimpleAsyncHTTPServer. It was certainly the >best choice for Quixote, but I don't know for the standard library. Where >can we find the 4 files ? >Thanks, >Pierre > > At the end of the day, it turned out be be a few more than 4 files; there were some "imports" halfway through the source file that I missed when I made the four-files assertion. Ah well. My goal was to strip out files that were non-essential to the HTTP server. There's still a lot of unused code in the remaining modules that ought to be cleaned out (whether for Quixote's purposes or others). The pared-down Medusa is available at http://fawcett.medialab.uwindsor.ca/quixote/medusa_patch.tar.gz. -- G From quentel.pierre at wanadoo.fr Wed Jan 14 16:31:37 2004 From: quentel.pierre at wanadoo.fr (Pierre Quentel) Date: Wed Jan 14 16:31:46 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL><04Jan12.180627pst."58611"@synergy1.parc.xerox.com> <20040113123507.GA7812@rogue.amk.ca> <002101c3da20$e3728e10$c022fea9@QUENTEL> <4004AD55.30303@teksavvy.com> Message-ID: <000f01c3dae5$cb8876f0$c022fea9@QUENTEL> I downloaded and unzipped this archive and I get this error : C:\Telechargements\medusa_patch\server\medusa>python http_server.py localhost 8080 Traceback (most recent call last): File "http_server.py", line 718, in ? import monitor ImportError: No module named monitor I don't have medusa installed on my PC Have I forgotten something ? - Pierre From fawcett at teksavvy.com Thu Jan 15 10:02:02 2004 From: fawcett at teksavvy.com (Graham Fawcett) Date: Thu Jan 15 10:02:16 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: <000f01c3dae5$cb8876f0$c022fea9@QUENTEL> References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL><04Jan12.180627pst."58611"@synergy1.parc.xerox.com> <20040113123507.GA7812@rogue.amk.ca> <002101c3da20$e3728e10$c022fea9@QUENTEL> <4004AD55.30303@teksavvy.com> <000f01c3dae5$cb8876f0$c022fea9@QUENTEL> Message-ID: <4006AB6A.4000001@teksavvy.com> Pierre Quentel wrote: >I downloaded and unzipped this archive and I get this error : > >C:\Telechargements\medusa_patch\server\medusa>python http_server.py >localhost >8080 >Traceback (most recent call last): > File "http_server.py", line 718, in ? > import monitor >ImportError: No module named monitor > >I don't have medusa installed on my PC > >Have I forgotten something ? > > No -- although it could be argued that I did. I didn't include the modules required to run http_server's '__main__' code. What's left is just protocol support; you need to provide your own logic. Keep in mind that I was only trying to provide a Medusa library sufficient to get HTTP running (for Quixote, in my case), and I haven't bothered to strip out excess code (such as the __main__ section that you called). You'll find one file in the tarball (server.py) that's not of Medusan origin; that's the Quixote driver. It should provide a good starting point for anyone who wishes to, say, write a PyWCI connector for Medusa. (In the spirit of embracing emerging standards, I'll buy a beer for whoever makes a PyWCI 1.0 Container out of this, or out of the original Medusa code!) Andrew Kuchling maintains a maintenance release of Medusa at http://www.amk.ca/python/code/medusa . Stripping down Medusa is certainly an easy task; just start with http_server and follow the import dependencies. ? bient?t, -- G From jjl at pobox.com Thu Jan 15 10:17:03 2004 From: jjl at pobox.com (John J Lee) Date: Thu Jan 15 10:17:10 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: <4006AB6A.4000001@teksavvy.com> References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL><04Jan12.180627pst."58611"@synergy1.parc.xerox.com> <20040113123507.GA7812@rogue.amk.ca> <002101c3da20$e3728e10$c022fea9@QUENTEL> <4004AD55.30303@teksavvy.com> <000f01c3dae5$cb8876f0$c022fea9@QUENTEL> <4006AB6A.4000001@teksavvy.com> Message-ID: On Thu, 15 Jan 2004, Graham Fawcett wrote: [...] > (In the spirit of embracing emerging standards, I'll buy a beer for > whoever makes a PyWCI 1.0 Container out of this, or out of the original > Medusa code!) [...] Now, who says Python can't compete with Java on funding? John From lucid at escex.com Thu Jan 15 12:40:33 2004 From: lucid at escex.com (Lucid Drake) Date: Thu Jan 15 12:39:31 2004 Subject: [Web-SIG] Web Form Handling Techniques Message-ID: The following is a reprint of a message I sent to the tutor list a long time ago, that I haven't gotten around to discussing with anyone else and failed to hear a reply on the tutor list. Hoping someone here may want to have some dialog. ------ I'm learning to write unit tests and am trying to write them for a web application I'm working on. I'm currently writing a test that is to purposefully fail by passing invalid data to a function. However this brought me to thinking about how to handle errors. Let me set up a hypothetical scenario. I have a web form with name and age as its two fields. When a person enters his/her information and submits the form, I take the two fields and create a dict that contains {'name' : 'Foo', 'age' : '82'}. I pass this information to a function called, insertData which looks something like this: def insertData(data): # test data for validity if not data['name'].isalpha(): raise InvalidDataError(data['name'], "Name contains non-alpha characters") if not data['age'].isdigit(): raise InvalidDataError(data['age'], "Age contains non-digit characters") sql = """ INSERT INTO people (name, age) VALUES ('%s', '%s') """ % (data['name'], data['age']) executeSQL(sql) I should first check to see if the data is valid, meaning that the name contains only alpha characters and the age only containing numeric characters. If I raise an exception, how would one handle the reprinting of the web form with a red * next to the field that contains invalid data? If one field is in error, I can see that upon receiving an exception you can have your code just reprint the web form with the red star, but what about when both fields are in error, do you have the exception create a list which then your code checks to see if it exists and then loops through it to know what fields are in error? And then again, perhaps I'm completely wrong and am going about this in an amateur manner. I realize this is more of a style question, but that's what I'm interested in discussing. Thanks, --Sean From jjl at pobox.com Fri Jan 16 09:53:31 2004 From: jjl at pobox.com (John J Lee) Date: Fri Jan 16 09:53:38 2004 Subject: [Web-SIG] Web Form Handling Techniques In-Reply-To: References: Message-ID: On Thu, 15 Jan 2004, Lucid Drake wrote: > The following is a reprint of a message I sent to the tutor list a long > time ago, that I haven't gotten around to discussing with anyone else > and failed to hear a reply on the tutor list. Hoping someone here may > want to have some dialog. [...] This list is for discussion of new modules for the standard library. Try comp.lang.python (gatewayed to python-list@python.org). John From quentel.pierre at wanadoo.fr Sat Jan 17 04:48:25 2004 From: quentel.pierre at wanadoo.fr (Pierre Quentel) Date: Sat Jan 17 04:48:32 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL><04Jan12.180627pst."58611"@synergy1.parc.xerox.com> <20040113123507.GA7812@rogue.amk.ca> <002101c3da20$e3728e10$c022fea9@QUENTEL> <4004AD55.30303@teksavvy.com> <000f01c3dae5$cb8876f0$c022fea9@QUENTEL> <4006AB6A.4000001@teksavvy.com> Message-ID: <000601c3dcdf$0dd00460$c022fea9@QUENTEL> If it's planned to propose an asynchronous HTTP server in the standard distribution, we have two options so far : 1 - strip medusa to the minimal modules required to have a functional HTTP server 2 - add medusa-like functions to SimpleAsyncHTTPServer Before that, the first step would be to agree on a set of requirements ; I guess it should include at least serving GET and POST request, running Python scripts with access to HTTP environment (headers, form fields), but what else ? Session management ? Basic HTTP authentication ? Options 1 seems the best way to have a robust solution with the required functionalities. Could someone try to package it ? Graham perhaps ? My fear is that it could lead either to something too big to have a chance of being included in the standard Python library, or to a parallel version of medusa which someone would have to maintain besides the "standard" product I'm ready to work on option 2 if a list of requirements can be agreed upon Cordialement, Pierre From janssen at parc.com Mon Jan 19 18:32:04 2004 From: janssen at parc.com (Bill Janssen) Date: Mon Jan 19 18:32:29 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: Your message of "Sat, 17 Jan 2004 01:48:25 PST." <000601c3dcdf$0dd00460$c022fea9@QUENTEL> Message-ID: <04Jan19.153206pst."58611"@synergy1.parc.xerox.com> > Options 1 [use existing Medusa, possibly stripped-down a bit] seems > the best way to have a robust solution with the required > functionalities. I agree. I don't think we should strip too much, either. Bill From ianb at colorstudy.com Thu Jan 22 15:29:53 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Jan 22 15:30:42 2004 Subject: [Web-SIG] Web Form Handling Techniques In-Reply-To: References: Message-ID: <401032C1.5040900@colorstudy.com> Lucid Drake wrote: > I pass this information to a function called, insertData which > looks something like this: > > def insertData(data): > > # test data for validity > if not data['name'].isalpha(): > raise InvalidDataError(data['name'], "Name contains non-alpha > characters") > if not data['age'].isdigit(): > raise InvalidDataError(data['age'], "Age contains non-digit > characters") > > sql = """ > INSERT INTO people (name, age) VALUES ('%s', '%s') > """ % (data['name'], data['age']) > > executeSQL(sql) > > I should first check to see if the data is valid, meaning that the > name contains only alpha characters and the age only containing > numeric characters. Try something more like: def insertData(data): errors = {} if not data['name'].isalpha(): errors['name'] = 'Name contains non-alpha characters' .... if errors: raise InvalidDataError(data, errors) sql = ... Then look at the errors dictionary. If you were putting each validation (and exception) in a separate function, then you might loop over the fields and do a try:except InvalidDataError: and then accumulate the results into a dictionary in the same fashion. Ian From ianb at colorstudy.com Fri Jan 23 12:26:57 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Jan 23 12:27:32 2004 Subject: [Web-SIG] Web Container Interface Message-ID: <40115961.7020906@colorstudy.com> I'm afraid I hadn't kept up with Web-SIG back when this came up, but maybe that's okay since the topic seems to need revisiting anyway, since the topic has been pretty quiet for a while. So... what's the status of Phillip's web container proposal? The biggest issue I see with it is that the container needs to be passed as one of the arguments to runCGI. This allows for greater integration of the application and the container. Blech... I don't like those terms, because the container's primary responsibility isn't containment, and the application's function is probably more general than a web application's. I'd rather see the container called something like the HTTP Driver, or otherwise indicating that it is the bridge between this simplified CGI interface, and the full HTTP interface. It may provide an HTTP interface, like with Twisted, or it may simply be a bridge to Apache or another server. Container means nothing to me. But I digress... The interface is sparse, and doesn't allow for things like leaving headers parsed (which is something several containers would prefer). That's okay, but it would be nice -- even if only in an ad hoc manner -- if applications could query the container about what it is and how it works. To do that we need to pass some sort of reference to the container, even if the nature of that object remains undefined. The other (related) issue I see is the reliance on configuration. One of the goals, presumably, is that frameworks that encompass both container and application (as Webware does) partition themselves more clearly. That's no big deal -- Webware already has a function very like runCGI (Webware.WebKit.Application.dispatchRawRequest), and translating the method signatures is trivial. Unfortunately, the real effort is in allowing it all to be "configured" to work with different backends, or the backend with different applications. This reliance on configuration seems very Java. I hate configuration. A lot. It makes programmers into system administrators, to the detriment of both programmers and system administrators. If I was going to provide configuration for the Webware AppServer to be used with another application, or the Webware Application to be used with another container... well, I haven't figured out what I'd do. I would probably define a richer interface for both container and application, and then possibly provide configuration to provide that interface when the container or application doesn't natively provide it (and the AppServer and Application would both provide that interface, so that Webware's internal integration would remain configuration free). Perhaps this richer interface is the sort of thing we should be developing on an as-needed basis, like I would do for Webware, and hopefully as others did the same we'd informally start to agree on what that might look like. And frankly, until this interface *is* developed, I don't think runCGI alone is a useful interface. But it's a place to start. Anyway, those are my thoughts. From pje at telecommunity.com Fri Jan 23 14:11:07 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Jan 23 14:11:16 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <40115961.7020906@colorstudy.com> Message-ID: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> At 11:26 AM 1/23/04 -0600, Ian Bicking wrote: >I'm afraid I hadn't kept up with Web-SIG back when this came up, but maybe >that's okay since the topic seems to need revisiting anyway, since the >topic has been pretty quiet for a while. So... what's the status of >Phillip's web container proposal? It's waiting for me to have enough spare cycles to write another draft. I've been pretty bogged down with "real work" lately. >The biggest issue I see with it is that the container needs to be passed >as one of the arguments to runCGI. This allows for greater integration of >the application and the container. But there isn't any functionality the proposal requires of the container beyond this. For example, what would a plain CGI container provide? A FastCGI container? For a lowest-common-denominator interface, what "integration" is possible or needed? > Blech... I don't like those terms, because the container's primary > responsibility isn't containment, and the application's function is > probably more general than a web application's. I'd rather see the > container called something like the HTTP Driver, or otherwise indicating > that it is the bridge between this simplified CGI interface, and the full > HTTP interface. It may provide an HTTP interface, like with Twisted, or > it may simply be a bridge to Apache or another server. Container means > nothing to me. But I digress... Okay. I used container to imply that the application "runs within" the container, ala servlet containers and bean containers in Java. But I didn't want to call the application a "servlet", since that implies long-runningness. OTOH, maybe we *should* call them servlets, since only a plain CGI container won't be long-running. The Java servlet API has a total of 5 methods: init(), service(), destroy(), getServletConfig(), and getServletInfo(). service() is essentially runCGI() with request and response objects. From my POV, none of the other four are of much use for simple containers, and it was specifically intended that the proposal not try to go into the highly controversial request/response interface area. >The interface is sparse, and doesn't allow for things like leaving headers >parsed (which is something several containers would prefer). I'm not sure I'm following you. As it sits, the proposal *requires* parsed headers on output. It's unparsed headers that aren't supported. > That's okay, but it would be nice -- even if only in an ad hoc manner -- > if applications could query the container about what it is and how it > works. To do that we need to pass some sort of reference to the > container, even if the nature of that object remains undefined. If you could present a concrete use case, it'd be a lot easier to understand what a solution might be. >The other (related) issue I see is the reliance on configuration. One of >the goals, presumably, is that frameworks that encompass both container >and application (as Webware does) partition themselves more >clearly. That's no big deal -- Webware already has a function very like >runCGI (Webware.WebKit.Application.dispatchRawRequest), and translating >the method signatures is trivial. > >Unfortunately, the real effort is in allowing it all to be "configured" to >work with different backends, or the backend with different >applications. This reliance on configuration seems very Java. I hate >configuration. A lot. It makes programmers into system administrators, >to the detriment of both programmers and system administrators. I'm not sure I'm following this either. If you prefer to use code to connect things, what's wrong with: from some_container import Container from some_framework import App c = Container( app=App(someAppArg=42) ) c.run() # or whatever some_container says you do to start the Container ...for arbitrary values of Container and App? If you need to set up container-specific settings, we're only talking about adding a few more arguments to Container.__init__. And if you need app-specific configuration, the same thing is true for App. So, I'm not getting what problem you're trying to solve here. Presumably, a given framework and/or container developer would simply create wrapper classes corresponding to Container and App above, using options that are relevant to that particular container or framework. >Perhaps this richer interface is the sort of thing we should be developing >on an as-needed basis, like I would do for Webware, and hopefully as >others did the same we'd informally start to agree on what that might look >like. And frankly, until this interface *is* developed, I don't think >runCGI alone is a useful interface. But it's a place to start. And I don't think that you could get anywhere near a common interface until you first had something for people to build stuff on. :) When we all get tired of writing our 10-line scripts to pass keyword arguments to a container and an app (because we're doing the same things over and over again), we can always add some shortcuts into a WCI 2.0 API. In the meantime, we'll have had the advantage of being able to connect things up at all. From ianb at colorstudy.com Fri Jan 23 16:31:10 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Jan 23 16:31:47 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> References: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> Message-ID: <4011929E.7080100@colorstudy.com> Phillip J. Eby wrote: > It's waiting for me to have enough spare cycles to write another draft. > I've been pretty bogged down with "real work" lately. Real work is no fun. It's the damn material needs and desires that are to blame. >> The biggest issue I see with it is that the container needs to be >> passed as one of the arguments to runCGI. This allows for greater >> integration of the application and the container. > > > But there isn't any functionality the proposal requires of the container > beyond this. For example, what would a plain CGI container provide? A > FastCGI container? For a lowest-common-denominator interface, what > "integration" is possible or needed? Well, understanding the threading properties of the container or application for one. We might not be able to define it -- but at some point we probably want to be able to communicate the situation between the two. Or stuff about the persistence of the process. Or about the concurrency expectations of the application. And that's just the obvious stuff in one situation. There's logging, shared (container-wide) configuration, authentication, and it goes on and on. Unless you have a different concept of container and application than I'm thinking of. >> Blech... I don't like those terms, because the container's primary >> responsibility isn't containment, and the application's function is >> probably more general than a web application's. I'd rather see the >> container called something like the HTTP Driver, or otherwise >> indicating that it is the bridge between this simplified CGI >> interface, and the full HTTP interface. It may provide an HTTP >> interface, like with Twisted, or it may simply be a bridge to Apache >> or another server. Container means nothing to me. But I digress... > > > Okay. I used container to imply that the application "runs within" the > container, ala servlet containers and bean containers in Java. But I > didn't want to call the application a "servlet", since that implies > long-runningness. OTOH, maybe we *should* call them servlets, since > only a plain CGI container won't be long-running. I wonder about the psychological implications of using "servlet", but otherwise okay ;) > The Java servlet API has a total of 5 methods: init(), service(), > destroy(), getServletConfig(), and getServletInfo(). service() is > essentially runCGI() with request and response objects. From my POV, > none of the other four are of much use for simple containers, and it was > specifically intended that the proposal not try to go into the highly > controversial request/response interface area. So... it's just confusing to reuse the term "servlet" when it's not quite the same thing. >> The interface is sparse, and doesn't allow for things like leaving >> headers parsed (which is something several containers would prefer). > > I'm not sure I'm following you. As it sits, the proposal *requires* > parsed headers on output. It's unparsed headers that aren't supported. Well, I was thinking of the "parsed" representation of headers being a dictionary, or a list of two-tuples. Systems that speak HTTP directly will want access to these, so they can do all the CGI-ish stuff necessary (translate the Status header to the response code, look at Location). They can re-parse the headers on the way out, or they could be provided with the headers directly. Seeing as most frameworks collect the headers directly, it might be nice to save the trouble of serializing the headers and then parsing them, at least when possible. >> That's okay, but it would be nice -- even if only in an ad hoc manner >> -- if applications could query the container about what it is and how >> it works. To do that we need to pass some sort of reference to the >> container, even if the nature of that object remains undefined. > > If you could present a concrete use case, it'd be a lot easier to > understand what a solution might be. Well, the application may want to share configuration with other applications (even if it's where-do-I-find-my-configuration-file configuration). This could be container-wide, so the application could query the container for that information. Or, the application may want to know something about the URL layout. Mmm... which makes me think that PATH_INFO and SCRIPT_NAME need to be well defined for runCGI, or alternate variables need to be considered. Does SCRIPT_NAME (and several associated CGI variables, PATH_INFO included) point to the application, or the container, or what? The application may want to know if there are any shared services it can use, like a persistent storage for sessions, or a cron-type service. Scheduling tasks particularly comes to mind -- it has to be done very differently depending on the environment. In a long-running threaded environment you could probably do it yourself. In an async environment there's a specific callback that's part of the control loop. In a process-based environment you may have to talk to some parent process. In a CGI environment... well, you just don't get one there. >> The other (related) issue I see is the reliance on configuration. One >> of the goals, presumably, is that frameworks that encompass both >> container and application (as Webware does) partition themselves more >> clearly. That's no big deal -- Webware already has a function very >> like runCGI (Webware.WebKit.Application.dispatchRawRequest), and >> translating the method signatures is trivial. >> >> Unfortunately, the real effort is in allowing it all to be >> "configured" to work with different backends, or the backend with >> different applications. This reliance on configuration seems very >> Java. I hate configuration. A lot. It makes programmers into system >> administrators, to the detriment of both programmers and system >> administrators. > > > I'm not sure I'm following this either. If you prefer to use code to > connect things, what's wrong with: > > from some_container import Container > from some_framework import App > > c = Container( > app=App(someAppArg=42) > ) > > c.run() # or whatever some_container says you do to start the Container > > ...for arbitrary values of Container and App? If you need to set up > container-specific settings, we're only talking about adding a few more > arguments to Container.__init__. And if you need app-specific > configuration, the same thing is true for App. > > So, I'm not getting what problem you're trying to solve here. Maybe I'm not entirely sure either. Something's bothering me, though. I'm having a hard time picturing the use of this proposal to do things we can't do now, i.e., hooking up things that weren't built with each other specifically in mind. > Presumably, a given framework and/or container developer would simply > create wrapper classes corresponding to Container and App above, using > options that are relevant to that particular container or framework. > >> Perhaps this richer interface is the sort of thing we should be >> developing on an as-needed basis, like I would do for Webware, and >> hopefully as others did the same we'd informally start to agree on >> what that might look like. And frankly, until this interface *is* >> developed, I don't think runCGI alone is a useful interface. But it's >> a place to start. > > And I don't think that you could get anywhere near a common interface > until you first had something for people to build stuff on. :) When we > all get tired of writing our 10-line scripts to pass keyword arguments > to a container and an app (because we're doing the same things over and > over again), we can always add some shortcuts into a WCI 2.0 API. In > the meantime, we'll have had the advantage of being able to connect > things up at all. That's what I was conceding. We aren't ready to define what those interfaces will be, but we need to keep in mind that there *will* be other interfaces, and that the growth of those interfaces is important to the utility of the WCI. Simply passing in the container as an argument to runCGI is probably sufficient. Ian From pje at telecommunity.com Fri Jan 23 19:36:52 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Jan 23 19:36:58 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <4011929E.7080100@colorstudy.com> References: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> Message-ID: <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> At 03:31 PM 1/23/04 -0600, Ian Bicking wrote: >Well, I was thinking of the "parsed" representation of headers being a >dictionary, or a list of two-tuples. Systems that speak HTTP directly >will want access to these, so they can do all the CGI-ish stuff necessary >(translate the Status header to the response code, look at >Location). They can re-parse the headers on the way out, or they could be >provided with the headers directly. Seeing as most frameworks collect the >headers directly, it might be nice to save the trouble of serializing the >headers and then parsing them, at least when possible. Maybe it would help if I rephrase one of my goals for WCI 1.0: *no existing app may be left behind*. There are *plenty* of existing apps and frameworks that expect to send headers over the output stream for external parsing. Making the interface used pre-parsed headers or no-parse headers would strand those applications in the pre-WCI world, or force everybody to write header parsers. In my view, that simply voids the point of having the interface in the first place. >Well, the application may want to share configuration with other >applications (even if it's where-do-I-find-my-configuration-file >configuration). This could be container-wide, so the application could >query the container for that information. That seems to me like straying into general component framework territory. PEAK and Zope X3, for example, (and no doubt many more systems) have their own notions of how to share that kind of information. WCI should be agnostic about that. >Or, the application may want to know something about the URL layout. >Mmm... which makes me think that PATH_INFO and SCRIPT_NAME need to be well >defined for runCGI, or alternate variables need to be considered. Does >SCRIPT_NAME (and several associated CGI variables, PATH_INFO included) >point to the application, or the container, or what? An excellent point. That *should* be added to the spec. I simply assumed they point to the object that is receiving the runCGI() call. It should be made explicit, though. Thanks! >The application may want to know if there are any shared services it can >use, like a persistent storage for sessions, or a cron-type service. >Scheduling tasks particularly comes to mind -- it has to be done very >differently depending on the environment. In a long-running threaded >environment you could probably do it yourself. In an async environment >there's a specific callback that's part of the control loop. In a >process-based environment you may have to talk to some parent process. In >a CGI environment... well, you just don't get one there. Again, this is totally the job of an application framework, and will result in an instant religious war to try and put it in what ought to be a nice narrowly defined interface. It should be more like a power outlet, and less like an ethernet jack. :) >Maybe I'm not entirely sure either. Something's bothering me, though. I'm >having a hard time picturing the use of this proposal to do things we >can't do now, i.e., hooking up things that weren't built with each other >specifically in mind. I have applications that run on ancient versions of ZPublisher that run as happily under my WCI containers as they do under straight CGI or in ZServer. I imagine I could run them under a Webware container if there was one, or a mod_python container, or whatever else. If that's not "things that weren't built with each other specifically in mind", I don't know what is. Granted, in each case a few lines of glue code are required, to create the container and create the app. And if a library doesn't already have a WCI container or app wrapper available, you have to write those too. But that's a *lot* better than having to figure out from scratch how to connect two things today, if you can connect them at all. And, if somebody writes a WCI router, you should be able to use multiple apps in the same process. Indeed, people can write many different WCI routers, allowing them to use different mechanisms to find apps. >That's what I was conceding. We aren't ready to define what those >interfaces will be, but we need to keep in mind that there *will* be other >interfaces, and that the growth of those interfaces is important to the >utility of the WCI. Simply passing in the container as an argument to >runCGI is probably sufficient. Actually, I think it's more likely that a WCI 2.0 would grow another method, similar to the Java servlet 'init()' method, to tell the app that it's being "used by" a particular container. However, at that point we'll also want to know something about what people actually want to get from a container, if anything. From ianb at colorstudy.com Sat Jan 24 14:05:30 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Jan 24 14:05:40 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> References: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> Message-ID: <47425DD6-4EA0-11D8-8D75-000393C2D67E@colorstudy.com> On Jan 23, 2004, at 6:36 PM, Phillip J. Eby wrote: > At 03:31 PM 1/23/04 -0600, Ian Bicking wrote: >> Well, I was thinking of the "parsed" representation of headers being >> a dictionary, or a list of two-tuples. Systems that speak HTTP >> directly will want access to these, so they can do all the CGI-ish >> stuff necessary (translate the Status header to the response code, >> look at Location). They can re-parse the headers on the way out, or >> they could be provided with the headers directly. Seeing as most >> frameworks collect the headers directly, it might be nice to save the >> trouble of serializing the headers and then parsing them, at least >> when possible. > > Maybe it would help if I rephrase one of my goals for WCI 1.0: *no > existing app may be left behind*. There are *plenty* of existing apps > and frameworks that expect to send headers over the output stream for > external parsing. Making the interface used pre-parsed headers or > no-parse headers would strand those applications in the pre-WCI world, > or force everybody to write header parsers. In my view, that simply > voids the point of having the interface in the first place. I'm not proposing that applications be required to implement a richer interface than what WCI 1.0 currently requires. Rather, that applications be given the opportunity to implement a richer interface, even if we don't give any indication (in the PEP) of what that interface would be. >> Well, the application may want to share configuration with other >> applications (even if it's where-do-I-find-my-configuration-file >> configuration). This could be container-wide, so the application >> could query the container for that information. > > That seems to me like straying into general component framework > territory. PEAK and Zope X3, for example, (and no doubt many more > systems) have their own notions of how to share that kind of > information. WCI should be agnostic about that. These are use cases, not things I think need to be in WCI 1.0. We should work towards making these things possible -- which means we should leave room for WCI implementations to define their own interfaces that work in this direction. >> Or, the application may want to know something about the URL layout. >> Mmm... which makes me think that PATH_INFO and SCRIPT_NAME need to be >> well defined for runCGI, or alternate variables need to be >> considered. Does SCRIPT_NAME (and several associated CGI variables, >> PATH_INFO included) point to the application, or the container, or >> what? > > An excellent point. That *should* be added to the spec. I simply > assumed they point to the object that is receiving the runCGI() call. > It should be made explicit, though. Thanks! Hmm... that might not be backward compatible. In both Webware and I believe Zope (and probably several others) these point to the container root, not the resource root. I would propose that these variables remain undefined, and that additional variables be added with better defined meaning. >> The application may want to know if there are any shared services it >> can use, like a persistent storage for sessions, or a cron-type >> service. Scheduling tasks particularly comes to mind -- it has to be >> done very differently depending on the environment. In a >> long-running threaded environment you could probably do it yourself. >> In an async environment there's a specific callback that's part of >> the control loop. In a process-based environment you may have to >> talk to some parent process. In a CGI environment... well, you just >> don't get one there. > > Again, this is totally the job of an application framework, and will > result in an instant religious war to try and put it in what ought to > be a nice narrowly defined interface. It should be more like a power > outlet, and less like an ethernet jack. :) Again, I'm not proposing we set out a standard that covers these, but that we leave room for implementations to extend the standard to cover these cases. >> Maybe I'm not entirely sure either. Something's bothering me, >> though. I'm having a hard time picturing the use of this proposal to >> do things we can't do now, i.e., hooking up things that weren't built >> with each other specifically in mind. > > I have applications that run on ancient versions of ZPublisher that > run as happily under my WCI containers as they do under straight CGI > or in ZServer. I imagine I could run them under a Webware container > if there was one, or a mod_python container, or whatever else. > > If that's not "things that weren't built with each other specifically > in mind", I don't know what is. > > Granted, in each case a few lines of glue code are required, to create > the container and create the app. And if a library doesn't already > have a WCI container or app wrapper available, you have to write those > too. I remain skeptical. I can't really see how you'll run a Zope application under plain CGI, or how both a Zope application and a Twisted application could successfully run with the same interface. I still can see that this is a step in the right direction -- not a big step, but it's still the right direction. > But that's a *lot* better than having to figure out from scratch how > to connect two things today, if you can connect them at all. And, if > somebody writes a WCI router, you should be able to use multiple apps > in the same process. Indeed, people can write many different WCI > routers, allowing them to use different mechanisms to find apps. > > >> That's what I was conceding. We aren't ready to define what those >> interfaces will be, but we need to keep in mind that there *will* be >> other interfaces, and that the growth of those interfaces is >> important to the utility of the WCI. Simply passing in the container >> as an argument to runCGI is probably sufficient. > > Actually, I think it's more likely that a WCI 2.0 would grow another > method, similar to the Java servlet 'init()' method, to tell the app > that it's being "used by" a particular container. However, at that > point we'll also want to know something about what people actually > want to get from a container, if anything. I think WCI 1.0 needs an init(), or something equivalent (like simply passing the container in to runCGI()). I think it is useful to provide that hook, even if we give implementors no specification beyond that. A portable application might even go as far as doing: if container.__class__.__name__ == 'AppServer': # Probably Webware, we'll do X elif container.__class__.__name__ == 'ZPublisher': # Zope, we do Y ... If that's what it takes for the author of the application to get things to work, so be it. Portable applications in Python typically require some special cases as issues arise -- it's better to enable developers than to enforce agnosticism. This is the distinction I would make between Python's approach to portability and Java's -- and I think Python is more successful in its approach. Hopefully experience with these special cases can be used to specify container interfaces for WCI 2.0. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From pje at telecommunity.com Sat Jan 24 23:03:10 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Jan 24 22:59:26 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <47425DD6-4EA0-11D8-8D75-000393C2D67E@colorstudy.com> References: <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> Message-ID: <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> At 01:05 PM 1/24/04 -0600, Ian Bicking wrote: >>>Or, the application may want to know something about the URL layout. >>>Mmm... which makes me think that PATH_INFO and SCRIPT_NAME need to be >>>well defined for runCGI, or alternate variables need to be considered. >>>Does SCRIPT_NAME (and several associated CGI variables, PATH_INFO >>>included) point to the application, or the container, or what? >> >>An excellent point. That *should* be added to the spec. I simply >>assumed they point to the object that is receiving the runCGI() call. >>It should be made explicit, though. Thanks! > >Hmm... that might not be backward compatible. In both Webware and I >believe Zope (and probably several others) these point to the container >root, not the resource root. Um, that's a container issue, not an app issue. AFAIK, all existing apps expect SCRIPT_NAME to point to the *app* root. Also AFAIK, no generic WCI containers exist yet except the ones in PEAK, and they all just pass SCRIPT_NAME as pointing to the app they're running. This works just fine with Zope 2 and 3, as well as any properly written CGI. However, if you were to create a container that routed requests, then it would need to remove any parts from PATH_INFO that it consumes, and add them to SCRIPT_NAME, so that the application being run can use HTTP_HOST+SCRIPT_NAME to form a URL to the application. I am, however, beginning to see the awkwardness of "container" and "app". I'm wondering if maybe "gateway" and "service" would be better terms, and rename the whole thing the Python Web Service Gateway Interface. That is, the WSGI, perhaps to be pronounced "whisky". :) Anyway, it would then make sense that the values supplied by a gateway to a service should be such that SCRIPT_NAME is the path of the service, relative to HTTP_HOST, with PATH_INFO being the remainder of the URL. Since a service can also act as a gateway to nested WSGI services, it should of course deliver different values to them, so that they know their correct base URLs as well. >Again, I'm not proposing we set out a standard that covers these, but that >we leave room for implementations to extend the standard to cover these cases. Well, for Twisted, Zope, and PEAK at least, each framework has ways of specifying interface metadata about an object, including adapters. So, gateways provided by those frameworks would be free to try to introspect a service for a fancier interface. For example, PEAK has a 'suggestParentComponent()' API that can be used to sniff an object for PEAK component-ness (using PyProtocols), and then tell it what context it's being used in. Zope 3 has getAdapter(), and Twisted has ISomething(ob). None of these mechanisms require anything special in the WSGI spec to support them. And even if you don't have those, there's still good old fashioned 'hasattr'. You'll notice that I keep emphasizing the gateway inspecting and invoking the service, *not* the other way around. There should not *be* anything different from one gateway to the next, if at all possible, since that runs counter to the whole point of the exercise, which is allowing the same service to run in different gateways, *especially* ones it wasn't written for. That's why I want the spec to be mind-numbingly precise about what is and isn't required, and don't want to throw a vague parameter into it that has no meaning at all, existing solely to introduce *differences*, which are the mortal enemy of the goal of sameness. :) >I remain skeptical. I can't really see how you'll run a Zope application >under plain CGI, or how both a Zope application and a Twisted application >could successfully run with the same interface. They can't. But a Twisted-specific application can run in the same process as a Twisted gateway that runs non-Twisted web services. And Zope 2 and Zope 3 services should be runnable by any WSGI gateway: certainly I've been successful with them in my own gateways. (I have an ancient Zope 2 ZPublisher-based app that runs via a runCGI call right now as we speak, serving millions of dynamic hits per month.) And, if somebody wrote a WSGI gateway for Twisted, we could finally marry Zope 3 and Twisted, as people have been flirting with doing for quite some time now. It would have to use the Twisted threadpool, and a pool of service instances (Zope 3 "Publication" objects, presumably), but it certainly could be done. Again, this presumes that you could configure the Twisted gateway in a way that would let it manage an app pool properly. But let's say that it couldn't, and you had to give Twisted a single service instance that had to be shared for the entire web server. Well, you'd write a WSGI service that simply pulled a Zope top-level instance from a pool, and invoked it as a subservice (leaving the 'environ' alone). And now, your "pooling" WSGI service could be used with a pool of *any* WSGI service objects. The cool thing about having a spec is, it enables this sort of bridging and connecting just by its very existence. Somebody has a specific problem to solve, like reversing the gender of a cable, or creating a splitter or splicer... instead of trying to invent wire, plugs, and sockets. :) >I still can see that this is a step in the right direction -- not a big >step, but it's still the right direction. I think it's a bigger step, but we won't know until we try, do we? If it turns out we need more, there's nothing stopping us from doing a WSGI 1.1 after a few months. >>Actually, I think it's more likely that a WCI 2.0 would grow another >>method, similar to the Java servlet 'init()' method, to tell the app that >>it's being "used by" a particular container. However, at that point >>we'll also want to know something about what people actually want to get >>from a container, if anything. > >I think WCI 1.0 needs an init(), or something equivalent (like simply >passing the container in to runCGI()). I think it is useful to provide >that hook, even if we give implementors no specification beyond that. > >A portable application might even go as far as doing: > >if container.__class__.__name__ == 'AppServer': > # Probably Webware, we'll do X >elif container.__class__.__name__ == 'ZPublisher': > # Zope, we do Y >.. Hm. ZPublisher isn't a gateway in this spec. It's a service. A gateway would call runCGI to *get to* ZPublisher. If you want a ZPublisher app, you just write a ZPublisher app. Then, to run it, you configure a service wrapper around ZPublisher, and pass it to a gateway. But if you wrote a ZPublisher app, you're *way* below the level of this interface. The idea of the WCI/WSGI is to provide an HTTP gateway interface, to let apps built with any framework run via any sort of HTTP gateway. That's why I keep saying that access to other services is an app-framework job. >Portable applications in Python typically require some special cases as >issues arise -- it's better to enable developers than to enforce agnosticism. Even so, I think it's better to have that glue controlled by the application's integrator, rather than buried inside. If the service has options X and Y, let the integrator choose what behavior is appropriate for the gateway they're installing it in. Otherwise, you're forcing the app developer to know about all possible gateways, which again runs counter to the point. I think I see now why we seem slightly at odds in this discussion. Your example suggests that you are thinking of "app" as "what I write to do useful work", and that you will write apps that directly export a service for use by a gateway. While it's definitely possible to do this, you will more likely use a service wrapper that's pre-defined by your app framework such as Zope or Webware. This is a way to take a Zope or Webware application and run it in a different server environment, not to make the application itself portable between Zope and Webware. However, once the interface exists, it becomes possible to make a "router" service that would let you run both say, Zope and Webware applications within the same server environment. But at that point, if you want to actually integrate the apps themselves (as opposed to simply running them side by side), again I think the mechanism of choice would be to write a short startup script that passes references to each app's internal services to the other. At least, that's my take on "practicality beats purity". Why create an abstract spec for passing things around, when the integrator "on the spot" can just connect what they need? From ianb at colorstudy.com Mon Jan 26 23:28:57 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Jan 26 23:28:59 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> References: <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> Message-ID: <52744918-5081-11D8-89F2-000393C2D67E@colorstudy.com> On Jan 24, 2004, at 10:03 PM, Phillip J. Eby wrote: [snip] > I am, however, beginning to see the awkwardness of "container" and > "app". I'm wondering if maybe "gateway" and "service" would be better > terms, and rename the whole thing the Python Web Service Gateway > Interface. That is, the WSGI, perhaps to be pronounced "whisky". :) I like gateway. Service is very vague -- maybe the gateway is providing an HTTP service. There's all sorts of services. But I'm coming up blank on an alternative. [snipsnip] >> Again, I'm not proposing we set out a standard that covers these, but >> that we leave room for implementations to extend the standard to >> cover these cases. > > Well, for Twisted, Zope, and PEAK at least, each framework has ways of > specifying interface metadata about an object, including adapters. > So, gateways provided by those frameworks would be free to try to > introspect a service for a fancier interface. That's true, but I think the services should be able to query the gateway as well. I part because the "inside" of the application, where you start to battle portability issues, is likely to occur in the services. > For example, PEAK has a 'suggestParentComponent()' API that can be > used to sniff an object for PEAK component-ness (using PyProtocols), > and then tell it what context it's being used in. Zope 3 has > getAdapter(), and Twisted has ISomething(ob). Well, if we're going that way, then we don't need this interface specification at all. I don't know -- it's all overlapped at that point. If we're relying on adapters, we could just use adapters only. The only specification is that you give a service object to the gateway, and let the gateway figure out the rest. I don't know, once you bring those mechanisms into it it confuses everything. > None of these mechanisms require anything special in the WSGI spec to > support them. And even if you don't have those, there's still good > old fashioned 'hasattr'. You'll notice that I keep emphasizing the > gateway inspecting and invoking the service, *not* the other way > around. There should not *be* anything different from one gateway to > the next, if at all possible, since that runs counter to the whole > point of the exercise, which is allowing the same service to run in > different gateways, *especially* ones it wasn't written for. That's > why I want the spec to be mind-numbingly precise about what is and > isn't required, and don't want to throw a vague parameter into it that > has no meaning at all, existing solely to introduce *differences*, > which are the mortal enemy of the goal of sameness. :) Gateways aren't all the same. They just aren't. Gateways can't and won't be uniform. They can't attempt to be uniform except by taking on an every-expanding set of responsibilities. [snipping a bunch] >> Portable applications in Python typically require some special cases >> as issues arise -- it's better to enable developers than to enforce >> agnosticism. > > Even so, I think it's better to have that glue controlled by the > application's integrator, rather than buried inside. If the service > has options X and Y, let the integrator choose what behavior is > appropriate for the gateway they're installing it in. Otherwise, > you're forcing the app developer to know about all possible gateways, > which again runs counter to the point. > > I think I see now why we seem slightly at odds in this discussion. > Your example suggests that you are thinking of "app" as "what I write > to do useful work", and that you will write apps that directly export > a service for use by a gateway. While it's definitely possible to do > this, you will more likely use a service wrapper that's pre-defined by > your app framework such as Zope or Webware. This is a way to take a > Zope or Webware application and run it in a different server > environment, not to make the application itself portable between Zope > and Webware. No, I'm thinking of the application/service as a framework as well. But I just don't see how a framework can fit into such a minimal interface and still be portable. For instance, how do I implement sessions? I don't see a portable way to do that in the service given this interface. I don't even see an unportable way to do it. Unless you propose that the code to add the service to the gateway should include all sorts of options to cover all the particular details... but I don't see how that will help any. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From pje at telecommunity.com Tue Jan 27 11:03:46 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Jan 27 10:59:57 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <52744918-5081-11D8-89F2-000393C2D67E@colorstudy.com> References: <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> At 10:28 PM 1/26/04 -0600, Ian Bicking wrote: >That's true, but I think the services should be able to query the gateway >as well. I part because the "inside" of the application, where you start >to battle portability issues, is likely to occur in the services. I'll address that below. >Gateways aren't all the same. They just aren't. Gateways can't and won't >be uniform. They can't attempt to be uniform except by taking on an >every-expanding set of responsibilities. Whaaa? I now have three gateways I've written. One is based on BaseHTTPServer, one uses CGI, and one uses FastCGI. They're uniform. All they do is call aService.runCGI(). Sure, they vary in how they get the data, what kind of file-like objects they pass to runCGI, and so on. But none of that affects the services that run under them, which rely only on WSGI-guaranteed properties of those objects. Perhaps we aren't in agreement on what "same" and "uniform" mean. Or perhaps it's what "gateway" means that we're not agreeing on. What I mean by "gateway" is, "source of HTTP requests and target of responses", and that is *all*. If we ignore gateways that are also services (like request routers), then there it's likely that there will be *very* few gateway implementations to begin with: typically one per HTTP server type, plus a few for gateway protocols like CGI, FastCGI, SCGI, etc. Given that many frameworks implement protocol gateways internally, we may initially see many competing protocol-based gateways, as framework developers are unlikely to want to tell their users to go find a gateway somewhere else. The other type of gateway one may see is request routers that mangle 'environ' and select a service to run, or pre/post-processors that replace stdin or stdout and . As long as these conform to the interface, this is still "uniform" in my view. >No, I'm thinking of the application/service as a framework as well. >But I just don't see how a framework can fit into such a minimal interface >and still be portable. Because nearly every single framework that exists right now is able to run under a CGI-like protocol. And because *every* one of them is based on HTTP, and CGI is a reasonably straightforward and reversible mapping from HTTP. > For instance, how do I implement sessions? I don't see a portable way > to do that in the service given this interface. I don't even see an > unportable way to do it. If you were writing a plain CGI, how would you implement sessions? That's precisely how you'd do it here. And any existing framework that does sessions for HTTP is mappable to this. For example, my 6-years-old ZPublisher app that's cranking out 4 million dynamic pages or so per month under this interface, uses cookies. It uses a Zope HTTPResponse object, calling 'setCookie()', and it reads the cookie from an HTTPRequest object. Now, the top level service object passed to the gateway knows nothing of these details. It simply calls Zope's 'publish()' function with the 'stdin', 'stdout', 'stderr', and 'environ' provided by the gateway. Everything below that is a plain ZPublisher application, that has been run via a variety of mechanisms over the years. Neither the service nor the app care about each other's details, and the service doesn't care what kind of gateway is running it. I've run it under a variety of gateways, and as long as the gateway conforms to the interface, everything below there is just fine. > Unless you propose that the code to add the service to the gateway > should include all sorts of options to cover all the particular > details... but I don't see how that will help any. It's a way to get from HTTP to an *existing* web app framework. Not a new framework. From pje at telecommunity.com Tue Jan 27 11:21:35 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Jan 27 11:17:47 2004 Subject: [Web-SIG] Sample WSGI server Message-ID: <5.1.0.14.0.20040127110511.03e1ac90@mail.telecommunity.com> FYI, you can find a sample WSGI gateway based on BaseHTTPServer at: http://cvs.eby-sarna.com/PEAK/src/peak/util/WSGIServer.py?rev=HEAD&content-type=text/vnd.viewcvs-markup Although it's in the 'peak.util' package, it can be used as a standalone module without installing or using any portion of PEAK whatsoever. This example should give a rough idea of the complexity of implementing WSGI in a web server that does not have CGI or a CGI-like internal interface already in existence. Implementing a gateway over an existing CGI or CGI-like protocol (e.g. FastCGI, SCGI, etc.) is much simpler as it requires only ensuring that the file-like objects support the right methods and that 'isinstance(environ,dict)', and that 'environ' may be safely modified by the called service. In the most trivial case, this short snippet: #!python from someservice import myservice import sys,os aService = myservice(someparam="something") aService.runCGI(sys.stdin,sys.stdout,sys.stderr,os.environ.copy()) is sufficient to run a WSGI service under CGI. From gward at python.net Tue Jan 27 21:35:44 2004 From: gward at python.net (Greg Ward) Date: Tue Jan 27 21:35:48 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> References: <40115961.7020906@colorstudy.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> Message-ID: <20040128023544.GA821@cthulhu.gerg.ca> On 23 January 2004, Phillip J. Eby said: > But there isn't any functionality the proposal requires of the container > beyond this. For example, what would a plain CGI container provide? A > FastCGI container? For a lowest-common-denominator interface, what > "integration" is possible or needed? [...stepping into this discussion a few days late...] I spend a lot of time with the Java Servlet API at work these days, and I think I have a pretty good handle on what's good about it and what's not good. Obviously, I think we should rip off the good ideas, and leave the bad ones behind. First of all, the absolute #1 best thing about the Java Servlet API is that it provides a complete but simple object-oriented wrapper for HTTP request-processing in the form of the HttpServletRequest and HttpServletResponse classes. (I can't say offhand if the wrapping is 100% perfectly complete, but I can say that it provides clean, simple access to every feature of HTTP I need in my day-to-day work.) (I'm fairly agnostic on the issue of whether these should be one class or two. The fact that no one has proposed a good name for the combined request+response object is enough to put me in the "two objects" camp. Also, I think I would prefer "response.set_header(...)" to "thingy.set_response_header(...)". But don't conflate my slight preference here for my appreciation of Java's request/response classes!) OTOH, the worst thing about the Java Servlet API is the notion of a servlet. There are two problems here: * premature overgeneralization; it looks like the servlet API was designed to allow people to someday write servlets for FTP servers or other as-yet-unknown protocols. This is stupid; web applications use HTTP. Period. * the level of granularity is wrong: most Java web applications consist of multiple servlets, and if the code I work on in my day job is any indication, there's a lot of overlapping code among the servlets in a given application. Thus, the point of entry between a web application container and a collection of web applications should be... the web application. (The Java community has figured this out; when you administer a modern servlet container like Tomcat, you generally work at the level of web apps, rather than individual servlets or the whole container. The existence of "servlets" as a separate entity complicates both administering a servlet container and writing web applications. It's a nasty design flaw that we should strenuously avoid.) The other thing that bugs me about the Java world is that their web application containers -- Tomcat in particular, since that's the one I use everyday -- are enormously complex, bloated beasts. They're hard to understand, hard to setup, and hard to administer. They keep thousands of people employed at banging their heads against confusing, arcane XML config files. (Come to think of it, the same could be said of Java web development frameworks.) My gut feeling is that a barebones web container -- say, one that enables Quixote applications to run as FastCGI scripts, say -- should fit into 10 lines of Python code. A super-duper, whiz-bang, all-singing, all-dancing container -- enable applications written under N different frameworks to execute using M different models -- should fit in roughly 1000 lines of Python. One big challenge I can foresee: the Python community will never allow a standard web container interface to mandate a particular execution model, as the Java Servlet API does. Writing a single API that handles both Twisted/Medusa-style (event-driven I/O) and Java-style (threaded I/O) will be difficult; it might be impossible. (Hmmm, maybe there is a third model: traditional Unix-style (multiprocess I/O).) I would rather see two (three?) related APIs than one really complicated API that tries to cover all the bases. Finally, in reponse to a later remark by Philip (I think): I definitely like calling the things that web developers write "web applications". "Web service" implies to me a special case of web application that does not have a human user interface. And I'm perfectly comfortable calling the software that runs web applications an "application container". "Application engine" and "application server" also make sense to me. Whatever terminology we pick, it should be carefully defined in that PEP! Greg -- Greg Ward http://www.gerg.ca/ Heisenberg may have slept here. From pje at telecommunity.com Tue Jan 27 23:39:22 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Jan 27 23:35:35 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <20040128023544.GA821@cthulhu.gerg.ca> References: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <40115961.7020906@colorstudy.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> Message-ID: <5.1.0.14.0.20040127231953.021ac780@mail.telecommunity.com> At 09:35 PM 1/27/04 -0500, Greg Ward wrote: >First of all, the absolute #1 best thing about the Java Servlet API is >that it provides a complete but simple object-oriented wrapper for HTTP >request-processing in the form of the HttpServletRequest and >HttpServletResponse classes. (I can't say offhand if the wrapping is >100% perfectly complete, but I can say that it provides clean, simple >access to every feature of HTTP I need in my day-to-day work.) Unfortunately, this is also 100% out of scope for the interface, because every framework out there already has its own request and response types. If Python had this from the mythical "day one", we'd have had a chance, but alas it's far too late for that. >OTOH, the worst thing about the Java Servlet API is the notion of a >servlet. There are two problems here: > > * premature overgeneralization; it looks like the servlet API was > designed to allow people to someday write servlets for FTP > servers or other as-yet-unknown protocols. This is stupid; > web applications use HTTP. Period. I'll take that as a +1 for the HTTP-specificity of the existing proposal. :) > * the level of granularity is wrong: most Java web applications > consist of multiple servlets, and if the code I work on in my > day job is any indication, there's a lot of overlapping code > among the servlets in a given application. Thus, the point > of entry between a web application container and a collection > of web applications should be... the web application. > > (The Java community has figured this out; when you administer a > modern servlet container like Tomcat, you generally work at the > level of web apps, rather than individual servlets or the whole > container. The existence of "servlets" as a separate entity > complicates both administering a servlet container and writing web > applications. It's a nasty design flaw that we should strenuously > avoid.) I'll take this, in conjunction with some of your later comments below, as a vote in favor of retaining "application" as the name for the thing that a gateway calls 'runCGI' on. :) >The other thing that bugs me about the Java world is that their web >application containers -- Tomcat in particular, since that's the one I >use everyday -- are enormously complex, bloated beasts. They're hard to >understand, hard to setup, and hard to administer. They keep thousands >of people employed at banging their heads against confusing, arcane XML >config files. (Come to think of it, the same could be said of Java web >development frameworks.) > >My gut feeling is that a barebones web container -- say, one that >enables Quixote applications to run as FastCGI scripts, say -- should >fit into 10 lines of Python code. A super-duper, whiz-bang, >all-singing, all-dancing container -- enable applications written under >N different frameworks to execute using M different models -- should fit >in roughly 1000 lines of Python. All the containers I've written so far weigh in at a lot less than 100 lines; even the BaseHTTPServer one was only maybe 200. I've only tested for N=3 and and M=3 so far, though. (Three frameworks: Zope 2, Zope 3, plain CGI; Three models: plain CGI, FastCGI, and BaseHTTPServer.) >One big challenge I can foresee: the Python community will never allow a >standard web container interface to mandate a particular execution >model, as the Java Servlet API does. Writing a single API that handles >both Twisted/Medusa-style (event-driven I/O) and Java-style (threaded >I/O) will be difficult; it might be impossible. The standard way in both Zope and Twisted to deal with this is to run blocking applications in a thread, allocated from a thread pool, while the event dispatch loop runs in the "main" thread. So, both frameworks already offer ready-made APIs for this sort of thing. In other words, it's not impossible, and though it might be difficult, the work has already been done in some major frameworks that have event-driven I/O loops. > (Hmmm, maybe there is a >third model: traditional Unix-style (multiprocess I/O).) I would rather >see two (three?) related APIs than one really complicated API that tries >to cover all the bases. Hm, maybe I actually should bump the number of models I listed above. :) My "millions of pages/month" app uses a preforking process model of serving FastCGI. It wraps my existing FastCGI container that uses -- you guessed it -- runCGI(). Oh, and it uses event-driven I/O loops to communicate between the parent and the subprocesses, as well as to monitor the FastCGI socket... I'm saying all this not to brag about my "mad skillz", but to point out that I wrote the 'runCGI' proposal to cover *actual* container implementations that I had already used in a variety of process models (and at least two protocols) in production environments. It is not a theoretical proposal, but a report on actual use experience. >Finally, in reponse to a later remark by Philip (I think): I definitely >like calling the things that web developers write "web applications". >"Web service" implies to me a special case of web application that does >not have a human user interface. And I'm perfectly comfortable calling >the software that runs web applications an "application container". >"Application engine" and "application server" also make sense to me. >Whatever terminology we pick, it should be carefully defined in that >PEP! Ian's comments made it appear to me that "application" was too vague and potentially prone to misunderstandings. "Service" seemed to eliminate some of those. "Servlet" is another possibility, but of course it would carry some inaccurate connotations from Java. Certainly, I'm open to other suggestions, but I'd prefer something that starts with an 'S' now that I've gone and written a 'WSGIServer' module... ;) From sholden at holdenweb.com Wed Jan 28 08:59:15 2004 From: sholden at holdenweb.com (Steve Holden) Date: Wed Jan 28 09:06:08 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <20040128023544.GA821@cthulhu.gerg.ca> Message-ID: [...] > > My gut feeling is that a barebones web container -- say, one that > enables Quixote applications to run as FastCGI scripts, say -- should > fit into 10 lines of Python code. A super-duper, whiz-bang, > all-singing, all-dancing container -- enable applications > written under > N different frameworks to execute using M different models -- > should fit > in roughly 1000 lines of Python. > sprint! > One big challenge I can foresee: the Python community will > never allow a > standard web container interface to mandate a particular execution > model, as the Java Servlet API does. Writing a single API > that handles > both Twisted/Medusa-style (event-driven I/O) and Java-style (threaded > I/O) will be difficult; it might be impossible. (Hmmm, maybe > there is a > third model: traditional Unix-style (multiprocess I/O).) I > would rather > see two (three?) related APIs than one really complicated API > that tries > to cover all the bases. > That would be suitably Pythonic > Finally, in reponse to a later remark by Philip (I think): I > definitely > like calling the things that web developers write "web applications". > "Web service" implies to me a special case of web application > that does > not have a human user interface. And I'm perfectly > comfortable calling > the software that runs web applications an "application container". > "Application engine" and "application server" also make sense to me. > Whatever terminology we pick, it should be carefully defined in that > PEP! > This all sounds reasonable. regards Steve From pje at telecommunity.com Wed Jan 28 10:08:45 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Jan 28 10:04:56 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.0.14.0.20040127231953.021ac780@mail.telecommunity.com> References: <20040128023544.GA821@cthulhu.gerg.ca> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <40115961.7020906@colorstudy.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> Message-ID: <5.1.0.14.0.20040128100615.020f3720@mail.telecommunity.com> At 11:39 PM 1/27/04 -0500, Phillip J. Eby wrote: >At 09:35 PM 1/27/04 -0500, Greg Ward wrote: >>My gut feeling is that a barebones web container -- say, one that >>enables Quixote applications to run as FastCGI scripts, say -- should >>fit into 10 lines of Python code. A super-duper, whiz-bang, >>all-singing, all-dancing container -- enable applications written under >>N different frameworks to execute using M different models -- should fit >>in roughly 1000 lines of Python. > >All the containers I've written so far weigh in at a lot less than 100 >lines; even the BaseHTTPServer one was only maybe 200. I've only tested >for N=3 and and M=3 so far, Oops, that was supposed to be "a lot less than 1000 lines". From amk at amk.ca Wed Jan 28 10:27:39 2004 From: amk at amk.ca (A.M. Kuchling) Date: Wed Jan 28 10:28:24 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.0.14.0.20040127231953.021ac780@mail.telecommunity.com> References: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <40115961.7020906@colorstudy.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.0.14.0.20040127231953.021ac780@mail.telecommunity.com> Message-ID: <20040128152739.GA25947@rogue.amk.ca> On Tue, Jan 27, 2004 at 11:39:22PM -0500, Phillip J. Eby wrote: > Unfortunately, this is also 100% out of scope for the interface, because > every framework out there already has its own request and response > types. If Python had this from the mythical "day one", we'd have had a > chance, but alas it's far too late for that. Well, I'm pessimistic about managing to get request/response classes that everyone is happy with, but it's also potentially low-hanging fruit that would be a significant step toward drawing the Python/web community together, as would Philip's WSGI proposal. We should at least take a stab at it. --amk From ianb at colorstudy.com Wed Jan 28 13:57:35 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Jan 28 13:58:27 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> References: <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> Message-ID: <4018061F.4090407@colorstudy.com> Phillip J. Eby wrote: >> For instance, how do I implement sessions? I don't see a portable >> way to do that in the service given this interface. I don't even see >> an unportable way to do it. > > If you were writing a plain CGI, how would you implement sessions? > That's precisely how you'd do it here. And any existing framework that > does sessions for HTTP is mappable to this. For example, my 6-years-old > ZPublisher app that's cranking out 4 million dynamic pages or so per > month under this interface, uses cookies. It uses a Zope HTTPResponse > object, calling 'setCookie()', and it reads the cookie from an > HTTPRequest object. Well, let's use sessions as an example, since it's really the example that gives me the most concern. If we have a goal to move existing frameworks into a application/service model, connecting to multiple gateways, we have to support the behavior that already exists in those frameworks. Cookies alone aren't sufficient. Currently Webware's primary session mechanism keeps the session in memory until the session reaches a certain age (likely because the user has gone away), at which time the session is pickled and moved to disk, in case the user comes back. This is done with a scheduling service that is part of Webware, as is the ultimate expiration (where the file is removed). Locking is handled with thread locks, because Webware expects a single-process model. Also, when the AppServer is shut down or restarted (which can happen very often during development), all sessions are pickled and written to disk. So, there's an existing session mechanism. The exact details of the implementation don't have to be maintained, but the external interface and semantics should be. That involves: * Sessions that persist over multiple requests. * Sessions persist over server restarts. * Objects put into the session do not need to be stored in client-side cookies. * Some concurrency protection (applications still need to consider their own concurrency requirements). * Sessions are expired in a consistent, scheduled manner. Now, most of these can be implemented for CGI. The last one would probably be slightly different, in that there may or may not be a scheduling service -- cron job or otherwise -- so an ad hoc scheduler that runs whenever a session is fetched may be necessary. But the *implementation* would be significantly different depending on the context. Webware's currently implementation wouldn't work in CGI, and to determine an optimal implementation it has to know something about the environment it's being run in. And, to make it a little harder, we've often had requests to implement memory-only sessions, to put unpickleable objects into the session. Usually we just tell people to keep these values in module globals. But module globals are also unportable across environments. Probably the most compelling example of putting a unpickleable object into a session is to use database transactions that span multiple requests. Of course, outside of a threaded server this just isn't possible at all. (Well, maybe some clever use of a Pyro threaded server that serves up only database connections...) Something that isn't an issue is getting the session ID from different locations -- a cookie, a URL variable, or a portion of the URL path. The application can handle this on its own -- but that's only a small part of the picture. Pooling and caching systems also have to deal with many of the same issues. Ian From ianb at colorstudy.com Wed Jan 28 14:16:53 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Jan 28 14:17:49 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <20040128023544.GA821@cthulhu.gerg.ca> References: <40115961.7020906@colorstudy.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <20040128023544.GA821@cthulhu.gerg.ca> Message-ID: <40180AA5.4070007@colorstudy.com> Greg Ward wrote: > On 23 January 2004, Phillip J. Eby said: > >>But there isn't any functionality the proposal requires of the container >>beyond this. For example, what would a plain CGI container provide? A >>FastCGI container? For a lowest-common-denominator interface, what >>"integration" is possible or needed? > > [...stepping into this discussion a few days late...] > > I spend a lot of time with the Java Servlet API at work these days, and > I think I have a pretty good handle on what's good about it and what's > not good. Obviously, I think we should rip off the good ideas, and > leave the bad ones behind. > > First of all, the absolute #1 best thing about the Java Servlet API is > that it provides a complete but simple object-oriented wrapper for HTTP > request-processing in the form of the HttpServletRequest and > HttpServletResponse classes. (I can't say offhand if the wrapping is > 100% perfectly complete, but I can say that it provides clean, simple > access to every feature of HTTP I need in my day-to-day work.) I agree that a common request/response object(s) would be very useful, though stdout/stdin/environ encompasses all the same information, just in a crufty sort of way. The advantage is really when you want to make portable libraries, like a form processor that needs access to the variables, but may occassionally want to look at the User-Agent header too (for dealing with DHTML compatibility issues). I think people are being too pessimistic about it -- I don't see why it should be that hard to agree on a simple request/response object, with the expectation that current frameworks will wrap the object with different interfaces as necessary (or provide wrappers to make their objects look like the standard). I don't think anything about WSGI (or whatever acronym we're going by ;) would preclude this. Given such an interface, it would even be fairly simple to create a wrapper that creates the object given stdin/stdout/environ, then passes the call onto another object. It would be a reasonable way to layer the two standards, without introducing any real inefficiency (well... depending on the way the applications are layered). > * premature overgeneralization; it looks like the servlet API was > designed to allow people to someday write servlets for FTP > servers or other as-yet-unknown protocols. This is stupid; > web applications use HTTP. Period. And it seems easier to translate FTP requests to HTTP requests than deal with the premature generalization. Well, easy to translate it into objects -- I think it would be somewhat annoying to translate an FTP request into stdin/stdout/environ ;) > * the level of granularity is wrong: most Java web applications > consist of multiple servlets, and if the code I work on in my > day job is any indication, there's a lot of overlapping code > among the servlets in a given application. Thus, the point > of entry between a web application container and a collection > of web applications should be... the web application. > > (The Java community has figured this out; when you administer a > modern servlet container like Tomcat, you generally work at the > level of web apps, rather than individual servlets or the whole > container. The existence of "servlets" as a separate entity > complicates both administering a servlet container and writing web > applications. It's a nasty design flaw that we should strenuously > avoid.) Can you expand? Do you mean that servlets as an exposed resource are unnecessary, and that the only exposed resource should be the application as a whole? Which may in turn be factored into servlets, but that's up to the application to determine...? Ian From pje at telecommunity.com Wed Jan 28 14:41:34 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Jan 28 14:42:22 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <4018061F.4090407@colorstudy.com> References: <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> At 12:57 PM 1/28/04 -0600, Ian Bicking wrote: >Currently Webware's primary session mechanism keeps the session in memory >until the session reaches a certain age (likely because the user has gone >away), at which time the session is pickled and moved to disk, in case the >user comes back. This is done with a scheduling service that is part of >Webware, as is the ultimate expiration (where the file is >removed). Locking is handled with thread locks, because Webware expects a >single-process model. Also, when the AppServer is shut down or restarted >(which can happen very often during development), all sessions are pickled >and written to disk. So, it sounds like Webware would say that its services were only runnable under single-process gateways. That sounds like a from-the-ground-up architectural decision. It's not going to be runnable under CGI or mod_python, certainly. But it *would* run in a single-process Python web server, or using FastCGI either as a dynamic app with maxclass=1, or as an "external" FastCGI app. >So, there's an existing session mechanism. The exact details of the >implementation don't have to be maintained, but the external interface and >semantics should be. That involves: > >* Sessions that persist over multiple requests. >* Sessions persist over server restarts. >* Objects put into the session do not need to be stored in client-side >cookies. >* Some concurrency protection (applications still need to consider their >own concurrency requirements). >* Sessions are expired in a consistent, scheduled manner. > >Now, most of these can be implemented for CGI. The last one would >probably be slightly different, in that there may or may not be a >scheduling service -- cron job or otherwise -- so an ad hoc scheduler that >runs whenever a session is fetched may be necessary. But the >*implementation* would be significantly different depending on the >context. Webware's currently implementation wouldn't work in CGI, and to >determine an optimal implementation it has to know something about the >environment it's being run in. I understand where you're coming from, but the proposal isn't intended to make fish into birds or vice versa. I'm pretty sure it was discussed previously that applications that assume a particular process model are only going to run in gateways that can provide that process model. That's *still* more gateways than they can run in now! >And, to make it a little harder, we've often had requests to implement >memory-only sessions, to put unpickleable objects into the session. >Usually we just tell people to keep these values in module globals. But >module globals are also unportable across environments. But if your framework only supports a "long running, single-process" architecture, module globals would work just fine with any gateway that supports that. Frankly, "multi-process only" and "short running" gateways are going to be in the minority anyway. The only gateway I know of that's likely to *require* multiple processes is mod_python, and the only gateway that's likely to be "short running" is plain CGI. So, it's not like requiring an "LR/SP" gateway is going to dramatically limit the choice of gateways for Webware. Obviously, the PEP needs to have examples of these process models added, and clarify the nature of the restrictions. Who knows, maybe if we talk about this long enough maybe we'll be able to clarify the process models well enough to define a variable that services can expose to indicate their compatibility with various process models. At that point, we almost might as well go ahead and make the API have all five Java servlet methods, call the objects servlets, and be done with it. :) From ianb at colorstudy.com Wed Jan 28 15:11:54 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Jan 28 15:12:44 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> References: <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> Message-ID: <4018178A.6070605@colorstudy.com> Phillip J. Eby wrote: > At 12:57 PM 1/28/04 -0600, Ian Bicking wrote: > >> Currently Webware's primary session mechanism keeps the session in >> memory until the session reaches a certain age (likely because the >> user has gone away), at which time the session is pickled and moved to >> disk, in case the user comes back. This is done with a scheduling >> service that is part of Webware, as is the ultimate expiration (where >> the file is removed). Locking is handled with thread locks, because >> Webware expects a single-process model. Also, when the AppServer is >> shut down or restarted (which can happen very often during >> development), all sessions are pickled and written to disk. > > > So, it sounds like Webware would say that its services were only > runnable under single-process gateways. That sounds like a > from-the-ground-up architectural decision. It's not going to be > runnable under CGI or mod_python, certainly. But it *would* run in a > single-process Python web server, or using FastCGI either as a dynamic > app with maxclass=1, or as an "external" FastCGI app. Specific Webware applications need a single-process model. The framework as a whole could be used in a multi-process or short running model (and a short-running implementation exists), but it would be changed to write sessions out to disk immediately in that case, and use some sort of disk-based locking instead of thread locks. But there's no reason to implement sessions that way when it's not necessary. Portable Real Web Applications could also adapt their behavior depending on how they were being run, for instance creating an abstraction that cached data in module globals if available (and unlike sessions, that can work in multi-process models), or wrote them to disk otherwise. In a long-running model, the framework also needs to know when the environment is shutting down (so it can write data out to disk). Maybe atexit would be sufficient, I'm not sure (it's not what we use now). >> So, there's an existing session mechanism. The exact details of the >> implementation don't have to be maintained, but the external interface >> and semantics should be. That involves: >> >> * Sessions that persist over multiple requests. >> * Sessions persist over server restarts. >> * Objects put into the session do not need to be stored in client-side >> cookies. >> * Some concurrency protection (applications still need to consider >> their own concurrency requirements). >> * Sessions are expired in a consistent, scheduled manner. >> >> Now, most of these can be implemented for CGI. The last one would >> probably be slightly different, in that there may or may not be a >> scheduling service -- cron job or otherwise -- so an ad hoc scheduler >> that runs whenever a session is fetched may be necessary. But the >> *implementation* would be significantly different depending on the >> context. Webware's currently implementation wouldn't work in CGI, and >> to determine an optimal implementation it has to know something about >> the environment it's being run in. > > > I understand where you're coming from, but the proposal isn't intended > to make fish into birds or vice versa. I'm pretty sure it was discussed > previously that applications that assume a particular process model are > only going to run in gateways that can provide that process model. > That's *still* more gateways than they can run in now! But why not provide that one little hook (a link to the gateway/container) that would allow systems to develop greater portability? (Hmm... when I think in terms of execution model, container seems much more appropriate of a term than gateway) Ultimately, I see a significant goal to be the ability to run actual applications in multiple environments. So Mailman (for instance) could run in its own space (Twisted), in Apache (mod_python), CGI, or something else entirely. We shouldn't entirely ignore the applications. >> And, to make it a little harder, we've often had requests to implement >> memory-only sessions, to put unpickleable objects into the session. >> Usually we just tell people to keep these values in module globals. >> But module globals are also unportable across environments. > > But if your framework only supports a "long running, single-process" > architecture, module globals would work just fine with any gateway that > supports that. > > Frankly, "multi-process only" and "short running" gateways are going to > be in the minority anyway. The only gateway I know of that's likely to > *require* multiple processes is mod_python, and the only gateway that's > likely to be "short running" is plain CGI. So, it's not like requiring > an "LR/SP" gateway is going to dramatically limit the choice of gateways > for Webware. SkunkWeb also uses multiple processes, run in a separate space from Apache. > Obviously, the PEP needs to have examples of these process models added, > and clarify the nature of the restrictions. Who knows, maybe if we talk > about this long enough maybe we'll be able to clarify the process models > well enough to define a variable that services can expose to indicate > their compatibility with various process models. I think that would be very useful. Here's my list: Single process per request: * process ends with request (CGI, Webware OneShot) * process reused (mod_python, SkunkWeb) Multiple requests per process: * Asynchronous (implied to be single-threaded) (Twisted, Medusa, CherryPy, BaseHTTPServer) * Threaded (Zope, Webware, CherryPy with different settings) Most asynch environments can be turned into threaded systems after runCGI. Webware, at least, is threaded at the point runCGI is called (maybe to its detriment), but many systems are not (including Zope, I think, and probably CherryPy). I believe most other frameworks are built on mod_python, CGI, or FastCGI so they are covered under these categories. I think there might be a separate threaded model for quixote, but I don't know if that portion has its own name. I'm not sure I understand FastCGI well enough to classify it. > At that point, we almost might as well go ahead and make the API have > all five Java servlet methods, call the objects servlets, and be done > with it. :) From amk at amk.ca Wed Jan 28 15:23:34 2004 From: amk at amk.ca (A.M. Kuchling) Date: Wed Jan 28 15:24:18 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <4018178A.6070605@colorstudy.com> References: <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <4018178A.6070605@colorstudy.com> Message-ID: <20040128202334.GA468@rogue.amk.ca> On Wed, Jan 28, 2004 at 02:11:54PM -0600, Ian Bicking wrote: > so they are covered under these categories. I think there might be a > separate threaded model for quixote, but I don't know if that portion > has its own name. SCGI is multiprocess, so it's like mod_python (just not embedded). Titus Brown's PyWX, which embeds Python inside AOLserver, can use Quixote in a threaded mode, but nothing in Quixote supports that specially. --amk From pje at telecommunity.com Wed Jan 28 16:56:53 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Jan 28 16:57:48 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <4018178A.6070605@colorstudy.com> References: <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> Message-ID: <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> At 02:11 PM 1/28/04 -0600, Ian Bicking wrote: >Portable Real Web Applications could also adapt their behavior depending >on how they were being run, for instance creating an abstraction that >cached data in module globals if available (and unlike sessions, that can >work in multi-process models), or wrote them to disk otherwise. And those applications don't provide a way to configure that? Honestly, my experience supporting applications that "adapt their behavior" without the user's input is rather unpleasant. Honestly, I'm -1 on providing ways for developers to make their applications decide stuff based on what they *think* is going on, in the absence of narrowly and precisely defined options. >In a long-running model, the framework also needs to know when the >environment is shutting down (so it can write data out to disk). Maybe >atexit would be sufficient, I'm not sure (it's not what we use now). And what do you do now when somebody does a "kill -9" on the process, or the machine reboots? Python doesn't even guarantee that all objects in a process will be finalized during a *normal* exit, so how can any Python container guarantee a finalization notice? I'd rather we didn't promise what isn't deliverable, or anything that starts blurring responsibilities between container and service. >But why not provide that one little hook (a link to the gateway/container) >that would allow systems to develop greater portability? If you define "portability" as "ability to run under new and never-before-seen containers", that hook woudl not only provide zero portability improvement, but it would also be an "attractive nuisance" encouraging people to write *non* portable code, that specifically looks for given containers. Configuration and options should be explicit, not implicit. "In the presences of ambiguity, refuse the temptation to guess". If the app or framework can choose its behaviors, let it make those options explicit, as part of its configuration. >>>And, to make it a little harder, we've often had requests to implement >>>memory-only sessions, to put unpickleable objects into the session. >>>Usually we just tell people to keep these values in module globals. >>>But module globals are also unportable across environments. >>But if your framework only supports a "long running, single-process" >>architecture, module globals would work just fine with any gateway that >>supports that. >>Frankly, "multi-process only" and "short running" gateways are going to >>be in the minority anyway. The only gateway I know of that's likely to >>*require* multiple processes is mod_python, and the only gateway that's >>likely to be "short running" is plain CGI. So, it's not like requiring >>an "LR/SP" gateway is going to dramatically limit the choice of gateways >>for Webware. > >SkunkWeb also uses multiple processes, run in a separate space from Apache. How does it communicate with Apache? Does it *require* multiple processes, or *allow* them? >>Obviously, the PEP needs to have examples of these process models added, >>and clarify the nature of the restrictions. Who knows, maybe if we talk >>about this long enough maybe we'll be able to clarify the process models >>well enough to define a variable that services can expose to indicate >>their compatibility with various process models. > >I think that would be very useful. Here's my list: > >Single process per request: >* process ends with request (CGI, Webware OneShot) >* process reused (mod_python, SkunkWeb) I think you mean single request per process here. >Multiple requests per process: >* Asynchronous (implied to be single-threaded) (Twisted, Medusa, CherryPy, >BaseHTTPServer) >* Threaded (Zope, Webware, CherryPy with different settings) Actually, both Twisted and Zope's ZServer use a "async dispatcher in the main thread, requests can be processed in worker threads" model. BaseHTTPServer isn't asynchronous, either, and with mixins can be threaded or forking. Regarding most of the others you mention, I'm not knowledgeable enough to comment. >Most asynch environments can be turned into threaded systems after >runCGI. Webware, at least, is threaded at the point runCGI is called >(maybe to its detriment), but many systems are not (including Zope, I >think, and probably CherryPy). I actually don't understand what you mean here. But I'll try and tackle definitions for this in a subsequent PEP draft. >I believe most other frameworks are built on mod_python, CGI, or FastCGI >so they are covered under these categories. I think there might be a >separate threaded model for quixote, but I don't know if that portion has >its own name. I'm not sure I understand FastCGI well enough to classify it. A quick attempt to clarify what concepts we're dealing with... * A "web server" is something that accepts HTTP connections * A "gateway protocol" connects a "web server" to a "gateway" * A "gateway protocol" may be in-process (e.g. if the server is written in Python or embeds Python) or use some kind of inter-process communication (pipes for CGI, sockets for FastCGI, etc.) * If the gateway protocol is in-process, then the process model for the app is limited by the process model of the web server. * If the gateway protocol is interprocess, then the process model for the app is determined by the process model of the gateway implementation. * The basic process models for a server or gateway are: - preforking, serially reused processes (e.g. mod_python, PEAK's multiprocess FastCGI runner, etc.) - "long running single process" (LRSP) (e.g. Twisted, ZServer, WSGIServer, any FastCGI runner under Apache if Apache is configured with maxClassProcesses=1, maybe AOLServer too?) + with threads + without threads - fork-on-demand, die-after-one-request (CGI) Notice that the server's process model need not be the same as the gateway/container's process model, if the gateway protocol is interprocess. Indeed, with Apache as the server, you can use any of the process models simply by selecting an appropriate gateway and gateway protocol. Anyway, I think I've covered everything possible, except for maybe the idea of using multiple threads in multiple processes, which makes my head hurt. :) So, for short, I guess I'd call the process models "prefork", LRSP-single, LRSP-multi, and fork-and-die (FAD? SRMP?). Those are just working terms for discussion, the PEP should of course use their full names/descriptions. The most complicated one from a configuration point of view (IMO) is LRSP-multi. I don't have much experience with developing in that environment, so it would be helpful if those who have could offer some thoughts. The main options I'm aware of are: * Gateway gets factory, instantiates service instance per worker thread, or on demand within configured parameters. (Here, the gateway drives how many service instances there are.) * Gateway gets a single service, that it calls from many threads. Service handles everything on that side of the fence. So here, the app side controls its threading. I think that Twisted and ZServer may currently lean slightly towards the first model, but the second model seems more "portable" to me, in terms of being doable for multiple frameworks. In theory, one could perhaps even run an "LRSP-single" app in an "LRSP-multi" gateway simply by having one's 'runCGI()' acquire and release a global lock at entry and exit. It also simplifies things from a container-configuration point of view, as there is only one service object to keep track of. From pje at telecommunity.com Wed Jan 28 16:59:20 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Jan 28 17:00:05 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <20040128202334.GA468@rogue.amk.ca> References: <4018178A.6070605@colorstudy.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <4018178A.6070605@colorstudy.com> Message-ID: <5.1.1.6.0.20040128165709.020518e0@telecommunity.com> At 03:23 PM 1/28/04 -0500, A.M. Kuchling wrote: >On Wed, Jan 28, 2004 at 02:11:54PM -0600, Ian Bicking wrote: > > so they are covered under these categories. I think there might be a > > separate threaded model for quixote, but I don't know if that portion > > has its own name. > >SCGI is multiprocess, so it's like mod_python (just not embedded). Correct me if I'm wrong, but isn't SCGI itself another interprocess gateway protocol like FastCGI? If so, wouldn't that then imply that it does not, of itself, impose a process model? Of course, the current container implementation(s) might have process model restrictions. From ianb at colorstudy.com Wed Jan 28 17:39:22 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Jan 28 17:40:14 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> References: <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> Message-ID: <40183A1A.9020106@colorstudy.com> Phillip J. Eby wrote: > At 02:11 PM 1/28/04 -0600, Ian Bicking wrote: > >> Portable Real Web Applications could also adapt their behavior >> depending on how they were being run, for instance creating an >> abstraction that cached data in module globals if available (and >> unlike sessions, that can work in multi-process models), or wrote them >> to disk otherwise. > > > And those applications don't provide a way to configure that? > > Honestly, my experience supporting applications that "adapt their > behavior" without the user's input is rather unpleasant. Honestly, I'm > -1 on providing ways for developers to make their applications decide > stuff based on what they *think* is going on, in the absence of narrowly > and precisely defined options. > > >> In a long-running model, the framework also needs to know when the >> environment is shutting down (so it can write data out to disk). >> Maybe atexit would be sufficient, I'm not sure (it's not what we use >> now). > > > And what do you do now when somebody does a "kill -9" on the process, or > the machine reboots? Python doesn't even guarantee that all objects in > a process will be finalized during a *normal* exit, so how can any > Python container guarantee a finalization notice? I'd rather we didn't > promise what isn't deliverable, or anything that starts blurring > responsibilities between container and service. We can't guarantee anything, but at least we can give the application a fighting chance. >> But why not provide that one little hook (a link to the >> gateway/container) that would allow systems to develop greater >> portability? > > > If you define "portability" as "ability to run under new and > never-before-seen containers", that hook woudl not only provide zero > portability improvement, but it would also be an "attractive nuisance" > encouraging people to write *non* portable code, that specifically looks > for given containers. Configuration and options should be explicit, not > implicit. "In the presences of ambiguity, refuse the temptation to guess". Truisms, I say! Anyway, it's not about guessing. It's about hard-coding behavior based on the environment, when it's called for to solve demonstrable problems. You don't get OS-independent programs by hiding the operating system from the language (though people have tried). And I don't think you get gateway-independent applications by hiding the gateway. > If the app or framework can choose its behaviors, let it make those > options explicit, as part of its configuration. Configuration sucks! If the application is not behaving properly in its environment, it's a bug. This is open source (at least, every implementation I care about will be), if you can't get the upstream to fix the bug, you can always fix it yourself. That may not be true with horribly bloated or close-source software, but I don't think we should use bad experiences with that sort of software to color our vision here. >>>> And, to make it a little harder, we've often had requests to >>>> implement memory-only sessions, to put unpickleable objects into the >>>> session. Usually we just tell people to keep these values in module >>>> globals. >>>> But module globals are also unportable across environments. >>> >>> But if your framework only supports a "long running, single-process" >>> architecture, module globals would work just fine with any gateway >>> that supports that. >>> Frankly, "multi-process only" and "short running" gateways are going >>> to be in the minority anyway. The only gateway I know of that's >>> likely to *require* multiple processes is mod_python, and the only >>> gateway that's likely to be "short running" is plain CGI. So, it's >>> not like requiring an "LR/SP" gateway is going to dramatically limit >>> the choice of gateways for Webware. >> >> SkunkWeb also uses multiple processes, run in a separate space from >> Apache. > > > How does it communicate with Apache? Does it *require* multiple > processes, or *allow* them? It's like Apache, preforking worker processes. It communicates with mod_skunkweb, which is equivalent to FastCGI, PCGI, SCGI, mod_webware, etc. It requires a single request per process (or a single process per request) as it uses globals in several places, including print. I don't believe it offers any significant configuration of its behavior (number of processes and such, but there's no threaded option or anything like that). >>> Obviously, the PEP needs to have examples of these process models >>> added, and clarify the nature of the restrictions. Who knows, maybe >>> if we talk about this long enough maybe we'll be able to clarify the >>> process models well enough to define a variable that services can >>> expose to indicate their compatibility with various process models. >> >> >> I think that would be very useful. Here's my list: >> >> Single process per request: >> * process ends with request (CGI, Webware OneShot) >> * process reused (mod_python, SkunkWeb) > > > I think you mean single request per process here. > > >> Multiple requests per process: >> * Asynchronous (implied to be single-threaded) (Twisted, Medusa, >> CherryPy, BaseHTTPServer) >> * Threaded (Zope, Webware, CherryPy with different settings) > > > Actually, both Twisted and Zope's ZServer use a "async dispatcher in the > main thread, requests can be processed in worker threads" model. > > BaseHTTPServer isn't asynchronous, either, and with mixins can be > threaded or forking. > > Regarding most of the others you mention, I'm not knowledgeable enough > to comment. > > >> Most asynch environments can be turned into threaded systems after >> runCGI. Webware, at least, is threaded at the point runCGI is called >> (maybe to its detriment), but many systems are not (including Zope, I >> think, and probably CherryPy). > > > I actually don't understand what you mean here. But I'll try and tackle > definitions for this in a subsequent PEP draft. Well, what you were referring to just above. In an LRSP-single process you can always just spawn a thread, and turn it into an LRSP-multi environment. So the distinction is a little vague. Applications that are typically threaded, like Zope, may not be threaded until after the gateway. >> I believe most other frameworks are built on mod_python, CGI, or >> FastCGI so they are covered under these categories. I think there >> might be a separate threaded model for quixote, but I don't know if >> that portion has its own name. I'm not sure I understand FastCGI well >> enough to classify it. > > > A quick attempt to clarify what concepts we're dealing with... > > * A "web server" is something that accepts HTTP connections > > * A "gateway protocol" connects a "web server" to a "gateway" > > * A "gateway protocol" may be in-process (e.g. if the server is written > in Python or embeds Python) or use some kind of inter-process > communication (pipes for CGI, sockets for FastCGI, etc.) > > * If the gateway protocol is in-process, then the process model for the > app is limited by the process model of the web server. > > * If the gateway protocol is interprocess, then the process model for > the app is determined by the process model of the gateway implementation. > > * The basic process models for a server or gateway are: > > - preforking, serially reused processes (e.g. mod_python, PEAK's > multiprocess FastCGI runner, etc.) > > - "long running single process" (LRSP) (e.g. Twisted, ZServer, > WSGIServer, any FastCGI runner under Apache if Apache is configured with > maxClassProcesses=1, maybe AOLServer too?) > > + with threads > > + without threads > > - fork-on-demand, die-after-one-request (CGI) > > Notice that the server's process model need not be the same as the > gateway/container's process model, if the gateway protocol is > interprocess. Indeed, with Apache as the server, you can use any of the > process models simply by selecting an appropriate gateway and gateway > protocol. > > Anyway, I think I've covered everything possible, except for maybe the > idea of using multiple threads in multiple processes, which makes my > head hurt. :) I think any circumstance where you have more processes/threads than you have requests doesn't need to be taken into account. > So, for short, I guess I'd call the process models "prefork", > LRSP-single, LRSP-multi, and fork-and-die (FAD? SRMP?). Those are just > working terms for discussion, the PEP should of course use their full > names/descriptions. I assume LRSP-single is async, and LRSP-multi is threaded? > The most complicated one from a configuration point of view (IMO) is > LRSP-multi. I don't have much experience with developing in that > environment, so it would be helpful if those who have could offer some > thoughts. The main options I'm aware of are: > > * Gateway gets factory, instantiates service instance per worker thread, > or on demand within configured parameters. (Here, the gateway drives > how many service instances there are.) The reusability of the service also comes into effect here -- i.e., services may not be threadsafe, but are reusable. This avoids much performance problem with recreating objects, but doesn't require threadsafety. > * Gateway gets a single service, that it calls from many threads. > Service handles everything on that side of the fence. So here, the app > side controls its threading. This options seems more likely at the service level (but maybe not the resource level, which we aren't touching in this proposal). > I think that Twisted and ZServer may currently lean slightly towards the > first model, but the second model seems more "portable" to me, in terms > of being doable for multiple frameworks. In theory, one could perhaps > even run an "LRSP-single" app in an "LRSP-multi" gateway simply by > having one's 'runCGI()' acquire and release a global lock at entry and > exit. It also simplifies things from a container-configuration point of > view, as there is only one service object to keep track of. Yes, I think nested services make the most sense here, where a single service is called from multiple threads, then dispatches from there. It can query its sub-service in whatever adhoc way that is necessary, to determine whether it an object needs to be instantiated, or can be reused. And, perhaps gateways should be encouraged to implement only LRSP-single, and again allow for a threaded service that spawns threads and calls a subservice. While the LRSP-single app could run in LRSP-multi with a lock, this seems unlikely to work well...? Or would it be okay, because it's naturally short running...? I suppose only the preforking model wouldn't work in the LRSP, since it's likely to be both blocking and not safe for concurrent use in a single process (at least typically). I haven't done async much, so that seems more confusing to me. At some point you need to return control, most likely before you have completed the request, and I'm not clear if there are well-defined protocols for this. But I don't really know much about it. With all this talk of services that are also gateways, it makes me wonder if we should make that idea more explicit, of various levels of delegation. But then, while it seems like an elegant way to implement a system (chaining components), it would be a total pain to configure such nested systems. So... either all the more reason to avoid configuration, or these ideas should be collapsed to make them easier to understand for end users (i.e., the system administrator-like people who set up the software). Ian From pje at telecommunity.com Wed Jan 28 18:45:26 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Jan 28 18:46:19 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <40183A1A.9020106@colorstudy.com> References: <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> Message-ID: <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> At 04:39 PM 1/28/04 -0600, Ian Bicking wrote: >>>In a long-running model, the framework also needs to know when the >>>environment is shutting down (so it can write data out to disk). >>>Maybe atexit would be sufficient, I'm not sure (it's not what we use now). >> >>And what do you do now when somebody does a "kill -9" on the process, or >>the machine reboots? Python doesn't even guarantee that all objects in a >>process will be finalized during a *normal* exit, so how can any Python >>container guarantee a finalization notice? I'd rather we didn't promise >>what isn't deliverable, or anything that starts blurring responsibilities >>between container and service. > >We can't guarantee anything, but at least we can give the application a >fighting chance. -1. Keeping its data safe is the application's responsibility. Blurring the responsibility over into the container doesn't help, it *hurts*. This is a difference between defining an interface, or language-level issue, and a framework. If this were a framework, I'd certainly want to provide services for something like this, should the issue be within scope for the framework. But WSGI isn't a framework, it's an interface. And interfaces should 1) clearly delineate responsibilities between the sides of the interface, and 2) not be vague about what you can or can't express. "We'll try" is something to be avoided in interfaces, because it doesn't *mean* anything. Instead, it encourages app writers to assume that it's taken care of, burdens gateway authors with additional functionality that they'll boilerplate in, and generally makes a mess of portability. >Truisms, I say! Anyway, it's not about guessing. It's about hard-coding >behavior based on the environment, when it's called for to solve >demonstrable problems. You don't get OS-independent programs by hiding >the operating system from the language (though people have tried). And I >don't think you get gateway-independent applications by hiding the gateway. That's what configuration is for. The deployer/integrator should be allowed to control the app's behavior. >>If the app or framework can choose its behaviors, let it make those >>options explicit, as part of its configuration. > >Configuration sucks! If the application is not behaving properly in its >environment, it's a bug. This is open source (at least, every >implementation I care about will be), if you can't get the upstream to fix >the bug, you can always fix it yourself. > >That may not be true with horribly bloated or close-source software, but I >don't think we should use bad experiences with that sort of software to >color our vision here. I guess we'll have to agree to disagree on this. In my view, an application that does not permit explicit configuration for compatibility with a custom environment is unsuitable for deployment in an enterprise production system. Also, with respect to open source, please keep in mind that while Python itself is open source, there are lots of Python users who develop closed-source applications with it. >>>Most asynch environments can be turned into threaded systems after >>>runCGI. Webware, at least, is threaded at the point runCGI is called >>>(maybe to its detriment), but many systems are not (including Zope, I >>>think, and probably CherryPy). >> >>I actually don't understand what you mean here. But I'll try and tackle >>definitions for this in a subsequent PEP draft. > >Well, what you were referring to just above. In an LRSP-single process >you can always just spawn a thread, and turn it into an LRSP-multi >environment. So the distinction is a little vague. Nope. An LRSP-single process by definition handles only one request at any moment in time, so spawning a thread doesn't change that. >I assume LRSP-single is async, and LRSP-multi is threaded? LRSP-single means only one thread. LRSP-multi means multi-threaded. Asynchronousness actually implies LRSP-multi, because if you're doing an asynchronous event loop the only way you can afford to call a blocking 'runCGI()' is to do it in a thread. Twisted and ZServer are asynchronous LRSP-multi. By contrast, synchronous servers can be LRSP-single or LRSP-multi. For example, BaseHTTPServer is synchronous, and can be single or multi-threaded. >Yes, I think nested services make the most sense here, where a single >service is called from multiple threads, then dispatches from there. It >can query its sub-service in whatever adhoc way that is necessary, to >determine whether it an object needs to be instantiated, or can be reused. I'm really baffled at why things need to be so complicated, but as long as the complexity is kept well away from the interface, I'm happy. :) >And, perhaps gateways should be encouraged to implement only LRSP-single, >and again allow for a threaded service that spawns threads and calls a >subservice. That would be LRSP-multi, because 'runCGI()' is synchronous. It returns only when the request is finished. Thus, the *only* way to do multiple requests at once is for the *container* to be threaded. That's an intentional feature of the design. >While the LRSP-single app could run in LRSP-multi with a lock, this seems >unlikely to work well...? Or would it be okay, because it's naturally >short running...? I suppose only the preforking model wouldn't work in >the LRSP, since it's likely to be both blocking and not safe for >concurrent use in a single process (at least typically). I'm not sure that we should describe applications themselves in terms of the process model as such. That is, I think we might refer to a "threadable" application as one that may be run in LRSP-threaded, and "multiprocess safe" as an application that does not require a single process. These dimensions don't quite map onto the four process models, but rather prescribe what models are *not* usable with that app. That is, an application that *isn't* threadable can't run in LRSP-multi, and an application that *isn't* multiprocess-safe can't run in prefork or fork-and-die. (i.e., it can only run in one of the LRSP models.) >I haven't done async much, so that seems more confusing to me. At some >point you need to return control, most likely before you have completed >the request, and I'm not clear if there are well-defined protocols for >this. But I don't really know much about it. WSGI is intentionally synchronous; you need LRSP-multi (i.e. threads) to run it in an asynchronous web server. I don't know if this is much of a problem in practice, but I know that both Twisted and ZServer support LRSP-multi in their asynchronous servers. >With all this talk of services that are also gateways, it makes me wonder >if we should make that idea more explicit, of various levels of delegation. Yeah, there's still not even a good consensus as to what to call either side of the interface. One thing that amazes me about this whole discussion is how something so incredibly simple can become so complicated as soon as you have to explain it to somebody else. :) >But then, while it seems like an elegant way to implement a system >(chaining components), it would be a total pain to configure such nested >systems. So... either all the more reason to avoid configuration, or >these ideas should be collapsed to make them easier to understand for end >users (i.e., the system administrator-like people who set up the software). I think that most chaining will take place on the "application" side, not the "container" side. By that I mean I would expect an application to be packaged as a single "service", even if internally it's composed of routers and adapters and who knows what. Of course, if you then want to integrate that app with others to be deployed within the same container, then you as the application integrator are bundling them together into a new, higher-level "application". In other words, I don't expect this to be much of a problem in practice, because whoever's dealing with a given integration level is unlikely to deal with any components "below" the level they're integrating them at. Does that make sense? From smulloni at smullyan.org Wed Jan 28 22:15:23 2004 From: smulloni at smullyan.org (Jacob Smullyan) Date: Wed Jan 28 22:15:35 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <40183A1A.9020106@colorstudy.com> References: <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <40183A1A.9020106@colorstudy.com> Message-ID: <20040129031523.GA17649@smullyan.org> On Wed, Jan 28, 2004 at 04:39:22PM -0600, Ian Bicking wrote: > Phillip J. Eby wrote: > >At 02:11 PM 1/28/04 -0600, Ian Bicking wrote: > >>SkunkWeb also uses multiple processes, run in a separate space from > >>Apache. > > > > > >How does it communicate with Apache? Does it *require* multiple > >processes, or *allow* them? > > It's like Apache, preforking worker processes. It communicates with > mod_skunkweb, which is equivalent to FastCGI, PCGI, SCGI, mod_webware, etc. > > It requires a single request per process (or a single process per > request) as it uses globals in several places, including print. I don't > believe it offers any significant configuration of its behavior (number > of processes and such, but there's no threaded option or anything like > that). SkunkWeb currently assumes and requires a multiple-process model, but there is a plan to refactor it for 4.0 so that most of the code could be used independently of that model, at least in threaded contexts, and provide a choice of containers. One part of the PEP (which I may be looking at in an older version) that attracted my attention: 3. Since ``output`` and ``errors`` may not be rewound, a container is free to forward write operations immediately, without buffering. In this case, the ``flush()`` method may be a no-op. Portable applications, however, cannot assume that output is unbuffered or that ``flush()`` is a no-op. They must call ``flush()`` if they need to ensure that output has in fact been written. Luckily, the use of ``output.flush()`` is only an issue for applications performing "server push" operations, since closing ``output`` will also flush it. Applications writing logs or other output to ``errors``, however, may wish to perform a flush after each complete item is output, to minimize intermingling of data from multiple processes writing to the same log. One useful feature of SkunkWeb is that it makes it easy to have work performed after a response is sent, rather than before -- refreshing caches, rolling back database connections, etc. This is a considerable benefit in http, since requests are by their nature sporadic, and the crunch times are during request processing. The container interface proposed offers only one hook for a web application (or whatever you want to call it) to do everything it is going to do. Therefore, the only way to ensure that work is performed after the response is sent is to flush or close output before undertaking that work. I'd be happier, therefore, if the specification mandated more explicitly that the container must actually respect the semantics of flush() -- that flush() may be a no-op if output is unbuffered, but it may not be a no-op if it is not. This means that output, for instance, could not be a StringIO object the contents of which the container blits back to the client after runCGI() returns. (It is debatable whether the error stream should be subject to this limitation; it might even be preferable for some containers to buffer error until after runCGI() has been completed.) Cheers, Jacob Smullyan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20040128/b4b40b44/attachment-0001.bin From ianb at colorstudy.com Wed Jan 28 23:19:22 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Jan 28 23:19:23 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <20040129031523.GA17649@smullyan.org> References: <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <40183A1A.9020106@colorstudy.com> <20040129031523.GA17649@smullyan.org> Message-ID: <50C3615E-5212-11D8-A3A0-000393C2D67E@colorstudy.com> On Jan 28, 2004, at 9:15 PM, Jacob Smullyan wrote: > One part of the PEP (which I may be looking at in an older version) > that attracted my attention: > > 3. Since ``output`` and ``errors`` may not be rewound, a container is > free to forward write operations immediately, without buffering. > In this case, the ``flush()`` method may be a no-op. Portable > applications, however, cannot assume that output is unbuffered > or that ``flush()`` is a no-op. They must call ``flush()`` if > they need to ensure that output has in fact been written. > > Luckily, the use of ``output.flush()`` is only an issue for > applications performing "server push" operations, since closing > ``output`` will also flush it. Applications writing logs or other > output to ``errors``, however, may wish to perform a flush after > each complete item is output, to minimize intermingling of data > from multiple processes writing to the same log. > > One useful feature of SkunkWeb is that it makes it easy to have work > performed after a response is sent, rather than before -- refreshing > caches, rolling back database connections, etc. This is a > considerable benefit in http, since requests are by their nature > sporadic, and the crunch times are during request processing. The > container interface proposed offers only one hook for a web > application (or whatever you want to call it) to do everything it is > going to do. Therefore, the only way to ensure that work is performed > after the response is sent is to flush or close output before > undertaking that work. I'd be happier, therefore, if the > specification mandated more explicitly that the container must > actually respect the semantics of flush() -- that flush() may be a > no-op if output is unbuffered, but it may not be a no-op if it is not. > This means that output, for instance, could not be a StringIO object > the contents of which the container blits back to the client after > runCGI() returns. (It is debatable whether the error stream should be > subject to this limitation; it might even be preferable for some > containers to buffer error until after runCGI() has been completed.) Wouldn't it be sufficient to close the stdout stream? This could be done before runCGI returns, and would (I presume) signal that execution had completed. Though that should be explicit. Obviously some containers (ack, too many alternative terminologies at this point) will not be able to finish the request until after control has been returned from runCGI, but I don't see how that can be helped. In general, fewer gateways would have to buffer output until after control was returned, if headers weren't included in the output stream. You could parse the headers at the soonest moment, and then connect the application to the actual client in a more direct fashion after that time. I suppose that only requires looking for \n\n (or a chunk that ends in \n, and another that starts in \n), but it's still annoying. Anyway, if a container sends the complete request when stdout.close() was called, control would at least temporarily be passed to the container, while the application would still have a chance to do some processing after stdout.close returns, and before runCGI returns. Maybe those semantics -- or even a lack of required semantics -- should be included in the PEP. Actually ensuring that flush sends data to the client is hard, as there can be many levels of buffering. But I don't see why it would be necessary. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From fawcett at teksavvy.com Wed Jan 21 12:30:33 2004 From: fawcett at teksavvy.com (Graham Fawcett) Date: Wed Jan 28 23:38:03 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: References: <002f01c3d886$9dce0dd0$c022fea9@QUENTEL><04Jan12.180627pst."58611"@synergy1.parc.xerox.com> <20040113123507.GA7812@rogue.amk.ca> <002101c3da20$e3728e10$c022fea9@QUENTEL> <4004AD55.30303@teksavvy.com> <000f01c3dae5$cb8876f0$c022fea9@QUENTEL> <4006AB6A.4000001@teksavvy.com> Message-ID: <400EB739.8030507@teksavvy.com> John J Lee wrote: >On Thu, 15 Jan 2004, Graham Fawcett wrote: >[...] > > >>(In the spirit of embracing emerging standards, I'll buy a beer for >>whoever makes a PyWCI 1.0 Container out of this, or out of the original >>Medusa code!) >> >> >[...] > >Now, who says Python can't compete with Java on funding? > > > Did I mention it would be Canadian beer? None of that watery stuff: this offer is a deal at any price! Regarding Pierre's question about adding async HTTP to the standard library: I think my vote would go to "adding Medusa-like functions to SimpleAsyncHTTPServer". Medusa's great, it works just fine for me; I'm just not sure I see it being added to the standard library. It's not part of the *HTTPServer family, for one thing; why introduce an orphan? -- Graham From ianb at colorstudy.com Thu Jan 29 00:13:20 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Jan 29 00:13:19 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> References: <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> Message-ID: On Jan 28, 2004, at 5:45 PM, Phillip J. Eby wrote: > At 04:39 PM 1/28/04 -0600, Ian Bicking wrote: >>>> In a long-running model, the framework also needs to know when the >>>> environment is shutting down (so it can write data out to disk). >>>> Maybe atexit would be sufficient, I'm not sure (it's not what we >>>> use now). >>> >>> And what do you do now when somebody does a "kill -9" on the >>> process, or the machine reboots? Python doesn't even guarantee that >>> all objects in a process will be finalized during a *normal* exit, >>> so how can any Python container guarantee a finalization notice? >>> I'd rather we didn't promise what isn't deliverable, or anything >>> that starts blurring responsibilities between container and service. >> >> We can't guarantee anything, but at least we can give the application >> a fighting chance. > > -1. Keeping its data safe is the application's responsibility. > Blurring the responsibility over into the container doesn't help, it > *hurts*. > > This is a difference between defining an interface, or language-level > issue, and a framework. If this were a framework, I'd certainly want > to provide services for something like this, should the issue be > within scope for the framework. But WSGI isn't a framework, it's an > interface. And interfaces should 1) clearly delineate > responsibilities between the sides of the interface, and 2) not be > vague about what you can or can't express. "We'll try" is something > to be avoided in interfaces, because it doesn't *mean* anything. > Instead, it encourages app writers to assume that it's taken care of, > burdens gateway authors with additional functionality that they'll > boilerplate in, and generally makes a mess of portability. So what alternative do you propose for handling a shutdown? The application *needs* to know about this. I don't think I trust atexit (though I'm open to it if it really would work). Also, if the application spawns threads that will simply block shutdown unless they can be told to stop. I don't care about blurring of responsibilities nearly as much as utility. You want to convert other frameworks, well then you have to convert their functionality. Spawned threads in the application exist. Resources (threads included) that have to be explicitly cleaned up on shutdown exist. If the gateway is the master process, then it has to handle the application's requirements. If you have another way to deal with these, fine, bring it forth -- if not, then this proposal won't work. I'm not trying to be difficult, I'm just trying to envision how I would adapt Webware's gateway and application (AppServer and Application) to this interface. I don't think of Webware as being particularly featureful, so I'm surprised other people haven't seen these problems either, unless there are solutions that I'm missing. The CGI protocol already passes through gateway information, in SERVER_SOFTWARE. The client passes through information in User-Agent. User-Agent is already used heavily, and Webware does use SERVER_SOFTWARE for a couple of things (when IIS acts differently from Apache). It might not be clean, but it gets the job done. I know we use os.name a lot. This is information applications need, and it is against convention to hide that information. >> Truisms, I say! Anyway, it's not about guessing. It's about >> hard-coding behavior based on the environment, when it's called for >> to solve demonstrable problems. You don't get OS-independent >> programs by hiding the operating system from the language (though >> people have tried). And I don't think you get gateway-independent >> applications by hiding the gateway. > > That's what configuration is for. The deployer/integrator should be > allowed to control the app's behavior. The best documentation is when no documentation is needed. That's my truism ;) When we figure something out, I'd rather put that knowledge into code, instead of documentation. And every piece of configuration requires documentation (and it doesn't even save you any code). >>> If the app or framework can choose its behaviors, let it make those >>> options explicit, as part of its configuration. >> >> Configuration sucks! If the application is not behaving properly in >> its environment, it's a bug. This is open source (at least, every >> implementation I care about will be), if you can't get the upstream >> to fix the bug, you can always fix it yourself. >> >> That may not be true with horribly bloated or close-source software, >> but I don't think we should use bad experiences with that sort of >> software to color our vision here. > > I guess we'll have to agree to disagree on this. In my view, an > application that does not permit explicit configuration for > compatibility with a custom environment is unsuitable for deployment > in an enterprise production system. Also, with respect to open > source, please keep in mind that while Python itself is open source, > there are lots of Python users who develop closed-source applications > with it. I don't expect glue to be closed source, so I don't think these problems should be part of closed source components. The real applications people build, which may be proprietary, should be insulated from most of these issues. (There do exist one or two proprietary-source Python web frameworks, but they seem rather obscure, and I doubt they are closed-source) >>>> Most asynch environments can be turned into threaded systems after >>>> runCGI. Webware, at least, is threaded at the point runCGI is >>>> called (maybe to its detriment), but many systems are not >>>> (including Zope, I think, and probably CherryPy). >>> >>> I actually don't understand what you mean here. But I'll try and >>> tackle definitions for this in a subsequent PEP draft. >> >> Well, what you were referring to just above. In an LRSP-single >> process you can always just spawn a thread, and turn it into an >> LRSP-multi environment. So the distinction is a little vague. > > Nope. An LRSP-single process by definition handles only one request > at any moment in time, so spawning a thread doesn't change that. Now I'm confused. If it's a single process, and handles only one request, isn't that just broken? I don't know of any example of such a server, since it wouldn't be able to handle concurrent requests. >> I assume LRSP-single is async, and LRSP-multi is threaded? > > LRSP-single means only one thread. LRSP-multi means multi-threaded. > Asynchronousness actually implies LRSP-multi, because if you're doing > an asynchronous event loop the only way you can afford to call a > blocking 'runCGI()' is to do it in a thread. Twisted and ZServer are > asynchronous LRSP-multi. Okay. This seems to mean that Twisted wouldn't use this interface internally, since they don't want to unnecessarily spawn a thread, and the interface doesn't seem to allow for a non-blocking API. OTOH, to support async applications we'd have to standardize the model, I suppose, and there's several models of passing around control, right? (Deferred, callbacks, etc) So maybe that's too hard. > By contrast, synchronous servers can be LRSP-single or LRSP-multi. > For example, BaseHTTPServer is synchronous, and can be single or > multi-threaded. But BaseHTTPServer without threads is kind of a silly thing, right? You can play with it, but not really use it for anything real. >> While the LRSP-single app could run in LRSP-multi with a lock, this >> seems unlikely to work well...? Or would it be okay, because it's >> naturally short running...? I suppose only the preforking model >> wouldn't work in the LRSP, since it's likely to be both blocking and >> not safe for concurrent use in a single process (at least typically). > > I'm not sure that we should describe applications themselves in terms > of the process model as such. That is, I think we might refer to a > "threadable" application as one that may be run in LRSP-threaded, and > "multiprocess safe" as an application that does not require a single > process. These dimensions don't quite map onto the four process > models, but rather prescribe what models are *not* usable with that > app. That is, an application that *isn't* threadable can't run in > LRSP-multi, and an application that *isn't* multiprocess-safe can't > run in prefork or fork-and-die. (i.e., it can only run in one of the > LRSP models.) > > > >> I haven't done async much, so that seems more confusing to me. At >> some point you need to return control, most likely before you have >> completed the request, and I'm not clear if there are well-defined >> protocols for this. But I don't really know much about it. > > WSGI is intentionally synchronous; you need LRSP-multi (i.e. threads) > to run it in an asynchronous web server. I don't know if this is much > of a problem in practice, but I know that both Twisted and ZServer > support LRSP-multi in their asynchronous servers. > > >> With all this talk of services that are also gateways, it makes me >> wonder if we should make that idea more explicit, of various levels >> of delegation. > > Yeah, there's still not even a good consensus as to what to call > either side of the interface. > > One thing that amazes me about this whole discussion is how something > so incredibly simple can become so complicated as soon as you have to > explain it to somebody else. :) > > >> But then, while it seems like an elegant way to implement a system >> (chaining components), it would be a total pain to configure such >> nested systems. So... either all the more reason to avoid >> configuration, or these ideas should be collapsed to make them easier >> to understand for end users (i.e., the system administrator-like >> people who set up the software). > > I think that most chaining will take place on the "application" side, > not the "container" side. By that I mean I would expect an > application to be packaged as a single "service", even if internally > it's composed of routers and adapters and who knows what. > > Of course, if you then want to integrate that app with others to be > deployed within the same container, then you as the application > integrator are bundling them together into a new, higher-level > "application". I was thinking about this, as in my head I went down a what-if-everything-was-a-filter approach, where each feature was a step in a chain of these runCGI calls. Anyway, it works to a point, but the opaqueness of stdout/stdin/environ made me realize it would fall down long before you could get to any specific code. There are a lot of useful transformations that couldn't be easily determined from stdin/environ, and a lot of output transformations that would be difficult to apply to stdout. Well, they all could be implemented, but that involves constant construction and deconstruction of various pieces of the request, and the construction of faux-stdouts that can be pulled apart and also reconstructed. > In other words, I don't expect this to be much of a problem in > practice, because whoever's dealing with a given integration level is > unlikely to deal with any components "below" the level they're > integrating them at. Does that make sense? Yes -- the application (in whatever form), comes as one object, and that object may reference others, or it may not. As long as you don't have to instantiate different chains of gateways depending on the combination of terminal gateways and applications (like if the gateways adapted the semantics slightly to make the two compatible). But that should probably be avoided, which is to say chained gateways should be avoided if possible. It's really more of a clever way to implement things, than a useful way to distribute reusable components. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From smulloni at smullyan.org Thu Jan 29 00:27:11 2004 From: smulloni at smullyan.org (Jacob Smullyan) Date: Thu Jan 29 00:27:25 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <50C3615E-5212-11D8-A3A0-000393C2D67E@colorstudy.com> References: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <40183A1A.9020106@colorstudy.com> <20040129031523.GA17649@smullyan.org> <50C3615E-5212-11D8-A3A0-000393C2D67E@colorstudy.com> Message-ID: <20040129052711.GA19546@smullyan.org> On Wed, Jan 28, 2004 at 10:19:22PM -0600, Ian Bicking wrote: > Wouldn't it be sufficient to close the stdout stream? This could be > done before runCGI returns, and would (I presume) signal that execution > had completed. Though that should be explicit. It would be sufficient, provided that closing stdout really meant closing stdout and transmitting the http response. Yes, what I said about flush() should probably really apply (if it should at all) to close(), not flush(). > Obviously some > containers (ack, too many alternative terminologies at this point) will > not be able to finish the request until after control has been returned > from runCGI, but I don't see how that can be helped. It may be true that it can't be helped -- but it isn't obvious to me, yet. Is there anything about the assumptions of some frameworks that would prevent a file-like object they are writing to from doing some networking when they close it? I would think that is the business of the container, not the containee. Or do you mean that the container may be in turn nested in a preexisting environment, not optimized specifically for this sort of containment, to which it delegates its networking responsibilities -- the "run Zope inside WebWare" scenario? That kind of scenario shouldn't be a sticking point, in my view, because it will always be suboptimal. If you want coexistence of two different application environments, I'd expect to do better nesting them both in one shared container which has limited, specialized functionality than arranging them serially (Zope running inside WebWare running inside SkunkWeb running inside...). A lower level of compliance, for such "convenience serial containers", would be forgiveable. > In general, fewer gateways would have to buffer output until after > control was returned, if headers weren't included in the output stream. > You could parse the headers at the soonest moment, and then connect > the application to the actual client in a more direct fashion after > that time. I suppose that only requires looking for \n\n (or a chunk > that ends in \n, and another that starts in \n), but it's still > annoying. Anyway, if a container sends the complete request when > stdout.close() was called, control would at least temporarily be passed > to the container, while the application would still have a chance to do > some processing after stdout.close returns, and before runCGI returns. > Maybe those semantics -- or even a lack of required semantics -- should > be included in the PEP. Yes, that is really what I'd like to see clarified. At the very least, a container should announce whether its output object really outputs when it says it does. My focussing on flush() rather than close() was the wrong emphasis. The reason I made that mistake was that I assumed that the container was not in the business of worrying about http, but that that was the responsibility of an adapter sitting between the application proper and the container (that the container was in no-parsed-header mode, in cgi terms); in that case, the container has no reason not to flush when the client toggles the handle, rather than waiting for him to get off the pot. Upon reflection, I think the container *should* parse headers by default -- what a bore for every adapter to have to do this. However, it would be nice if in runCGI you could set an attribute of output that would tell it whether to parse headers or not; it would seem reasonable to at least aspire to the range of functionality already implemented by CGI :). But I don't know if Mr. Eby wants his nice abstract output object mucked up with pesky attributes! (This suggests a T-shirt: "Get your attributes off my object!" -- "attributes" being more polite, if less general, than "members".) Cheers, Jacob Smullyan > > -- > Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20040129/0608e763/attachment.bin From ianb at colorstudy.com Thu Jan 29 01:58:17 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Jan 29 01:58:17 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <20040129052711.GA19546@smullyan.org> References: <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <40183A1A.9020106@colorstudy.com> <20040129031523.GA17649@smullyan.org> <50C3615E-5212-11D8-A3A0-000393C2D67E@colorstudy.com> <20040129052711.GA19546@smullyan.org> Message-ID: <840184C7-5228-11D8-A3A0-000393C2D67E@colorstudy.com> On Jan 28, 2004, at 11:27 PM, Jacob Smullyan wrote: >> Obviously some >> containers (ack, too many alternative terminologies at this point) >> will >> not be able to finish the request until after control has been >> returned >> from runCGI, but I don't see how that can be helped. > > It may be true that it can't be helped -- but it isn't obvious to me, > yet. Is there anything about the assumptions of some frameworks that > would prevent a file-like object they are writing to from doing some > networking when they close it? I would think that is the business of > the container, not the containee. Or do you mean that the container > may be in turn nested in a preexisting environment, not optimized > specifically for this sort of containment, to which it delegates its > networking responsibilities -- the "run Zope inside WebWare" scenario? > That kind of scenario shouldn't be a sticking point, in my view, > because it will always be suboptimal. If you want coexistence of two > different application environments, I'd expect to do better nesting > them both in one shared container which has limited, specialized > functionality than arranging them serially (Zope running inside > WebWare running inside SkunkWeb running inside...). A lower level of > compliance, for such "convenience serial containers", would be > forgiveable. I almost took it back, that all containers should be able to do a network response when you close. But I'm not sure about plain CGI -- the only way to finish a CGI request may be to end the process. Everything else should be able to finish the response when .close() is called. But Zope in Webware, or Webware in Zope, or what-have-you, it should be possible. If it's a nested container, it should be able to pass the .close() on up, until you've reached the top. >> In general, fewer gateways would have to buffer output until after >> control was returned, if headers weren't included in the output >> stream. >> You could parse the headers at the soonest moment, and then connect >> the application to the actual client in a more direct fashion after >> that time. I suppose that only requires looking for \n\n (or a chunk >> that ends in \n, and another that starts in \n), but it's still >> annoying. Anyway, if a container sends the complete request when >> stdout.close() was called, control would at least temporarily be >> passed >> to the container, while the application would still have a chance to >> do >> some processing after stdout.close returns, and before runCGI returns. >> Maybe those semantics -- or even a lack of required semantics -- >> should >> be included in the PEP. > > Yes, that is really what I'd like to see clarified. At the very > least, a container should announce whether its output object really > outputs when it says it does. Flushing can't be guaranteed, at least not in general. There's too many places where buffering can happen, and they aren't all easily accessible. I know Apache does some buffering which I haven't been able to get around. In most cases a little extra buffering doesn't hurt. Heck, I sometimes wonder if the browser buffers a bit... > My focussing on flush() rather than close() was the wrong emphasis. > The reason I made that mistake was that I assumed that the container > was not in the business of worrying about http, but that that was the > responsibility of an adapter sitting between the application proper > and the container (that the container was in no-parsed-header mode, in > cgi terms); in that case, the container has no reason not to flush > when the client toggles the handle, rather than waiting for him to get > off the pot. Upon reflection, I think the container *should* parse > headers by default -- what a bore for every adapter to have to do > this. I think it's implied that the container will parse the headers, since that's what we're all used to from CGI. This should consist mostly of the Status header, and maybe the Location header. I've never been clear on the Location header, though -- it's semantics are very unclear to me in the absence of a Status header. I think with Apache you get an internal redirect if you give a path without a host name, and an external temporary redirect otherwise. But in some environments I think it always becomes an external redirect, and relative paths are resolved to absolute paths before being sent to the client. That might be nice to define. Or to strongly encourage applications to only use fully qualified redirects, and to always give a Status header unless doing a 200 response. > However, it would be nice if in runCGI you could set an > attribute of output that would tell it whether to parse headers or > not; it would seem reasonable to at least aspire to the range of > functionality already implemented by CGI :). But I don't know if > Mr. Eby wants his nice abstract output object mucked up with pesky > attributes! (This suggests a T-shirt: "Get your attributes off my > object!" -- "attributes" being more polite, if less general, than > "members".) Yes, I suspect he won't like that addition ;) I think it would be common for the container to add Keep-Alive headers, and maybe some others. And in fact, the application should not add those headers, it's really something the container needs to abstract. So the response doesn't belong to the application alone. It would be nice to better specify what kind of parsing will occur, what headers might be added, which are off limits (to application or to container). CGI is poorly specified. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From amk at amk.ca Thu Jan 29 08:32:33 2004 From: amk at amk.ca (A.M. Kuchling) Date: Thu Jan 29 08:33:22 2004 Subject: [Web-SIG] Asynchronous HTTP server in standard library ? In-Reply-To: <4004AD55.30303@teksavvy.com> References: <20040113123507.GA7812@rogue.amk.ca> <002101c3da20$e3728e10$c022fea9@QUENTEL> <4004AD55.30303@teksavvy.com> Message-ID: <20040129133233.GA26937@rogue.amk.ca> On Tue, Jan 13, 2004 at 09:45:41PM -0500, Graham Fawcett wrote: > The pared-down Medusa is available at > http://fawcett.medialab.uwindsor.ca/quixote/medusa_patch.tar.gz. I've now added the patch to the experimental Arch repository for Quixote. In a subsequent patch I stripped it down yet further. Note that I left medusa_http.py in quixote.server; in Graham's original patch the module is in the quixote.server.medusa package. See http://www.quixote.ca/qx/ArchRepository for instructions on getting the current state of the repository. --amk From pje at telecommunity.com Thu Jan 29 12:51:05 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Jan 29 12:52:19 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: References: <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> Message-ID: <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> At 11:13 PM 1/28/04 -0600, Ian Bicking wrote: >So what alternative do you propose for handling a shutdown? The >application *needs* to know about this. I don't think I trust atexit >(though I'm open to it if it really would work). Also, if the application >spawns threads that will simply block shutdown unless they can be told to stop. What do you propose to use instead of atexit? Suppose we add a 'shutdown()' method, analagous to Java servlets' 'destroy()'. How is the *container* going to guarantee it'll get called? If we define it as "best effort", then the application writer who wants a guarantee is *still* going to have to use atexit, or something else. Or else we're going to force the container to use atexit, whether the service needs a shutdown message or not, and bloat both the container and the number of atexit functions registered, while duplicating this functionality in every container! I haven't used a framework or written an application that needed an explicit shutdown in order to operate properly. However, if one is needed, that's what 'atexit' is for, and it has one of the stronger cleanup guarantees of anything in Python that I know of! So, here's what I would suggest... if we want to allow containers to start and shutdown servlets at runtime, we can add a 'shutdown()' method. BUT, I don't want to *require* the container to call it. If the servlet wants a guaranteed shutdown, it *must* use atexit or some other finalization strategy. By the way, in reference to shutdown being blocked by threads, AFAIK your statement only applies to use of the 'threading' module with "non-daemonic" threads. (And that blocking is done with an atexit function.) >I don't care about blurring of responsibilities nearly as much as >utility. You want to convert other frameworks, well then you have to >convert their functionality. Spawned threads in the application >exist. Resources (threads included) that have to be explicitly cleaned up >on shutdown exist. It might be helpful to read some of Guido and Tim Peters' comments about these things on Python-Dev. They've tended to be very much of the opinion that Python doesn't guarantee resource finalization, period, and that it's the OS's job to reclaim resources on process termination. I found some of the recent discussion of the Python 2.3.2 finalization GC bugs to be quite enlightening on just how *hard* it is to guarantee finalization of anything. Anyway, as I said, if an app creates a non-daemonic thread, presumably it *wants* for shutdown to wait for it, and if it wants to know that shutdown is happening, there's atexit. IOW, there are perfectly good mechanisms in the stdlib for dealing with these things, and I don't see a reason to either reinvent them, or force container authors to do the application authors' job. >I'm not trying to be difficult, I'm just trying to envision how I would >adapt Webware's gateway and application (AppServer and Application) to >this interface. I don't think of Webware as being particularly >featureful, so I'm surprised other people haven't seen these problems >either, unless there are solutions that I'm missing. Well, I'm so far having trouble understanding the specific things you're trying to do, that aren't addressed by stdlib features. I am still open to addressing whatever they might be, I just want concrete use cases and narrow solutions. IOW, I'm YAGNI on widening the interface, and saying "show me the use cases". >The CGI protocol already passes through gateway information, in >SERVER_SOFTWARE. The client passes through information in User-Agent. >User-Agent is already used heavily, and Webware does use SERVER_SOFTWARE >for a couple of things (when IIS acts differently from Apache). It might >not be clean, but it gets the job done. I know we use os.name a >lot. This is information applications need, and it is against convention >to hide that information. Those pieces of information have established standards and conventions to guide their use. However, even those very same items are subject to rampant abuse, such as by sites that refuse to let you use them unless you pretend to be MSIE. Thus, in the absence of a specific use case for having the information, I'd like to avoid its presence. In the presence of a specific use case, I'll want to find the change to the spec that makes the least possible increase in container-to-app guarantees. >>>Truisms, I say! Anyway, it's not about guessing. It's about >>>hard-coding behavior based on the environment, when it's called for to >>>solve demonstrable problems. You don't get OS-independent programs by >>>hiding the operating system from the language (though people have >>>tried). And I don't think you get gateway-independent applications by >>>hiding the gateway. >> >>That's what configuration is for. The deployer/integrator should be >>allowed to control the app's behavior. > >The best documentation is when no documentation is needed. That's my >truism ;) When we figure something out, I'd rather put that knowledge >into code, instead of documentation. And every piece of configuration >requires documentation (and it doesn't even save you any code). Clearly, we disagree on this issue. You want a wide interface, I want a narrow one. One reason is that I want to encourage proliferation of containers. We already have a huge proliferation of apps and frameworks, and very few choices for how to run and deploy them. My practical observation has been that when identification of a host environment is permitted -- as opposed to introspection of host *properties* -- it rapidly leads to nonportable code, where portable is defined as "will run correctly in a *new* environment without reprogramming". I have no objection to defining properties like "container is LRSP-multi" or whatever, if there is a meaningful use case. What I object to is simply throwing random spoor for the servlet to sniff at and guess its prey. So please, let's focus on what specific properties you'd like to know about a container, if any. >Now I'm confused. If it's a single process, and handles only one request, >isn't that just broken? I don't know of any example of such a server, >since it wouldn't be able to handle concurrent requests. Do you need concurrent requests for a single-user "webtop" application that runs on your desktop? The fact that the model has a limited usage profile doesn't make it broken. Perhaps somebody will also speak up in favor of "multiprocess+multithread" model that I previously mentioned as making my head hurt. >>LRSP-single means only one thread. LRSP-multi means multi-threaded. >>Asynchronousness actually implies LRSP-multi, because if you're doing an >>asynchronous event loop the only way you can afford to call a blocking >>'runCGI()' is to do it in a thread. Twisted and ZServer are asynchronous >>LRSP-multi. > >Okay. This seems to mean that Twisted wouldn't use this interface >internally, since they don't want to unnecessarily spawn a thread, and the >interface doesn't seem to allow for a non-blocking API. Twisted has a builtin "thread pool" mechanism for this. A WSGI implementation for Twisted would simply call (IIRC): reactor.callInThread(service.runCGI, stdinWrapper, stdoutWrapper, stderrWrapper, environ) And threads would only be spawned up to the configured pool size. If there are more concurrent requests than allocated threads, the runCGI call will be queued until an existing thread finishes. ZServer has a similar mechanism, although I believe it's more "internal" and less available for a third party to do. Twisted is flexible enough that a third party could roll their own Twisted-based WSGI gateway, if the core developers aren't interested or want no part of it. :) >But BaseHTTPServer without threads is kind of a silly thing, right? >You can play with it, but not really use it for anything real. Sure you can: desktop web apps, and most especially, desktop testing and development of an app to be later deployed in a "real" container. I specifically wrote WSGIServer for this purpose (and to serve as an example of how to make a simple WSGI container in a web server). From pje at telecommunity.com Thu Jan 29 13:00:50 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Jan 29 13:01:53 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <20040129031523.GA17649@smullyan.org> References: <40183A1A.9020106@colorstudy.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <40183A1A.9020106@colorstudy.com> Message-ID: <5.1.1.6.0.20040129125353.00a7f730@telecommunity.com> At 10:15 PM 1/28/04 -0500, Jacob Smullyan wrote: >I'd be happier, therefore, if the >specification mandated more explicitly that the container must >actually respect the semantics of flush() -- that flush() may be a >no-op if output is unbuffered, but it may not be a no-op if it is not. >This means that output, for instance, could not be a StringIO object >the contents of which the container blits back to the client after >runCGI() returns. This does mean that gateways that perform header parsing (i.e. gateways built into webservers) will need a StringIO subclass. But that doesn't seem like too terrible of a burden. I think we need to specify that flush()ed output is not guaranteed to be sent to the client unless the service has sent all of its headers already. Then, header-parsing gateways can force output to be buffered until they have seen the headers. From pje at telecommunity.com Thu Jan 29 13:16:41 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Jan 29 13:17:46 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <840184C7-5228-11D8-A3A0-000393C2D67E@colorstudy.com> References: <20040129052711.GA19546@smullyan.org> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <40183A1A.9020106@colorstudy.com> <20040129031523.GA17649@smullyan.org> <50C3615E-5212-11D8-A3A0-000393C2D67E@colorstudy.com> <20040129052711.GA19546@smullyan.org> Message-ID: <5.1.1.6.0.20040129130320.02c8aec0@telecommunity.com> At 12:58 AM 1/29/04 -0600, Ian Bicking wrote: >On Jan 28, 2004, at 11:27 PM, Jacob Smullyan wrote: >>>Obviously some >>>containers (ack, too many alternative terminologies at this point) will >>>not be able to finish the request until after control has been returned >>>from runCGI, but I don't see how that can be helped. >> >>It may be true that it can't be helped -- but it isn't obvious to me, >>yet. Is there anything about the assumptions of some frameworks that >>would prevent a file-like object they are writing to from doing some >>networking when they close it? I would think that is the business of >>the container, not the containee. Or do you mean that the container >>may be in turn nested in a preexisting environment, not optimized >>specifically for this sort of containment, to which it delegates its >>networking responsibilities -- the "run Zope inside WebWare" scenario? >>That kind of scenario shouldn't be a sticking point, in my view, >>because it will always be suboptimal. If you want coexistence of two >>different application environments, I'd expect to do better nesting >>them both in one shared container which has limited, specialized >>functionality than arranging them serially (Zope running inside >>WebWare running inside SkunkWeb running inside...). A lower level of >>compliance, for such "convenience serial containers", would be >>forgiveable. > >I almost took it back, that all containers should be able to do a network >response when you close. But I'm not sure about plain CGI -- the only way >to finish a CGI request may be to end the process. It is, at least with Apache on the Linux platform I tried it on once. :) >>My focussing on flush() rather than close() was the wrong emphasis. >>The reason I made that mistake was that I assumed that the container >>was not in the business of worrying about http, but that that was the >>responsibility of an adapter sitting between the application proper >>and the container (that the container was in no-parsed-header mode, in >>cgi terms); in that case, the container has no reason not to flush >>when the client toggles the handle, rather than waiting for him to get >>off the pot. Upon reflection, I think the container *should* parse >>headers by default -- what a bore for every adapter to have to do >>this. > >I think it's implied that the container will parse the headers, since >that's what we're all used to from CGI. Well, it's definitely implied that the headers will be parsed, but not by whom. CGI and FastCGI gateways have the luxury of allowing the upstream webserver to do the parsing. :) > This should consist mostly of the Status header, and maybe the Location > header. I've never been clear on the Location header, though -- it's > semantics are very unclear to me in the absence of a Status header. IIRC, it's supposed to imply a Status: 302 header, which is how I've implemented in WSGIServer. > I think with Apache you get an internal redirect if you give a path > without a host name, and an external temporary redirect otherwise. But > in some environments I think it always becomes an external redirect, and > relative paths are resolved to absolute paths before being sent to the > client. That might be nice to define. Or to strongly encourage > applications to only use fully qualified redirects, and to always give a > Status header unless doing a 200 response. I think that the latter recommendation would likely be best. I was really hoping to avoid turning the PEP into Yet Another Attempt To Formalize CGI, though. :( >>However, it would be nice if in runCGI you could set an >>attribute of output that would tell it whether to parse headers or >>not; it would seem reasonable to at least aspire to the range of >>functionality already implemented by CGI :). But I don't know if >>Mr. Eby wants his nice abstract output object mucked up with pesky >>attributes! (This suggests a T-shirt: "Get your attributes off my >>object!" -- "attributes" being more polite, if less general, than >>"members".) > >Yes, I suspect he won't like that addition ;) Not because of the attribute per se, but because gateways couldn't guarantee honoring it. CGI and FastCGI gateways would be powerless to *stop* header parsing, for example. If we had to have something like this, it'd need to be done in such a way that a container that couldn't do 'nph' style services would simply refuse to boot the service up in the first place, or else make this property known to the app at startup. In general, though, I'm still loath to add properties that have to be checked by the service. For example, I consider the DBAPI spec's inclusion of the "paramstyle" property to be broken because it means I can't write portable DBAPI code without bypassing the use of parameters! Thus, this is an area where I'd rather see the app say, "I can't live without unparsed headers, so don't even bother running me if you can't handle it." OTOH, I don't believe the unparsed header style is common for most apps/frameworks that aren't built into webservers to start with, so I'd like to dodge the entire issue if possible. :) >It would be nice to better specify what kind of parsing will occur, what >headers might be added, which are off limits (to application or to >container). CGI is poorly specified. Indeed. From ianb at colorstudy.com Fri Jan 30 16:14:05 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Jan 30 16:15:01 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> References: <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> Message-ID: <401AC91D.1080100@colorstudy.com> Phillip J. Eby wrote: > At 11:13 PM 1/28/04 -0600, Ian Bicking wrote: > >> So what alternative do you propose for handling a shutdown? The >> application *needs* to know about this. I don't think I trust atexit >> (though I'm open to it if it really would work). Also, if the >> application spawns threads that will simply block shutdown unless they >> can be told to stop. > > What do you propose to use instead of atexit? After some experimentation, atexit works better than I had thought. Now I don't know we bother with explicit shutdown methods. So I remove the suggestion for a shutdown method. >> I'm not trying to be difficult, I'm just trying to envision how I >> would adapt Webware's gateway and application (AppServer and >> Application) to this interface. I don't think of Webware as being >> particularly featureful, so I'm surprised other people haven't seen >> these problems either, unless there are solutions that I'm missing. > > > Well, I'm so far having trouble understanding the specific things you're > trying to do, that aren't addressed by stdlib features. I am still open > to addressing whatever they might be, I just want concrete use cases and > narrow solutions. IOW, I'm YAGNI on widening the interface, and saying > "show me the use cases". Sessions remain the use case I consider to be unresolved. I think sessions should be possible to implement portable, without requiring configuration that specifies the gateway's process/concurrency model. (That doesn't rule out configuration, just that in the absence we should be able to pick the right implementation) Probably all this requires is a way to indicate the process model. >> Now I'm confused. If it's a single process, and handles only one >> request, isn't that just broken? I don't know of any example of such >> a server, since it wouldn't be able to handle concurrent requests. > > Do you need concurrent requests for a single-user "webtop" application > that runs on your desktop? The fact that the model has a limited usage > profile doesn't make it broken. Perhaps somebody will also speak up in > favor of "multiprocess+multithread" model that I previously mentioned as > making my head hurt. Okay, I suppose it can exist, though it's a rather obscure implementation. I suppose there are debugging environments that require this form of execution. >>> LRSP-single means only one thread. LRSP-multi means multi-threaded. >>> Asynchronousness actually implies LRSP-multi, because if you're doing >>> an asynchronous event loop the only way you can afford to call a >>> blocking 'runCGI()' is to do it in a thread. Twisted and ZServer are >>> asynchronous LRSP-multi. >> >> >> Okay. This seems to mean that Twisted wouldn't use this interface >> internally, since they don't want to unnecessarily spawn a thread, and >> the interface doesn't seem to allow for a non-blocking API. > > > Twisted has a builtin "thread pool" mechanism for this. A WSGI > implementation for Twisted would simply call (IIRC): > > reactor.callInThread(service.runCGI, stdinWrapper, stdoutWrapper, > stderrWrapper, environ) > > And threads would only be spawned up to the configured pool size. If > there are more concurrent requests than allocated threads, the runCGI > call will be queued until an existing thread finishes. > > ZServer has a similar mechanism, although I believe it's more "internal" > and less available for a third party to do. Twisted is flexible enough > that a third party could roll their own Twisted-based WSGI gateway, if > the core developers aren't interested or want no part of it. :) Yes, my point was just that Twisted .rpy scripts and the like wouldn't fit into the model. Or, they could, but it would be kind of silly, since you'd loose all the advantages of async, even if you ran under a Twisted gateway. In contrast, many other systems could be composed into gateway and application without any real downside. Ian From pje at telecommunity.com Fri Jan 30 16:31:35 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Jan 30 16:50:29 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <401AC91D.1080100@colorstudy.com> References: <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> Message-ID: <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> At 03:14 PM 1/30/04 -0600, Ian Bicking wrote: >Sessions remain the use case I consider to be unresolved. I think >sessions should be possible to implement portable, without requiring >configuration that specifies the gateway's process/concurrency model. >(That doesn't rule out configuration, just that in the absence we should >be able to pick the right implementation) > >Probably all this requires is a way to indicate the process model. How about by indicating whether the application can be 1. Run in multiple processes 2. Run by multiple threads as information exposed by the app? Or do you think that we should have a "setup" method called by the container to tell the app what model it's going to be run under? >Yes, my point was just that Twisted .rpy scripts and the like wouldn't fit >into the model. Or, they could, but it would be kind of silly, since >you'd loose all the advantages of async, even if you ran under a Twisted >gateway. In contrast, many other systems could be composed into gateway >and application without any real downside. Right. The goal for WSGI in the case of Twisted is to let people migrate apps *to* Twisted, not *from* it. :) I don't think that migrating from Twisted to anything other than another event-driven framework (e.g. peak.events) is even possible, and there aren't any mature alternatives with comparable capabilities to Twisted at the moment, if i understand correctly. From pje at telecommunity.com Fri Jan 30 17:10:55 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Jan 30 17:11:03 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <401ACF1B.2070605@colorstudy.com> References: <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> Message-ID: <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> At 03:39 PM 1/30/04 -0600, Ian Bicking wrote: >Phillip J. Eby wrote: >>At 03:14 PM 1/30/04 -0600, Ian Bicking wrote: >> >>>Sessions remain the use case I consider to be unresolved. I think >>>sessions should be possible to implement portable, without requiring >>>configuration that specifies the gateway's process/concurrency model. >>>(That doesn't rule out configuration, just that in the absence we should >>>be able to pick the right implementation) >>> >>>Probably all this requires is a way to indicate the process model. >> >>How about by indicating whether the application can be >>1. Run in multiple processes >>2. Run by multiple threads >>as information exposed by the app? Or do you think that we should have a >>"setup" method called by the container to tell the app what model it's >>going to be run under? > >Yes, I think we need a setup call. I want the app to be able to respond >to the environment, not merely indicate compatibility. Compatibility can >then be asserted by throwing an exception if you use the application in an >inappropriate container. So far, it seems to me that this could be done with the two pieces of information mentioned above. That is, whether the app will (potentially) be run in multiple processes, and whether it will (potentially) be run in multiple threads. If the conditions are unacceptable to the service, it should raise an error immediately upon receipt of this information. Note that this means that in the case of very primitive containers, an application integrator can themselves invoke the service's setup method. For "forward compatibility", I'm guessing the setup method should look something like: def setup(self, multiprocess=True, multithread=False, **futureOptions): ...and the gateway *must* use keywords to supply all arguments, now and in the future. In the trivial case, a service could simply implement: def setup(self, **dontCareAboutAnyofThis): pass and be done with it. Most of my apps will probably just look like: def setup(self, multiprocess=True, multithread=False, **futureOptions): if multithread: raise NotImplementedError("Threads are evil!") :) >Either way we have to specify clearly what the models are. Nobody's yet pointed out anything besides multiprocessness and multithreadedness as aspects of the models, so I guess we're done. We've also managed to rule out some of the hairier multithreading models by requiring all locking and object pooling to be on the service side, and all thread pooling to occur on the gateway side. That is, the gateway is responsible for having threads to call runCGI() in. The runCGI() method is responsible for either having separate objects per thread, or locking the objects that it operates on. (Assuming, of course, that it hasn't vetoed operation in a multithreaded environment.) Also, we should clarify that the 'multithread' flag means that the service's 'runCGI()' method may be called from more than one thread at the same time. If a gateway uses multiple threads but can guarantee that only one will ever call that object's 'runCGI()' method at a given point, it should be free to tell the service that 'multithread=False'. (I'm assuming here that there may be multi-app/multi-service containers at some point, that can be configured to run some apps serially, and others in parallel, according to the needs of the application.) >>>Yes, my point was just that Twisted .rpy scripts and the like wouldn't >>>fit into the model. Or, they could, but it would be kind of silly, >>>since you'd loose all the advantages of async, even if you ran under a >>>Twisted gateway. In contrast, many other systems could be composed into >>>gateway and application without any real downside. >> >>Right. The goal for WSGI in the case of Twisted is to let people migrate >>apps *to* Twisted, not *from* it. :) I don't think that migrating from >>Twisted to anything other than another event-driven framework (e.g. >>peak.events) is even possible, and there aren't any mature alternatives >>with comparable capabilities to Twisted at the moment, if i understand >>correctly. > >In the scope of web applications Medusa is very similar, isn't it? Fairly >significant applications are written with Medusa and without any threads >(PyDS, for instance). But that's an aside. I was referring to the full scope of functionality that Twisted encompasses, such as GUI apps, distributed objects, etc. Not merely the multi-protocol network services part of Twisted. From ianb at colorstudy.com Fri Jan 30 16:39:39 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Jan 30 17:27:45 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> References: <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> Message-ID: <401ACF1B.2070605@colorstudy.com> Phillip J. Eby wrote: > At 03:14 PM 1/30/04 -0600, Ian Bicking wrote: > >> Sessions remain the use case I consider to be unresolved. I think >> sessions should be possible to implement portable, without requiring >> configuration that specifies the gateway's process/concurrency model. >> (That doesn't rule out configuration, just that in the absence we >> should be able to pick the right implementation) >> >> Probably all this requires is a way to indicate the process model. > > > How about by indicating whether the application can be > > 1. Run in multiple processes > 2. Run by multiple threads > > as information exposed by the app? Or do you think that we should have > a "setup" method called by the container to tell the app what model it's > going to be run under? Yes, I think we need a setup call. I want the app to be able to respond to the environment, not merely indicate compatibility. Compatibility can then be asserted by throwing an exception if you use the application in an inappropriate container. Maybe it's even sufficient to pass this information in an environmental variable, though that would imply that the information may only apply to a single request. But I don't know how you could change the situation between requests, except to spawn threads in a LRSP-single environment. Either way we have to specify clearly what the models are. >> Yes, my point was just that Twisted .rpy scripts and the like wouldn't >> fit into the model. Or, they could, but it would be kind of silly, >> since you'd loose all the advantages of async, even if you ran under a >> Twisted gateway. In contrast, many other systems could be composed >> into gateway and application without any real downside. > > > Right. The goal for WSGI in the case of Twisted is to let people > migrate apps *to* Twisted, not *from* it. :) I don't think that > migrating from Twisted to anything other than another event-driven > framework (e.g. peak.events) is even possible, and there aren't any > mature alternatives with comparable capabilities to Twisted at the > moment, if i understand correctly. In the scope of web applications Medusa is very similar, isn't it? Fairly significant applications are written with Medusa and without any threads (PyDS, for instance). But that's an aside. Ian From ianb at colorstudy.com Fri Jan 30 17:29:17 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Jan 30 17:30:35 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> References: <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> Message-ID: <401ADABD.6050003@colorstudy.com> Phillip J. Eby wrote: > So far, it seems to me that this could be done with the two pieces of > information mentioned above. That is, whether the app will > (potentially) be run in multiple processes, and whether it will > (potentially) be run in multiple threads. If the conditions are > unacceptable to the service, it should raise an error immediately upon > receipt of this information. We should also indicate if the server is long-running (the only non-long-running instance being CGI). Of course, a long-running server may still only serve one request and be shut down, but there's lots of optimizations and caching you might want to do in a long-running process that aren't useful for a single-request process, and some applications may be useless in a short-running environment (e.g., the multi-request database transaction example). Otherwise I suppose I'm okay with the rest (even if I would like a real reference to the gateway). I can implement Webware using just this interface. The open keyword arguments allow me to extend this as well, e.g., the Webware Application displays AppServer configuration parameters in the web control panel, so I really would like a reference, even if it's not required. So actually I'm A-OK with the proposal. Ian From pje at telecommunity.com Fri Jan 30 18:46:23 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Jan 30 18:46:30 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <401ADABD.6050003@colorstudy.com> References: <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> Message-ID: <5.1.1.6.0.20040130183447.02ba08b0@telecommunity.com> At 04:29 PM 1/30/04 -0600, Ian Bicking wrote: >Phillip J. Eby wrote: >>So far, it seems to me that this could be done with the two pieces of >>information mentioned above. That is, whether the app will (potentially) >>be run in multiple processes, and whether it will (potentially) be run in >>multiple threads. If the conditions are unacceptable to the service, it >>should raise an error immediately upon receipt of this information. > >We should also indicate if the server is long-running (the only >non-long-running instance being CGI). Of course, a long-running server >may still only serve one request and be shut down, but there's lots of >optimizations and caching you might want to do in a long-running process >that aren't useful for a single-request process, and some applications may >be useless in a short-running environment (e.g., the multi-request >database transaction example). > >Otherwise I suppose I'm okay with the rest (even if I would like a real >reference to the gateway). I can implement Webware using just this >interface. The open keyword arguments allow me to extend this as well, >e.g., the Webware Application displays AppServer configuration parameters >in the web control panel, so I really would like a reference, even if it's >not required. So actually I'm A-OK with the proposal. My intention was that additional keywords be reserved for future WSGI versions, *not* become a way to sneak unspecified information into the interface. Private communications are always possible through framework-specific mechanisms, such as sniffing for a 'giveMeWebwareInfo()' method that the container can call, or implementation of a particular interface, subclassing from a particular base, etc. etc. Thing is, your Application won't be able to display that AppServer stuff *unless* it's being run under an AppServer. And, there's no point in the AppServer telling non-Webware applications they're being run under it. So, it should introspect you using Webware-specific techniques, and call Webware-specific methods in that event. In other words, a typesafe mechanism agreed upon by both sides of the (Webware-specific) protocol. If it's not defined in the standard, I don't want it allowed by the standard. Let it be some other standard, like the "Webware service metadata extension to WSGI", that defines how you can make your service receive extra data when it's run in a Webware container. There's nothing wrong with having such protocol extensions, but they should not be part of the base protocol's methods. And I don't want to build "bags of data" into the base protocol with "extension info" that you then have to sift through to find things, when we could easily just have different methods that are part of other protocols. (Indeed, I'm beginning to wonder if WSGI methods should all have some sort of prefix or suffix on them to help thoroughly disambiguate them from extended protocols. E.g. 'runWSGI()', 'setupWSGI()' etc.) From ianb at colorstudy.com Fri Jan 30 19:37:11 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Jan 30 19:37:09 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <5.1.1.6.0.20040130183447.02ba08b0@telecommunity.com> References: <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> <5.1.1.6.0.20040130183447.02ba08b0@telecommunity.com> Message-ID: <9B74435A-5385-11D8-9074-000393C2D67E@colorstudy.com> On Jan 30, 2004, at 5:46 PM, Phillip J. Eby wrote: >> Otherwise I suppose I'm okay with the rest (even if I would like a >> real reference to the gateway). I can implement Webware using just >> this interface. The open keyword arguments allow me to extend this >> as well, e.g., the Webware Application displays AppServer >> configuration parameters in the web control panel, so I really would >> like a reference, even if it's not required. So actually I'm A-OK >> with the proposal. > > My intention was that additional keywords be reserved for future WSGI > versions, *not* become a way to sneak unspecified information into the > interface. Private communications are always possible through > framework-specific mechanisms, such as sniffing for a > 'giveMeWebwareInfo()' method that the container can call, or > implementation of a particular interface, subclassing from a > particular base, etc. etc. I had a feeling you'd react this way ;) But I don't really see why it would be so bad. AppServer calls: application.setup(..., webKitAppServer=self) Application has: def setup(..., webKitAppServer=None, **kw): It's highly transparent what's happening. Since the spec leaves room for future extension, I assume applications are required to accept keywords they don't understand, and that they will ignore those arguments, and that applications must provide default arguments for values so that they support containers that don't provide those arguments. > Thing is, your Application won't be able to display that AppServer > stuff *unless* it's being run under an AppServer. And, there's no > point in the AppServer telling non-Webware applications they're being > run under it. So, it should introspect you using Webware-specific > techniques, and call Webware-specific methods in that event. In other > words, a typesafe mechanism agreed upon by both sides of the > (Webware-specific) protocol. Obviously Application would have to accept None in place of the AppServer, and react accordingly. That's my problem to deal with, not the spec's. > If it's not defined in the standard, I don't want it allowed by the > standard. Let it be some other standard, like the "Webware service > metadata extension to WSGI", that defines how you can make your > service receive extra data when it's run in a Webware container. > There's nothing wrong with having such protocol extensions, but they > should not be part of the base protocol's methods. And I don't want > to build "bags of data" into the base protocol with "extension info" > that you then have to sift through to find things, when we could > easily just have different methods that are part of other protocols. The one keyword is the only addition I'd make. Everything else would be between Application and AppServer. Again, I really don't see the problem. I don't see where this would cascade into some unmaintainable mess. > (Indeed, I'm beginning to wonder if WSGI methods should all have some > sort of prefix or suffix on them to help thoroughly disambiguate them > from extended protocols. E.g. 'runWSGI()', 'setupWSGI()' etc.) I think setup() is rather generic, though it could be setupCGI and it would be fine. I think you are overreacting. And hey, I just said what someone else was eventually going to think of anyway. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From pje at telecommunity.com Fri Jan 30 20:38:45 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Jan 30 20:38:52 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <9B74435A-5385-11D8-9074-000393C2D67E@colorstudy.com> References: <5.1.1.6.0.20040130183447.02ba08b0@telecommunity.com> <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <5.1.1.6.0.20040123191620.024c20c0@telecommunity.com> <5.1.0.14.0.20040124220658.021b5680@mail.telecommunity.com> <5.1.0.14.0.20040127103955.0243ddb0@mail.telecommunity.com> <5.1.1.6.0.20040128142134.0200cec0@telecommunity.com> <5.1.1.6.0.20040128161939.020683c0@telecommunity.com> <5.1.1.6.0.20040128175011.01ef14c0@telecommunity.com> <5.1.1.6.0.20040129120407.02da4ab0@telecommunity.com> <5.1.1.6.0.20040130162209.020c4380@telecommunity.com> <5.1.1.6.0.20040130165705.01eac1e0@telecommunity.com> <5.1.1.6.0.20040130183447.02ba08b0@telecommunity.com> Message-ID: <5.1.1.6.0.20040130202631.02034290@telecommunity.com> At 06:37 PM 1/30/04 -0600, Ian Bicking wrote: >It's highly transparent what's happening. Since the spec leaves room for >future extension, I assume applications are required to accept keywords >they don't understand, and that they will ignore those arguments, and that >applications must provide default arguments for values so that they >support containers that don't provide those arguments. YAGNI. Framework-specific communication can easily -- and therefore should -- occur via framework-specific interfaces. There is absolutely no need for it in the interface. Ergo, it shouldn't be there. Entities should not be multiplied beyond necessity. >The one keyword is the only addition I'd make. Everything else would be >between Application and AppServer. Again, I really don't see the >problem. I don't see where this would cascade into some unmaintainable mess. Unmaintainable messes grow from small hacks. You fertilize them with neglect, water them with tears... sorry, Babylon 5 reference. :) >>(Indeed, I'm beginning to wonder if WSGI methods should all have some >>sort of prefix or suffix on them to help thoroughly disambiguate them >>from extended protocols. E.g. 'runWSGI()', 'setupWSGI()' etc.) > >I think setup() is rather generic, though it could be setupCGI and it >would be fine. I think you are overreacting. And hey, I just said what >someone else was eventually going to think of anyway. I'm sure they will. And they'll be in violation of the spec. Providing a backchannel won't improve the spec - it'll just lead to balkanized practices and adhoc extension. But, well-defined complementary protocols are a plus: if e.g. the Webware "side protocol" becomes popular, other containers might offer the info, and it might eventually make its way into a new WSGI version. But I absolutely do not want even the slightest confusion before then that the side protocol is in any way a part of WSGI. Many people develop systems by copy-paste-modify from existing examples. That means that people will copy code with the nonstandard keywords, along with the processing code for them, thinking it's some part of WSGI that's not covered in the spec, but hey, the code works, so it must be right... Anyway, as you can see, this is a social issue, not a technical one. Interfaces that go between different developers are *always* a social issue, and therefore deserve appropriate care in their social engineering. From gward at python.net Fri Jan 30 22:14:30 2004 From: gward at python.net (Greg Ward) Date: Fri Jan 30 22:14:39 2004 Subject: [Web-SIG] Web Container Interface In-Reply-To: <40180AA5.4070007@colorstudy.com> References: <40115961.7020906@colorstudy.com> <5.1.1.6.0.20040123134806.01f9c020@telecommunity.com> <20040128023544.GA821@cthulhu.gerg.ca> <40180AA5.4070007@colorstudy.com> Message-ID: <20040131031430.GA5694@cthulhu.gerg.ca> [me, commenting on what the Java folks got wrong] > * the level of granularity is wrong: most Java web applications > consist of multiple servlets, and if the code I work on in my > day job is any indication, there's a lot of overlapping code > among the servlets in a given application. Thus, the point > of entry between a web application container and a collection > of web applications should be... the web application. > > (The Java community has figured this out; when you administer a > modern servlet container like Tomcat, you generally work at the > level of web apps, rather than individual servlets or the whole > container. The existence of "servlets" as a separate entity > complicates both administering a servlet container and writing web > applications. It's a nasty design flaw that we should strenuously > avoid.) [Ian asks for clarification] > Can you expand? Do you mean that servlets as an exposed resource are > unnecessary, and that the only exposed resource should be the > application as a whole? That pretty much nails it. There should be exactly one programmatic point of interface between the container and the application, and it should almost certainly be an instance of a class named something like WebApplication. (This is where the mythical runCGI() method should exist. Although that is clearly the wrong name; I'd prefer handle_request() or something like that. Or even just run() or handle(), what the heck.) At work, I'm in the (unpleasant) position of maintaining a legacy web application written as a collection of similar servlets that just write HTML to stdout -- the Java equivalent of a bunch of CGI scripts, really. However, we're running it under a modern servlet container -- Tomcat -- that implements a recent version of the Java servlet specification, which in particular specifies just what a web application is, how it's executed, how to configure it (ie. describe it to the servlet container), etc. (You can download all 330 eye-glazing pages of the servlet specification from http://jcp.org/aboutJava/communityprocess/final/jsr154/index.html; the struggle to get that far is alone indicative of how vastly more pleasant working with Python is. It's a continual source of amazement to me that Sun has not succeeded in killing Java with sheer bureaucracy; I guess lots of programmers just love those curly braces and static types.) Anyways, the annoying thing about writing web apps in Java is that as a *programmer*, I primarily work on servlets, but to actually get those servlets to run -- ie. if I pretend to be a sysadmin for a few minutes -- then I have to step away from the code and start mucking around in complicated XML config files that describe my web application(s). And how do you describe a web application? By listing the servlets that implement it, of course! So the sysadmin has to be pretty intimately aware of how the application is constructed -- eg. which classes handle which URLs -- and the programmer does not work at the most natural level. That said, I think modern Java web development has moved beyond writing individual servlets -- they're still there, but whatever framework you use takes care of providing the actual servlet class; application developers just write handlers that the main servlet calls. (At least that's a very vague, hand-wavey overview of how Struts, which is a popular framework over in Java-land, works. I know little about Struts and nothing about any other Java framework. They're all pretty pathetic compared to Quixote, as far as I'm concerned. ;-) > Which may in turn be factored into servlets, > but that's up to the application to determine...? I think if the notion of "servlet" had never existed, no one would have bothered to invent it. As far as I can tell, it was just some silly idea cooked up by some pinhead at Sun that caught on because, well, lots of programmers like curly braces and static types, and Java's the best curly-braces-and-static-types programming language out there by a long shot. They got the HttpServletRequest and HttpServletResponse classes dead-on; I'll give them that much credit. And the session management seems pretty sound. But the rest of it... I'm not so sure. Greg -- Greg Ward http://www.gerg.ca/ Budget's in the red? Let's tax religion! -- Dead Kennedys