From pje at telecommunity.com Sat Oct 2 01:07:04 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Oct 2 01:08:40 2004 Subject: [Web-SIG] Latest WSGI revision posted; finalization soon? Message-ID: <5.1.1.6.0.20041001185733.02147810@mail.telecommunity.com> FYI, I've posted a new revision, mainly it's just the changes Mark proposed. At this point, the only real open issue is what to do about the async API, and drafting a section on sync/async/threading, to replace the currently very short section on threading. It's been a little over a week since I proposed an alternative way to structure an optional asynchronous API, but I haven't seen any comments on that API. I'd really like to get some sort of async API finalized, just so that there is some "standard" way of offering the feature. But, since I personally don't need it, I'd like some guidance from the community as to what approach is more desirable. The other option is to merely present some ideas and alternatives in the PEP, and leave it to the community to try different things. Whichever way we go, I'd ideally like to see the PEP able to move to a "Final" status this month, such that we don't make any further semantic changes to 1.0. From ianb at colorstudy.com Sun Oct 3 06:18:44 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sun Oct 3 06:18:49 2004 Subject: [Web-SIG] WSGI Webware progress Message-ID: <415F7DA4.3090805@colorstudy.com> I've made quite a bit of progress with the WSGI port of Webware, running two real applications I've written under it, without any significant changes to the applications (except for an import statement or two). The applications weren't written with WSGI in mind, so they didn't limit themselves to things that seemed simple under WSGI. OTOH, I wrote both of them, and I only use a subset of the Webware API. The Webware portion of this remains fairly minimal, mostly some simple classes that translate the WSGI environment and general system to the Webware API. In the process, I've made some reusable middleware that is Webware-neutral, but implements some of Webware's functionality (and I layer them in roughly this order): * httpexceptions; catches particular exceptions and turns them into HTTP responses (e.g., HTTPMovedPermanently, HTTPNotFound, etc). In a way I wish this was standard. However, the other middleware doesn't use this (though some of my Webware code does). * recursive; allows applications to forward to other URLs and to make recursive calls to include other URLs. These URLs have to be under the location where recursive is used. * session; implements sessions. The persistence is simple and doesn't take concurrency into account (yet), but the basic structure seems correct to me. * urlparser; this takes a URL and finds an application based on it. Currently it looks in a single directory, parses out the next part, and finds the application associated. Subdirectories turn into other urlparser instances. Finds .py modules, and looks for "application" (which is a ready-made application), or module.module_name, where the object must be called before it is ready to act as an application (in Webware's case, this is a class, instances of which are WSGI applications). Also serves up static files, like .css, .html, etc. * wsgilib; a number of generic functions for use with WSGI. Right now this includes: * Cookie parser: get_cookies * Something to add a finalizing function to an iterator: add_close * A way to run a request in a fake environment, for interactive debugging and testing: interactive * An error response creator (for 404 messages, etc): error_response * An application-builder for on-disk files: send_file I still need to do more testing, and write some unit tests for these middleware. But progress has gone well, and implementing a real-world framework on WSGI seems very doable. This is a more aggressive use of WSGI than many framework ports may make; a simpler porting technique would be to take the whole framework and find a single entry point, letting the framework keep all its URL parsing and other code. I'm doing this refactoring in part because I think it's the right direction for Webware, moreso than it's the best or easiest way to port a framework. Comments and suggestions welcome. The code is located at svn://colorstudy.com/trunk/WSGI -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From floydophone at gmail.com Sun Oct 3 16:42:13 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sun Oct 3 16:42:15 2004 Subject: [Web-SIG] WSGI Webware progress Message-ID: <6654eac40410030742163cd370@mail.gmail.com> Looking good! I see we've written a lot of similar code; perhaps we could merge our two separate efforts into "wsgilib"? From pje at telecommunity.com Sun Oct 3 18:02:09 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Oct 3 18:02:02 2004 Subject: [Web-SIG] WSGI Webware progress In-Reply-To: <6654eac40410030742163cd370@mail.gmail.com> Message-ID: <5.1.1.6.0.20041003120107.03bf15c0@mail.telecommunity.com> At 10:42 AM 10/3/04 -0400, Peter Hunt wrote: >Looking good! I see we've written a lot of similar code; perhaps we >could merge our two separate efforts into "wsgilib"? Heh. I've also started work on a "wsgilib", mainly to provide common base classes and utility functions for servers and gateways. Maybe we need to co-ordinate in some fashion. :) From py-web-sig at xhaus.com Sun Oct 3 20:15:55 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Sun Oct 3 20:17:26 2004 Subject: [Web-SIG] ANN: Release 0.20.0 of modjy: a WSGI gateway for jython 2.1 and J2EE. Message-ID: <416041DB.30107@xhaus.com> Dear all, I am somewhat pleased to announce the release of version 0.20.0 of modjy, a WSGI-compliant gateway for Jython 2.1 and J2EE. Modjy is released under the Apache 2.0 License. You can download this release, including all source and documentation, from the following address http://www.xhaus.com/modjy There are still a number of areas to be cleaned up, including the import mechanism. Also exception handling needs to be improved, especially with the introduction of the "exc_info" parameter to the start_response_callable. Also, I have a fair amount of tests to develop. But I don't have a lot of time to spare right now, due to extensive work commitments, so it may be a while before those tests are developed. The reason why I decided to release without a full test suite is because there seems to be several members of the WEB-SIG who are developing WSGI test suites right now, so I'm hoping that I will be able to at least partially reuse those test suites. Still, I'm hoping that you will find modjy a useful test bed for WSGI development. Kind regards, Alan. From pje at telecommunity.com Sun Oct 3 21:47:12 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Oct 3 21:47:05 2004 Subject: [Web-SIG] ANN: Release 0.20.0 of modjy: a WSGI gateway for jython 2.1 and J2EE. In-Reply-To: <416041DB.30107@xhaus.com> Message-ID: <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> At 07:15 PM 10/3/04 +0100, Alan Kennedy wrote: >Dear all, > >I am somewhat pleased to announce the release of version 0.20.0 of modjy, >a WSGI-compliant gateway for Jython 2.1 and J2EE. Modjy is released under >the Apache 2.0 License. > >You can download this release, including all source and documentation, >from the following address > >http://www.xhaus.com/modjy > >There are still a number of areas to be cleaned up, including the import >mechanism. Also exception handling needs to be improved, especially with >the introduction of the "exc_info" parameter to the start_response_callable. Looks pretty good. FYI, as far as I can tell, your 'j2ee.*' extensions aren't compliant, because they can bypass middleware modifications to the environment. (I wonder if perhaps the current mechanism to prevent middleware bypassing is too heavyweight?) I haven't had time to read all of the source code yet, so I'm not sure if that's the only compliance issue, but that's the only one I've seen in your documentation. By the way, if you do implement pooling of application objects to bypass their single-threadedness, I think the only really safe way to do that is by having a separate Jython interpreter for each one. A single-threaded application is going to assume it can use module-level globals without conflicts, so just creating duplicate application objects isn't going to resolve that issue. From floydophone at gmail.com Mon Oct 4 00:23:55 2004 From: floydophone at gmail.com (Peter Hunt) Date: Mon Oct 4 00:23:58 2004 Subject: [Web-SIG] WSGI Webware progress In-Reply-To: <5.1.1.6.0.20041003120107.03bf15c0@mail.telecommunity.com> References: <6654eac40410030742163cd370@mail.gmail.com> <5.1.1.6.0.20041003120107.03bf15c0@mail.telecommunity.com> Message-ID: <6654eac4041003152327b03280@mail.gmail.com> I think we could use a SVN repository for all of this stuff. Most of my code is uploaded on http://st0rm.hopto.org/wsgi/, except I've been working on a Twisted.web resource for running WSGI apps. On Sun, 03 Oct 2004 12:02:09 -0400, Phillip J. Eby wrote: > > > At 10:42 AM 10/3/04 -0400, Peter Hunt wrote: > >Looking good! I see we've written a lot of similar code; perhaps we > >could merge our two separate efforts into "wsgilib"? > > Heh. I've also started work on a "wsgilib", mainly to provide common base > classes and utility functions for servers and gateways. Maybe we need to > co-ordinate in some fashion. :) > > From py-web-sig at xhaus.com Mon Oct 4 01:00:20 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Oct 4 01:01:20 2004 Subject: [Web-SIG] ANN: Release 0.20.0 of modjy: a WSGI gateway for jython 2.1 and J2EE. In-Reply-To: <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> References: <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> Message-ID: <41608484.8010101@xhaus.com> [Phillip J. Eby] > Looks pretty good. FYI, as far as I can tell, your 'j2ee.*' extensions > aren't compliant, because they can bypass middleware modifications to > the environment. Indeed, I was aware of that. I meant to add something to the documentation which said "these modjy-specific extensions are not compliant with the strict wording of the spec, which forbids access to HTTP request and response data in a way that bypasses WSGI mechanisms". However ..... > (I wonder if perhaps the current mechanism to prevent middleware > bypassing is too heavyweight?) I'm sort of thinking that it is a little heavyweight. I think that anyone who wants to bypass the middleware will probably have a good reason for doing so. Also, they would probably be very aware that their application would no longer be portable. Also, I would have to add a fair amount of extra code, just to ensure that the extension APIs present the same information as the standard WSGI interface. Which seems unnecessary, given that the WSGI information is already there. More importantly, that extra code would then make it impossible for the application/framework author to get at the original request, which they might conceivably really, really, need ..... But I am concerned about the statement in the PEP which says "it is very important that these "safe extension" rules be followed by both server/gateway and middleware developers, in order to avoid a future in which middleware developers are forced to delete any and all extension APIs from environ to ensure that their mediation isn't being bypassed by applications using those extensions!" I definitely don't want to bring such a future about .... > I haven't had time to read all of the source code yet, so I'm not sure > if that's the only compliance issue, but that's the only one I've seen > in your documentation. I am reasonably sure that there are other minor nits in the code, which I will incrementally fix in the coming weeks. The reason for "releasing early, releasing often" is that I want to demonstrate that I am serious about publishing a production-quality J2EE->WSGI gateway for jython. > By the way, if you do implement pooling of application objects to bypass > their single-threadedness, I think the only really safe way to do that > is by having a separate Jython interpreter for each one. A > single-threaded application is going to assume it can use module-level > globals without conflicts, so just creating duplicate application > objects isn't going to resolve that issue. That's true, and would indeed be quite messy to implement. I'll leave that one on the back burner for now. Regards, Alan. From pje at telecommunity.com Mon Oct 4 04:38:47 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Oct 4 04:38:40 2004 Subject: [Web-SIG] ANN: Release 0.20.0 of modjy: a WSGI gateway for jython 2.1 and J2EE. In-Reply-To: <41608484.8010101@xhaus.com> References: <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041003222240.036b41c0@mail.telecommunity.com> At 12:00 AM 10/4/04 +0100, Alan Kennedy wrote: >[Phillip J. Eby] >>(I wonder if perhaps the current mechanism to prevent middleware >>bypassing is too heavyweight?) > >I'm sort of thinking that it is a little heavyweight. I think that anyone >who wants to bypass the middleware will probably have a good reason for >doing so. Also, they would probably be very aware that their application >would no longer be portable. Well, the problem isn't portability, nor is it *intending* to bypass middleware. The problem is that you can write a portable program that uses bypass APIs for performance when they're available, but then mysteriously breaks when you add middleware to the mix, because it's bypassing the middleware. >Also, I would have to add a fair amount of extra code, just to ensure that >the extension APIs present the same information as the standard WSGI >interface. Which seems unnecessary, given that the WSGI information is >already there. Right. I think it's a natural first thought to say, "Oh, I'll add an extension API so you can get at the original server request", but given the purpose of WSGI, at second thought it seems rather pointless. If the app author wanted something non-portable, he'd have written to the server's API to begin with. If it's *extra* information you're providing, just add it to environ, as long as it's not information *derived* from other data in environ. If it's derived, offer a function to derive it, rather than data. If you're providing a special input feature, attach it to the input stream, so that if middleware replaces the input stream, it disables the feature automatically. If it's a special output feature, supply an iterator-wrapper that can be returned by the application for special treatment by the server, or make it an attribute of start_response. Maybe the above guidelines should be added to the spec. >More importantly, that extra code would then make it impossible for the >application/framework author to get at the original request, which they >might conceivably really, really, need ..... For...? >But I am concerned about the statement in the PEP which says "it is very >important that these "safe extension" rules be followed by both >server/gateway and middleware developers, in order to avoid a future in >which middleware developers are forced to delete any and all extension >APIs from environ to ensure that their mediation isn't being bypassed by >applications using those extensions!" > >I definitely don't want to bring such a future about .... That is the big issue, yes. When an app behaves mysteriously when middleware is added, the middleware author will get the blame, even though the application developer did everything right, and the server author is the real culprit. So, the middleware author will gripe and grumble and add code to delete the server's extensions... in which case there was no point in the server author putting them there. From ianb at colorstudy.com Mon Oct 4 04:54:13 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Oct 4 04:54:17 2004 Subject: [Web-SIG] ANN: Release 0.20.0 of modjy: a WSGI gateway for jython 2.1 and J2EE. In-Reply-To: <5.1.1.6.0.20041003222240.036b41c0@mail.telecommunity.com> References: <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> <5.1.1.6.0.20041003222240.036b41c0@mail.telecommunity.com> Message-ID: <4160BB55.9050902@colorstudy.com> Phillip J. Eby wrote: >> Also, I would have to add a fair amount of extra code, just to ensure >> that the extension APIs present the same information as the standard >> WSGI interface. Which seems unnecessary, given that the WSGI >> information is already there. > > > Right. I think it's a natural first thought to say, "Oh, I'll add an > extension API so you can get at the original server request", but given > the purpose of WSGI, at second thought it seems rather pointless. If > the app author wanted something non-portable, he'd have written to the > server's API to begin with. If it's *extra* information you're > providing, just add it to environ, as long as it's not information > *derived* from other data in environ. If it's derived, offer a function > to derive it, rather than data. I think Alan might be considering a situation in which there's some information which he isn't aware of that's missing, and rather than have the application author curse him for neutering his environment, he gives the author a way to get around it all. Then, ideally, the author makes a note of this and the information shows up in the next version of modjy. Or, the author who uses that information just has to be careful about data integrity him or herself. Maybe it would be sufficient not to provide the request or response immediately in the dictionary, but require the author to do something like j2ee_req = environ['modjy.request'](environ); then when they get this, you could emit a warning, or if they get the request and you detect that there's something weird about the environ, you return None, raise an exception, log a warning, or something along those lines. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From py-web-sig at xhaus.com Mon Oct 4 13:40:17 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Oct 4 13:41:25 2004 Subject: [Web-SIG] ANN: Release 0.20.0 of modjy: a WSGI gateway for jython 2.1 and J2EE. In-Reply-To: <4160BB55.9050902@colorstudy.com> References: <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> <5.1.1.6.0.20041003154110.0369c890@mail.telecommunity.com> <5.1.1.6.0.20041003222240.036b41c0@mail.telecommunity.com> <4160BB55.9050902@colorstudy.com> Message-ID: <416136A1.3050600@xhaus.com> [Alan Kennedy] >>> Also, I would have to add a fair amount of extra code, just to ensure >>> that the extension APIs present the same information as the standard >>> WSGI interface. Which seems unnecessary, given that the WSGI >>> information is already there. [Phillip J. Eby] >> Right. I think it's a natural first thought to say, "Oh, I'll add an >> extension API so you can get at the original server request", but >> given the purpose of WSGI, at second thought it seems rather >> pointless. If the app author wanted something non-portable, he'd have >> written to the server's API to begin with. Or the author may want to reuse some existing WSGI code, and minimally tweak it to use a server-specific API. And could explicitly check for relevant server-specific extensions in different servers/gateways, e.g. if environ.has_key('j2ee.request'): # Do J2EE specific processing elif environ.has_key('mod_python.request'): # Do mod_python specific processing else: raise UnableToProvideError() That said, I can not currently think of situation where such might be necessary. [Phillip J. Eby] >> If it's *extra* >> information you're providing, just add it to environ, as long as it's >> not information *derived* from other data in environ. If it's >> derived, offer a function to derive it, rather than data. [Ian Bicking] > I think Alan might be considering a situation in which there's some > information which he isn't aware of that's missing, and rather than have > the application author curse him for neutering his environment, he gives > the author a way to get around it all. I couldn't have said it better myself, Ian. [Ian Bicking] > Maybe it would be sufficient not to provide the request or response > immediately in the dictionary, but require the author to do something > like j2ee_req = environ['modjy.request'](environ); then when they get > this, you could emit a warning, or if they get the request and you > detect that there's something weird about the environ, you return None, > raise an exception, log a warning, or something along those lines. I'll do whatever is necessary to comply with the spec. If bypassing middleware is judged to be out-of-the-question, then I will either eliminate the extensions or wrap them so that they are compliant. Regards, Alan. From mnot at mnot.net Mon Oct 4 19:18:47 2004 From: mnot at mnot.net (Mark Nottingham) Date: Mon Oct 4 19:43:52 2004 Subject: [Web-SIG] Latest WSGI revision posted; finalization soon? In-Reply-To: <5.1.1.6.0.20041001185733.02147810@mail.telecommunity.com> References: <5.1.1.6.0.20041001185733.02147810@mail.telecommunity.com> Message-ID: <736F563A-1629-11D9-88DC-000A95BD86C0@mnot.net> Big +1! On Oct 1, 2004, at 4:07 PM, Phillip J. Eby wrote: > Whichever way we go, I'd ideally like to see the PEP able to move to a > "Final" status this month, such that we don't make any further > semantic changes to 1.0. -- Mark Nottingham http://www.mnot.net/ From james at pythonweb.org Mon Oct 4 20:46:40 2004 From: james at pythonweb.org (James Gardner) Date: Mon Oct 4 20:46:49 2004 Subject: [Web-SIG] Python Web Modules - Version 0.4.1 Message-ID: <41619A90.8060504@pythonweb.org> Hello, I'd like to announce the release of the Python Web Modules 0.4.1. This is the first time the modules have been publicly announced. http://www.pythonweb.org/ Feel free to download and have a play Back in March before the WSGI discussions there was some talk about releasing better standard modules in Python for developing web applications. This is my attempt to achieve that. These modules are designed to be easily accessible to beginners or developers currently using PHP or Perl whilst also offering lower level APIs for experts to create powerful dynamic websites. Key features include: * web.auth - Identity and identification handling. Users may have multiple access levels to multiple applications. Sign in and password reminder handling is built in. * web.session - Persistence using cookie or URL based session IDs allowing any object which can be pickled to be stored using a dictionary- like interface. Can be used with file or database drivers. * web.form - HTML Form generation and user input handling. Field objects available for HTML fields and the main Python types including date and time objects. Values returned as Python objects. * web.database - Database abstraction layer supporting MySQL, SQLite, ODBC and Gadfly for cross-database programming. Types are converted. - Multiple return formats including dict, tuple and object. - Object-relational mapper similar to SQLObject allowing transparent database manipulation using dictionary-like objects in Python code. One and many to many mappings and automatic HTML form generation for editing records are supported. * web.error - Enhanced error handling based on the principles of the cgitb module. Plain text or HTML output to a file or browser. Custom extension mechanism for email notifications and more. * web.template - Support for Cheetah, XYAPUT and Dreamweaver MX templates. * web.mail - Quickly send plain text or HTML emails. * web.image - Generate 2D pie, bar and scatter graphs in a variety of image formats. Requires PIL. * datetime - Python 2.3 date handling compatibility module for Python 2.2 There is probably nothing too ground-breaking here (apart from perhaps the HTML form interface being combined with a database ORM) but I have tried to make it all as complete and intuitive as possible which is why I feel it stands out from other modules. A sample webserver is included to test the examples. The full module reference and examples are available at: http://www.pythonweb.org/doc/0.4.1/ One feature which should make this package more attractive to certain developers over Zope or Webware is that no superuser rights are needed to use the modules since there is no application server to be run. They can be uploaded to a shared Apache-based web server and run without compilation or installation (although certain features are only available if you have external software). The project plan for the next stage includes continued work on useful applications such as user management and contact forms (which most websites use), write code to support the WSGI PEP and further improve the documentation. http://www.pythonweb.org/project/plan.html Any thoughts or comments would be really appreciated. Best wishes, James -- James Gardner james 'at' pythonweb.org http://www.pythonweb.org From ianb at colorstudy.com Mon Oct 4 22:55:46 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Oct 4 22:56:48 2004 Subject: [Web-SIG] Python Web Modules - Version 0.4.1 In-Reply-To: <41619A90.8060504@pythonweb.org> References: <41619A90.8060504@pythonweb.org> Message-ID: <4161B8D2.1020902@colorstudy.com> James Gardner wrote: > Hello, > > I'd like to announce the release of the Python Web Modules 0.4.1. > This is the first time the modules have been publicly announced. > > http://www.pythonweb.org/ Feel free to download and have a play > Back in March before the WSGI discussions there was some talk about > releasing better standard modules in Python for developing web > applications. This is my attempt to achieve that. These modules are > designed to be easily accessible to beginners or developers currently > using PHP or Perl whilst also offering lower level APIs for experts > to create powerful dynamic websites. Now with WSGI, have you thought about refactoring some of these with that in mind? Some of these are really WSGI-neutral libraries, but others aren't. The obvious place to start would be a WSGI backend. It doesn't seem like Python Web Modules model will work well in a non-CGI environment. Not only does it seem to put everything in the global space (e.g., web.cgi), making it difficult to run in threaded environments, but all the examples run the request at the top level of the module, so that you have to reload the module to serve a second request. This will paint you into a corner, as the API will be resistent to any other environments. There are some other parts that might be good as middleware. A deep stack of middleware starts to bring up issues of configuration and providing hooks... but that's another issue. Anyway... > Key features include: > > * web.auth - Identity and identification handling. Users may have > multiple access levels to multiple applications. Sign in and > password reminder handling is built in. This could be middleware, though obviously it requires a lot of user configuration. If it's middleware you could share a single authentication system with different WSGI applications. Some standardization in this case would be good -- starting with things as simple as environ['auth.username'] holding the string username. But for now there's no standard, so you should use a custom prefix. > * web.session - Persistence using cookie or URL based session IDs > allowing any object which can be pickled to be stored using a > dictionary- like interface. Can be used with file or database > drivers. This would be good as a WSGI middleware. I have such a middleware at svn://colorstudy.com/trunk/WSGI/session.py , but the actual persistence and configuration is minimal. But it might be helpful for thinking about how it might look as middleware. > * web.error - Enhanced error handling based on the principles of > the cgitb module. Plain text or HTML output to a file or browser. > Custom extension mechanism for email notifications and more. This could also be a piece of middleware. I feel like it's one of the more complicated kinds of middleware, but useful. It could also be a bit of library code that applications can use, but I'd prefer it as middleware because you could configure it for multiple applications. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From james at pythonweb.org Tue Oct 5 01:07:04 2004 From: james at pythonweb.org (James Gardner) Date: Tue Oct 5 01:07:12 2004 Subject: [Web-SIG] Python Web Modules - Version 0.4.1 Message-ID: <4161D798.6030609@pythonweb.org> Thanks for the comments, much appreciated. I'm afraid I've got some more questions though :-) > Now with WSGI, have you thought about refactoring some of these with > that in mind? Some of these are really WSGI-neutral libraries, but > others aren't. > > The obvious place to start would be a WSGI backend. It doesn't seem > like Python Web Modules model will work well in a non-CGI environment. > Not only does it seem to put everything in the global space (e.g., > web.cgi), making it difficult to run in threaded environments, but all > the examples run the request at the top level of the module, so that > you have to reload the module to serve a second request. This will > paint you into a corner, as the API will be resistent to any other > environments. Agreed, the modules are fairly CGI-orientated.. and none of the examples show anything clever going on.. but I am keen to refactor them and think they could be easily modified.. even thought I might need some advice! I also think the web modules and the WSGI might make a good fit and there would be no harm in writing the necessary glue so that they could be used in both environments. I am also going to look into how hard it would be getting them working with jython. I'm just trying to get my head around the best way of doing things.. My understanding is this: the server is constantly running and calls both the application and any encompassing middleware every time a request is made. This means that for each request the middleware and the application are executed for the request. Consequently there is no speed advantage in moving code from the application to the middleware. The only advantage is that it makes certain bits of code more reusable for other applications. I can see how the web.database structure or cursor can be moved to the server and passed as environ['web.database.cursor'] and environ['web.database.structure'] objects. (btw it is legal to put objects in the environ dictionary isn't it or are the values expected to be strings?) but surely there would be no advantage to moving things like the web.cgi object away from the application global namespace because it would have to be reloaded on each request anyway so it might as well exist in the application's global space mightn't it? I guess what I'm asking is: for items that have to be refreshed every request is there a lot to be gained by moving them away from the application's global namespcae? Could you possibly be more specific about which areas of the modules you think wouldn't work well with threading and why they wouldn't? I don't expect you've studied the modules too closely but I'm not sure I understand where the difficulties might lie? >> Key features include: >> >> * web.auth - Identity and identification handling. Users may have >> multiple access levels to multiple applications. Sign in and >> password reminder handling is built in. > > > > This could be middleware, though obviously it requires a lot of user > configuration. If it's middleware you could share a single > authentication system with different WSGI applications. Some > standardization in this case would be good -- starting with things as > simple as environ['auth.username'] holding the string username. But > for now there's no standard, so you should use a custom prefix. Yes, I guess the auth and session modules could be middleware and I am writing an application to handle the sign in and sign out so that wouldn't need to be included in the middleware, just the current auth status of the user and the access levels making the middleware thinner. >> * web.session - Persistence using cookie or URL based session IDs >> allowing any object which can be pickled to be stored using a >> dictionary- like interface. Can be used with file or database drivers.. > > > This would be good as a WSGI middleware. I have such a middleware at > svn://colorstudy.com/trunk/WSGI/session.py , but the actual > persistence and configuration is minimal. But it might be helpful for > thinking about how it might look as middleware. I downloaded and ran your code earlier today and had a look.. certainly helpful.. thank you. >> * web.error - Enhanced error handling based on the principles of >> the cgitb module. Plain text or HTML output to a file or browser. >> Custom extension mechanism for email notifications and more. > > > This could also be a piece of middleware. I feel like it's one of the > more complicated kinds of middleware, but useful. It could also be a > bit of library code that applications can use, but I'd prefer it as > middleware because you could configure it for multiple applications. I quite like this as middleware too, but again it could go in the server.. how do you decide? I also find that all pages have similar regions like title, breadcrumbs, navigation bar, content.. I was planning on having some sort of templating middleware so that applications didn't have to worry so much about the broad page structure allowing easy theming of sites. At the moment I'm think of refactoring as follows: WSGI - Server: web.database web.database.object any global config options - Middleware: web.auth web.session web.error theming engine - Application: Sign in, sign out, change password, password reminder, change access levels etc - Library: web.mail web.image.graph web.template Does this sound like a sensible architecture to go with? Again any thoughts would be appreciated. Cheers then, James -- James Gardner james 'at' pythonweb.org http://www.pythonweb.org From floydophone at gmail.com Tue Oct 5 04:10:33 2004 From: floydophone at gmail.com (Peter Hunt) Date: Tue Oct 5 04:10:36 2004 Subject: [Web-SIG] Updated my WSGI examples Message-ID: <6654eac40410041910182deb55@mail.gmail.com> http://st0rm.hopto.org/wsgi/ - test_applications.py - contains a bunch of fun little test WSGI applications which demonstrate various capabilities. It also contains a unit test which will test all of these applications when given a URL. WE SHOULD EXPAND ON THIS; to ensure WSGI compatibility, we should expand this test case to be as conclusive as possible and require framework authors to pass it. - middleware.py - added generic encoding middleware which defaults to rot-13. - twisted_wsgi.py - Twisted.web Resource which will export a WSGI application. Example server is included in this file which sets up a server which can then be tested by test_applications.py. You can run them in async mode, which executes the WSGI app assuming it does not block, or in sync mode, which simply executes it in a thread. Looking forward to the bugs you will find :) I'm still not quite sure if I'm handling errors the correct way (twisted_wsgi)... From ianb at colorstudy.com Tue Oct 5 06:09:17 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Oct 5 06:09:22 2004 Subject: [Web-SIG] Python Web Modules - Version 0.4.1 In-Reply-To: <4161D798.6030609@pythonweb.org> References: <4161D798.6030609@pythonweb.org> Message-ID: <41621E6D.1040208@colorstudy.com> James Gardner wrote: > Agreed, the modules are fairly CGI-orientated.. and none of the examples > show anything clever going on.. but I am keen to refactor them and think > they could be easily modified.. even thought I might need some advice! I > also think the web modules and the WSGI might make a good fit and there > would be no harm in writing the necessary glue so that they could be > used in both environments. I am also going to look into how hard it > would be getting them working with jython. > > I'm just trying to get my head around the best way of doing things.. My > understanding is this: the server is constantly running and calls both > the application and any encompassing middleware every time a request is > made. This means that for each request the middleware and the > application are executed for the request. Consequently there is no speed > advantage in moving code from the application to the middleware. The > only advantage is that it makes certain bits of code more reusable for > other applications. Correct. > I can see how the web.database structure or cursor can be moved to the > server and passed as environ['web.database.cursor'] and > environ['web.database.structure'] objects. I'm not sure what the benefit would be? I'd expect those modules to stay as libraries for the application to use, just like they are now. > (btw it is legal to put > objects in the environ dictionary isn't it or are the values expected to > be strings?) Yes, it is legal. > but surely there would be no advantage to moving things > like the web.cgi object away from the application global namespace > because it would have to be reloaded on each request anyway so it might > as well exist in the application's global space mightn't it? I guess > what I'm asking is: for items that have to be refreshed every request is > there a lot to be gained by moving them away from the application's > global namespcae? Well, they *have* to be moved away from the global namespace. There is no global request object in WSGI -- the request is represented with the environ dictionary, and it has to be passed around. If it's global, then only one request can be processed at a time. This would make it incompatible with threaded environments. > Could you possibly be more specific about which areas of the modules you > think wouldn't work well with threading and why they wouldn't? I don't > expect you've studied the modules too closely but I'm not sure I > understand where the difficulties might lie? To be threadsafe, you have to move anything request-related out of global variables. You don't *have* to be threadsafe; you could simply not support threaded environments. That still leaves a number of other environments -- CGI, mod_python, and some others -- but I don't think it's a good idea to build in that limitation. The other issue with your modules is that applications shouldn't be scripts. They should be objects of some sort (possibly including functions). The problem with scripts is that they are awkward to work with in Python, as you can't import them. Because if you import them, then the script runs, and if you import it a second time, the script *won't* run. And you *must* support an application being run more than one time in the same process. You could get around this, by creating an application object that reruns the script everytime it is called, but I think this is unnecessarily difficult, and there are other downsides to using scripts in this style. >>> Key features include: >>> >>> * web.auth - Identity and identification handling. Users may have >>> multiple access levels to multiple applications. Sign in and >>> password reminder handling is built in. >> >> >> >> >> This could be middleware, though obviously it requires a lot of user >> configuration. If it's middleware you could share a single >> authentication system with different WSGI applications. Some >> standardization in this case would be good -- starting with things as >> simple as environ['auth.username'] holding the string username. But >> for now there's no standard, so you should use a custom prefix. > > > > Yes, I guess the auth and session modules could be middleware and I am > writing an application to handle the sign in and sign out so that > wouldn't need to be included in the middleware, just the current auth > status of the user and the access levels making the middleware thinner. Yes, I think that's about right. More generally, you might just include whatever object represents the user, and depend on the application to handle its own permission levels. >>> * web.error - Enhanced error handling based on the principles of >>> the cgitb module. Plain text or HTML output to a file or browser. >>> Custom extension mechanism for email notifications and more. >> >> >> >> This could also be a piece of middleware. I feel like it's one of the >> more complicated kinds of middleware, but useful. It could also be a >> bit of library code that applications can use, but I'd prefer it as >> middleware because you could configure it for multiple applications. > > > > I quite like this as middleware too, but again it could go in the > server.. how do you decide? I'd be inclined limit the server to the most basic issues, like supporting HTTP or interfacing with a web server, and with the basic concurrency issues of responding to multiple requests. I'd rather leave other parts out, unless it's really natural to include them. Like, you might include URL resolution in a server based on mod_python, because Apache already has URL resolution. > I also find that all pages have similar > regions like title, breadcrumbs, navigation bar, content.. I was > planning on having some sort of templating middleware so that > applications didn't have to worry so much about the broad page structure > allowing easy theming of sites. A filtering middleware could make sense here. Otherwise, it might just make sense to think of this as configuration -- you indicate what the standard template is, and expect the application to select and fill the template appropriately. > At the moment I'm think of refactoring as follows: > WSGI - Server: web.database > web.database.object What would you gain from putting this in the server, instead of a library? > any global config options > - Middleware: web.auth > web.session > web.error > theming engine What's your thinking here? Would the theming engine work for other kinds of WSGI applications, e.g., a Webware application? If not, then I don't think there's any need to put this in the server/middleware. > - Application: Sign in, sign out, change password, > password reminder, change access levels etc Yes, definitely application, though there's a configuration aspect -- you'd probably configure the authentication middleware to know where some of these things were located. > - Library: web.mail > web.image.graph > web.template From pje at telecommunity.com Tue Oct 5 06:15:07 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Oct 5 06:14:58 2004 Subject: [Web-SIG] Updated my WSGI examples In-Reply-To: <6654eac40410041910182deb55@mail.gmail.com> Message-ID: <5.1.1.6.0.20041005001007.02bf5800@mail.telecommunity.com> At 10:10 PM 10/4/04 -0400, Peter Hunt wrote: >Looking forward to the bugs you will find :) Good, then I won't feel so bad about telling you that the 'wsgi.' prefix is reserved for WSGI-defined features, so "wsgi.field_storage" and friends are right out. ;) Technically, I think this was only implied in the spec, not explicitly stated, so I'll have to fix that. Anyway, the idea of the prefix is to avoid name collisions between different developers, so you need to pick your *own* prefix that isn't the same as anybody else's. From ianb at colorstudy.com Tue Oct 5 06:38:53 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Oct 5 06:38:58 2004 Subject: [Web-SIG] Updated my WSGI examples In-Reply-To: <5.1.1.6.0.20041005001007.02bf5800@mail.telecommunity.com> References: <5.1.1.6.0.20041005001007.02bf5800@mail.telecommunity.com> Message-ID: <4162255D.9010607@colorstudy.com> Phillip J. Eby wrote: > Good, then I won't feel so bad about telling you that the 'wsgi.' prefix > is reserved for WSGI-defined features, so "wsgi.field_storage" and > friends are right out. ;) > > Technically, I think this was only implied in the spec, not explicitly > stated, so I'll have to fix that. Anyway, the idea of the prefix is to > avoid name collisions between different developers, so you need to pick > your *own* prefix that isn't the same as anybody else's. That raises a question of convention that I was thinking about. I ended up giving each of my modules its own namespace. Which probably isn't the right way to go. But then, I also wasn't trying to think of them as a unified package. Also, some of the extensions are meant to be opaque to the rest of the application; for instance, a cookie parser stores data in the environment to cache the parse, but that data shouldn't be manipulated by other applications. Maybe I should have used a leading underscore. Also, there's already things I'm starting to think of in terms of extensions, where we'd agree on the meaning of a second namespace. For instance, I'd like a flag to indicate to applications that they should let their unexpected exceptions be raised. This would be nice for something like a debugging server that can be run in a console and falls into pdb when there's an error. Once this flag was set, middleware further up shouldn't catch unexpected errors; and if this flag isn't set, then applications should avoid letting errors escape. Sessions and configuration might be other places where standardization is called for, just to think of some things I've encountered so far. But then, this should probably be part of a second standard, which follows from WSGI. Maybe WAI, Web Application Interface, to make up an acronym. Or maybe "webapp" would be better. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From foom at fuhm.net Tue Oct 5 06:52:54 2004 From: foom at fuhm.net (James Y Knight) Date: Tue Oct 5 06:53:02 2004 Subject: [Web-SIG] A more Twisted approach to async apps in WSGI In-Reply-To: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> Message-ID: <6B8CDF7C-168A-11D9-B112-000A95A50FB2@fuhm.net> A bit late with the response...but better late than never I hope. ;) On Sep 22, 2004, at 9:56 PM, Phillip J. Eby wrote: > On the positive side of the iterator approach, it could make it easier > for asynchronous applications to pause waiting for input, and it could > in principle support "chunked" transfer encoding of the input stream. > > Anyway, the long and short of it is that CGI and chunked encoding are > quite simply incompatible, which means that relying on its > availability would be nonportable in a WSGI application anyway. I do not find that a good reason to copy the mistake (not supporting chunking) to a new API. However! I don't think that the file-like-object API even has a problem with chunked incoming data. As long as WSGI does not make CONTENT_LENGTH a required header, and as long as the result of read looks different for "more data still to come" and "data finished" (it does, blocking for more data to occur vs. returning ''), I think it should be fine (for non-async apps). Am I missing something here? > [...] That means that if we switch from an input stream to an > iterator, a lot of people are going to be trying to make sensible > wrappers to convert the iterator back to an input stream, and that's > just getting ridiculous, [...] Iterable input stream does seems like it may be a loser for the common case. > So, I'm thinking we should shift the burden to an async-specific API. > But, in this case, "burden" means that we get to give asynchronous > apps an API much more suited to their use cases. > [...] > The idea is that this would create an iterator that the server/gateway > could recognize as "special", similar to the file-wrapper trick. But, > the object returned would provide an extra API for use by the > asynchronous application, maybe something like: > > put(data) -- queue data for retrieval when the controller is > iterated over > > finish() -- mark the iterator finished, so it raises StopIteration > > on_get(length,callback) -- call 'callback(data)' when 'length' > bytes are available on 'wsgi.input' (but return immediately from the > 'on_get()' call) > > While this API is an optional extension, it seems it would be closer > to what some async fans wanted, and less of a kludge. It won't do > away with the possibility that middleware might block waiting for > input, of course, but when no middleware is present or the middleware > isn't transforming the input stream, it should work out quite well. That sounds okay. I'd specify that the on_get "length" bit is a hint, and may or may not be honored. put/finish is the right API for output (although I'd call it write/finish myself), and on_get seems like the a fairly usable API for input. It doesn't let you pause the incoming data, so if you're passing it on to a slow downstream you'll potentially need to buffer a lot, but maybe that's too much to ask for. I assume callback('') is used to indicate end of incoming data: that should be specified. However, interaction with middleware seems quite tricky here: - For input modifying middleware: I guess on_get would have to just raise an exception if wsgi.input has been replaced. If the input stream was iterable, an on_get callback could just be considered notice that you can iterate the input stream once without blocking, assuming the block boundary requirements were also in effect here. Then it would work right even if the input stream was replaced. However, I think it might be the case that middleware that wants to modify the input stream is so rare, it doesn't really matter. - Output. The block boundary section implies that middleware that follows the guidelines, and doesn't do any blocking operations of its own should work without worrying about the server and application being async or sync. If this is to work, the server cannot expect to actually receive an asyncwrapper iterable as the return value, even if the app is using it, because the middleware might be consuming that iterable and returning one of its own. This means the .put/.next methods should communicate out-of-band, effectively calling pause/resume functions in the server so it knows when it's safe to iterate the vanilla iterator the middleware returned without the middleware blocking when calling the asyncwrapper-iterator. > But if this is the overall right approach, I'd like to drop the > current proposals to make 'wsgi.input' an iterator and add optional > 'pause'/'resume' APIs, since they were rather kludgy compared to > giving async apps their own mini-API for nonblocking I/O. Perhaps Peter Hunt could try to implement it in his twisted wsgi gateway and see if it works out. :) James From pje at telecommunity.com Tue Oct 5 08:37:18 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Oct 5 08:37:10 2004 Subject: [Twisted-web] Re: [Web-SIG] A more Twisted approach to async apps in WSGI In-Reply-To: <6B8CDF7C-168A-11D9-B112-000A95A50FB2@fuhm.net> References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> At 12:52 AM 10/5/04 -0400, James Y Knight wrote: >A bit late with the response...but better late than never I hope. ;) > >On Sep 22, 2004, at 9:56 PM, Phillip J. Eby wrote: >>On the positive side of the iterator approach, it could make it easier >>for asynchronous applications to pause waiting for input, and it could in >>principle support "chunked" transfer encoding of the input stream. >> >>Anyway, the long and short of it is that CGI and chunked encoding are >>quite simply incompatible, which means that relying on its availability >>would be nonportable in a WSGI application anyway. > >I do not find that a good reason to copy the mistake (not supporting >chunking) to a new API. Perhaps not, but there are also lots of other reasons not to support chunked input, mainly that a Google search for "chunked encoding CGI" turns up reams of vulnerabilities that suggest existing HTTP implementations may leave a bit to be desired with respect to accepting a POST of chunked input. :) >However! I don't think that the file-like-object API even has a problem >with chunked incoming data. As long as WSGI does not make CONTENT_LENGTH a >required header, and as long as the result of read looks different for >"more data still to come" and "data finished" (it does, blocking for more >data to occur vs. returning ''), I think it should be fine (for non-async >apps). Am I missing something here? I don't think so. Although you probably want something more like a pipe error if the input times out or the connection is broken. >>So, I'm thinking we should shift the burden to an async-specific API. >>But, in this case, "burden" means that we get to give asynchronous apps >>an API much more suited to their use cases. >>[...] >>The idea is that this would create an iterator that the server/gateway >>could recognize as "special", similar to the file-wrapper trick. But, >>the object returned would provide an extra API for use by the >>asynchronous application, maybe something like: >> >> put(data) -- queue data for retrieval when the controller is >> iterated over >> >> finish() -- mark the iterator finished, so it raises StopIteration >> >> on_get(length,callback) -- call 'callback(data)' when 'length' bytes >> are available on 'wsgi.input' (but return immediately from the 'on_get()' call) >> >>While this API is an optional extension, it seems it would be closer to >>what some async fans wanted, and less of a kludge. It won't do away with >>the possibility that middleware might block waiting for input, of course, >>but when no middleware is present or the middleware isn't transforming >>the input stream, it should work out quite well. > >That sounds okay. I'd specify that the on_get "length" bit is a hint, and >may or may not be honored. put/finish is the right API for output >(although I'd call it write/finish myself), The reason for not using 'write' is to avoid confusion with the existing "write" callable, both in terms of knowing which one we're talking about, and in terms of not confusing the semantics, which may differ subtly between the two. > and on_get seems like the a fairly usable API for input. It doesn't let > you pause the incoming data, Actually it does; it's supposed to be a one-shot. You have to call it again if you want to get called back again. > so if you're passing it on to a slow downstream you'll potentially need > to buffer a lot, but maybe that's too much to ask for. I assume > callback('') is used to indicate end of incoming data: that should be > specified. I missed that entirely, but it sounds like a good idea. >However, interaction with middleware seems quite tricky here: >- For input modifying middleware: I guess on_get would have to just raise >an exception if wsgi.input has been replaced. Yep. Although it might be that the wrapper would just refuse to instantiate in the first place in that circumstance. > If the input stream was iterable, an on_get callback could just be > considered notice that you can iterate the input stream once without > blocking, assuming the block boundary requirements were also in effect here. Yes, but this'd only work if the input were an iterator. input.read() returning an empty string would mean EOF, so the boundary stuff doesn't work in that case. >- Output. The block boundary section implies that middleware that follows >the guidelines, and doesn't do any blocking operations of its own should >work without worrying about the server and application being async or >sync. If this is to work, the server cannot expect to actually receive an >asyncwrapper iterable as the return value, even if the app is using it, >because the middleware might be consuming that iterable and returning one >of its own. Correct. > This means the .put/.next methods should communicate out-of-band, > effectively calling pause/resume functions in the server so it knows when > it's safe to iterate the vanilla iterator the middleware returned without > the middleware blocking when calling the asyncwrapper-iterator. It could do that, certainly. But, the truth is it's *always* safe to iterate. Note that the application can just use the on_get callback to set a flag that it's ready to continue, and just keep yielding empty strings till then. More to the point, the iterator-wrapper can simply yield empty strings when its internal queue is empty, and a sensible async server should back off its iterator.next() retry attempts when an application yields empty strings. This is pretty much always safe and sensible. However, the out-of-band communication you describe can also take place, since it provides better communication in the case where the extension is available. From tsarna at sarna.org Tue Oct 5 17:24:15 2004 From: tsarna at sarna.org (Ty Sarna) Date: Tue Oct 5 17:20:15 2004 Subject: [Web-SIG] Python Web Modules - Version 0.4.1 In-Reply-To: Message from ianb at colorstudy.com (Ian Bicking) of "Mon, 04 Oct 2004 22:56:48." <4161B8D2.1020902@colorstudy.com> Message-ID: <20041005152415.98A3EBB980@kopernik.sarna.org> > > * web.auth - Identity and identification handling. Users may have > > multiple access levels to multiple applications. Sign in and > > password reminder handling is built in. > > This could be middleware, though obviously it requires a lot of user > configuration. If it's middleware you could share a single > authentication system with different WSGI applications. Some > standardization in this case would be good -- starting with things as > simple as environ['auth.username'] holding the string username. But for > now there's no standard, so you should use a custom prefix. I think this should be environ['REMOTE_USER'], per the CGI spec, so that same app could take auth either from the server (apache mod_auth_whatever or equivalent in other servers) or from middleware. From pje at telecommunity.com Tue Oct 5 17:29:57 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Oct 5 17:29:46 2004 Subject: [Web-SIG] Python Web Modules - Version 0.4.1 In-Reply-To: <20041005152415.98A3EBB980@kopernik.sarna.org> References: Message-ID: <5.1.1.6.0.20041005112930.02c06ec0@mail.telecommunity.com> At 11:24 AM 10/5/04 -0400, Ty Sarna wrote: > > > * web.auth - Identity and identification handling. Users may have > > > multiple access levels to multiple applications. Sign in and > > > password reminder handling is built in. > > > > This could be middleware, though obviously it requires a lot of user > > configuration. If it's middleware you could share a single > > authentication system with different WSGI applications. Some > > standardization in this case would be good -- starting with things as > > simple as environ['auth.username'] holding the string username. But for > > now there's no standard, so you should use a custom prefix. > >I think this should be environ['REMOTE_USER'], per the CGI spec, so that >same app could take auth either from the server (apache >mod_auth_whatever or equivalent in other servers) or from middleware. +1. From ianb at colorstudy.com Tue Oct 5 20:12:37 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Oct 5 20:14:09 2004 Subject: [Web-SIG] WSGI Webware progress In-Reply-To: <5.1.1.6.0.20041003120107.03bf15c0@mail.telecommunity.com> References: <5.1.1.6.0.20041003120107.03bf15c0@mail.telecommunity.com> Message-ID: <4162E415.5040904@colorstudy.com> Phillip J. Eby wrote: > At 10:42 AM 10/3/04 -0400, Peter Hunt wrote: > >> Looking good! I see we've written a lot of similar code; perhaps we >> could merge our two separate efforts into "wsgilib"? > > > Heh. I've also started work on a "wsgilib", mainly to provide common > base classes and utility functions for servers and gateways. Maybe we > need to co-ordinate in some fashion. :) Should we put some of this code in a common repository? I guess there's actually some benefit to working separately, since this is a standard not an implementation. But then we at least need to agree on module names and it would be convenient to agree on some of these simple, common functions. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Tue Oct 5 20:19:37 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Oct 5 20:20:59 2004 Subject: [Web-SIG] A more Twisted approach to async apps in WSGI In-Reply-To: <6B8CDF7C-168A-11D9-B112-000A95A50FB2@fuhm.net> References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <6B8CDF7C-168A-11D9-B112-000A95A50FB2@fuhm.net> Message-ID: <4162E5B9.7080502@colorstudy.com> James Y Knight wrote: > However, interaction with middleware seems quite tricky here: > - For input modifying middleware: I guess on_get would have to just > raise an exception if wsgi.input has been replaced. If the input stream > was iterable, an on_get callback could just be considered notice that > you can iterate the input stream once without blocking, assuming the > block boundary requirements were also in effect here. Then it would work > right even if the input stream was replaced. However, I think it might > be the case that middleware that wants to modify the input stream is so > rare, it doesn't really matter. I think middleware would have to modify the input stream if it wanted to parse POST variables. In that case, you might parse the input stream, while also constructing a replacement input stream for when the application tries to re-read the stream. In effect the middleware wants to peek at the input stream. I can't think of any other useful reasons to modify the input stream, but this one seems fairly reasonable. For instance, a piece of middleware might try to detect a login attempt by looking for particular field names in the request. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From floydophone at gmail.com Tue Oct 5 20:29:02 2004 From: floydophone at gmail.com (Peter Hunt) Date: Tue Oct 5 20:29:06 2004 Subject: [Web-SIG] WSGI Webware progress In-Reply-To: <4162E415.5040904@colorstudy.com> References: <5.1.1.6.0.20041003120107.03bf15c0@mail.telecommunity.com> <4162E415.5040904@colorstudy.com> Message-ID: <6654eac4041005112933ceb412@mail.gmail.com> I was actually thinking of putting all of the wsgilib candidate code in a SVN repository. That way you can fix all of the bugs that I write in my code without waiting for me :) On Tue, 05 Oct 2004 13:12:37 -0500, Ian Bicking wrote: > > > Phillip J. Eby wrote: > > At 10:42 AM 10/3/04 -0400, Peter Hunt wrote: > > > >> Looking good! I see we've written a lot of similar code; perhaps we > >> could merge our two separate efforts into "wsgilib"? > > > > > > Heh. I've also started work on a "wsgilib", mainly to provide common > > base classes and utility functions for servers and gateways. Maybe we > > need to co-ordinate in some fashion. :) > > Should we put some of this code in a common repository? I guess there's > actually some benefit to working separately, since this is a standard > not an implementation. But then we at least need to agree on module > names and it would be convenient to agree on some of these simple, > common functions. > > -- > Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org > From ianb at colorstudy.com Tue Oct 5 20:40:27 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Oct 5 20:41:32 2004 Subject: [Web-SIG] WSGI Webware progress In-Reply-To: <6654eac4041005112933ceb412@mail.gmail.com> References: <5.1.1.6.0.20041003120107.03bf15c0@mail.telecommunity.com> <4162E415.5040904@colorstudy.com> <6654eac4041005112933ceb412@mail.gmail.com> Message-ID: <4162EA9B.70001@colorstudy.com> Peter Hunt wrote: > I was actually thinking of putting all of the wsgilib candidate code > in a SVN repository. That way you can fix all of the bugs that I write > in my code without waiting for me :) Sure... or something like that ;) I can offer up repository space on colorstudy.com or on webwareforpython.org. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Oct 6 01:26:46 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Oct 6 01:26:35 2004 Subject: [Web-SIG] An implementation error I just found in PEP 333 Message-ID: <5.1.1.6.0.20041005191932.03540e80@mail.telecommunity.com> Just a quick heads-up... there's an error in the PEP's CGI implementation, so if you are basing a server/gateway implementation on it, you may be copying this error into your own code. Specifically, 'start_response' contains this code: elif headers_sent: raise AssertionError("Headers already sent!") It *should* read: elif headers_set: raise AssertionError("Headers already set!") This is apparently a typo; it leads to noncompliant behavior (allowing set_response() to be called multiple times without error even if exc_info isn't supplied). I discovered it while working on the WSGI reference library (wsgiref). FYI, the ViewCVS for wsgiref is: http://cvs.eby-sarna.com/wsgiref/ And you can also get it via anonymous CVS; see http://peak.telecommunity.com/Meta/AnonymousCVSAccess.html for instructions, replacing 'co PEAK' with 'co wsgiref'. At the moment, wsgiref just contains a header manipulation class, a FileWrapper class, and a bunch of environment manipulation functions, all with extensive automated tests. I'm in the middle of working on a base class that can be used to implement pretty much any kind of WSGI server or gateway, and I noticed that I had managed to copy the above error into my new base class. So I thought I should mention it to everybody so they can verify that they didn't make the same mistake. Sorry about the mixup, I'll get it fixed in the next PEP revision. In the meantime, you can now feel good about the fact that even *my* PEP 333 implementation had a compliance bug... ;) From pje at telecommunity.com Wed Oct 6 08:39:42 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Oct 6 08:39:31 2004 Subject: [Web-SIG] Draft of server/gateway base class now available Message-ID: <5.1.1.6.0.20041006021659.02270150@mail.telecommunity.com> I've just checked in a set of server/gateway base classes into the wsgiref library. The main class, BaseHandler, implements the structural flow of a WSGI application invocation, with stub methods for creating the various streams, variables, and so on, including some optional extensions like 'wsgi.file_wrapper'. Server/gateway implementations can subclass BaseHandler to fill in these stubs with appropriate implementations for their particular architecture. Two other classes, BaseCGIHandler and CGIHandler, are usable as-is (more or less) for CGI and CGI-like environments. BaseCGIHandler instances can be passed the streams and environ mapping to use, while CGIHandler takes them direct from the 'sys' and 'os' modules, while using different defaults for e.g. wsgi.multiprocess and wsgi.run_once. The main things missing at the moment from BaseHandler are: * sensible default error handling * automatic addition of missing headers (e.g. Content-Length) * Any HTTP/1.1 support whatsoever :) * a more comprehensive test suite (there is a simple test suite now, but it doesn't cover all code paths) The wsgiref package comes with a small set of automated tests; they can be run automatically via 'python setup.py -q test'. It also includes utility routines like 'setup_testing_defaults()' to populate a basic 'environ' for testing purposes, HTTP header manipulation support, and various other useful things for server and application implementors. I've tried to write the package to work with Python 2.1 (e.g. Jython), though I may have missed a few idioms; if you're working with an older version of Python and experience any difficulties, please let me know. Most everything in the package has moderately verbose docstrings, so using pydoc or 'help()' in the interpreter should help you get going. For a quick start, you can run a WSGI application under CGI with: from wsgiref.handlers import CGIHandler CGIHandler().run(application) FYI, the ViewCVS for wsgiref is: http://cvs.eby-sarna.com/wsgiref/ And you can also get it via anonymous CVS; see http://peak.telecommunity.com/Meta/AnonymousCVSAccess.html for instructions, replacing 'co PEAK' with 'co wsgiref'. From pje at telecommunity.com Wed Oct 6 08:42:21 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Oct 6 08:42:11 2004 Subject: [Web-SIG] *Another* implementation error In-Reply-To: <5.1.1.6.0.20041005191932.03540e80@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041006024000.0232e140@mail.telecommunity.com> At 07:26 PM 10/5/04 -0400, Phillip J. Eby wrote: >Just a quick heads-up... there's an error in the PEP's CGI >implementation, so if you are basing a server/gateway implementation on >it, you may be copying this error into your own code. This time, the culprit is: environ['wsgi.last_call'] = True Which I apparently never updated when the name of the variable became 'wsgi.run_once'. Please check to make sure you didn't copy this error into your implementations. Sorry for the inconvenience. I've just checked in an update of the PEP to fix this and the other coding errors I found today. From py-web-sig at xhaus.com Wed Oct 6 16:13:14 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Oct 6 16:13:49 2004 Subject: [Web-SIG] Multipart/multiple stream file uploads. Message-ID: <4163FD7A.5040200@xhaus.com> [Ian Bicking] > I think middleware would have to modify the input stream if it wanted > to parse POST variables. In that case, you might parse the input > stream, while also constructing a replacement input stream for when > the application tries to re-read the stream. In effect the middleware > wants to peek at the input stream. Reading this put me in mind of a potential use case that any WSGI input API will have to cover: that of multiple streamed file uploads. So for example that the user is uploading a set of form variables, *and* multiple files, each in a MIME multipart sub-message. Say further that on the server-side we want to stream each of the files into disk without buffering them to memory, as well as access the form variables from the first MIME multipart. In this case, the file stream for the second file stream (i.e. the third multipart) cannot be made available to the WSGI application until the first file has been processed/saved to disk. How could an asynchronous API support such multiple file uploads? As well as process/present form data from the first part? Would it have to register callbacks for "a new multipart has arrived" events? I don't have a proposed solution, I just thought it was worth raising the use case, for discussion purposes. Regards, Alan. From py-web-sig at xhaus.com Wed Oct 6 16:22:59 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Oct 6 16:23:26 2004 Subject: [Web-SIG] Modjy and external packages. Message-ID: <4163FFC3.4050800@xhaus.com> Dear All, I've just had an email from a modjy user who was delighted to get lucene (the excellent java text indexing engine[1]) up and running under WSGI/modjy. Cool B-) But there is one little trick that one needs to know to make such things work. This is a trick that most jythonistas know, but if one doesn't know it, finding why the relevant imports don't work can be infuriating. When referencing external jars in modjy applications, it is not sufficient to place the jar on the classpath, or to place it in the WEB-INF/lib directory. You *also* have to inform jython about the existence of the package. This is very simple to do, by adding a simple declaration to your modules, like so ####### import sys sys.add_package('org.apache.lucene') ####### And that's it. I will be releasing a micro revision to modjy at the weekend which supports doing this through a configuration parameter, rather than the mildly ugly approach outlined above. Happy modjy'ing! Regards, Alan. [1] " Jakarta Lucene is a high-performance, full-featured text search engine library written entirely in Java." http://jakarta.apache.org/lucene From paul.boddie at ementor.no Wed Oct 6 16:46:32 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Wed Oct 6 16:46:36 2004 Subject: [Web-SIG] Modjy and external packages. Message-ID: <0F4BD34E02639E428B4654DCBAB4502D109266@100NOOSLMSG004.common.alpharoot.net> Alan Kennedy wrote: > > When referencing external jars in modjy applications, it is not > sufficient to place the jar on the classpath, or to place it in the > WEB-INF/lib directory. Really? I know that Java Servlet deployment issues are infuriating enough as it is, but all I've ever needed to do with JythonServlet (which forms the basis of WebStack's Java/Jython support) is to make sure that relevant libraries reside in the WEB-INF/lib directory. At least, the basic servlet libraries have to reside there, despite them also residing in lots of other places within Apache Tomcat (which is what I'm testing on). Perhaps recent Tomcat developments (I'm using 4.1.27) have messed around with the security model, but I've never seen any need for what you suggest... > import sys > sys.add_package('org.apache.lucene') modjy looks interesting, though. Paul From py-web-sig at xhaus.com Wed Oct 6 16:56:30 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Oct 6 16:57:16 2004 Subject: [Web-SIG] Modjy and external packages. In-Reply-To: <0F4BD34E02639E428B4654DCBAB4502D109266@100NOOSLMSG004.common.alpharoot.net> References: <0F4BD34E02639E428B4654DCBAB4502D109266@100NOOSLMSG004.common.alpharoot.net> Message-ID: <4164079E.3070904@xhaus.com> [Alan Kennedy] >>When referencing external jars in modjy applications, it is not >>sufficient to place the jar on the classpath, or to place it in the >>WEB-INF/lib directory. [Paul Boddie] > Really? I know that Java Servlet deployment issues are infuriating > enough as > it is, but all I've ever needed to do with JythonServlet (which forms > the > basis of WebStack's Java/Jython support) is to make sure that relevant > libraries reside in the WEB-INF/lib directory. That only works because the current org.python.util.PyServlet class already adds the relevant packages for you behind the scenes. Take a look at the source for the PyServlet.java file http://cvs.sourceforge.net/viewcvs.py/jython/jython/org/python/util/PyServlet.java?rev=1.16&view=markup The following are the relevant lines /* ------------------------------------- */ public class PyServlet extends HttpServlet { /* ....... */ public void init() { /* ....... */ PySystemState sys = Py.getSystemState(); sys.add_package("javax.servlet"); sys.add_package("javax.servlet.http"); sys.add_package("javax.servlet.jsp"); sys.add_package("javax.servlet.jsp.tagext"); sys.add_classdir(rootPath + "WEB-INF" + File.separator + "classes"); sys.add_extdir(rootPath + "WEB-INF" + File.separator + "lib", true); /* ....... */ } } /* ------------------------------------- */ Regards, Alan. From paul.boddie at ementor.no Wed Oct 6 17:06:14 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Wed Oct 6 17:06:17 2004 Subject: [Web-SIG] Modjy and external packages. Message-ID: <0F4BD34E02639E428B4654DCBAB4502D10926A@100NOOSLMSG004.common.alpharoot.net> Alan Kennedy wrote: > [Servlet libraries in WEB-INF/lib] > That only works because the current org.python.util.PyServlet class > already adds the relevant packages for you behind the scenes. Take a > look at the source for the PyServlet.java file [...] I stand corrected! Having made various changes to PyServlet, one would have thought I might have remembered this. You've quite possibly saved me some time in the near future, Alan! Paul From foom at fuhm.net Thu Oct 7 06:59:47 2004 From: foom at fuhm.net (James Y Knight) Date: Thu Oct 7 07:04:23 2004 Subject: [Twisted-web] Re: [Web-SIG] A more Twisted approach to async apps in WSGI In-Reply-To: <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> Message-ID: On Oct 5, 2004, at 2:37 AM, Phillip J. Eby wrote: > Although you probably want something more like a pipe error if the > input times out or the connection is broken. You normally only get pipe errors on writes, read just sees EOF. But that does bring up a good point: How does the server notify the application that the client has gone away, and any further work is useless? - For non-async apps that use the iterator model: I think the server is allowed to just call iterable.close() and never iterate again. - For async applications, with the proposed API, that may not be an option, because the iterable returned is the special wrapper, not a user-created class. Although, actually, I guess the app can return its own iterable whose __iter__ calls through and returns the wrapper's __iter__. - What about for non-async applications that use the write callable? Should write be allowed to raise an exception? Or should it just become a no-op when the client is disconnected? >> and on_get seems like the a fairly usable API for input. It doesn't >> let you pause the incoming data, > > Actually it does; it's supposed to be a one-shot. You have to call it > again if you want to get called back again. Ah, didn't see that it was one-shot. Yeah, in that case, the server can stop reading if there is no registered data callback and some predetermined buffer size is filled. Nice. >> If the input stream was iterable, an on_get callback could just be >> considered notice that you can iterate the input stream once without >> blocking, assuming the block boundary requirements were also in >> effect here. > > Yes, but this'd only work if the input were an iterator. input.read() > returning an empty string would mean EOF, so the boundary stuff > doesn't work in that case. Right -- just pointing out one plus to the iterator model. :) >> This means the .put/.next methods should communicate out-of-band, >> effectively calling pause/resume functions in the server so it knows >> when it's safe to iterate the vanilla iterator the middleware >> returned without the middleware blocking when calling the >> asyncwrapper-iterator. > > It could do that, certainly. But, the truth is it's *always* safe to > iterate. Note that the application can just use the on_get callback > to set a flag that it's ready to continue, and just keep yielding > empty strings till then. > > More to the point, the iterator-wrapper can simply yield empty strings > when its internal queue is empty, and a sensible async server should > back off its iterator.next() retry attempts when an application yields > empty strings. This is pretty much always safe and sensible. > > However, the out-of-band communication you describe can also take > place, since it provides better communication in the case where the > extension is available. Hmm, yes. I totally missed the option of just yielding ''. Of course it's a very bad idea to repeatedly yield '' to a server if you don't know the server can properly handle it (by e.g. delaying longer and longer), but, in this case, since the server itself is providing the special iterable, that should be fine. It seems like it should be possible to make a generic class that implements this async API for use with sync servers that do not support it natively. That would allow async apps to run on a sync server without modification, which is potentially useful. To do that, though, I think the it'd have to spawn an extra thread per request that is waiting to read data, for the read() call to block on. Unless, of course, the app never needs to yield outgoing data while waiting for incoming data. The one remaining issue I have is the required thread-safeness of various APIs. The spec doesn't mention much of anything about threadsafeness: is it ok to call wsgi methods from a different thread than the one the server originally called the request on? Especially interesting for implementing the above sync->async adapter: environ['wsgi.input'].read(x) would be called from a second thread. What thread (if there's a choice) does the on_get callback get called on. Etc. I haven't really thought about these thready questions much either, so maybe the answers are obvious, but in my experience, that's usually not the case when it comes to threads. That's why async apps are nice. ;) James From pje at telecommunity.com Thu Oct 7 07:28:42 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Oct 7 07:28:28 2004 Subject: [Twisted-web] Re: [Web-SIG] A more Twisted approach to async apps in WSGI In-Reply-To: References: <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041007010942.02d33c90@mail.telecommunity.com> At 12:59 AM 10/7/04 -0400, James Y Knight wrote: >On Oct 5, 2004, at 2:37 AM, Phillip J. Eby wrote: >>Although you probably want something more like a pipe error if the input >>times out or the connection is broken. > >You normally only get pipe errors on writes, read just sees EOF. > >But that does bring up a good point: How does the server notify the >application that the client has gone away, and any further work is useless? >- For non-async apps that use the iterator model: I think the server is >allowed to just call iterable.close() and never iterate again. Yes. >- For async applications, with the proposed API, that may not be an >option, because the iterable returned is the special wrapper, not a >user-created class. Although, actually, I guess the app can return its own >iterable whose __iter__ calls through and returns the wrapper's __iter__. Not if the server wants to be able to handle that iterable specially. But anyway, it seems that the wrapper's constructor should take a close method, or have a way to set one. >- What about for non-async applications that use the write callable? >Should write be allowed to raise an exception? Or should it just become a >no-op when the client is disconnected? It's allowed to raise an exception, though this was never explicitly put in the spec; I'll have to fix that. The actual process for that scenario looks something like this: * app calls write() * write() raises error * app catches error (maybe) and calls start_response() with exc_info * start_response() reraises the error, because it has already sent headers to the client and can't restart the response * application error handler bombs out and returns to server/gateway * server/gateway logs the exception (maybe) and gets on with life in the big 'net >Hmm, yes. I totally missed the option of just yielding ''. Of course it's >a very bad idea to repeatedly yield '' to a server if you don't know the >server can properly handle it (by e.g. delaying longer and longer), but, >in this case, since the server itself is providing the special iterable, >that should be fine. Yes. Also, when we finally settle on an async API, I do want to cover the issue of backing off iteration when empty strings are yielded. I'm actually inclined to suggest that an async application should take responsibility for doing the delaying if it's called repeatedly, and the async API isn't available. >It seems like it should be possible to make a generic class that >implements this async API for use with sync servers that do not support it >natively. That would allow async apps to run on a sync server without >modification, which is potentially useful. To do that, though, I think the >it'd have to spawn an extra thread per request that is waiting to read >data, for the read() call to block on. Unless, of course, the app never >needs to yield outgoing data while waiting for incoming data. Well, with Twisted you could deferToThread the read() operations, though it's hard for me to think straight about that scenario because I keep finding it hard to imagine an async web app that isn't just written to the Twisted API to start with... ;) >The one remaining issue I have is the required thread-safeness of various >APIs. > >The spec doesn't mention much of anything about threadsafeness: is it ok >to call wsgi methods from a different thread than the one the server >originally called the request on? Especially interesting for implementing >the above sync->async adapter: environ['wsgi.input'].read(x) would be >called from a second thread. Excellent question; I should add the answer to the spec, as soon as I decide precisely what it is. :) One point: the spec should absolutely forbid servers from using thread identity to identify the application/caller. The "what can you call while what else is executing" part of the question is a bit trickier. >What thread (if there's a choice) does the on_get callback get called on. Etc. My inclination is to make threading issues symmetrical. That is, the application doesn't get any thread-identity guarantees either. > I haven't really thought about these thready questions much either, so > maybe the answers are obvious, but in my experience, that's usually not > the case when it comes to threads. Yep. :) However, the more I think about it, the more it seems to me that WSGI should emulate single-threadedness with respect to any function/method/iterator invocations associated with a given application invocation. However, it is *not* guaranteed that all such invocations will occur from the same thread. Basically, it means "no multitasking with the other guy's objects", and puts the locking burdens on whoever's trying to mix multitasking into the works. >That's why async apps are nice. ;) Not to mention fork(). :) By the way, after all this discussion... do you think it would be better to: 1) Push towards a full async API, nailing down all these loose ends 2) Use the simple-but-klugdy "pause iteration" API idea 3) Don't make an "official" async API, and just leave it open to server authors to create their own extensions, and maybe cherry pick the best ideas for WSGI 2.0, or 4) Do something else altogether? From carribeiro at gmail.com Thu Oct 7 16:55:13 2004 From: carribeiro at gmail.com (Carlos Ribeiro) Date: Thu Oct 7 16:55:33 2004 Subject: [Web-SIG] Philosophical question: publishing classes vs instances Message-ID: <864d3709041007075558ecfac2@mail.gmail.com> Hello all, I've been following the Web SIG, although I only signed the list today. I'm working out some concepts related to object-oriented web application design. I'm sure I'm not the first to do it :-) and I would like not to reinvent the wheel -- at least, not the _same_ wheel. The "natural way" to implement Python web apps seems to be through some type of object publisher -- a system that finds the correct object, that is 'published' in some part of the site, and activates this object upon request. I've checked a few systems, and although I can't claim extensive experience with them, most seem to operate based on publishing object *instances*. I'm not working the high level design for an application of mine, and I thought that the correct way to do it out be to publish object *classes*, and let the web framework instantiate the class and them activate upon request. Most of the time, I can't preserve information in the server side anyway. And even if I use some of the advanced techniques (for example, the persistent Javascript trick that apps such as GMail use), an object instance seems to be a better fit, although it would need a more complex management model. I would like to know what do you think of it, and if is there any good resources that I can study to understand all the issues. Maybe I'm missing something; I don't believe that performance alone justifies such preference, and it's something that I would like to understand. Best regards, -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com From pje at telecommunity.com Fri Oct 8 01:21:32 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 8 01:21:18 2004 Subject: [Web-SIG] PEAK now provides various WSGI gateway and server options Message-ID: <5.1.1.6.0.20041007190441.02386300@mail.telecommunity.com> The CVS version of PEAK now offers three options for running WSGI applications: CGI, FastCGI, and SimpleHTTPServer. For example, this command: peak launch WSGI import:my_app.application will do this: 1. Import 'application' from 'my_app', treating it as a WSGI application 2. Start a SimpleHTTPServer listening to an arbitrary port on 'localhost' 3. launch a browser window pointing to that local server So, it's a pretty easy way to test and play with WSGI applications without needing to configure a web server or mess with CGI. PEAK also includes a CGI/FastCGI gateway that auto-detects whether it's running under CGI or FastCGI; the equivalent command is: peak CGI WSGI import:my_app.application But you would normally turn this into a shell script, e.g.: #!/bin/sh peak CGI WSGI import:my_app.application that would then be used as the CGI or FastCGI application executable. Finally, PEAK also offers an advanced FastCGI "supervisor" that's a compelling replacement for mod_fastcgi's process manager when running high-volume and slow-starting applications. It handles its own forking and killing off of child processes when they become too idle, and it has better "knowledge" of when new processes should or shouldn't be started. All of these containers are fairly stable, with some of them having been used in production for over a year now. (Until now, of course, the interface they used was a predecessor of the current WSGI spec, and they now use a simple adapter (courtesy of the wsgiref library) to wrap WSGI-compliant objects such that they implement that older, more CGI-like interface.) In addition to these server and gateway implementations, all of PEAK's web-based tools including the peak.web application framework, the 'DDT' (Document-Driven Testing) toolkit, and various example applications, are now all WSGI applications, and should in principle be able to run under other WSGI-compliant servers and gateways, once you write an appropriate startup script to instantiate them. Information about PEAK can be found at http://peak.telecommunity.com/. PEAK's server and gateway implementations are based on the 'wsgiref' library, which is distributed bundled with PEAK, as well as in a separate distribution. From ianb at colorstudy.com Wed Oct 13 22:05:42 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 13 22:07:09 2004 Subject: [Web-SIG] PEAK now provides various WSGI gateway and server options In-Reply-To: <5.1.1.6.0.20041007190441.02386300@mail.telecommunity.com> References: <5.1.1.6.0.20041007190441.02386300@mail.telecommunity.com> Message-ID: <416D8A96.7000100@colorstudy.com> Phillip J. Eby wrote: > The CVS version of PEAK now offers three options for running WSGI > applications: CGI, FastCGI, and SimpleHTTPServer. For example, this > command: > > peak launch WSGI import:my_app.application > > will do this: > > 1. Import 'application' from 'my_app', treating it as a WSGI application > 2. Start a SimpleHTTPServer listening to an arbitrary port on 'localhost' > 3. launch a browser window pointing to that local server I'm noticing that peak serve WSGI import:... does the same thing, but without launching a web browser. Is there any way to start the server up on a known port and interface? When I do "launch" it opens itself up in "localhost.my.hostname", and I'm not sure where localhost.my.hostname is coming from. Since my computer has several interfaces, I'm not sure which one it's starting on, so I haven't been able to figure it out even when I try different addresses. I was able to get "peak CGI WSGI import:..." working successfully, so the basic system is all installed and working. I tried FastCGI a little, but I got stuck on installing mod_fastcgi for the moment. I'm assuming that if I create a script like: #!/bin/sh peak FastCGI WSGI import:... In a .fcgi, executable script, with "AddHandler fastcgi-script .fcgi" in my httpd.conf, it'll just work...? I'm also not sure what the concurrency is for these. Multithreaded, multiple processes, single process? Configurable? Does the supervisor start on its own, or does that have to be configured? -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Wed Oct 13 22:21:15 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 13 22:22:41 2004 Subject: [Web-SIG] PEAK now provides various WSGI gateway and server options In-Reply-To: <416D8A96.7000100@colorstudy.com> References: <5.1.1.6.0.20041007190441.02386300@mail.telecommunity.com> <416D8A96.7000100@colorstudy.com> Message-ID: <416D8E3B.7000900@colorstudy.com> Ian Bicking wrote: > I was able to get "peak CGI WSGI import:..." working successfully, so > the basic system is all installed and working. I tried FastCGI a > little, but I got stuck on installing mod_fastcgi for the moment. BTW, does anyone know of a CGI gateway to FastCGI? Lots of FastCGI-alike protocols have these: wkcgi in Webware, scgi-cgi for SCGI, Zope/PCGI's Zope.cgi, etc. Typically these are just little C CGI programs. I couldn't find one for FastCGI, but then the search terms are woefully ambiguous (too many "cgi"s). -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From fumanchu at amor.org Wed Oct 13 23:06:12 2004 From: fumanchu at amor.org (Robert Brewer) Date: Wed Oct 13 23:06:56 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022F79@exchange.hqamor.amorhq.net> In order to test my application's WSGI interface, I wrote a quick mod_python server interface for WSGI. It's not bulletproof, but the parts I use work. Sorry, Phillip, I didn't subclass wsgiref.handlers.BaseHandler yet. ;( class ModPythonInputWrapper(object): def __init__(self, req): self.req = req def read(self, size=-1): return self.req.read(size) def readline(self): return self.req.readline() def readlines(self, hint=-1): return self.req.readlines(hint) def __iter__(self): return iter(self.req.readlines()) class ModPythonErrorWrapper(object): def __init__(self, req): self.req = req def flush(self): pass def write(self, content): self.req.log_error(content) def writelines(self, seq): for content in seq: self.req.log_error(content) def wrap_mod_python(application, req): """WSGI wrapper for mod_python 3.1 (Apache 2). Write your own short handler function, obtain your application, and pass it and the apache Request object to this function. """ from mod_python import apache req.add_common_vars() environ = dict(req.subprocess_env.items()) environ['wsgi.input'] = ModPythonInputWrapper(req) environ['wsgi.errors'] = ModPythonErrorWrapper(req) environ['wsgi.version'] = (1, 0) environ['wsgi.multithread'] = True environ['wsgi.multiprocess'] = False if req.protocol.count(u'HTTPS') > 0: environ['wsgi.url_scheme'] = 'https' else: environ['wsgi.url_scheme'] = 'http' nested_status = [apache.OK] def start_response(status, headers): if status: if status == "200 OK": nested_status[0] = apache.OK else: nested_status[0] = int(status[:3]) for key, val in headers: req.headers_out[key] = val return req.write result = application(environ, start_response) try: for data in result: req.write(data) finally: if hasattr(result,'close'): result.close() return nested_status[0] ----------- Example handler (for Junct, my wiki, built on Cation, my app framework): from cation.html import uiwsgi import junct def handler(req): ui = uiwsgi.UserInterfaceWSGI(junct.junctapp) ui.sandbox = junct.arena.new_sandbox() app = ui.request result = uiwsgi.wrap_mod_python(app, req) ui.sandbox.flush_all() return result Robert Brewer MIS Amor Ministries fumanchu@amor.org From pje at telecommunity.com Wed Oct 13 23:31:25 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Oct 13 23:32:19 2004 Subject: [Web-SIG] PEAK now provides various WSGI gateway and server options In-Reply-To: <416D8E3B.7000900@colorstudy.com> References: <416D8A96.7000100@colorstudy.com> <5.1.1.6.0.20041007190441.02386300@mail.telecommunity.com> <416D8A96.7000100@colorstudy.com> Message-ID: <5.1.1.6.0.20041013173029.0315bdb0@mail.telecommunity.com> At 03:21 PM 10/13/04 -0500, Ian Bicking wrote: >Ian Bicking wrote: >>I was able to get "peak CGI WSGI import:..." working successfully, so the >>basic system is all installed and working. I tried FastCGI a little, but >>I got stuck on installing mod_fastcgi for the moment. > >BTW, does anyone know of a CGI gateway to FastCGI? Lots of FastCGI-alike >protocols have these: wkcgi in Webware, scgi-cgi for SCGI, Zope/PCGI's >Zope.cgi, etc. Typically these are just little C CGI programs. I >couldn't find one for FastCGI, but then the search terms are woefully >ambiguous (too many "cgi"s). It's called 'cgi-fcgi', and it's part of the FastCGI developer's kit: http://www.fastcgi.com/devkit/doc/fcgi-devel-kit.htm#S4.2 From pje at telecommunity.com Thu Oct 14 00:25:48 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Oct 14 00:26:44 2004 Subject: [Web-SIG] PEAK now provides various WSGI gateway and server options In-Reply-To: <416D8A96.7000100@colorstudy.com> References: <5.1.1.6.0.20041007190441.02386300@mail.telecommunity.com> <5.1.1.6.0.20041007190441.02386300@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041013172910.02b5e5e0@mail.telecommunity.com> At 03:05 PM 10/13/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>The CVS version of PEAK now offers three options for running WSGI >>applications: CGI, FastCGI, and SimpleHTTPServer. For example, this command: >> peak launch WSGI import:my_app.application >>will do this: >> 1. Import 'application' from 'my_app', treating it as a WSGI application >> 2. Start a SimpleHTTPServer listening to an arbitrary port on 'localhost' >> 3. launch a browser window pointing to that local server > >I'm noticing that peak serve WSGI import:... does the same thing, but >without launching a web browser. Yes, but it's less convenient to use since you have to set up a configuration file to specify the port and hostname and such. "peak launch" selects an available port and tells your web browser about it. However, if you want to use 'peak serve', you can put something like this: [peak.tools.server] url = "tcp://fqdn.goes.here:8000" in a configuration file, and then point to it with PEAK_CONFIG. E.g.: PEAK_CONFIG=myserver.conf peak serve WSGI import:my_app.application Or, if you want to just make the whole thing an easy-to-run application: #!invoke peak runIni [peak.running] app = = commands.Alias(command=['serve','WSGI','import:my_app.application']) [peak.tools.server] url = "tcp://fqdn.goes.here:8000" And then make the file executable, so you can run it directly. Now, you've got a ready-made setup to run a specific application. You can also use 'launch' instead of 'serve'; it will start the web browser on the 'http' version of the given URL. >Is there any way to start the server up on a known port and interface? >When I do "launch" it opens itself up in "localhost.my.hostname", and I'm >not sure where localhost.my.hostname is coming from. From 'socket.getfqdn(serversocket.getsockname())'. Specifically, the default address is 'localhost:0', which translates to any available port on 'localhost'. Apparently, your local resolver considers your FQDN to be 'localhost.my.hostname', so I'd check /etc/resolv.conf or some such if you're on a Unix-like machine. If you're on Windows or OS/X machine, I have no idea what to do. Your problem does suggest that maybe I should change local_server to consider its address to be whatever it was configured to be, and not ask for the "official" socket address. That way, it won't rely on a properly configured resolver, just to set up a localhost server. >I was able to get "peak CGI WSGI import:..." working successfully, so the >basic system is all installed and working. I tried FastCGI a little, but >I got stuck on installing mod_fastcgi for the moment. I'm assuming that >if I create a script like: > >#!/bin/sh >peak FastCGI WSGI import:... > >In a .fcgi, executable script, with "AddHandler fastcgi-script .fcgi" in >my httpd.conf, it'll just work...? Something like that, yes. It's been a while since I used that approach; I've mainly used stuff that's more like: SetHandler fastcgi-script For the most part, mod_fastcgi is a bitch to set up for non-trivial applications, even *with* the PEAK supervisor tool, as many of its options are either poorly documented, or buggy, depending on whether you consider the code or documentation to be the thing that's wrong. :) >I'm also not sure what the concurrency is for these. Multithreaded, >multiple processes, single process? Configurable? CGI/FastCGI are both single thread, multi-process. The supervisor is also multi-process, but forking. If your application module wants to set up caches, import lots of modules, etc., this will be done in the parent process, so that child processes will already have the work done. > Does the supervisor start on its own, or does that have to be configured? mod_fastcgi starts its own process manager as needed. Based on the settings in httpd.conf, it will start multiple processes for you, up to the maximum you specify. It will also kill them off by signalling them when they become idle. (The PEAK FastCGI implementations detect this and shut down gracefully.) If you are using PEAK's process supervisor tool (peak.tools.supervisor) to manage an application, then you should configure mod_fastcgi to start one and only one process for that application. Or, you can have the application start independently, listening on a known socket (e.g /tmp/myapp.sock), and configure mod_fastcgi not to manage the start/stop of processes. The process supervisor will take care of the rest for you. If you start a supervised application that's already running, the new copy will get ready to run, and then signal the old copy to terminate gracefully, allowing currently-running requests to finish. This is intended to make it easy to do a "warm restart" of your application to e.g. upgrade the code of a production application. Alternately, you can simply issue a soft kill signal to the running parent process, and mod_fastcgi will take care of restarting it, if you've used the "start exactly one" approach. To run an app under the PEAK "supervisor" tool, you need to create a configuration file, at minimum, something like: #!invoke peak supervise Command FastCGI fd.socket:stdin WSGI import:my_app.application PidFile /var/run/my_app.pid This assumes that your OS supports using PATH to interpret "#!" lines; if not, you'll need an absolute path to 'invoke'. ('invoke' is a C program that comes with PEAK in the 'scripts' directory, that you can install to more easily use PEAK tools as interpreters.) The 'FastCGI fd.socket:stdin' means to use standard input as the connect socket for FastCGI; if you are using the "standalone" configuration, you'll want to replace that with a 'unix:/path/to/a_socket' or 'tcp://localhost:1234' URL, as appropriate. (For more detailed info on PEAK socket URL's, see the 'peak.net.sockets' module.) The 'PidFile' spec is required; it's how the supervisor ensures that there's only one "master" process for the application at a given time, and it also makes it easy to shut down the application. (There are also some other files used, whose names default to variations on the PidFile's filename, such as the "startup lock" file and the "pid lock" file; see the "Supervisor.xml" file in the 'peak.tools.supervisor' package directory for detailed info on these and all other configuration options for the "supervise" tool.) Anyway, the configuration file can contain other options, like: MinProcesses 1 # Always have one request-handling process MaxProcesses 4 # and up to 4 if needed StartInterval 15s # Don't start children more often than 1 per 15 seconds Import some.module # force module to be imported in parent, that child might need Note that 'Import' directives do not do anything with the contents of the named module; they just ensure the module is imported before the supervisor considers itself "started". This is useful if your application's initial import doesn't load all the modules it's going to use, and you don't want to slow down the startup of new child processes by making them import the module. Whew. Anyway, so, the minimum to use PEAK's supervise tool in place of the mod_fastcgi process supervisor is to make a configuration file specifying the command and pidfile, and it should be run using 'peak supervise'. Ideally, you can do that with a '#!' line as shown above, but you can also do it with a shell script, e.g.: #!/bin/sh peak supervise config_file_for_my_app Note that you can probably get by for a while without PEAK's supervise tool; it's fairly "industrial strength" and exists mainly to work around performance flaws in mod_fastcgi's process manager that affect slow-starting applications that need multiple processes in order to handle the server's request volume, and to make it easier to control a running application (e.g. easy warm restart). If you don't have an application that costs measurable amounts of money for every second of delayed response, you may not need "peak supervise". Finally, note that there's a very nice tutorial at http://peak.telecommunity.com/DevCenter/IntroToPeak that covers lots of basic "how to set up configuration files and make them executable" stuff for PEAK. There's also some useful information in INSTALL.txt, under "SCRIPTS, BATCH FILES, and #!": http://peak.telecommunity.com/doc/INSTALL.txt.html From pje at telecommunity.com Thu Oct 14 01:01:31 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Oct 14 01:02:38 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022F79@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20041013182934.02473c70@mail.telecommunity.com> At 02:06 PM 10/13/04 -0700, Robert Brewer wrote: >In order to test my application's WSGI interface, I wrote a quick >mod_python server interface for WSGI. It's not bulletproof, but the >parts I use work. Sorry, Phillip, I didn't subclass >wsgiref.handlers.BaseHandler yet. ;( That's okay; you've given me several of the pieces I would need to do it myself. :) Although, I still would want a better way to find out what to set the multithread/multiprocess flags to; as some Apache builds are multithreaded and some are not, and some are multi-process, and some or not. To be compliant There are, however, numerous other issues in your code, from a WSGI-compliance perspective. For example, your start_response() doesn't support WSGI error handling. Anyway, a mod_python handler would probably look something like: from wsgiref.handlers import BaseCGIHandler class ModPyHandler(BaseCGIHandler): def __init__(self,req): req.add_common_vars() BaseCGIHandler.__init__(self, stdin = ModPythonInputWrapper(req), stdout = None, stderr = ModPythonErrorWrapper(req), environ = dict(req.subprocess_env.items()), multiprocess = True, # XXX multithread = True, # XXX ) self.request = req self._write = req.write def _flush(self): pass def send_headers(self): self.cleanup_headers() self.headers_sent = True self.request.status = int(self.status[:3]) for key, val in self.headers.items(): self.request.headers_out[key] = val def wsgi_handler(req): handler = ModPyHandler(req) options = req.get_options() appmod,appname = options['application'].split('::') d = {} exec ("from %(appmod)s import %(appname) as application" % locals()) in d handler.run(d[application]) from mod_python import apache return apache.OK But note that this is just a draft off the top of my head, and may be deficient with respect to how it uses the mod_python API (especially since I've never used mod_python even once). Anyway, to use it, one would configure something like: PythonHandler somewhere::wsgi_handler PythonOption application myapp::wsgi_app_func In other words, it uses a PythonOption called "application" to indicate the application to be run, thus simplifying the launch configuration. Let me know if this code works for you, and if so I'll add it to the wsgiref library. From pje at telecommunity.com Thu Oct 14 01:09:21 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Oct 14 01:10:18 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt In-Reply-To: <5.1.1.6.0.20041013182934.02473c70@mail.telecommunity.com> References: <3A81C87DC164034AA4E2DDFE11D258E3022F79@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20041013190831.023e1ca0@mail.telecommunity.com> At 07:01 PM 10/13/04 -0400, Phillip J. Eby wrote: > exec ("from %(appmod)s import %(appname) as application" % > locals()) in d > handler.run(d[application]) Oops, typos. There should be an 's' after '%(appname)', and that should be "d['application']". Those are probably not the only mistakes I made in that code, but they're the first I've seen so far. :) I'm almost tempted to go build mod_python so I can see what the rest of the errors are. :) From fumanchu at amor.org Thu Oct 14 07:13:21 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Oct 14 07:14:06 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022F82@exchange.hqamor.amorhq.net> Phillip J. Eby wrote: > > At 02:06 PM 10/13/04 -0700, Robert Brewer wrote: > >In order to test my application's WSGI interface, I wrote a quick > >mod_python server interface for WSGI. It's not bulletproof, but the > >parts I use work. Sorry, Phillip, I didn't subclass > >wsgiref.handlers.BaseHandler yet. ;( > > That's okay; you've given me several of the pieces I would > need to do it myself. :) I was hoping someone would say that. :) > Anyway, a mod_python handler would probably look something like: > > from wsgiref.handlers import BaseCGIHandler > > class ModPyHandler(BaseCGIHandler): > > def __init__(self,req): > req.add_common_vars() > BaseCGIHandler.__init__(self, > stdin = ModPythonInputWrapper(req), > stdout = None, > stderr = ModPythonErrorWrapper(req), > environ = dict(req.subprocess_env.items()), > multiprocess = True, # XXX > multithread = True, # XXX > ) 1. I found apache.build_cgi_env(req) tonight, which does the add_common_vars() and dict() shoving for you. Unfortunately, it's got a bug. So I just stole code from it. 2. I think apache.mpm_query() is what we want for multithreading/process. But it was introduced in version 3.1, so that needs to be trapped. I went with optional arguments to ModPyHandler.__init__ > def wsgi_handler(req): > handler = ModPyHandler(req) > options = req.get_options() > appmod,appname = options['application'].split('::') > d = {} > exec ("from %(appmod)s import %(appname) as application" % > locals()) in d > handler.run(d[application]) > from mod_python import apache > return apache.OK Eeew. exec. Smelly. :) I'll stick with plain Python code over PythonOption, thanks, and make my app developers do a few lines of extra work *once* instead of every deployer on every install. To each his own... :P > Let me know if this code works for you, and if so I'll add it to the > wsgiref library. Here's the revised version. I haven't tested everything; for example, reading straight from wsgi.input or writing to .errors. I'll wait for the bug reports. :) class ModPythonInputWrapper(object): def __init__(self, req): self.req = req def read(self, size=-1): return self.req.read(size) def readline(self): return self.req.readline() def readlines(self, hint=-1): return self.req.readlines(hint) def __iter__(self): return iter(self.req.readlines()) class ModPythonErrorWrapper(object): def __init__(self, req): self.req = req def flush(self): pass def write(self, content): self.req.log_error(content) def writelines(self, seq): for content in seq: self.req.log_error(content) from wsgiref.handlers import BaseCGIHandler class ModPyHandler(BaseCGIHandler): def __init__(self, req, threaded=None, forked=None): from mod_python import apache try: q = apache.mpm_query except AttributeError: if (threaded is None) or (forked is None): m = ("You must provide 'threaded' and 'forked' args to " "ModPyHandler when running mod_python < 3.1") raise ValueError(m) else: threaded = apache.mpm_query(apache.AP_MPMQ_IS_THREADED) forked = apache.mpm_query(apache.AP_MPMQ_IS_FORKED) req.add_common_vars() env = req.subprocess_env.copy() if req.path_info: env["SCRIPT_NAME"] = req.uri[:-len(req.path_info)] else: env["SCRIPT_NAME"] = req.uri env["GATEWAY_INTERFACE"] = "Python-CGI/1.1" # you may want to comment this out for better security if req.headers_in.has_key("authorization"): env["HTTP_AUTHORIZATION"] = req.headers_in["authorization"] BaseCGIHandler.__init__(self, stdin=ModPythonInputWrapper(req), stdout=None, stderr=ModPythonErrorWrapper(req), environ=env, multiprocess=forked, multithread=threaded ) self.request = req self._write = req.write def _flush(self): pass def send_headers(self): self.cleanup_headers() self.headers_sent = True self.request.status = int(self.status[:3]) for key, val in self.headers.items(): self.request.headers_out[key] = val Robert Brewer MIS Amor Ministries fumanchu@amor.org From fumanchu at amor.org Thu Oct 14 08:20:22 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Oct 14 08:21:07 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022F83@exchange.hqamor.amorhq.net> I wrote: > Here's the revised version. I haven't tested everything; for example, > reading straight from wsgi.input or writing to .errors. I'll wait for > the bug reports. :) Okay. I just did file uploads (with cgi.FieldStorage) and naturally encountered errors ;) which were printed to the apache2 error.log. However, this did *not* happen in BaseCGIHandler.log_exception because the headers had already been set, which raised another error, which also got printed to error.log. Not sure what to do about that, if anything. :/ I also confirmed that multiprocess and multithreaded were set correctly, at least for mpm_winnt. Robert Brewer MIS Amor Ministries fumanchu@amor.org From pje at telecommunity.com Thu Oct 14 19:24:57 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Oct 14 19:24:32 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022F82@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20041014131210.03468640@mail.telecommunity.com> At 10:13 PM 10/13/04 -0700, Robert Brewer wrote: >Phillip J. Eby wrote: > > def wsgi_handler(req): > > handler = ModPyHandler(req) > > options = req.get_options() > > appmod,appname = options['application'].split('::') > > d = {} > > exec ("from %(appmod)s import %(appname) as application" % > > locals()) in d > > handler.run(d[application]) > > from mod_python import apache > > return apache.OK > >Eeew. exec. Smelly. :) The "correct" way to do it would be to swipe whatever code mod_python itself uses for that, although I wouldn't be surprised if it uses exec also. :) More likely, it uses '__import__', but for the prototype version, why bother? >I'll stick with plain Python code over >PythonOption, thanks, and make my app developers do a few lines of extra >work *once* instead of every deployer on every install. I'm confused. One of the main points of WSGI is to "write once, run anywhere". Assuming most WSGI apps end up as a callable that can be imported from somewhere, then the path of least resistance for a deployer is to be able to pop an extra line or two in an .htaccess or httpd.conf. They're going to have to touch that file anyway, even to set up a wrapper script. Why should they have to edit the configuration *and* write a script? Especially if they're just deploying the app. That makes no sense to me at all. Likewise, it makes no sense to have the application developer have to write a mod_python wrapper for their WSGI applications, since they might not have or care about mod_python specifically. Perhaps I'm misunderstanding what you're saying, because I don't "get it". Or maybe you misunderstood the intent of my code. I was assuming that the 'wsgi_handler' function would be bundled with the *gateway*, not added to every application. So, you would always have, e.g.: PythonHandler wsgiref.handlers::wsgi_handler as part of the handler setup for a WSGI application. Thus, deploying a WSGI app on mod_python should be as simple as having wsgiref and the application itself on the server's PYTHONPATH, and then setting a couple of configuration options. > if req.path_info: > env["SCRIPT_NAME"] = req.uri[:-len(req.path_info)] > else: > env["SCRIPT_NAME"] = req.uri Does the 'req.uri' attribute include a query string? > # you may want to comment this out for better security No, you don't want to. :) If you don't trust the WSGI app, you shouldn't run it. It would be trivial for it to inspect Python stack frames until it finds the request object and pull out the authorization on its own. So, it might give someone a warm fuzzy feeling to take it out, it won't really help anything. :) From pje at telecommunity.com Thu Oct 14 19:39:27 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Oct 14 19:39:01 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022F83@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20041014132635.03467e50@mail.telecommunity.com> At 11:20 PM 10/13/04 -0700, Robert Brewer wrote: >I wrote: > > Here's the revised version. I haven't tested everything; for example, > > reading straight from wsgi.input or writing to .errors. I'll wait for > > the bug reports. :) > >Okay. I just did file uploads (with cgi.FieldStorage) and naturally >encountered errors ;) which were printed to the apache2 error.log. >However, this did *not* happen in BaseCGIHandler.log_exception because >the headers had already been set, which raised another error, which also >got printed to error.log. Not sure what to do about that, if anything. Send me both tracebacks. :) One quick question: what is 'sys.stderr' for Python under mod_python? If it prints to the error log, there's no reason (at least from a compliance POV) not to simply use it as the handler's stderr. From fumanchu at amor.org Thu Oct 14 23:20:07 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Oct 14 23:20:53 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022F86@exchange.hqamor.amorhq.net> Phillip J. Eby wrote: > At 11:20 PM 10/13/04 -0700, Robert Brewer wrote: > >I wrote: > > > Here's the revised version. I haven't tested everything; > for example, > > > reading straight from wsgi.input or writing to .errors. > I'll wait for > > > the bug reports. :) > > > >Okay. I just did file uploads (with cgi.FieldStorage) and naturally > >encountered errors ;) which were printed to the apache2 error.log. > >However, this did *not* happen in > BaseCGIHandler.log_exception because > >the headers had already been set, which raised another > error, which also > >got printed to error.log. Not sure what to do about that, if > anything. > > Send me both tracebacks. :) Traceback (most recent call last): File "C:\Python23\lib\site-packages\mod_python\apache.py", line 299, in HandlerDispatch result = object(req) File "C:\Python23\lib\site-packages\cation\html\uiwsgi.py", line 144, in run_app self.run(self.app) File "C:\Python23\lib\site-packages\wsgiref\handlers.py", line 96, in run self.handle_error() File "C:\Python23\lib\site-packages\wsgiref\handlers.py", line 307, in handle_error self.result = self.error_output(self.environ, self.start_response) File "C:\Python23\lib\site-packages\wsgiref\handlers.py", line 325, in error_output start_response(self.error_status, self.error_headers[:]) File "C:\Python23\lib\site-packages\wsgiref\handlers.py", line 176, in start_response raise AssertionError("Headers already set!") AssertionError: Headers already set! > One quick question: what is 'sys.stderr' for Python under > mod_python? If > it prints to the error log, there's no reason (at least from > a compliance > POV) not to simply use it as the handler's stderr. sys.stderr -> apache's error log. See http://www.modpython.org/FAQ/faqw.py?req=show&file=faq02.003.htp Robert Brewer MIS Amor Ministries fumanchu@amor.org From pje at telecommunity.com Thu Oct 14 23:33:52 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Oct 14 23:33:27 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022F86@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20041014172738.022c1b40@mail.telecommunity.com> At 02:20 PM 10/14/04 -0700, Robert Brewer wrote: > File "C:\Python23\lib\site-packages\wsgiref\handlers.py", line 325, in >error_output > start_response(self.error_status, self.error_headers[:]) Aha. That's a wsgiref bug; it should be passing 'sys.exc_info()' as the third argument here. As a result, it doesn't work if start_response has already been called. I've fixed this in CVS now. Apparently the tests don't yet cover a scenario of "call start_response(), then raise an exception before the headers are actually sent." > > One quick question: what is 'sys.stderr' for Python under > > mod_python? If > > it prints to the error log, there's no reason (at least from > > a compliance > > POV) not to simply use it as the handler's stderr. > >sys.stderr -> apache's error log. See >http://www.modpython.org/FAQ/faqw.py?req=show&file=faq02.003.htp Ah. So it should suffice to use sys.stderr, as long as the output is flushed from time to time. I've changed wsgiref to flush stderr after writing exception output, since it really should be doing that for other platforms as well. From fumanchu at amor.org Thu Oct 14 23:49:29 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Oct 14 23:50:15 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022F87@exchange.hqamor.amorhq.net> Phillip J. Eby wrote: > At 10:13 PM 10/13/04 -0700, Robert Brewer wrote: > >Phillip J. Eby wrote: > > > def wsgi_handler(req): > > > handler = ModPyHandler(req) > > > options = req.get_options() > > > appmod,appname = options['application'].split('::') > > > d = {} > > > exec ("from %(appmod)s import %(appname) as > application" % > > > locals()) in d > > > handler.run(d[application]) > > > from mod_python import apache > > > return apache.OK > > > >Eeew. exec. Smelly. :) > > The "correct" way to do it would be to swipe whatever code mod_python > itself uses for that, although I wouldn't be surprised if it > uses exec also. :) > > More likely, it uses '__import__', but for the prototype > version, why bother? Because it's easy: def wsgi_handler(req): from mod_python import apache handler = ModPyHandler(req) options = req.get_options() modname, objname = options['application'].split('::') module = apache.import_module(modname, autoreload=False, log=debug) app = apache.resolve_object(module, objname, arg=None, silent=False) handler.run(app) return apache.OK > >I'll stick with plain Python code over > >PythonOption, thanks, and make my app developers do a few > lines of extra > >work *once* instead of every deployer on every install. > > I'm confused. One of the main points of WSGI is to "write once, run > anywhere". Assuming most WSGI apps end up as a callable that can be > imported from somewhere, then the path of least resistance > for a deployer > is to be able to pop an extra line or two in an .htaccess or > httpd.conf. They're going to have to touch that file anyway, > even to set > up a wrapper script. Why should they have to edit the > configuration *and* > write a script? Especially if they're just deploying the > app. That makes > no sense to me at all. Likewise, it makes no sense to have > the application > developer have to write a mod_python wrapper for their WSGI > applications, > since they might not have or care about mod_python specifically. You're right. It was just one more level of indirection and my brain was on overload with all the callbacks, etc. Turns out I can take what would have been a handler: def handler(req): ...and change it to: def get_wsgi_app(environ, start_response): ...and make that my "application" callable. > > if req.path_info: > > env["SCRIPT_NAME"] = req.uri[:-len(req.path_info)] > > else: > > env["SCRIPT_NAME"] = req.uri > > Does the 'req.uri' attribute include a query string? The docs say "uri: The path portion of the URI." Helpful. I'd guess req.uri does not include query string, since path_info comes before query args. I copied the above 4 lines from mod_python/apache.py > > > # you may want to comment this out for better security > > No, you don't want to. :) If you don't trust the WSGI app, > you shouldn't run it. It would be trivial for it to inspect > Python stack frames until it finds the request object and pull > out the authorization on its own. So, it might give someone a > warm fuzzy feeling to take it out, it won't really help anything. That comment was also copied from mod_python. I realized you don't need a separate handler function called wsgi_handler; mod_python is smart enough to notice when your handler is an unbound class method, and automatically forms an instance of your class (passing the request object), and then calling the bound method (again, passing the request). So I folded the handler code directly into ModPyHandler. Here's the latest version: class ModPythonInputWrapper(object): def __init__(self, req): self.req = req def read(self, size=-1): return self.req.read(size) def readline(self): return self.req.readline() def readlines(self, hint=-1): return self.req.readlines(hint) def __iter__(self): return iter(self.req.readlines()) import sys from wsgiref.handlers import BaseCGIHandler class ModPyHandler(BaseCGIHandler): def __init__(self, req): from mod_python import apache options = req.get_options() try: q = apache.mpm_query except AttributeError: # Threading and forking threaded = options.get('multithread', '') forked = options.get('multiprocess', '') if not (threaded and forked): raise ValueError("You must provide 'multithread' and " "'multiprocess' PythonOptions when " "running mod_python < 3.1") threaded = threaded.lower() in ('on', 't', 'true', '1') forked = forked.lower() in ('on', 't', 'true', '1') else: threaded = q(apache.AP_MPMQ_IS_THREADED) forked = q(apache.AP_MPMQ_IS_FORKED) req.add_common_vars() env = req.subprocess_env.copy() if req.path_info: env["SCRIPT_NAME"] = req.uri[:-len(req.path_info)] else: env["SCRIPT_NAME"] = req.uri env["GATEWAY_INTERFACE"] = "Python-CGI/1.1" if req.headers_in.has_key("authorization"): env["HTTP_AUTHORIZATION"] = req.headers_in["authorization"] BaseCGIHandler.__init__(self, stdin=ModPythonInputWrapper(req), stdout=None, stderr=sys.stderr, environ=env, multiprocess=forked, multithread=threaded ) self.request = req self._write = req.write config = req.get_config() debug = int(config.get("PythonDebug", 0)) modname, objname = options['application'].split('::') module = apache.import_module(modname, autoreload=False, log=debug) self.app = apache.resolve_object(module, objname, arg=None, silent=False) def run_app(self, req): self.run(self.app) return 0 # = apache.OK def _flush(self): pass def send_headers(self): self.cleanup_headers() self.headers_sent = True self.request.status = int(self.status[:3]) for key, val in self.headers.items(): self.request.headers_out[key] = val ------------ and a sample .conf: PythonHandler wsgiref.handlers::ModPyHandler.run_app PythonOption application myproggie.startup::get_wsgi_app # These options are required if you're using a version of mod_python < 3.1 # multithread = On # multiprocess = Off Robert Brewer MIS Amor Ministries fumanchu@amor.org From floydophone at gmail.com Fri Oct 15 02:59:45 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Oct 15 02:59:47 2004 Subject: [Web-SIG] WSGI async API Message-ID: <6654eac4041014175977291ff4@mail.gmail.com> Can someone briefly outline how the WSGI async API works? Sorry to reiterate, but I don't know the agreement we finally reached. From pje at telecommunity.com Fri Oct 15 07:46:42 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 15 07:46:18 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <6654eac4041014175977291ff4@mail.gmail.com> Message-ID: <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> At 08:59 PM 10/14/04 -0400, Peter Hunt wrote: >Can someone briefly outline how the WSGI async API works? Sorry to >reiterate, but I don't know the agreement we finally reached. That's because no agreement was reached. There are two (moderately vague) proposals still at large: 1) Have server extension APIs to pause iteration until further notice, or until input is available 2) Have a server extension API that returns an iterable that the application then returns after registering callbacks with it. This object would provide a more continuation-like API. Another alternative is not to bless an official async API at this time, and leave it open for server developers to innovate. Then, come back later and extend the PEP once there's more user/developer experience with the various innovations out there. Both of the above approaches could be implemented in various ways, according to developer interest, but would be considered server-specific extensions until/unless there was consensus to formalize them as optional extensions to the current spec. Given that none of the proposals appear to require making any further changes to the base API, and that traffic discussing the existing proposals has been slim, this latter alternative is beginning to look pretty attractive to me. From pje at telecommunity.com Fri Oct 15 07:47:22 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 15 07:46:57 2004 Subject: [Web-SIG] [WSGI] mod_python wrapper: minimal first attempt In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022F87@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20041015013257.02c1aec0@mail.telecommunity.com> At 02:49 PM 10/14/04 -0700, Robert Brewer wrote: >Phillip J. Eby wrote: > > The "correct" way to do it would be to swipe whatever code mod_python > > itself uses for that, although I wouldn't be surprised if it > > uses exec also. :) > > > > More likely, it uses '__import__', but for the prototype > > version, why bother? > >Because it's easy: Not if you haven't installed or even downloaded mod_python. ;) >I realized you don't need a separate handler function called >wsgi_handler; mod_python is smart enough to notice when your handler is >an unbound class method, and automatically forms an instance of your >class (passing the request object), and then calling the bound method >(again, passing the request). So I folded the handler code directly into >ModPyHandler. Here's the latest version: I think I'll stick with the version where they're separate. It's easier to implement unit tests on the handler class if its __init__ method doesn't run the application. Still, this looks like it's in pretty good shape to pop into wsgiref. Thanks for your help in fleshing it out. From floydophone at gmail.com Fri Oct 15 12:57:57 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Oct 15 12:58:01 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> References: <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> Message-ID: <6654eac404101503573c8cfa7a@mail.gmail.com> So if I'm implementing a Twisted gateway, where should request.finish() go? This has been puzzling me for some time... On Fri, 15 Oct 2004 01:46:42 -0400, Phillip J. Eby wrote: > At 08:59 PM 10/14/04 -0400, Peter Hunt wrote: > > > >Can someone briefly outline how the WSGI async API works? Sorry to > >reiterate, but I don't know the agreement we finally reached. > > That's because no agreement was reached. There are two (moderately vague) > proposals still at large: > > 1) Have server extension APIs to pause iteration until further notice, or > until input is available > > 2) Have a server extension API that returns an iterable that the > application then returns after registering callbacks with it. This object > would provide a more continuation-like API. > > Another alternative is not to bless an official async API at this time, and > leave it open for server developers to innovate. Then, come back later and > extend the PEP once there's more user/developer experience with the various > innovations out there. Both of the above approaches could be implemented > in various ways, according to developer interest, but would be considered > server-specific extensions until/unless there was consensus to formalize > them as optional extensions to the current spec. > > Given that none of the proposals appear to require making any further > changes to the base API, and that traffic discussing the existing proposals > has been slim, this latter alternative is beginning to look pretty > attractive to me. > > From irmen at xs4all.nl Fri Oct 15 13:55:16 2004 From: irmen at xs4all.nl (Irmen de Jong) Date: Fri Oct 15 13:55:21 2004 Subject: [Web-SIG] http content-location header, and different browsers Message-ID: <416FBAA4.6060502@xs4all.nl> Hello all, I was just trying some new code I was writing for Snakelets with different browsers, and stumbled across something weird. It has to do with the HTTP Content-Location header. What I used to do was adding a Content-Location header in the reply, when the page was internally redirected in Snakelets. (I thought this was a good idea, based on what I knew about the meaning of that header). Everything worked fine. Until I opened my website with Opera, instead of Firefox or IE....: a few of my links had totally wrong URLs in Opera! After a bit of searching I now know that at least Opera implements the HTTP specification, which says in http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14 that "The value of Content-Location also defines the base URI for the entity." So Opera was -rightfully so- using the value of the content-location header as the new base URI, and the other browsers I tried *do not do that*. Firefox has a WONTFIX-bug on this (bugzilla #109553) because they feel that it would break a lot of websites that supply faulty content-location headers. In the end, I decided to just not generate this header anymore. And my site started working in Opera too ;-) What do you think of this? --Irmen de Jong. From foom at fuhm.net Fri Oct 15 17:20:34 2004 From: foom at fuhm.net (James Y Knight) Date: Fri Oct 15 17:20:40 2004 Subject: [Twisted-web] Re: [Web-SIG] A more Twisted approach to async apps in WSGI In-Reply-To: <5.1.1.6.0.20041007010942.02d33c90@mail.telecommunity.com> References: <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> <5.1.1.6.0.20041007010942.02d33c90@mail.telecommunity.com> Message-ID: On Oct 7, 2004, at 1:28 AM, Phillip J. Eby wrote: >> - For async applications, with the proposed API, that may not be an >> option, because the iterable returned is the special wrapper, not a >> user-created class. Although, actually, I guess the app can return >> its own iterable whose __iter__ calls through and returns the >> wrapper's __iter__. > > Not if the server wants to be able to handle that iterable specially. > But anyway, it seems that the wrapper's constructor should take a > close method, or have a way to set one. As already discussed, the server cannot really expect to actually get the iterable back anyhow. But yes, I'd say either the init should take a close argument, or else the use of something like "wrapper.close = myCloseFunction" should be part of the API. >> Hmm, yes. I totally missed the option of just yielding ''. Of course >> it's a very bad idea to repeatedly yield '' to a server if you don't >> know the server can properly handle it (by e.g. delaying longer and >> longer), but, in this case, since the server itself is providing the >> special iterable, that should be fine. > > Yes. Also, when we finally settle on an async API, I do want to cover > the issue of backing off iteration when empty strings are yielded. > I'm actually inclined to suggest that an async application should take > responsibility for doing the delaying if it's called repeatedly, and > the async API isn't available. If the async API isn't available, and I'm an async application, I would assume I'm running on a synch server, and thus am allowed to block the request thread indefinitely, and do so, waiting for a wakeup notification from the reactor loop. It doesn't seem to me that any iterator back-off behavior is needed, or desirable. I can fabricate an async wrapper that uses threads >> It seems like it should be possible to make a generic class that >> implements this async API for use with sync servers that do not >> support it natively. That would allow async apps to run on a sync >> server without modification, which is potentially useful. To do that, >> though, I think the it'd have to spawn an extra thread per request >> that is waiting to read data, for the read() call to block on. >> Unless, of course, the app never needs to yield outgoing data while >> waiting for incoming data. > > Well, with Twisted you could deferToThread the read() operations, > though it's hard for me to think straight about that scenario because > I keep finding it hard to imagine an async web app that isn't just > written to the Twisted API to start with... ;) Right -- but deferToThread'ing a read() operation is essentially the same as spawning an extra thread per request to read the data, just with nicer thread management. > [thread stuff] > >> I haven't really thought about these thready questions much either, >> so maybe the answers are obvious, but in my experience, that's >> usually not the case when it comes to threads. > > Yep. :) However, the more I think about it, the more it seems to me > that WSGI should emulate single-threadedness with respect to any > function/method/iterator invocations associated with a given > application invocation. However, it is *not* guaranteed that all such > invocations will occur from the same thread. > > Basically, it means "no multitasking with the other guy's objects", > and puts the locking burdens on whoever's trying to mix multitasking > into the works. That does sound good. No multitasking means it's impossible to write a response while already waiting for incoming data. But actually I think it's probably fine for an async app running on a sync server to not be able to simultaneously read data and write data, so I take back anything about needing to call wsgi server methods from more than one thread. In the compat wrapper, calling on_get can just block writing until the read has occurred; in that case, all wsgi methods can be called from the server's request thread. > By the way, after all this discussion... do you think it would be > better to: > > 1) Push towards a full async API, nailing down all these loose ends > > 2) Use the simple-but-klugdy "pause iteration" API idea > > 3) Don't make an "official" async API, and just leave it open to > server authors to create their own extensions, and maybe cherry pick > the best ideas for WSGI 2.0, or > > 4) Do something else altogether? I think the API you've outlined sounds good. I can imagine ways to implement it both for an async server like twisted, and as a compatibility layer for an async-requiring application on a sync server. I think it's easier to make the compatibility layer with this API than with the pause/resume API. However, I would be quite wary of including it in the final spec without it being implemented first. Another question is: what is the current use for it? Does anyone want to write untwisted async web applications? My current interest in WSGI is basically on the "plug twisted web into another webserver as an application" side of things. I wouldn't want to write an application to WSGI (without a framework on top)... If everyone else feels that way, an async API may not be actually useful until there is some other Async-WSGI web server that you could plug twisted framework stuff on top of, or some other async framework you can plug on top of the twisted server. As for postponing until WSGI 2.0, I would hope there doesn't need to be a WSGI 2.0, though, since the interface is so darn simple. ;) But it could be in a separate WSGI async addons. James From pje at telecommunity.com Fri Oct 15 17:31:56 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 15 17:31:31 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <6654eac404101503573c8cfa7a@mail.gmail.com> References: <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041015113034.02c7a7e0@mail.telecommunity.com> At 06:57 AM 10/15/04 -0400, Peter Hunt wrote: >So if I'm implementing a Twisted gateway, where should >request.finish() go? This has been puzzling me for some time... What's request.finish()? I've never done anything with Twisted at a higher level than the raw reactor interface, and a bit with Deferreds. So I'm not sure what you're talking about here. From pje at telecommunity.com Fri Oct 15 17:52:58 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 15 17:52:36 2004 Subject: [Twisted-web] Re: [Web-SIG] A more Twisted approach to async apps in WSGI In-Reply-To: References: <5.1.1.6.0.20041007010942.02d33c90@mail.telecommunity.com> <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20041005022421.02fae470@mail.telecommunity.com> <5.1.1.6.0.20041007010942.02d33c90@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041015113209.02155ba0@mail.telecommunity.com> At 11:20 AM 10/15/04 -0400, James Y Knight wrote: >On Oct 7, 2004, at 1:28 AM, Phillip J. Eby wrote: >>By the way, after all this discussion... do you think it would be better to: >> >>1) Push towards a full async API, nailing down all these loose ends >> >>2) Use the simple-but-klugdy "pause iteration" API idea >> >>3) Don't make an "official" async API, and just leave it open to server >>authors to create their own extensions, and maybe cherry pick the best >>ideas for WSGI 2.0, or >> >>4) Do something else altogether? > >I think the API you've outlined sounds good. I can imagine ways to >implement it both for an async server like twisted, and as a compatibility >layer for an async-requiring application on a sync server. I think it's >easier to make the compatibility layer with this API than with the >pause/resume API. However, I would be quite wary of including it in the >final spec without it being implemented first. Right, this is one reason I'm thinking that #3 might be a good idea, although it'd probably be more like 1.1 than 2.0. Or really, it would just be an optional extension available under 1.0. Even if we finalize the 1.0 spec, nothing stops us from adding optional extensions that don't alter the existing required semantics. >Another question is: what is the current use for it? Does anyone want to >write untwisted async web applications? Right. That's the really big issue, and another reason why saying, "let's wait for implementations" might be a good idea. That is, if people implement something, there's clearly a market for it. If they don't, maybe we don't need it. >My current interest in WSGI is basically on the "plug twisted web into >another webserver as an application" side of things. I wouldn't want to >write an application to WSGI (without a framework on top)... If everyone >else feels that way, an async API may not be actually useful until there >is some other Async-WSGI web server that you could plug twisted framework >stuff on top of, or some other async framework you can plug on top of the >twisted server. Yep, that's the issue alright. It seems that the common usecase for an async web app is going to boil down to: "do you want to proxy your Twisted app from some other web server?" Because let's face it, Twisted's process model isn't really a match for say, the Apache prefork model, or CGI. ISTM, then, that the useful thing to write would be a synchronous WSGI->HTTP "application" object. That would allow Twisted or any other async server (or really any HTTP server at all) to be treated as a WSGI application, thus letting async apps join the WSGI party without forcing them to give up any asyncness or to have to do other really horrid things to fit. With a little more sophistication, such an application component could perhaps actually spawn the async server if it's not running, by checking a pid file or some such. Or that could be middleware; you have a "server starter" middleware that just ensures the server is running before it passes the request down to the proxy middleware. >As for postponing until WSGI 2.0, I would hope there doesn't need to be a >WSGI 2.0, though, since the interface is so darn simple. ;) But it could >be in a separate WSGI async addons. Technically, I don't think finalizing the base specification would prevent us from amending the PEP to add optional features even to 1.0. From floydophone at gmail.com Fri Oct 15 19:24:19 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Oct 15 19:26:47 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <5.1.1.6.0.20041015113034.02c7a7e0@mail.telecommunity.com> References: <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac404101503573c8cfa7a@mail.gmail.com> <5.1.1.6.0.20041015113034.02c7a7e0@mail.telecommunity.com> Message-ID: <6654eac404101510246ffd970@mail.gmail.com> Essentially, Twisted.Web gives you something like this: class MyResource(resource.Resource): def render(self, request): return "content here" # you could also do request.write("content here") If you do an async call, you have to use request.write() to write the data, return server.NOT_DONE_YET from the render() method, and call request.finish() to finish the request. On Fri, 15 Oct 2004 11:31:56 -0400, Phillip J. Eby wrote: > At 06:57 AM 10/15/04 -0400, Peter Hunt wrote: > >So if I'm implementing a Twisted gateway, where should > >request.finish() go? This has been puzzling me for some time... > > What's request.finish()? I've never done anything with Twisted at a higher > level than the raw reactor interface, and a bit with Deferreds. So I'm not > sure what you're talking about here. > > From carribeiro at gmail.com Fri Oct 15 19:51:54 2004 From: carribeiro at gmail.com (Carlos Ribeiro) Date: Fri Oct 15 19:52:33 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <6654eac404101510246ffd970@mail.gmail.com> References: <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac404101503573c8cfa7a@mail.gmail.com> <5.1.1.6.0.20041015113034.02c7a7e0@mail.telecommunity.com> <6654eac404101510246ffd970@mail.gmail.com> Message-ID: <864d3709041015105131652057@mail.gmail.com> On Fri, 15 Oct 2004 13:24:19 -0400, Peter Hunt wrote: > Essentially, Twisted.Web gives you something like this: > > class MyResource(resource.Resource): > def render(self, request): > return "content here" # you could also do request.write("content here") > > If you do an async call, you have to use request.write() to write the > data, return server.NOT_DONE_YET from the render() method, and call > request.finish() to finish the request. Just curious, so forgive me from jumping into the middle of the discussion. Isn't this one of the scenarios where output generators are most useful? Assuming that Twisted supported it, you could yield lines until there were nothing else to write. Did I get it right? -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com From exarkun at divmod.com Fri Oct 15 20:07:46 2004 From: exarkun at divmod.com (exarkun@divmod.com) Date: Fri Oct 15 20:07:48 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <864d3709041015105131652057@mail.gmail.com> Message-ID: <20041015180746.4730.745244974.divmod.quotient.126@ohm> On Fri, 15 Oct 2004 14:51:54 -0300, Carlos Ribeiro wrote: >On Fri, 15 Oct 2004 13:24:19 -0400, Peter Hunt wrote: > > Essentially, Twisted.Web gives you something like this: > > > > class MyResource(resource.Resource): > > def render(self, request): > > return "content here" # you could also do request.write("content here") > > > > If you do an async call, you have to use request.write() to write the > > data, return server.NOT_DONE_YET from the render() method, and call > > request.finish() to finish the request. > > Just curious, so forgive me from jumping into the middle of the > discussion. Isn't this one of the scenarios where output generators > are most useful? Assuming that Twisted supported it, you could yield > lines until there were nothing else to write. Did I get it right? > Only if you can also signal to the code which is iterating the generator that it should stop iterating it for a while, otherwise user code might be called upon for bytes before they are available. If I have understand the conversation on the matter then this caveat is a main stumbling block for the async wsgi api. Jp From carribeiro at gmail.com Fri Oct 15 20:17:02 2004 From: carribeiro at gmail.com (Carlos Ribeiro) Date: Fri Oct 15 20:17:52 2004 Subject: [Web-SIG] http content-location header, and different browsers In-Reply-To: <416FBAA4.6060502@xs4all.nl> References: <416FBAA4.6060502@xs4all.nl> Message-ID: <864d370904101511173de0ac@mail.gmail.com> On Fri, 15 Oct 2004 13:55:16 +0200, Irmen de Jong wrote: > Hello all, > I was just trying some new code I was writing for Snakelets with > different browsers, and stumbled across something weird. > It has to do with the HTTP Content-Location header. > > What I used to do was adding a Content-Location header in the > reply, when the page was internally redirected in Snakelets. > (I thought this was a good idea, based on what I knew about > the meaning of that header). Everything worked fine. Until I > opened my website with Opera, instead of Firefox or IE....: > a few of my links had totally wrong URLs in Opera! > > After a bit of searching I now know that at least Opera implements > the HTTP specification, which says in > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14 > that "The value of Content-Location also defines the base URI > for the entity." So Opera was -rightfully so- using the value > of the content-location header as the new base URI, and the > other browsers I tried *do not do that*. Firefox has a WONTFIX-bug > on this (bugzilla #109553) because they feel that it would break > a lot of websites that supply faulty content-location headers. > > In the end, I decided to just not generate this header anymore. > And my site started working in Opera too ;-) > > What do you think of this? I have limited experience with this. But if Firefox guys decided it wasnt worth fixing, they're probably correct. God knows how much email (and bug tickets) they get when something they do works differently from IE or other 'mainstream' browsers. BTW... did you try it in Opera using their IE-emulation mode? -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com From foom at fuhm.net Fri Oct 15 20:19:51 2004 From: foom at fuhm.net (James Y Knight) Date: Fri Oct 15 20:19:55 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <6654eac404101503573c8cfa7a@mail.gmail.com> References: <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac404101503573c8cfa7a@mail.gmail.com> Message-ID: On Oct 15, 2004, at 6:57 AM, Peter Hunt wrote: > So if I'm implementing a Twisted gateway, where should > request.finish() go? This has been puzzling me for some time... You'd call finish when the iterator from the iterable returned by the WSGI app is exhausted and raises StopIteration, I think? James From pje at telecommunity.com Fri Oct 15 20:21:40 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 15 20:21:16 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <20041015180746.4730.745244974.divmod.quotient.126@ohm> References: <864d3709041015105131652057@mail.gmail.com> Message-ID: <5.1.1.6.0.20041015141622.027125c0@mail.telecommunity.com> At 06:07 PM 10/15/04 +0000, exarkun@divmod.com wrote: >On Fri, 15 Oct 2004 14:51:54 -0300, Carlos Ribeiro >wrote: > >On Fri, 15 Oct 2004 13:24:19 -0400, Peter Hunt > wrote: > > > Essentially, Twisted.Web gives you something like this: > > > > > > class MyResource(resource.Resource): > > > def render(self, request): > > > return "content here" # you could also do > request.write("content here") > > > > > > If you do an async call, you have to use request.write() to write the > > > data, return server.NOT_DONE_YET from the render() method, and call > > > request.finish() to finish the request. > > > > Just curious, so forgive me from jumping into the middle of the > > discussion. Isn't this one of the scenarios where output generators > > are most useful? Assuming that Twisted supported it, you could yield > > lines until there were nothing else to write. Did I get it right? > > > > Only if you can also signal to the code which is iterating the > generator that it should stop iterating it for a while, otherwise user > code might be called upon for bytes before they are available. > > If I have understand the conversation on the matter then this caveat is > a main stumbling block for the async wsgi api. Yes. Essentially, in order to write a Twisted WSGI gateway (for running WSGI apps under Twisted), you *must* use threads (e.g. deferToThread) for invoking the WSGI application and iterating over its result, because a synchronous WSGI app might block during either operation. However, note that an asynchronous server/gateway is free to delay requesting another iteration, if the application yields an empty string. So, the minimum "asynchronous API" is simply backing off the iteration rate when the application yields empty strings. But, a more sophisticated API would of course only iterate when there was data available to be iterated. From pje at telecommunity.com Fri Oct 15 20:28:25 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 15 20:28:00 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: References: <6654eac404101503573c8cfa7a@mail.gmail.com> <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac404101503573c8cfa7a@mail.gmail.com> Message-ID: <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> At 02:19 PM 10/15/04 -0400, James Y Knight wrote: >On Oct 15, 2004, at 6:57 AM, Peter Hunt wrote: >>So if I'm implementing a Twisted gateway, where should >>request.finish() go? This has been puzzling me for some time... > >You'd call finish when the iterator from the iterable returned by the WSGI >app is exhausted and raises StopIteration, I think? Yes. A Twisted gateway, to avoid blocking, would need to deferToThread() the initial invocation of the WSGI app, and immediately return server.NOT_DONE_YET. A callback on the deferred would then deferToThread an iteration on the return iterable, which would in turn defer to the next iteration, and so on. When you get an errback() of StopIteration instead of a callback, you could finish(). But all invocations of the application or any method of any object provided by the application *has* to be in a non-reactor thread, so as not to block the reactor. For example, there's no guarantee that simply calling 'iter(result)' on the result returned by the application, won't e.g. open a database connection or something. From exarkun at divmod.com Fri Oct 15 20:32:39 2004 From: exarkun at divmod.com (exarkun@divmod.com) Date: Fri Oct 15 20:32:41 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> Message-ID: <20041015183239.4730.1265718043.divmod.quotient.138@ohm> On Fri, 15 Oct 2004 14:28:25 -0400, "Phillip J. Eby" wrote: >At 02:19 PM 10/15/04 -0400, James Y Knight wrote: > >On Oct 15, 2004, at 6:57 AM, Peter Hunt wrote: > >>So if I'm implementing a Twisted gateway, where should > >>request.finish() go? This has been puzzling me for some time... > > > >You'd call finish when the iterator from the iterable returned by the WSGI > >app is exhausted and raises StopIteration, I think? > > Yes. A Twisted gateway, to avoid blocking, would need to deferToThread() > the initial invocation of the WSGI app, and immediately return > server.NOT_DONE_YET. A callback on the deferred would then deferToThread > an iteration on the return iterable, which would in turn defer to the next > iteration, and so on. When you get an errback() of StopIteration instead > of a callback, you could finish(). > > But all invocations of the application or any method of any object > provided by the application *has* to be in a non-reactor thread, so as not > to block the reactor. For example, there's no guarantee that simply > calling 'iter(result)' on the result returned by the application, won't > e.g. open a database connection or something. > Does WSGI enforce any requirements about which thread the function is first invoked in, and which thread(s) it is iterated in? The scenario you described above would lead to an arbitrary thread being used for each iteration. I could see this being a problem for WSGI applications which attempted to use thread local storage, assuming that they would always be run in the same non-IO thread. Jp From pje at telecommunity.com Fri Oct 15 20:59:16 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Oct 15 20:58:52 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <20041015183239.4730.1265718043.divmod.quotient.138@ohm> References: <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041015145625.02b28210@mail.telecommunity.com> At 06:32 PM 10/15/04 +0000, exarkun@divmod.com wrote: >On Fri, 15 Oct 2004 14:28:25 -0400, "Phillip J. Eby" > wrote: > >At 02:19 PM 10/15/04 -0400, James Y Knight wrote: > > >On Oct 15, 2004, at 6:57 AM, Peter Hunt wrote: > > >>So if I'm implementing a Twisted gateway, where should > > >>request.finish() go? This has been puzzling me for some time... > > > > > >You'd call finish when the iterator from the iterable returned by the > WSGI > > >app is exhausted and raises StopIteration, I think? > > > > Yes. A Twisted gateway, to avoid blocking, would need to deferToThread() > > the initial invocation of the WSGI app, and immediately return > > server.NOT_DONE_YET. A callback on the deferred would then deferToThread > > an iteration on the return iterable, which would in turn defer to the next > > iteration, and so on. When you get an errback() of StopIteration instead > > of a callback, you could finish(). > > > > But all invocations of the application or any method of any object > > provided by the application *has* to be in a non-reactor thread, so as not > > to block the reactor. For example, there's no guarantee that simply > > calling 'iter(result)' on the result returned by the application, won't > > e.g. open a database connection or something. > > > > Does WSGI enforce any requirements about which thread the function is > first invoked in, and which thread(s) it is iterated in? Not currently. > The scenario you described above would lead to an arbitrary thread > being used for each iteration. I could see this being a problem for WSGI > applications which attempted to use thread local storage, assuming that > they would always be run in the same non-IO thread. The discussion so far has been that the spec should prohibit applications and servers from depending on what thread a callable is invoked from, the result is iterated over, etc., as long as only one thread at at time does these things. In other words, servers and applications may not use thread-local storage to determine invocation context, but they do not have to do any locking (except for the 'wsgi.multithread' case). From floydophone at gmail.com Fri Oct 15 21:06:37 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Oct 15 21:07:23 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> References: <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac404101503573c8cfa7a@mail.gmail.com> <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> Message-ID: <6654eac4041015120665163b25@mail.gmail.com> Okay. How will the gateway know to go to the next iteration of the application? Constantly iterating over a bunch of empty strings while waiting for output seems like a waste of cycles to me. Perhaps, for async apps, there can be an environ["async.wakeup"]() method which will tell the gateway to iterate until the next empty string? On Fri, 15 Oct 2004 14:28:25 -0400, Phillip J. Eby wrote: > At 02:19 PM 10/15/04 -0400, James Y Knight wrote: > > > >On Oct 15, 2004, at 6:57 AM, Peter Hunt wrote: > >>So if I'm implementing a Twisted gateway, where should > >>request.finish() go? This has been puzzling me for some time... > > > >You'd call finish when the iterator from the iterable returned by the WSGI > >app is exhausted and raises StopIteration, I think? > > Yes. A Twisted gateway, to avoid blocking, would need to deferToThread() > the initial invocation of the WSGI app, and immediately return > server.NOT_DONE_YET. A callback on the deferred would then deferToThread > an iteration on the return iterable, which would in turn defer to the next > iteration, and so on. When you get an errback() of StopIteration instead > of a callback, you could finish(). > > But all invocations of the application or any method of any object > provided by the application *has* to be in a non-reactor thread, so as not > to block the reactor. For example, there's no guarantee that simply > calling 'iter(result)' on the result returned by the application, won't > e.g. open a database connection or something. > > From pje at telecommunity.com Sat Oct 16 00:07:05 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Oct 16 00:06:42 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <6654eac4041015120665163b25@mail.gmail.com> References: <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac404101503573c8cfa7a@mail.gmail.com> <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20041015175958.023acd60@mail.telecommunity.com> At 03:06 PM 10/15/04 -0400, Peter Hunt wrote: >Okay. How will the gateway know to go to the next iteration of the >application? Constantly iterating over a bunch of empty strings while >waiting for output seems like a waste of cycles to me. Perhaps, for >async apps, there can be an environ["async.wakeup"]() method which >will tell the gateway to iterate until the next empty string? That's close to the first outstanding proposal for an async API, which went something like: resume = environ["wsgi.pause_iteration"]() Which would pause subsequent iteration until 'resume()' was called. By the way, if you're trying to implement async applications under WSGI, I'd really like to know more about what you have in mind, what your goals are, etc. One of the problems in formulating a good WSGI API for async applications is that it's hard to envision use cases where somebody wants to write an async web application, and yet doesn't want to run it in a dedicated process. So anything you could add to enlighten me on this point would make it easier for me to finalize an async API. I've been leaving it up to the SIG so far, because I don't have as strong a vision of the use cases for async apps as I do for async servers. From irmen at xs4all.nl Sat Oct 16 01:59:04 2004 From: irmen at xs4all.nl (Irmen de Jong) Date: Sat Oct 16 01:59:06 2004 Subject: [Web-SIG] http content-location header, and different browsers In-Reply-To: <864d370904101511173de0ac@mail.gmail.com> References: <416FBAA4.6060502@xs4all.nl> <864d370904101511173de0ac@mail.gmail.com> Message-ID: <41706448.3020506@xs4all.nl> Carlos Ribeiro wrote: [....about Content-Location header...] > I have limited experience with this. But if Firefox guys decided it > wasnt worth fixing, they're probably correct. God knows how much email > (and bug tickets) they get when something they do works differently > from IE or other 'mainstream' browsers. It was news for me too. I always thought that Mozilla(/firefox) followed the RFCs to the letter. But this was the first one that I encountered that they deliberatly chose *not* to implement. Because if they did, it would break a lot of sites (apparently) and people start to blame Mozilla. I wonder what Opera users do with this. Because Opera will break those sites... > BTW... did you try it in Opera using their IE-emulation mode? Ehm, isn't it just a change in the User-Agent string? That wouldn't make any difference... --Irmen From floydophone at gmail.com Sat Oct 16 02:05:08 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sat Oct 16 02:05:10 2004 Subject: [Web-SIG] WSGI async API In-Reply-To: <5.1.1.6.0.20041015175958.023acd60@mail.telecommunity.com> References: <6654eac4041014175977291ff4@mail.gmail.com> <5.1.1.6.0.20041015014108.02c1ec00@mail.telecommunity.com> <6654eac404101503573c8cfa7a@mail.gmail.com> <5.1.1.6.0.20041015142243.022869f0@mail.telecommunity.com> <6654eac4041015120665163b25@mail.gmail.com> <5.1.1.6.0.20041015175958.023acd60@mail.telecommunity.com> Message-ID: <6654eac404101517055b4d4758@mail.gmail.com> Here's what I was thinking. To install an application on the Twisted library, you'd provide the application callable and a boolean optional parameter which is whether to run async or sync (defaults to sync). If it runs it sync, it launches the WSGI in a new thread and does business as usual. If it runs it async, the app *cannot* block; there needs to be a way around it. How about I write a simple demo implementation of "wakeup" and you guys can try it out? On Fri, 15 Oct 2004 18:07:05 -0400, Phillip J. Eby wrote: > At 03:06 PM 10/15/04 -0400, Peter Hunt wrote: > >Okay. How will the gateway know to go to the next iteration of the > >application? Constantly iterating over a bunch of empty strings while > >waiting for output seems like a waste of cycles to me. Perhaps, for > >async apps, there can be an environ["async.wakeup"]() method which > >will tell the gateway to iterate until the next empty string? > > That's close to the first outstanding proposal for an async API, which went > something like: > > resume = environ["wsgi.pause_iteration"]() > > Which would pause subsequent iteration until 'resume()' was called. > > By the way, if you're trying to implement async applications under WSGI, > I'd really like to know more about what you have in mind, what your goals > are, etc. One of the problems in formulating a good WSGI API for async > applications is that it's hard to envision use cases where somebody wants > to write an async web application, and yet doesn't want to run it in a > dedicated process. So anything you could add to enlighten me on this point > would make it easier for me to finalize an async API. I've been leaving it > up to the SIG so far, because I don't have as strong a vision of the use > cases for async apps as I do for async servers. > > From foom at fuhm.net Sat Oct 16 02:37:00 2004 From: foom at fuhm.net (James Y Knight) Date: Sat Oct 16 02:37:07 2004 Subject: [Web-SIG] http content-location header, and different browsers In-Reply-To: <41706448.3020506@xs4all.nl> References: <416FBAA4.6060502@xs4all.nl> <864d370904101511173de0ac@mail.gmail.com> <41706448.3020506@xs4all.nl> Message-ID: <7E0632B6-1F0B-11D9-AAA6-000A95A50FB2@fuhm.net> On Oct 15, 2004, at 7:59 PM, Irmen de Jong wrote: > Carlos Ribeiro wrote: > [....about Content-Location header...] >> I have limited experience with this. But if Firefox guys decided it >> wasnt worth fixing, they're probably correct. God knows how much email >> (and bug tickets) they get when something they do works differently >> from IE or other 'mainstream' browsers. > > It was news for me too. I always thought that Mozilla(/firefox) > followed the RFCs to the letter. But this was the first one that > I encountered that they deliberatly chose *not* to implement. > Because if they did, it would break a lot of sites (apparently) > and people start to blame Mozilla. > > I wonder what Opera users do with this. Because Opera > will break those sites... Actually, Opera also doesn't follow the RFC, as I recall. It only listens to Content-Location if the host and port matches that of the real URL. This fixes the problems with IIS servers, which are most of the broken sites. James From carribeiro at gmail.com Sat Oct 16 02:51:40 2004 From: carribeiro at gmail.com (Carlos Ribeiro) Date: Sat Oct 16 02:51:42 2004 Subject: [Web-SIG] http content-location header, and different browsers In-Reply-To: <41706448.3020506@xs4all.nl> References: <416FBAA4.6060502@xs4all.nl> <864d370904101511173de0ac@mail.gmail.com> <41706448.3020506@xs4all.nl> Message-ID: <864d37090410151751316a19ab@mail.gmail.com> On Sat, 16 Oct 2004 01:59:04 +0200, Irmen de Jong wrote: > > BTW... did you try it in Opera using their IE-emulation mode? > > Ehm, isn't it just a change in the User-Agent string? > That wouldn't make any difference... I'm really not sure if it's only a User-Agent 'hack' or if it affects other aspects of the browser. I've read a *lot* of CSS-specific tricks over the past few days, and its amazing how many things a modern browser has to do to properly render real-world web sites. In the end I was under the impression that Opera did more than just mimic the IE User-Agent string (which it does just to fool JavaScript code) - it actually has to use other 'hints' to be able to stablish how it is supposed to behave under certain situations. But I'm not a Opera user, and it's possible that I just got it wrong. -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com From floydophone at gmail.com Sat Oct 16 04:16:11 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sat Oct 16 04:16:13 2004 Subject: [Web-SIG] Async API - example of my implementation Message-ID: <6654eac40410151916c0434dd@mail.gmail.com> I'm working on getting subversion running again, but for now, take a look at how I write my Twisted WSGI async apps. def blocking_call(): d = defer.Deferred() reactor.callLater(2, d.callback, None) return d def phase2(result, environ): environ["thetime"] = time.time() environ["twisted.wsgi.resume"]() def blocking_async_app(environ, start_response): write = start_response("200 OK", [("Content-type","text/plain")]) yield "the time right now is " + `time.time()` + "\n" blocking_call().addCallback(phase2, environ) yield "" yield "the time now is " + `environ["thetime"]` Is this acceptible? Basically, when in the special async mode, the gateway iterates over the application iterator until it hits a "". It then lets the app do its thing until environ["twisted.wsgi.resume"]() is called, at which point it repeats this process until StopIteration. What do you think? From pje at telecommunity.com Sat Oct 16 07:47:19 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Oct 16 07:46:59 2004 Subject: [Web-SIG] Async API - example of my implementation In-Reply-To: <6654eac40410151916c0434dd@mail.gmail.com> Message-ID: <5.1.1.6.0.20041016014138.03381ec0@mail.telecommunity.com> At 10:16 PM 10/15/04 -0400, Peter Hunt wrote: >I'm working on getting subversion running again, but for now, take a >look at how I write my Twisted WSGI async apps. > >def blocking_call(): > d = defer.Deferred() > reactor.callLater(2, d.callback, None) > return d > >def phase2(result, environ): > environ["thetime"] = time.time() > environ["twisted.wsgi.resume"]() > >def blocking_async_app(environ, start_response): > write = start_response("200 OK", [("Content-type","text/plain")]) > yield "the time right now is " + `time.time()` + "\n" > blocking_call().addCallback(phase2, environ) > yield "" > yield "the time now is " + `environ["thetime"]` > >Is this acceptible? > >Basically, when in the special async mode, the gateway iterates over >the application iterator until it hits a "". It then lets the app do >its thing until environ["twisted.wsgi.resume"]() is called, at which >point it repeats this process until StopIteration. > >What do you think? I think an explicit pause operation is better, e.g.: def blocking_async_app(environ,start_response): start_response("200 OK", [("Content-type","text/plain")]) yield "doing something" resume = environ['wsgi.pause_iteration']() def phase2(result): environ["thetime"] = time.time() resume() blocking_call().addCallback(phase2) yield "" # Won't get here till 'resume' is called yield "the time now is " + `environ["thetime"]` This is basically the first of the two alternative API proposals that's currently outstanding. One issue that is not addressed either in your example or in the previous proposal is error handling/timeouts. Suppose resume() is never called? How do we define what "never" is? This is just one open issue with the current async API proposals. From pje at telecommunity.com Sun Oct 17 15:47:03 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Oct 17 15:46:54 2004 Subject: [Web-SIG] FYI: Changes to PEP 333 and wsgiref Message-ID: <5.1.1.6.0.20041017093759.02428ec0@mail.telecommunity.com> I noticed today that the "URL Reconstruction" algorithm in the PEP (which I also copied into wsgiref) is incorrect. HTTP_HOST (aka the 'Host:' header) can contain a port, if it's not the default port for the corresponding protocol. So, SERVER_PORT should not be appended to it in that case. I've fixed the PEP and wsgiref. (The PEP update should be visible on python.org within an hour or two.) In addition, I've also updated the PEP to make SERVER_PROTOCOL a required environ variable. It's impossible to comply with the HTTP RFC's if you don't know what HTTP version the client is using. (Despite its name, SERVER_PROTOCOL is actually the *client* protocol: "the name and revision of the information protocol with which the request arrived", according to the CGI spec.) Finally, while making the updates, I also added a notation to the effect that 'wsgi.errors' is intended to be a "text mode" file. This was always the intent, but the fact wasn't documented. From floydophone at gmail.com Mon Oct 18 03:46:01 2004 From: floydophone at gmail.com (Peter Hunt) Date: Mon Oct 18 03:46:03 2004 Subject: [Web-SIG] Exciting new developments :) Message-ID: <6654eac404101718463d56cc3c@mail.gmail.com> - My Twisted WSGI implementation is now fully-functional and tested synchronously. The async API is broken. It's also now built upon Philip's wsgiref library. - I've written a WSGI object publisher, similar to Zope's ZPublisher. It's extremely simple, but rather nice I'd say: def publisher_application(root): """ I'm a ZPublisher-like application, except I run everything as a WSGI application or coerce it to a string. If you don't want something published, start it with an underscore. Possible TODOs: security, list of what attributes are accessible via the web. insert base href optionally should str() objects Content-type be text/plain or text/html? """ def app(environ, start_response): o = root # start at the root for elem in environ.get("PATH_INFO","").split("/"): # iterate through every item in the path coming after this script if len(elem) > 0 and elem[0] != "_": # if the element isn't blank and doesn't begin with an underscore... try: o = getattr(o, elem) # try to get the next part of the path except AttributeError: # if it's not found... start_response("404 Not Found", [("Content-type","text/plain")]) # return a 404 return ["Resource not found."] # and a nice little message if callable(o): # if the final object is callable... return o(environ, start_response) # call it as a WSGI application else: start_response("200 OK", [("Content-type","text/html")]) # otherwise, assume it's just a string. return [str(o)] return app - If you've heard of FlowScript, I've implemented something very similar for WSGI on Stackless. It lets you write applications without worrying about writing FSM's. Once I get a good example, I'll post it. - I fixed up Ian Bicking's session middleware a bit to be more browser, OS, and machine friendly. I also removed all of its external dependencies and integrated it with my cookie middleware - My cookie middleware is now stable - I've started putting together a WSGI unit tests library...would anyone like to contribute? Since I have no hosting as of right now, I can't post any of this cool stuff. However, once it's back up, I'll send a message to the list. From pje at telecommunity.com Wed Oct 20 18:50:08 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Oct 20 18:49:40 2004 Subject: [Web-SIG] Re: PEP 333 / unittest In-Reply-To: Message-ID: <5.1.1.6.0.20041020124548.022a59a0@mail.telecommunity.com> At 11:58 AM 10/20/04 -0400, Greg Wilson wrote: >Hi Phillip. Hope you don't mind mail out of the blue, but I was wondering >if anyone had already done work to integrate WSGI and the unit test >framework, i.e. built a mock-WSGI that could be dropped directly into >unittest? Check the SIG list archives; there are people who have talked about various tests they've done. I don't know if any of their work qualifies as what you're talking about. 'wsgiref' also has some simple unit tests that run simple WSGI applications under a "server" to test the server handlers, but I didn't really make any effort for them to be generic beyond the scope of server implementations based on wsgiref. And the wsgiref handlers have lots of 'assert' statements in them designed to cause a crash if you run a broken application under a wsigref-based server. That's about all I've done in the area of testing. I seem to recall Ian Bicking created a few WSGI test programs, including a 'lint' middleware to run between a server and an application, testing both for compliance, and an 'echo' application to be used by an external test script verifying a server's compliance. At this point, I would say that all of these various tests are preliminary, and there has been little or no interoperability testing to verify that the tests themselves are correct. From ianb at colorstudy.com Wed Oct 20 19:13:40 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 20 19:14:07 2004 Subject: [Web-SIG] Re: PEP 333 / unittest In-Reply-To: <5.1.1.6.0.20041020124548.022a59a0@mail.telecommunity.com> References: <5.1.1.6.0.20041020124548.022a59a0@mail.telecommunity.com> Message-ID: <41769CC4.5080102@colorstudy.com> Phillip J. Eby wrote: > At 11:58 AM 10/20/04 -0400, Greg Wilson wrote: > >> Hi Phillip. Hope you don't mind mail out of the blue, but I was >> wondering >> if anyone had already done work to integrate WSGI and the unit test >> framework, i.e. built a mock-WSGI that could be dropped directly into >> unittest? Kind of, depending on which part of WSGI is "mock". The echo application is intended to be, essentially, a mock application. There are unittests that work against that application, implicitly testing the WSGI server. lint checks for more compliance issues, mostly trying to determine that errors don't quietly pass through (e.g., not supplying a content type, a problem which many servers and browsers will cover up). wsgilib.interactive is something I've created for inspecting applications. I think wsgiref has somthing similar -- just creating a mock request, and providing a response. It would be nice to make a response object that was appropriate for testing -- that might mean easy methods to test for a string in the response, check for general success (e.g., 200 status code, no applicatinon-generated errors, etc), maybe check what shows up in the error log. From there, you could make a unittest.TestCase subclass that automated this a bit further, so you could quickly write acceptance/functional tests. But, a lot of the acceptance test work could be done through HTTP directly, and wouldn't be much more difficult to implement. The advantage to using WSGI instead of HTTP would be in saving some work doing configuration. That's very possibly worth it, since configuring a test environment is annoying (since you'll never actively use it). But since HTTP and WSGI are so close, it might be nice to allow either to be tested using the same framework. My code is in svn://colorstudy.com/trunk/WSGI ; Peter Hunt has also done some stuff similar to the echo tests, in svn://colorstudy.com/trunk/WSGI/phunt/test_applications.py -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From floydophone at gmail.com Sun Oct 24 20:39:56 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sun Oct 24 20:39:58 2004 Subject: [Web-SIG] I put up my WSGI code again Message-ID: <6654eac404102411392020722f@mail.gmail.com> http://st0rm.hopto.org:8080/wsgi/ Apache died on me...so I put up a Zope3 server for the time being. From titus at caltech.edu Sun Oct 24 20:49:40 2004 From: titus at caltech.edu (Titus Brown) Date: Sun Oct 24 20:49:43 2004 Subject: [Web-SIG] I put up my WSGI code again In-Reply-To: <6654eac404102411392020722f@mail.gmail.com> References: <6654eac404102411392020722f@mail.gmail.com> Message-ID: <20041024184940.GB21864@caltech.edu> -> http://st0rm.hopto.org:8080/wsgi/ -> -> Apache died on me...so I put up a Zope3 server for the time being. Hi all, I've noticed that a few people seem to lack stable Web hosting setups. I have a co-located server that is nowhere near to capacity; I'd be happy to set up individual WebDAV access for people posting Python+WWW code. I can also give you virtual domains etc., either under idyll.org or whatever domain(s) you own. Just drop me a private line & I'll set you up... cheers, --titus From neel at mediapulse.com Thu Oct 28 22:41:31 2004 From: neel at mediapulse.com (Michael C. Neel) Date: Thu Oct 28 22:35:56 2004 Subject: [Web-SIG] [ANNOUNCE] SnakeSkin 1.0 Message-ID: <1098996090.3838.142.camel@mike.mediapulse.com> We are proud to announce the release of version 1.0 of SnakeSkin, a python application toolkit released under an Open Source BSD-Style license, available at http://snakeskin.pseudocode.net/ Along with this release, we have updated the CGI Demo to be easier to install and follow. Both of these releases can be found at (along with more information including change logs): http://sourceforge.net/project/showfiles.php?group_id=118346 Support for SnakeSkin is handled though SourceForge.Net. The project information page is at http://www.sourceforge.net/projects/snakeskin-tools There you will find the bug tracking system, a feature request system, and the main method of support, the SnakeSkin mailing list. About SnakeSkin In SnakeSkin, developers can customize the framework to the application, unlike in traditional frameworks, such as PHP. For example, adding custom tags to the templating system is quick and easy. The goal of the project is to have a framework that scales down as well as up--a "Zope-lite" framework. SnakeSkin can scale down to be useful in a simple form-to-email or just to apply a clean-cut design skin. The toolkit can just as easily scale up to handle complex content managment systems, B2B extranets, and full-fledged e-commerce engines. We do it all the time. SnakeSkin, based upon the existing Albatross project maintained by Object Craft, runs under several webservers, including CGI based, Apache, FastCGI, and its own included webserver (used mainly for development). SnakeSkin has several built in capabilities: * Dynamic Macro Features (think server-side includes on steroids) * SQL support in both the application and the template * Support for Apache 2.0 Filters ... and includes Albatross features ... * Clean separation of logic and design * A simple-yet-robust templating system that is Web Designer-friendly (Plays nice with Dreamweaver) * Secure Session Management in hidden fields, server-side data-stores, or through a session server The SnakeSkin team. http://snakeskin.pseudocode.net/