From cs1spw at bath.ac.uk Fri Oct 17 11:56:21 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 17 11:54:24 2003 Subject: [Web-SIG] Useful ideas from PHP Message-ID: <3F901125.2010300@bath.ac.uk> I've been working with PHP for several years, but have recently started to make the switch to Python for web development. There follow some thoughts on PHP's web development capabilities compared to Python's. PHP has a number of tricks that are worth borrowing for the Python standard library - although in my opinion the ability to embed code in HTML is not one of them. Things PHP does better than Python ================================== * $_GET, $_POST, $_COOKIE, $_FILES, $_REQUEST, $_SERVER, $_ENV http://www.php.net/manual/en/language.variables.predefined.php These global dictionaries provide immediate access to information sent from the client The first three provide information from teh query string, posted forms and cookies respectively. $_FILES handles uploaded files, $_REQUEST allows access to data regardless of where it came from (like Python's FieldStorage() module does at the moment) and $_SERVER and $_ENV are server and environment variables. This is an improvement on Python because these arrays are consistent. Everything is available in a straight forward dictionary (no fields['name'].value oddness), there's no need to explicitly parse cookies from their environment variable and it's possible to tell the difference between POST and GET data while retaining the convenience of just being able to get the data without caring about the method used to send it. * header(), setcookie() http://www.php.net/manual/en/features.cookies.php These functions allow a user to manipulate the headers being sent back to the user and provide an easy method for setting cookies. In Python CGIs you have to manually ensure you send the headers before any HTML by being careful with your print statements. Some kind of abstraction for headers is a good idea. * Native session support with session_register and $_SESSION http://www.php.net/manual/en/ref.session.php This is a pretty useful feature in PHP, which could be easily replicated in Python. It would probably be better as a separate session module rather than adding it straight in to the CGI module. Things Python does better than PHP ================================== Pretty much everything else. Python's syntax and semantics are cleaner, the language is more powerful and expressive and the standard library for the most part is outstanding. Python's database access is cleaner as well. If Python only had a cleaner CGI API and a more widely available Apache module it could make serious inroads in to PHP's market share. Things PHP has that Python doesn't need ======================================= A big fuss is always made of PHP's ability to embed code straight in to HTML, but in practise most experienced PHP developers tend to avoid this feature and use some kind of templating system instead, preferring to separate their application logic and presentation logic. Python is already well served by a number of excellent template libraries such as Cheetah. Cheers, Simon Willison http://simon.incutio.com/ From ianb at colorstudy.com Fri Oct 17 14:48:13 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 17 14:48:18 2003 Subject: [Web-SIG] Useful ideas from PHP In-Reply-To: <3F901125.2010300@bath.ac.uk> Message-ID: <76816A14-00D2-11D8-AD6F-000393C2D67E@colorstudy.com> On Friday, October 17, 2003, at 10:56 AM, Simon Willison wrote: > I've been working with PHP for several years, but have recently > started to make the switch to Python for web development. There follow > some thoughts on PHP's web development capabilities compared to > Python's. PHP has a number of tricks that are worth borrowing for the > Python standard library - although in my opinion the ability to embed > code in HTML is not one of them. > > Things PHP does better than Python > ================================== > > * $_GET, $_POST, $_COOKIE, $_FILES, $_REQUEST, $_SERVER, $_ENV > http://www.php.net/manual/en/language.variables.predefined.php This is really PHP vs. the Python cgi module. Other Python web frameworks do most of these things (some don't differentiate between GET and POST variables, most use a different way of indicating files, and there's some other features that are sometimes included in frameworks and sometimes not, like access to the raw POST data or streaming output). So really this is a matter of getting Python's stdlib to include some of the functionality that has been widely implemented elsewhere. Or at least, that's one possible goal. > * header(), setcookie() > http://www.php.net/manual/en/features.cookies.php AFAIK, all frameworks (besides cgi) handle this. > These functions allow a user to manipulate the headers being sent back > to the user and provide an easy method for setting cookies. In Python > CGIs you have to manually ensure you send the headers before any HTML > by being careful with your print statements. Some kind of abstraction > for headers is a good idea. > > * Native session support with session_register and $_SESSION > http://www.php.net/manual/en/ref.session.php And most handle this as well. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From cs1spw at bath.ac.uk Fri Oct 17 14:59:51 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 17 14:58:00 2003 Subject: [Web-SIG] Useful ideas from PHP In-Reply-To: <76816A14-00D2-11D8-AD6F-000393C2D67E@colorstudy.com> References: <76816A14-00D2-11D8-AD6F-000393C2D67E@colorstudy.com> Message-ID: <3F903C27.1090407@bath.ac.uk> Ian Bicking wrote: > This is really PHP vs. the Python cgi module. Other Python web > frameworks do most of these things (some don't differentiate between GET > and POST variables, most use a different way of indicating files, and > there's some other features that are sometimes included in frameworks > and sometimes not, like access to the raw POST data or streaming output). > > So really this is a matter of getting Python's stdlib to include some of > the functionality that has been widely implemented elsewhere. Or at > least, that's one possible goal. This links in to the mail I sent to Meta-SIG a few days ago. I would like to see the CGI module (or its replacement in the standard library) define a solid interface for common web tasks and then lead by example, encouraging other web frameworks to implement that same interface (or provide a wrapper to it). This would make it far easier to move from one framework to another, which in turn would make the process of chosing a framework far less intimidating (if the chosen framework doesn't work out moving to another becomes an easier option). I know of at least one precedent for this already: mod_python provides an interface to form variables that is modelled on cgi.FieldStorage(). Unfortunately, in my opinion FieldStorage isn't quite as capable as it needs to be (see my email comparing it to PHP). Since PHP is almost certainly Python's biggest competitor in the web development arena, it makes sense to look hard at the things Python does well that Python's standard library (and associated software) doesn't. Best regards, Simon Willison From cs1spw at bath.ac.uk Fri Oct 17 15:02:21 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 17 15:00:57 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks Message-ID: <3F903CBD.6030508@bath.ac.uk> The following is part of an email I sent to Meta-SIG discussing possible targets for the Web-SIG. An acknowledged problem with Python for web programming at the moment is the sheer abundance of web development frameworks currently available - newcomers to Python web programming literally have their work cut out just evaluating the options available to them. mod_python (the framework with which I have had the most experience) provides an emulattion of the cgi module's FormFields interface as part of the mod_python package. Other frameworks may do this as well. I think this provides an interesting example of how the multiple framework problem could be partially resolved. If the Python standard library included a well defined interface for common web programming tasks (such as accessing data from forms and cookies, sending cookies and HTTP headers) existing web frameworks could be encouraged to either support this interface natively or provide some kind of wrapper from that interface to the internals of their framework. This would make selecting a web framework a far less dauntinfg process, as code written for one framework would be much easier to port to another. An interesting example of this kind of process (albeit on a much larger scale) is Java's Servlet API specification. This defines the interfaces a Java servlet container must implement, but leaves the implementation details up to the team implementing the spec. This means commercial and open source vendors can create competing servlet engines, and developers have great flexibility in selecting a servlet container and switching to a different one should they run in to problems. I'd like to see the Web SIG define a strong standard interface for common web tasks, which could then be supported by Python web framework authors. Best regards, Simon Willison http://simon.incutio.com/ From anthony at interlink.com.au Sun Oct 19 03:03:50 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Oct 19 03:06:27 2003 Subject: [Web-SIG] HTTP digest support Message-ID: <200310190703.h9J73p1A008870@localhost.localdomain> I'm currently working on fixing HTTP DIGEST auth support in the stdlib. The current support in urllib2 is utterly broken. There's a patch on SF which fixes it for the simple case (www.python.org/sf/823328). I'm also working on the server side of it - see the python CVS, nondist/sandbox/digestauth. Right now I have a simple server framework that handles straight MD5 digest auth - I have a chunk of MD5-sess done, and should get the rest finished in the next week or so. Stuff still to be added: server side checking of client nonce storing away nonces and nonce-counts to prevent replay attacks client side checking of Authentication-info headers integrating the DIGEST and BASIC auth into a single chunk of code other stuff I've forgotten right now I'd _like_ for the basic HTTP handling stuff in the stdlib to have full digest auth support "out of the box" for Python 2.4. Anthony From gstein at lyra.org Wed Oct 22 19:52:17 2003 From: gstein at lyra.org (Greg Stein) Date: Wed Oct 22 22:38:02 2003 Subject: [Web-SIG] client-side support: PEP 268 Message-ID: <20031022165217.I11797@lyra.org> I just wanted to send a reminder that I had started a PEP a while back to pull together a bunch of disparate HTTP client-side activities under a coherent model. Part of the problem was how to build an HTTP Connection object which optional had SSL support, or additional DAV facilities, and/or proxy support. A bit orthogonal to that was how to provide an extension system to enable arbitrary sets of authentication systems. The default set would be Basic and Digest, but something like client certificates would also be "in scope" given the SSL capabilities of the module. http://www.python.org/peps/pep-0268.html I'd suggest taking a look at that PEP and using it as the end-point for the discussion of client-side changes. IOW, I'm not holding any particular "mine mine mine" on it, so it seems like a valid way to produce a final proposal. I also happen to think that it was heading in the Right Direction(tm) :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From janssen at parc.com Fri Oct 17 16:51:50 2003 From: janssen at parc.com (Bill Janssen) Date: Wed Oct 22 22:38:16 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Your message of "Fri, 17 Oct 2003 12:02:21 PDT." <3F903CBD.6030508@bath.ac.uk> Message-ID: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> Simon, It seems to me that there are three basic modules which should be in stdlib for server-side Python programming: 1) A good CGI module. This should allow clear access to the various values passed in the environment, as Simon points out. I think the current "cgi" module isn't bad at this, but I'm sure we can find shortcomings. 2) A standard Apache plug-in. Does mod_python fill this role? (Should this really be part of the stdlib?) It would be useful if the APIs used here were similar to those used in the API support. 3) A standard stand-alone solution, but better than the three standard servers already in the stdlib. I been using Medusa lately, and rather like its approach to things. There are other pan-server things that need to be done, as well, such as server-side SSL support in the socket module. Bill From janssen at parc.com Wed Oct 22 22:47:43 2003 From: janssen at parc.com (Bill Janssen) Date: Wed Oct 22 22:48:13 2003 Subject: [Web-SIG] client-side support: PEP 268 In-Reply-To: Your message of "Wed, 22 Oct 2003 16:52:17 PDT." <20031022165217.I11797@lyra.org> Message-ID: <03Oct22.194750pdt."58611"@synergy1.parc.xerox.com> Great! Thanks, Greg. Bill From ianb at colorstudy.com Wed Oct 22 22:58:03 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 22 22:58:07 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> Message-ID: On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote: > 1) A good CGI module. This should allow clear access to the various > values passed in the environment, as Simon points out. I think the > current "cgi" module isn't bad at this, but I'm sure we can find > shortcomings. There's a bunch of shortcomings -- some of which aren't that big a deal in the CGI environment (like adding headers) but make cgi-based programs difficult to port to other systems. > 2) A standard Apache plug-in. Does mod_python fill this role? (Should > this really be part of the stdlib?) It would be useful if the APIs > used here were similar to those used in the API support. mod_python pretty much fits this. I don't see any reason to develop anything else (at least in terms of Apache integration). I don't think it would make sense as part of the stdlib -- it depends on Apache just as much as Python, and people install Apache in all sorts of different ways. > 3) A standard stand-alone solution, but better than the three standard > servers already in the stdlib. I been using Medusa lately, and rather > like its approach to things. Twisted makes as much sense as anything. My impression is that Medusa is similar, but Twisted is more actively developed. OTOH, Twisted is moving out into other things -- some well defined portion of Twisted could be included, but certainly not everything that is distributed with Twisted currently. There are also some Twistedisms, like Deferred, which are generic but not currently used by much of anyone outside Twisted. Medusa is nice because it has a limited scope. But that's good and bad. Twisted would work great if the Twisted people wanted to make a small defined core, and it wouldn't work well otherwise. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Wed Oct 22 23:12:42 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 22 23:12:46 2003 Subject: [Web-SIG] Request/Response features Message-ID: I've very interested into getting some sort of sane request/response object into the Python standard library, to form the basis of an informal standard on how those objects should look (even if wrappers or adaptation are required for most frameworks). Technically I suppose cgi.FieldStorage is a request object, but it's not a very good one, and it's very incomplete outside of CGI (e.g., output goes to sys.stdout, headers come from os.environ), and unusable in a threaded environment. A useful starting point might be to summarize the features that various request/response implementations already have. Thoughts? Wild enthusiasm from anyone to take on the project? If no one else is interested I could probably take this on. (PS: should this SIG be announced to python-announce?) -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From cs1spw at bath.ac.uk Wed Oct 22 23:42:51 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Wed Oct 22 23:42:55 2003 Subject: [Web-SIG] Request/Response features In-Reply-To: References: Message-ID: <3F974E3B.9010109@bath.ac.uk> Ian Bicking wrote: > I've very interested into getting some sort of sane request/response > object into the Python standard library, to form the basis of an > informal standard on how those objects should look (even if wrappers or > adaptation are required for most frameworks). Technically I suppose > cgi.FieldStorage is a request object, but it's not a very good one, and > it's very incomplete outside of CGI (e.g., output goes to sys.stdout, > headers come from os.environ), and unusable in a threaded environment. I think that's an absolutely fantastic idea. Request / Response objects are the one thing that almost all Python web frameworks deal with in some way, and a standardised interface for them in the standard library could do a lot for improving cross-framework compatibility. > A useful starting point might be to summarize the features that various > request/response implementations already have. Thoughts? Wild > enthusiasm from anyone to take on the project? If no one else is > interested I could probably take this on. I have plenty of enthusiasm, but it's coupled with youthful ignorance (I've been using Python for web development for just over a month). That said, I'm happy to contribute serious time and effort to this. In a previous post I outlined the things that I liked about PHP's web interface features, which while not exactly modelled on a request/response object do cover the same ground. I think the most valuable thing PHP's treatment of this brings to the table is the concept of GET, POST and COOKIE dictionaries for looking up data sent by the client (also the REQUEST dictionary which combines the three). One other thing I've been thinking about recently is that HTTP requests and HTTP responses both consist of a set of headers and a body, in a very similar way to MIME email messages which are already well catered for by the standard library. I think any standard for request/response objects should aim to closely match the way MIME style messages are handled by other parts of the standard library (in particular the email module). Cheers, Simon Willison http://simon.incutio.com/ From ianb at colorstudy.com Thu Oct 23 00:03:40 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 23 00:03:44 2003 Subject: [Web-SIG] Request/Response features In-Reply-To: Message-ID: On Wednesday, October 22, 2003, at 10:12 PM, Ian Bicking wrote: > I've very interested into getting some sort of sane request/response > object into the Python standard library, to form the basis of an > informal standard on how those objects should look (even if wrappers > or adaptation are required for most frameworks). Technically I > suppose cgi.FieldStorage is a request object, but it's not a very good > one, and it's very incomplete outside of CGI (e.g., output goes to > sys.stdout, headers come from os.environ), and unusable in a threaded > environment. I should actually append to this, on the pyweb list (archives at http://www.amk.ca/pipermail/pyweb/ ) I proposed a request/response interface, as a discussion starter if nothing else. Well, constructive discussion didn't actually ensue, but the spec still exists. In retrospect I shouldn't have tried to include anything outside of the request and response, as it's distracting and more controversial. The interface I wrote is at: http://colorstudy.com/~ianb/IHTTP_01.py If rewriting it, I'd probably just put the response in the request instead of having a transaction object -- the interface would consist purely of HTTPRequest and HTTPResponse. I might remove some other methods as well, to make adapting/wrapping from other frameworks easier. Anyway, it could serve as a starting point for other interface specifications. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From davidf at sjsoft.com Thu Oct 23 02:19:56 2003 From: davidf at sjsoft.com (David Fraser) Date: Thu Oct 23 02:20:36 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: References: Message-ID: <3F97730C.70107@sjsoft.com> Ian Bicking wrote: > On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote: > >> 1) A good CGI module. This should allow clear access to the various >> values passed in the environment, as Simon points out. I think the >> current "cgi" module isn't bad at this, but I'm sure we can find >> shortcomings. > > There's a bunch of shortcomings -- some of which aren't that big a > deal in the CGI environment (like adding headers) but make cgi-based > programs difficult to port to other systems. > >> 2) A standard Apache plug-in. Does mod_python fill this role? (Should >> this really be part of the stdlib?) It would be useful if the APIs >> used here were similar to those used in the API support. > > mod_python pretty much fits this. I don't see any reason to develop > anything else (at least in terms of Apache integration). I don't > think it would make sense as part of the stdlib -- it depends on > Apache just as much as Python, and people install Apache in all sorts > of different ways. Yes, in Apache, mod_python is pretty much it. As far as the API goes, I think mod_python is an important one to look at at the design stage rather than trying to fit an API to it later, since Apache is fairly standard and mod_python is used by lots of different people. You don't want mod_python to have to be rewritten to comply with the API later. >> 3) A standard stand-alone solution, but better than the three standard >> servers already in the stdlib. I been using Medusa lately, and rather >> like its approach to things. > > > Twisted makes as much sense as anything. My impression is that Medusa > is similar, but Twisted is more actively developed. OTOH, Twisted is > moving out into other things -- some well defined portion of Twisted > could be included, but certainly not everything that is distributed > with Twisted currently. There are also some Twistedisms, like > Deferred, which are generic but not currently used by much of anyone > outside Twisted. > > Medusa is nice because it has a limited scope. But that's good and > bad. Twisted would work great if the Twisted people wanted to make a > small defined core, and it wouldn't work well otherwise. I haven't used Medusa, but I have used Twisted and the standard Python libraries. Some notes: 1) Twisted is definitely too complex to include. The question is, would it be possible to rip out a simple web server from Twisted or would it require a whole lot of extras that don't fit in the standard libraries? This may amount to a re-write. 2) Actually, having something really simple with limited functionality is great, particularly if it uses a standard API that more complex servers support. This would allow people to develop/test/install with just the basic Python libraries. I actually think the standard servers would be fine if they were cleaned up and extended a bit. 3) It's important to define what basic functionality will be required by the API, and what extra functionality will be defined by it. I would suggest the following: - url handling - get/post argument support, in standard dictionaries - cookie support, in standard dictionaries - flexible request/response support David From aquarius-lists at kryogenix.org Thu Oct 23 04:45:52 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Thu Oct 23 04:45:04 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks References: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> Message-ID: Bill Janssen spoo'd forth: > Simon, > > It seems to me that there are three basic modules which should be in > stdlib for server-side Python programming: > > 1) A good CGI module. This should allow clear access to the various > values passed in the environment, as Simon points out. I think the > current "cgi" module isn't bad at this, but I'm sure we can find > shortcomings. Not too many, though, I wouldn't say. I think that the cgi module shouldn't be used much by people; it's a building block, some infrastructure. Like, say, SocketServer -- you can use it if you want low level access, but most people use something constructed upon it. > 2) A standard Apache plug-in. Does mod_python fill this role? (Should > this really be part of the stdlib?) It would be useful if the APIs > used here were similar to those used in the API support. Like you say, mod_python is pretty much the only option, but I wouldn't have tought htat it should be co-opted into the stdlib; how would it be set up? I can imagine modules that *use* mod_python if you have it (or does the stdlib have to be closed?) but not mod_python itself. > 3) A standard stand-alone solution, but better than the three standard > servers already in the stdlib. I been using Medusa lately, and rather > like its approach to things. This is a bit of a holy war sort of question, though, isn't it? Some people will like Medusa, some will like Twisted... sil -- Writing software is, in fact, like dancing to frozen music. -- mewse From davidf at sjsoft.com Thu Oct 23 04:53:17 2003 From: davidf at sjsoft.com (David Fraser) Date: Thu Oct 23 04:53:27 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: References: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F9796FD.6050003@sjsoft.com> Stuart Langridge wrote: >Bill Janssen spoo'd forth: > > >>3) A standard stand-alone solution, but better than the three standard >>servers already in the stdlib. I been using Medusa lately, and rather >>like its approach to things. >> >> >This is a bit of a holy war sort of question, though, isn't it? Some >people will like Medusa, some will like Twisted... > > Maybe for this reason we should stick to the existing HTTP server in stdlib, but fix it up and improve it and change it to match the new API David From davidf at sjsoft.com Thu Oct 23 06:05:52 2003 From: davidf at sjsoft.com (David Fraser) Date: Thu Oct 23 06:06:02 2003 Subject: [Web-SIG] Request/Response features In-Reply-To: References: Message-ID: <3F97A800.70106@sjsoft.com> Ian Bicking wrote: > On Wednesday, October 22, 2003, at 10:12 PM, Ian Bicking wrote: > >> I've very interested into getting some sort of sane request/response >> object into the Python standard library, to form the basis of an >> informal standard on how those objects should look (even if wrappers >> or adaptation are required for most frameworks). Technically I >> suppose cgi.FieldStorage is a request object, but it's not a very >> good one, and it's very incomplete outside of CGI (e.g., output goes >> to sys.stdout, headers come from os.environ), and unusable in a >> threaded environment. > > > I should actually append to this, on the pyweb list (archives at > http://www.amk.ca/pipermail/pyweb/ ) I proposed a request/response > interface, as a discussion starter if nothing else. Well, > constructive discussion didn't actually ensue, but the spec still > exists. In retrospect I shouldn't have tried to include anything > outside of the request and response, as it's distracting and more > controversial. The interface I wrote is at: > > http://colorstudy.com/~ianb/IHTTP_01.py > > If rewriting it, I'd probably just put the response in the request > instead of having a transaction object -- the interface would consist > purely of HTTPRequest and HTTPResponse. I might remove some other > methods as well, to make adapting/wrapping from other frameworks > easier. Anyway, it could serve as a starting point for other > interface specifications. Hi Had a look at this, it's nice for a start. However I agree with you that the transaction interface is confusing... for example, what does "setTransaction" mean/do? Some other comments: pathInfo/requestURI would be good to have some consistency between these names getFields I don't think the ordering is generally important to people, so why not ignore it, because if people want to preserve it, they can always write some code to do that, but it's hardly needed as default functionality. getFieldDict It would be great if the user could set the behaviour they want for multiple keys. I know I *always* want to discard any extra values. Including an option to do this rather than return a list would prevent lots of people doing post-processing General comment here: there are quite a few different methods to handle getting/setting get/post fields. Perhaps this would be made simpler by using a standard dictionary interface. That would also clear up confusion about what parameters to pass to setFieldDict etc. Another question is whether people really need get and post arguments to be processed differently. Also, is it neccessary for all attributes to be accessed by methods? Particularly (no pun intented) things like "method", "time" would seem to make more sense as attributes. If anyone really needs to run some code to access them, The input method seems strange. Perhaps this should be called read? In general, there needs to be a clear separation between low-level accessing of the request stream, and higher-level accessing of processed get/post fields. Perhaps a way to do this would be to analyse how the most popular existing servers do things, then define a set of low-level methods which would cover their functionality. If this was done well, the higher-level methods could be written so that they always fall back to use the underlying low-level methods if they aren't overridden, so at least people only have to implement basic functionality to match the API. Anyway those are just a few thoughts. David From davidf at sjsoft.com Thu Oct 23 06:08:36 2003 From: davidf at sjsoft.com (David Fraser) Date: Thu Oct 23 06:08:42 2003 Subject: [Web-SIG] Request/Response features In-Reply-To: <3F974E3B.9010109@bath.ac.uk> References: <3F974E3B.9010109@bath.ac.uk> Message-ID: <3F97A8A4.9020703@sjsoft.com> Simon Willison wrote: > Ian Bicking wrote: > >> I've very interested into getting some sort of sane request/response >> object into the Python standard library, to form the basis of an >> informal standard on how those objects should look (even if wrappers >> or adaptation are required for most frameworks). Technically I >> suppose cgi.FieldStorage is a request object, but it's not a very >> good one, and it's very incomplete outside of CGI (e.g., output goes >> to sys.stdout, headers come from os.environ), and unusable in a >> threaded environment. > > > I think that's an absolutely fantastic idea. Request / Response > objects are the one thing that almost all Python web frameworks deal > with in some way, and a standardised interface for them in the > standard library could do a lot for improving cross-framework > compatibility. Absolutely. I am busy constructing a toolkit (jtoolkit.sourceforge.net) for web applications, and have been looking at making it compatible with more than one web framework (so far mod_python and have started work on a standalone HTTP server). Having a standard interface to requests/responses would make life a lot easier. In particular, having a standard HTTP server included with Python that supports these interfaces so applications can be tested without any other software. >> A useful starting point might be to summarize the features that >> various request/response implementations already have. Thoughts? >> Wild enthusiasm from anyone to take on the project? If no one else >> is interested I could probably take this on. > > I have plenty of enthusiasm, but it's coupled with youthful ignorance > (I've been using Python for web development for just over a month). > That said, I'm happy to contribute serious time and effort to this. > > In a previous post I outlined the things that I liked about PHP's web > interface features, which while not exactly modelled on a > request/response object do cover the same ground. I think the most > valuable thing PHP's treatment of this brings to the table is the > concept of GET, POST and COOKIE dictionaries for looking up data sent > by the client (also the REQUEST dictionary which combines the three). I think it's important that while we can take things from PHP, they need to be rethought to apply to a Python context... but the dictionaries sound great. > One other thing I've been thinking about recently is that HTTP > requests and HTTP responses both consist of a set of headers and a > body, in a very similar way to MIME email messages which are already > well catered for by the standard library. I think any standard for > request/response objects should aim to closely match the way MIME > style messages are handled by other parts of the standard library (in > particular the email module). Good point. The main difference is that in parsing email messages, you often have the whole message available, whereas in HTTP you need to be able to handle parts at a time (for example, when uploading files) David From amk at amk.ca Thu Oct 23 07:13:31 2003 From: amk at amk.ca (amk@amk.ca) Date: Thu Oct 23 07:13:36 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> References: <3F903CBD.6030508@bath.ac.uk> <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> Message-ID: <20031023111331.GA7516@rogue.amk.ca> On Fri, Oct 17, 2003 at 01:51:50PM -0700, Bill Janssen wrote: > 1) A good CGI module. This should allow clear access to the various > values passed in the environment, as Simon points out. I think the > current "cgi" module isn't bad at this, but I'm sure we can find > shortcomings. * Too much cruft. We could either deprecate stuff in cgi.py with a vengeance, or think up some new package organization. > 2) A standard Apache plug-in. Does mod_python fill this role? (Should > this really be part of the stdlib?) Too much work for the stdlib. Apache support suffers from the split between Apache versions 1.3 and 2.0; the API changed a *lot* between the two versions, but both versions are still pretty common. Leave it to mod_python. > 3) A standard stand-alone solution, but better than the three standard > servers already in the stdlib. I been using Medusa lately, and rather > like its approach to things. The problem is that the code in the Medusa package is written really unconventionally -- classes have lowercase names, it's still 1.5 (and often 1.4!) compatible -- and there's a lot of cruft here, too; it's often not clear which modules are intended for actual use and which ones are half-baked experiments. This could be cleaned up if it's deemed worth the effort; I initially didn't want to embark on a big class renaming because I thought Twisted would quickly and completely replace Medusa, but that seems unlikely to happen. --amk From aquarius-lists at kryogenix.org Thu Oct 23 07:51:18 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Thu Oct 23 07:51:08 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks References: <3F903CBD.6030508@bath.ac.uk> <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> <20031023111331.GA7516@rogue.amk.ca> Message-ID: amk@amk.ca spoo'd forth: > On Fri, Oct 17, 2003 at 01:51:50PM -0700, Bill Janssen wrote: >> 1) A good CGI module. This should allow clear access to the various >> values passed in the environment, as Simon points out. I think the >> current "cgi" module isn't bad at this, but I'm sure we can find >> shortcomings. > > * Too much cruft. We could either deprecate stuff in cgi.py with a > vengeance, or think up some new package organization. It would be useful to know if anyone is still *using* any of the old backwards-compatible stuff. The cgi.py API hasn't changed very much since 1.5, has it? (I might be hideously wrong here.) If not, then deprecating all the previous ways it used to work would probably be a good idea; there can't be that many people still using code that old? (I admit that this sort of assertion has a habit of coming back and biting you, mind.) sil -- "Last week, I arrived in Sunnydale. Or perhaps it was the week before, I don't know." -- Buffy, as written by Albert Camus Certic, <1004299876.18870.0.nnrp-12.9e98b74c@news.demon.co.uk> From sholden at holdenweb.com Thu Oct 23 08:38:39 2003 From: sholden at holdenweb.com (Steve Holden) Date: Thu Oct 23 08:43:16 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <3F97730C.70107@sjsoft.com> Message-ID: [David Fraser] > > Ian Bicking wrote: > > > On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote: > > [...] > > > >> 2) A standard Apache plug-in. Does mod_python fill this > role? (Should > >> this really be part of the stdlib?) It would be useful if the APIs > >> used here were similar to those used in the API support. > > > > mod_python pretty much fits this. I don't see any reason > to develop > > anything else (at least in terms of Apache integration). I don't > > think it would make sense as part of the stdlib -- it depends on > > Apache just as much as Python, and people install Apache in > all sorts > > of different ways. > > Yes, in Apache, mod_python is pretty much it. As far as the > API goes, I > think mod_python is an important one to look at at the design stage > rather than trying to fit an API to it later, since Apache is fairly > standard and mod_python is used by lots of different people. > You don't > want mod_python to have to be rewritten to comply with the API later. > I'm not sure that we should be arguing to include something that depends on a specific environment like Apache in the standard library. We should certainly be trying to promote a standard of some sort, however, which seems to conflict. I see the parallel more as being with the DB API - there are Oracle modules and ODBC modules (which are cross-engine) and SQL Server modules and so on. What we need is something to provide closely similar interfaces to different web server engines - whether those engines are in pure Python or external components. The one problem I see with mod_python is its defaulting behavior - you can get the same content several different ways. Specifically, the following URLs http://server/ http://server/index.py http://server/index.py.index all refer to the same content, and this makes it rather difficult to come up with a scheme for producing sensible relative URLs -- the browsers don't always interpret the path the same way the server does -- which in turn can make it difficult to produce easily portable web content. While this is probably not an issue for the standard library I'd like to know whether anyone has actually addressed the problem. My own solution is to canonicalise everything to be explicit, but if there's an easier way I'd love to hear it. regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From sholden at holdenweb.com Thu Oct 23 08:49:32 2003 From: sholden at holdenweb.com (Steve Holden) Date: Thu Oct 23 08:54:17 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <20031023111331.GA7516@rogue.amk.ca> Message-ID: [amk] > > On Fri, Oct 17, 2003 at 01:51:50PM -0700, Bill Janssen wrote: > > 1) A good CGI module. This should allow clear access to > the various > > values passed in the environment, as Simon points out. I think the > > current "cgi" module isn't bad at this, but I'm sure we can find > > shortcomings. > > * Too much cruft. We could either deprecate stuff in cgi.py with a > vengeance, or think up some new package organization. > My own preference would be for a new package altogether. The existing module would be difficult to engineer onwards into something clean. Like Topsy, it "just growed". > > 2) A standard Apache plug-in. Does mod_python fill this > role? (Should > > this really be part of the stdlib?) > > Too much work for the stdlib. Apache support suffers from > the split between > Apache versions 1.3 and 2.0; the API changed a *lot* between the two > versions, but both versions are still pretty common. Leave > it to mod_python. > Agreed. > > 3) A standard stand-alone solution, but better than the > three standard > > servers already in the stdlib. I been using Medusa lately, > and rather > > like its approach to things. > > The problem is that the code in the Medusa package is written really > unconventionally -- classes have lowercase names, it's still > 1.5 (and often > 1.4!) compatible -- and there's a lot of cruft here, too; > it's often not > clear which modules are intended for actual use and which ones are > half-baked experiments. This could be cleaned up if it's > deemed worth the > effort; I think it would be worth the effort. I don't think Medusa has had the concerted support that other environments have, and that's a pity because it appears to strike an excellent balance between complexity, efficiency and capability. I'd be prepared to help in such an effort (once PyCon is back on track). > I initially didn't want to embark on a big class > renaming because I > thought Twisted would quickly and completely replace Medusa, > but that seems > unlikely to happen. > Well, if those Twisted guys would stop implementing neat ideas and do some serious work explaining the structure of the framework they would probably find their code was more widely used. I suspect it will take Twisted a long time to mature because the developers are who and what they are. Their enthusiasm is admirable, but sometimes I get a bit annoyed by the hand waving :-) My experience is that people who've been walked through the Twisted code one-to-one by a Twisted developer "get it", but that just reading the docs or listening to conference presentations doesn't cut the mustard. Or maybe that's just me... regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From amk at amk.ca Thu Oct 23 09:58:57 2003 From: amk at amk.ca (amk@amk.ca) Date: Thu Oct 23 09:59:07 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? Message-ID: <20031023135857.GA8007@rogue.amk.ca> What's the scope of improving client-side HTTP support? I suggest aiming for something you could write a web browser or web scraper on top of. That means storing and returning cookies from the server, writing them to a file, and a page cache that handles HTTP's cache expiration rules. HTML formatting is out of scope, but a specialized parser for extracting a list of form elements or for picking apart a table might not be. Does anyone want to produce a feature list and proposed design? --amk From cs1spw at bath.ac.uk Thu Oct 23 10:25:07 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Thu Oct 23 10:25:18 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: References: <3F903CBD.6030508@bath.ac.uk> <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> <20031023111331.GA7516@rogue.amk.ca> Message-ID: <3F97E4C3.4070704@bath.ac.uk> Stuart Langridge wrote: > It would be useful to know if anyone is still *using* any of the old > backwards-compatible stuff. The cgi.py API hasn't changed very much > since 1.5, has it? (I might be hideously wrong here.) If not, then > deprecating all the previous ways it used to work would probably be a > good idea; there can't be that many people still using code that old? > (I admit that this sort of assertion has a habit of coming back and > biting you, mind.) Actually, I wrote an application using the cgi module this week - it's just been deployed as the system to manage http://coupons.lawrence.com/ :) I like the idea that has been suggested before of creating a new 'web' packages, similar to the email one. The current cgi module could be left as it is (and marked as deprecated but left in the library), the new interface could live at web.cgi. Cheers, Simon Willison From aquarius-lists at kryogenix.org Thu Oct 23 10:42:15 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Thu Oct 23 10:41:37 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks References: <3F903CBD.6030508@bath.ac.uk> <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com> <20031023111331.GA7516@rogue.amk.ca> <3F97E4C3.4070704@bath.ac.uk> Message-ID: Simon Willison spoo'd forth: > Stuart Langridge wrote: >> It would be useful to know if anyone is still *using* any of the old >> backwards-compatible stuff. The cgi.py API hasn't changed very much >> since 1.5, has it? (I might be hideously wrong here.) If not, then >> deprecating all the previous ways it used to work would probably be a >> good idea; there can't be that many people still using code that old? >> (I admit that this sort of assertion has a habit of coming back and >> biting you, mind.) > > Actually, I wrote an application using the cgi module this week - it's > just been deployed as the system to manage http://coupons.lawrence.com/ :) Oh, blimey, I didn't mean deprecate the whole module, prqactically everything I write uses it :) It seems to have a lot of very backwards-compatible stuff in it, like everything that came before FieldStorage, which is what I was talking about removing. I don't think that there's anything significant that you can't *do* with it, is there? Just that it's not all that convenient to do anything. So a "web" module, analagous to "email", as Simon suggested, seems like a great idea to me; a higher-level abstraction layer over the cgi module. sil -- A man, a plan, a canoe, pasta, heros, rajahs, a coloratura, maps, snipe, percale, macaroni, a gag, a banana bag, a tan, a tag, a banana bag again (or a camel), a crepe, pins, Spam, a rut, a Rolo, cash, a jar, sore hats, a peon, a canal -- Panama! From neel at mediapulse.com Thu Oct 23 10:59:20 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Thu Oct 23 10:59:26 2003 Subject: [Web-SIG] Python and the web Message-ID: Hi all, First a short introduction of myself. I work for a web development company and we have been focused on using python for projects now for about one year. I tend to lurk in python mailing lists I have no business being in (such as the mod_python dev list =) ). My views here are to represent the end user of the python web related tools, i.e. the web programmer. I have personally launched about 10 sites now that are python driven, many are an Apache+mod+python+Albatross+MySQL engine. The first question I have to everyone is the scope of the group. There are some tasks common to the web programmer, but might be off topic from what I'm seeing here. Two examples are templating and parsing SGML based documents (HTML, XML, etc). It would be nice if python included a basic templating module, but I wouldn't expect it to be very powerful. When heavy firepower is needed projects like Albatross and PSP (being integrated into mod_python) are a better solution. However, sometimes a simple system is all that is needed, lightweight and fast. The module by Greg Stein, ezt.py, is a good example of what I think would be handy in the stdlib. Parsing files might be too much for this focus, as it's a very large task. Still, more and more the web developer is faced with reading in XML and applying a style sheet to it or otherwise formatting the data. Python is billed as batteries included, and granted this is a car battery of a module, but it would still be nice to help out the web developer here with the stdlib more. Too often I find myself developing a custom parsing engine for reading in some HTML or XML files. Isn't the point of a standard format so we can use standard tools with it? Yes, I'm quite aware that, while XML is a standard, the term is applied loosely =p. This is a problem larger than python, as all languages seem to be wrestling with this; how cool would it be for python to be the first to have a really powerful, yet simple solution? For the CGI module, I can't comment - I've never used it. Our decision that python was ready for the big time here was based mostly on mod_python's ability. CGI is dead to us as a viable option; it simply does not scale. While you can use tools to string it along, like fastCGI and co., working closely with the server API is going to be the best gain for effort in the performance area. For this same reason we also skipped over mod_python's publisher handler (which is where the relative URL complains come in - it's worth note that is something not mod_python as a whole but just publisher). For client side http in python, I've been impressed with how clean and simple it is. Getting a file across http is no harder (in fact easier imho) than a local disk file. Now dealing with the file is a different story, see the above on parsing. For a http server module, this is not a great need for myself but it would still be good to have. My idea would be a server class that you derived a server from, overriding the phases of the request you needed to work with, al'a the way Apache works. Something like: class MyServer(HTTPServer): def authhandler(self, req): if self.validate(req.user, req.pass): return true else: return false def handler(self, req): page = req.uri.filename try: req.send(open(page,'r').read()) return true execpt: return false That's basic, but if you've worked with the apache API in mod_python, mod_perl, or C you get the idea. Also it would be nice if the default handlers provided a working server, if some options were set like a DocumentRoot: class MyServer(HTTPServer): documentroot = "/var/www/html" I would say that Apache's 1.3 API should be a better goal, and leave out the new features in 2.0 API. First is the KISS principle, next is we should be trying to replace Apache but rather provide a reasonable useful web server in the stdlib. Also, if someone needs a feature of the 2.0 style API, they can always add that in the derived class. The last thing to point out is using a request object is important, as others mentioned here. With a standard request object, other tools like Albatross, can easily tie into this new server. Look forward to comments and where this goes! Mike From grisha at modpython.org Thu Oct 23 11:17:26 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Thu Oct 23 11:17:30 2003 Subject: [Web-SIG] some thoughts Message-ID: <20031023105912.V10747@onyx.ispol.com> Hello all - The first point of this message it to make it known that I am on this list (partially wearing my mod_python hat) and listening attentively. Second, I do agree with those who said mod_python does not belong in stdlib. (Not unless Python becomes an ASF project or Apache becomes a PSF project...). Mod_python is a lot more about Apache than it is about Python, and it is far more complex than it would seem at first sight. What I would really like to see come out of this SIG is an agreement to work towards developing a set of standards, rather than a bunch of code. The following things could be standardized: 1. "Publishing" a la mod_python's publisher or Zope's ZPublisher (Bobo) 2. Request/Response interface 3. Python Server Pages (Right now mod_python and webware have a similar syntax, but not the same). Mod_python's flex-based PSP might actually be more appropriately placed into stdlib, rather than be part of mp. 4. PSTL, i.e. XML-compliant tag-based server pages. AFAIK nothing mature of this sort exists. Grisha From ianb at colorstudy.com Thu Oct 23 11:21:17 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 23 11:21:20 2003 Subject: [Web-SIG] Request/Response features In-Reply-To: <3F97A800.70106@sjsoft.com> Message-ID: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> On Thursday, October 23, 2003, at 05:05 AM, David Fraser wrote: >> The interface I wrote is at: >> >> http://colorstudy.com/~ianb/IHTTP_01.py > > Had a look at this, it's nice for a start. However I agree with you > that the transaction interface is confusing... for example, what does > "setTransaction" mean/do? Some of the methods were for setting up the request, or modifying the request so it can be forwarded internally. It might be fine to leave the request/response setup undefined -- it would be defined by the context, e.g., cgi would set it up one way, mod_python another, etc. For forwarding I think it might be better to simply create new object that would be reinjected into the framework. > Some other comments: > pathInfo/requestURI > would be good to have some consistency between these names They are mostly based off their CGI environment equivalent. > getFields > I don't think the ordering is generally important to people, so why > not ignore it, because if people want to preserve it, they can always > write some code to do that, but it's hardly needed as default > functionality. I agree, it should go. > getFieldDict > It would be great if the user could set the behaviour they want for > multiple keys. > I know I *always* want to discard any extra values. Including an > option to do this rather than return a list would prevent lots of > people doing post-processing That seems to difficult to define. I don't think there should be customizations, because that makes it too difficult to work in a heterogeneous environment. If you turn that setting on and some application you are using needs it off, then you get a configuration mess. Wrappers could provide more friendly interfaces. > General comment here: there are quite a few different methods to > handle getting/setting get/post fields. Perhaps this would be made > simpler by using a standard dictionary interface. That would also > clear up confusion about what parameters to pass to setFieldDict etc. > Another question is whether people really need get and post arguments > to be processed differently. People do need to access them separately, as that's a common feature request. Usually they'd be accessing some combined version of those, but the option should be there. > Also, is it neccessary for all attributes to be accessed by methods? > Particularly (no pun intented) things like "method", "time" would seem > to make more sense as attributes. If anyone really needs to run some > code to access them, I wrote the interface with wrappers in mind, and I thought purely using methods would be easier and more explicit. > The input method seems strange. Perhaps this should be called read? In > general, there needs to be a clear separation between low-level > accessing of the request stream, and higher-level accessing of > processed get/post fields. Perhaps a way to do this would be to > analyse how the most popular existing servers do things, then define a > set of low-level methods which would cover their functionality. If > this was done well, the higher-level methods could be written so that > they always fall back to use the underlying low-level methods if they > aren't overridden, so at least people only have to implement basic > functionality to match the API. I guess there's two ways you could go with that -- if a method is derivative of other methods, then just leave it out and let a wrapper implement it. But that doesn't work particularly well if we want to use the request/response as part of the standard library (without any wrapper in the library). So an abstract base class might be a good idea, with subclasses implementing the actual construction and some of the basic methods. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From barry at python.org Thu Oct 23 11:37:13 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 23 11:37:18 2003 Subject: [Web-SIG] some thoughts In-Reply-To: <20031023105912.V10747@onyx.ispol.com> References: <20031023105912.V10747@onyx.ispol.com> Message-ID: <1066923432.11634.132.camel@anthem> On Thu, 2003-10-23 at 11:17, Gregory (Grisha) Trubetskoy wrote: > What I would really like to see come out of this SIG is an agreement to > work towards developing a set of standards, rather than a bunch of code. /Some/ code wouldn't hurt, but I definitely agree that the early focus of the SIG should be on standards, much like the db-sig came up with DB-API 1.0 and 2.0. E.g. I'd really like for my CGI based scripts to be written against a CGI-API that would Just Work in mod_python, Twisted, Zope, CGIHTTPServer, etc, etc. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/web-sig/attachments/20031023/3027e0db/attachment.bin From ianb at colorstudy.com Thu Oct 23 12:20:31 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 23 12:20:42 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <20031023135857.GA8007@rogue.amk.ca> Message-ID: On Thursday, October 23, 2003, at 08:58 AM, amk@amk.ca wrote: > What's the scope of improving client-side HTTP support? > > I suggest aiming for something you could write a web browser or web > scraper > on top of. That means storing and returning cookies from the server, > writing > them to a file, and a page cache that handles HTTP's cache expiration > rules. > HTML formatting is out of scope, but a specialized parser for > extracting a > list of form elements or for picking apart a table might not be. > > Does anyone want to produce a feature list and proposed design? ClientCookie and ClientForm (http://wwwsearch.sourceforge.net) seem like a possible starting point. I haven't used them much, but they seem like they resist being a framework (which is a good thing) and just do their one job. I don't think you could build a browser on top of them (though that doesn't even apply to ClientForm, which is more of a browser alternative). But if you added caching and authentication into ClientCookie that would probably be a reasonable basis (maybe authentication is already there, I don't know). Looking at ClientCookie just a little more, it could even be integrated directly into urllib2 (it mostly matches that API already). Really, all of this could go into urllib2... -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Thu Oct 23 12:27:25 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 23 12:27:31 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Message-ID: On Thursday, October 23, 2003, at 07:38 AM, Steve Holden wrote: > I'm not sure that we should be arguing to include something that > depends > on a specific environment like Apache in the standard library. We > should > certainly be trying to promote a standard of some sort, however, which > seems to conflict. > > I see the parallel more as being with the DB API - there are Oracle > modules and ODBC modules (which are cross-engine) and SQL Server > modules > and so on. What we need is something to provide closely similar > interfaces to different web server engines - whether those engines are > in pure Python or external components. This was my idea of what the request/response stdlib classes could accomplish -- if not a formal specification, at least a reference implementation which other people could use as a model. > The one problem I see with mod_python is its defaulting behavior - you > can get the same content several different ways. Specifically, the > following URLs > > http://server/ > http://server/index.py > http://server/index.py.index > > all refer to the same content, and this makes it rather difficult to > come up with a scheme for producing sensible relative URLs -- the > browsers don't always interpret the path the same way the server does > -- > which in turn can make it difficult to produce easily portable web > content. In general I would note that URL introspection is (as far as I've seen) poorly handled by nearly everyone. In part because it's hard -- you can have multiple layers of things going on, with proxies, various virtual host configurations, aliases and location-specific handlers, mod_rewrite to mix everything up beyond hope, all before Python even becomes involved in the process. Then there's a wide variety of ways the URL can continue to be mapped even after that. Portably figuring out where an application exists, what its base is, and how it should best refer to other pages is difficult. Then add things like non-cookie session IDs, :action GET variables, and other things that break out of the model. It's challenging to make a good system to map URLs to resources, but people haven't really tried to meet the challenge of mapping resources back into URLs (and maybe it shouldn't even be attempted in a general way, but rather accomplished through some sort of configuration). -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From janssen at parc.com Thu Oct 23 16:03:31 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 16:03:55 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Your message of "Wed, 22 Oct 2003 23:19:56 PDT." <3F97730C.70107@sjsoft.com> Message-ID: <03Oct23.130335pdt."58611"@synergy1.parc.xerox.com> > I haven't used Medusa, but I have used Twisted and the standard Python > libraries. > Some notes: > 1) Twisted is definitely too complex to include. The question is, would > it be possible to rip out a simple web server from Twisted or would it > require a whole lot of extras that don't fit in the standard libraries? > This may amount to a re-write. I've become quite fond of Medusa, myself. It's small, uses standard Python (pure Python), is *not* under active development (a benefit, if you think about it). I've written a number of services using its framework. I haven't actually tried Twisted, because it seems overly complex for the various tasks I want to perform. I'm not interested in an all-singing all-dancing Apache clone -- I'd just use Apache for that. Bill From janssen at parc.com Thu Oct 23 16:04:08 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 16:04:34 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Your message of "Thu, 23 Oct 2003 01:53:17 PDT." <3F9796FD.6050003@sjsoft.com> Message-ID: <03Oct23.130412pdt."58611"@synergy1.parc.xerox.com> Yes, I like this idea. David Fraser writes: > Maybe for this reason we should stick to the existing HTTP server in > stdlib, but fix it up and improve it and change it to match the new API Bill From janssen at parc.com Thu Oct 23 16:06:28 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 16:08:27 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Your message of "Thu, 23 Oct 2003 05:49:32 PDT." Message-ID: <03Oct23.130637pdt."58611"@synergy1.parc.xerox.com> > My experience is that people who've been walked through the Twisted code > one-to-one by a Twisted developer "get it", but that just reading the > docs or listening to conference presentations doesn't cut the mustard. > Or maybe that's just me... No, it's not just you. Bill From janssen at parc.com Thu Oct 23 16:12:14 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 16:12:49 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: Your message of "Thu, 23 Oct 2003 06:58:57 PDT." <20031023135857.GA8007@rogue.amk.ca> Message-ID: <03Oct23.131222pdt."58611"@synergy1.parc.xerox.com> amk writes: > What's the scope of improving client-side HTTP support? > > I suggest aiming for something you could write a web browser or web scraper > on top of. That means storing and returning cookies from the server, writing > them to a file, and a page cache that handles HTTP's cache expiration rules. > HTML formatting is out of scope, but a specialized parser for extracting a > list of form elements or for picking apart a table might not be. My original idea was to look at something like cURL (http://curl.haxx.se/), and make sure anything you could do with that tool, you could do with Python. Might be a bit ambitious; here's the lead paragraph from the cURL web page: Curl is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP. Curl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, kerberos, HTTP form based upload, proxies, cookies, user+password authentication, file transfer resume, http proxy tunneling and a busload of other useful tricks. Currently, for example, there's no way in the Python standard libraries to do a file upload (a POST with multipart/form-data). Then there are issues about handling the Web-centric formats you get back. There's no CSS parser, for instance. It's hard to understand a modern Web page without one. A Javascript interpreter? Bill From janssen at parc.com Thu Oct 23 16:12:50 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 16:13:19 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Your message of "Thu, 23 Oct 2003 07:25:07 PDT." <3F97E4C3.4070704@bath.ac.uk> Message-ID: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com> > Actually, I wrote an application using the cgi module this week - it's > just been deployed as the system to manage http://coupons.lawrence.com/ :) Sure, I write them all the time. But what's missing? What do you have to work around? Bill From janssen at parc.com Thu Oct 23 16:14:52 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 16:16:32 2003 Subject: [Web-SIG] Python and the web In-Reply-To: Your message of "Thu, 23 Oct 2003 07:59:20 PDT." Message-ID: <03Oct23.131453pdt."58611"@synergy1.parc.xerox.com> Great comments! Thanks, Mike. Bill From ianb at colorstudy.com Thu Oct 23 16:15:36 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 23 16:16:34 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <03Oct23.130335pdt."58611"@synergy1.parc.xerox.com> Message-ID: On Thursday, October 23, 2003, at 03:03 PM, Bill Janssen wrote: > I've become quite fond of Medusa, myself. It's small, uses standard > Python (pure Python), is *not* under active development (a benefit, if > you think about it). I've written a number of services using its > framework. Those are all good arguments for Medusa being appropriate for the standard library. Or, if not Medusa, something similar (it sounds like Medusa might just need a little love and care to modernize it). -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From barry at python.org Thu Oct 23 16:19:59 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 23 16:20:07 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <03Oct23.130637pdt."58611"@synergy1.parc.xerox.com> References: <03Oct23.130637pdt."58611"@synergy1.parc.xerox.com> Message-ID: <1066940399.11634.290.camel@anthem> On Thu, 2003-10-23 at 16:06, Bill Janssen wrote: > > My experience is that people who've been walked through the Twisted code > > one-to-one by a Twisted developer "get it", but that just reading the > > docs or listening to conference presentations doesn't cut the mustard. > > Or maybe that's just me... > > No, it's not just you. Agreed. I've been using Twisted as the framework for my Mailman 3 experiments and I didn't get it until I spent an evening on irc with Moshe and Itamar. That was incredibly helpful, and I recommend that same approach for everyone . -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/web-sig/attachments/20031023/31925494/attachment.bin From cs1spw at bath.ac.uk Thu Oct 23 17:56:09 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Thu Oct 23 17:56:24 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com> References: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F984E79.8080501@bath.ac.uk> Bill Janssen wrote: >>Actually, I wrote an application using the cgi module this week - it's >>just been deployed as the system to manage http://coupons.lawrence.com/ :) > > Sure, I write them all the time. But what's missing? What do you > have to work around? The biggest thing for me is distinguishing between GET and POST data. Sending HTTP headers (including cookies) is also highly inconvenient as with the cgi module they have to be manually constructed as HTTP name:value pairs and sent before the rest of the text. This is where the request/response object model becomes very attractive - maybe something like the following: import web.cgi req = web.cgi.HTTPRequest() # Auto-populates with data from environment if req.POST: # Form has been posted body = 'Hi there, %s' % req.POST['name'] else: body = '
' res = web.cgi.HTTPResponse() res.content_type = 'text/html' res.set_cookie('name', 'Simon') res['X-Additional-Header'] = 'Another header' res.write('

Hi there

\n%s' % body) print res Output: Content-Type: text/html Set-Cookie: name=Simon X-Additional-Header: Another header Content-Length: 30

Hi there

Cheers, Simon Willison http://simon.incutio.com/ From janssen at parc.com Thu Oct 23 19:45:53 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 19:46:16 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Your message of "Thu, 23 Oct 2003 14:56:09 PDT." <3F984E79.8080501@bath.ac.uk> Message-ID: <03Oct23.164558pdt."58611"@synergy1.parc.xerox.com> I usually use a simple "response" object with a few standard methods: class response: def open (self, content_type = "text/html"): """Returns a file object open for write""" def redirect (self, url): """Sends a redirect message to the specified URL""" def error (self, code, message): """Sends back error CODE (a valid HTTP code) with MESSAGE""" def reply (self, message): """Sends back reply string MESSAGE""" return self.error(200, message) def return_file (self, typ, path): """returns the file of MIME type TYP from PATH""" def add_cookie (self, name, value): """Add the cookie to the reply""" But that's probably too simple. Bill From jjl at pobox.com Thu Oct 23 20:46:46 2003 From: jjl at pobox.com (John J Lee) Date: Thu Oct 23 20:47:05 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <03Oct23.131222pdt."58611"@synergy1.parc.xerox.com> References: <03Oct23.131222pdt."58611"@synergy1.parc.xerox.com> Message-ID: On Thu, 23 Oct 2003, Bill Janssen wrote: > amk writes: > > What's the scope of improving client-side HTTP support? > > > > I suggest aiming for something you could write a web browser or web scraper > > on top of. That means storing and returning cookies from the server, writing > > them to a file, and a page cache that handles HTTP's cache expiration rules. > > HTML formatting is out of scope, but a specialized parser for extracting a > > list of form elements or for picking apart a table might not be. I've been working on that kind of stuff. http://wwwsearch.sourceforge.net/ I certainly think automatic cookie handling would be appropriate for the std lib. I've written code to do that (based on a port from libwww-perl, but substantially changed since then), which is already integrated into urllib2 (albeit ATM including a lot of junk for backwards-compatibility and some cut-n-pasting necessary because it's not (yet) actually part of the Python standard library). The only problem is that it's rather large. I claim this is (mostly) not my fault ;-) because the cookie standards are a royal mess. For a number of reasons, it will be significantly smaller in the form I hope will get into the Python standard lib., but it'll still be bigish. Still, you *could* quite easily write a much less anal implementation that worked most of the time. One risk of that is that you'd have to put up with a constant stream of bugs from people finding that website x breaks your simple impelemntation. At least, Ronald Tschalar (author of one of two Java libraries both named HTTPClient) tells me that was his experience. The fundamental problem is that the cookie 'standard' is really just Mozilla and MSIE's behaviour. For a brief summary of the sad tale, see: http://wwwsearch.sourceforge.net/ClientCookie/doc.html#standards OTOH, my code goes to some effort to enforce as many restrictions as possible to prevent cookies getting set and returned when they shouldn't. That could be cut without losing functionality (but obviously, losing security, for those who care about that). That seems pointless to me now that code is pretty stable, though. One thing about my implementation that might seem like it should be cut out is RFC 2965 support. It seems fairly safe to say that RFC 2965 is all but officially dead as an internet standard (and the same goes for RFC 2109, though I'm told a few servers implement it in some form -- *clients* have taken bits and pieces from the standard, but very few of those could be called RFC 2109 implementations: I regard those bits of the RFC 2109 standard as simply parts of the current state of the de-facto Netscape protocol). The one guy who was driving forward errata for RFC 2965 on the http-state mailing list seems to have succumbed to cookie-fatigue. I guess it's still useful on intranets. Half of the reason it's still in my code is simply that the Netscape cookie protocol is a messy de facto standard, and it seems far easier and more secure to specify it by the ways it differs from the RFC standard than to have it stand on its own feet. It also allows you to easily tighten up the Netscape rules if you feel like it (assuming that doesn't break the particular site you're using). The remaining 25% of the reason it's there is that I don't have the heart to rip it out ;-) So, that's my pitch for justifying the inclusion of ClientCookie (in a somewhat reduced form) in the standard library. Jeremy Hylton seemed to like the idea of having it in the std lib, but I don't know if he looked at the code :-) A related issue is urllib2's 'handler' system, which I've discovered isn't quite flexible enough to implement a number useful features (including automatic cookie handling). I think it's possible to fix this without breaking anybody's code. Full details here: http://www.python.org/sf/759792 Jeremy said a few months back that he'd look at it, but I've heard nothing from him since... As for forms, originally I thought the forms code I wrote (ClientForm -- again, based on a port of Gisle Aas' libwww-perl, and again quite substantially changed since then) might be nice in the std lib, but I changed my mind a long while ago for a number of reasons. But if anybody wants to talk about HTML form parsers, of course, feel free to start a thread. Same goes for HTML table parsing -- I'm not convinced the standard library is the place for this. I certainly think a function for doing file uploads would be great, though. Steve Purcell has some code for that in his old webunit module (there seems to be a new Python module called webunit here http://mechanicalcat.net/tech/webunit but the code download link is broken), and so do I in ClientForm. My code depends on a modified version of MimeWriter. I think it would be nice to fix MimeWriter so it could do this job. I think that's possible without breaking old code, though I know almost nothing about MIME. > My original idea was to look at something like cURL > (http://curl.haxx.se/), and make sure anything you could do with that > tool, you could do with Python. Might be a bit ambitious; here's the > lead paragraph from the cURL web page: > > Curl is a command line tool for transferring files with URL syntax, > supporting FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and > LDAP. Curl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP > uploading, kerberos, HTTP form based upload, proxies, cookies, > user+password authentication, file transfer resume, http proxy > tunneling and a busload of other useful tricks. I don't think it's a good idea to start on some new grand library, certainly not in the std lib. Gradual evolution seems more appropriate. Most of the stuff you list is either already there, or would fit it quite neatly into the current framework without any major upheavals. > Then there are issues about handling the Web-centric formats you get > back. There's no CSS parser, for instance. It's hard to understand a > modern Web page without one. What uses do you have in mind for that? > A Javascript interpreter? Whaaat?? You want a JS interpreter included with the Python distribution? You're kidding, right? :-) John From ianb at colorstudy.com Thu Oct 23 21:10:26 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 23 21:11:09 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: Message-ID: On Thursday, October 23, 2003, at 07:46 PM, John J Lee wrote: > I've been working on that kind of stuff. > > http://wwwsearch.sourceforge.net/ > > I certainly think automatic cookie handling would be appropriate for > the > std lib. I've written code to do that (based on a port from > libwww-perl, > but substantially changed since then), which is already integrated into > urllib2 (albeit ATM including a lot of junk for backwards-compatibility > and some cut-n-pasting necessary because it's not (yet) actually part > of > the Python standard library). The only problem is that it's rather > large. > I claim this is (mostly) not my fault ;-) because the cookie standards > are > a royal mess. For a number of reasons, it will be significantly > smaller > in the form I hope will get into the Python standard lib., but it'll > still > be bigish. How big can it really be? I don't see how that would be a problem. Cookies suck, they act all funny and always seem unpredictable. If your library can hide that, great! It's certainly not worth simplifying the code if it means making the library less robust. I'm all for hiding crufty stuff behind simpler interfaces. If the cruft leaks out that might be an issue, but you probably have a more informed opinion about whether you have been able to keep it in or not. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Thu Oct 23 21:13:54 2003 From: jjl at pobox.com (John J Lee) Date: Thu Oct 23 21:14:07 2003 Subject: [Web-SIG] client-side support: PEP 268 In-Reply-To: <20031022165217.I11797@lyra.org> References: <20031022165217.I11797@lyra.org> Message-ID: Greg (or anybody else, for that matter), would you mind looking at these doc bugs? http://www.python.org/sf/793553 http://www.python.org/sf/798244 John From jjl at pobox.com Thu Oct 23 21:22:43 2003 From: jjl at pobox.com (John J Lee) Date: Thu Oct 23 21:23:53 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: References: Message-ID: On Thu, 23 Oct 2003, Ian Bicking wrote: > On Thursday, October 23, 2003, at 07:46 PM, John J Lee wrote: [...] > How big can it really be? I don't see how that would be a problem. Well, much bigger than it should be for the job that cookies do. And there's a big difference in size between a module that handles cookies, and one that knows about all the endless nonsense involved in doing the Right Thing. [...] > I'm all for hiding crufty stuff behind simpler interfaces. If the > cruft leaks out that might be an issue, but you probably have a more > informed opinion about whether you have been able to keep it in or not. That's not an issue. For most people, it doesn't even *have* an interface -- you'd just do urllib2.urlopen as usual (ATM, you do ClientCookie.urlopen, of course). Well, possibly you'd have to call build_opener / install_opener, too, to explicitly request cookie handling... John From janssen at parc.com Thu Oct 23 22:08:29 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 22:09:45 2003 Subject: [Web-SIG] file uploads In-Reply-To: Your message of "Thu, 23 Oct 2003 17:46:46 PDT." Message-ID: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com> > I certainly think a function for doing file uploads would be great, > though. It's not difficult. I adapted this code from a version in the Python cookbook, by Wade Leftwich: import httplib, mimetypes def https_post_multipart(host, port, selector, fields, files): """ Post fields and files to an http host as multipart/form-data. FIELDS is a sequence of (name, value) elements for regular form fields. FILES is a sequence of (name, filename [, value]) elements for data to be uploaded as files. Return the server's response page. """ content_type, body = encode_multipart_formdata(fields, files) h = httplib.HTTPS(host, port) h.putrequest('POST', selector) h.putheader('Content-Type', content_type) h.putheader('Content-Length', str(len(body))) h.endheaders() h.send(body) errcode, errmsg, headers = h.getreply() return errcode, errmsg, headers, h.file.read() def http_post_multipart(host, port, password, selector, fields, files): """ Post fields and files to an http host as multipart/form-data. FIELDS is a sequence of (name, value) elements for regular form fields. FILES is a sequence of (name, filename [, value]) elements for data to be uploaded as files. Return the server's response page. """ content_type, body = encode_multipart_formdata(fields, files) h = httplib.HTTP(host, port) h.putrequest('POST', selector) if password: h.putheader('Password', password) h.putheader('Content-Type', content_type) h.putheader('Content-Length', str(len(body))) h.endheaders() h.send(body) errcode, errmsg, headers = h.getreply() return errcode, errmsg, headers, h.file.read() def encode_multipart_formdata(fields, files): """ fields is a sequence of (name, value) elements for regular form fields. files is a sequence of (name, filename, value) elements for data to be uploaded as files Return (content_type, body) ready for httplib.HTTP instance """ BOUNDARY = '----------ThIs_Is_tHe_bouNdaRY_$' CRLF = '\r\n' L = [] for (key, value) in fields: L.append('--' + BOUNDARY) L.append('Content-Disposition: form-data; name="%s"' % key) L.append('') L.append(value) for file in files: key = file[0] filename = file[1] if len(file) > 2: value = file[2] else: value = None L.append('--' + BOUNDARY) L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, os.path.basename(filename))) L.append('Content-Type: %s' % get_content_type(filename)) if value: L.append('') L.append(value) else: L.append('Content-Transfer-Encoding: binary') L.append('') fp = open(filename, 'r') L.append(fp.read()) fp.close() L.append('--' + BOUNDARY + '--') L.append('') body = CRLF.join(L) content_type = 'multipart/form-data; boundary=%s' % BOUNDARY return content_type, body def get_content_type(filename): return mimetypes.guess_type(filename)[0] or 'application/octet-stream' From janssen at parc.com Thu Oct 23 22:11:56 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 23 22:12:24 2003 Subject: [Web-SIG] Client-side API In-Reply-To: Your message of "Thu, 23 Oct 2003 18:22:43 PDT." Message-ID: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com> Another possibility would be to mimic the Java 1.4.1 libraries for the Web. For instance, we could have the "URL" object, which has a method called "open()", which when called gives you a "Connection", which can be of subtype "HTTPConnection", "FTPConnection", etc. Call the "create_request()" method on that "Connection" to get a new Request instance, use "set_header()", "set_cookie()", "set_body()", etc., then call the "send()" method, getting back a ReplyPromise instance, which can then be interrogated periodically to get a Reply instance, etc. Bill From cs1spw at bath.ac.uk Fri Oct 24 01:40:29 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 01:40:38 2003 Subject: [Web-SIG] Client-side API In-Reply-To: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com> References: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F98BB4D.2020100@bath.ac.uk> Bill Janssen wrote: > Another possibility would be to mimic the Java 1.4.1 libraries for the > Web. For instance, we could have the "URL" object, which has a method > called "open()", which when called gives you a "Connection", which can > be of subtype "HTTPConnection", "FTPConnection", etc. Call the > "create_request()" method on that "Connection" to get a new Request > instance, use "set_header()", "set_cookie()", "set_body()", etc., then > call the "send()" method, getting back a ReplyPromise instance, which > can then be interrogated periodically to get a Reply instance, etc. Ugh. One of the things I love about Python is that unlike Java it doesn't force you to have horribly verbose interfaces with dozens of different classes. A URL is a string, file-like-objects are file-like-objects and most of the modules in the standard library only make you deal with one or two classes and a few useful utility methods. I'm all for replicating the capabilities of Java libraries (if they have a good bunch of features) but replicating the exact APIs seems to me like a lost opportunity to take advantage of Python's more expressive syntax. Cheers, Simon Willison http://simon.incutio.com/ From thijs at fngtps.com Fri Oct 24 02:11:11 2003 From: thijs at fngtps.com (Thijs van der Vossen) Date: Fri Oct 24 02:11:16 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? Message-ID: <200310240811.13760.thijs@fngtps.com> ---------- Forwarded Message ---------- Subject: Re: [Web-SIG] Client-side support: what are we aiming for? Date: Friday 24 October 2003 08:08 From: Thijs van der Vossen To: web-sig@python.org On Friday 24 October 2003 02:46, John J Lee wrote: > So, that's my pitch for justifying the inclusion of ClientCookie (in a > somewhat reduced form) in the standard library. Jeremy Hylton seemed to > like the idea of having it in the std lib, but I don't know if he looked > at the code :-) I would really like to have client cookie support in the standard library too. > As for forms, originally I thought the forms code I wrote (ClientForm -- > again, based on a port of Gisle Aas' libwww-perl, and again quite > substantially changed since then) might be nice in the std lib, but I > changed my mind a long while ago for a number of reasons. But if anybody > wants to talk about HTML form parsers, of course, feel free to start a > thread. Same goes for HTML table parsing -- I'm not convinced the > standard library is the place for this. I tend to agree with this. HTML form and/or table parsing is almost only used for stuff like screen-scraping, but I don't think this is so common it should be included in the stanstard library. Retrieving data from the web will be done more and more through web service interfaces like XML-RPC and SOAP or with REST-style interfaces. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 ------------------------------------------------------- -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From davidf at sjsoft.com Fri Oct 24 03:33:27 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:33:33 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: References: Message-ID: <3F98D5C7.8030802@sjsoft.com> Steve Holden wrote: >[David Fraser] > > >>Ian Bicking wrote: >> >> >> >>>On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote: >>> >>> >>> >[...] > > >>>>2) A standard Apache plug-in. Does mod_python fill this >>>> >>>> >>role? (Should >> >> >>>>this really be part of the stdlib?) It would be useful if the APIs >>>>used here were similar to those used in the API support. >>>> >>>> >>>mod_python pretty much fits this. I don't see any reason >>> >>> >>to develop >> >> >>>anything else (at least in terms of Apache integration). I don't >>>think it would make sense as part of the stdlib -- it depends on >>>Apache just as much as Python, and people install Apache in >>> >>> >>all sorts >> >> >>>of different ways. >>> >>> >>Yes, in Apache, mod_python is pretty much it. As far as the >>API goes, I >>think mod_python is an important one to look at at the design stage >>rather than trying to fit an API to it later, since Apache is fairly >>standard and mod_python is used by lots of different people. >>You don't >>want mod_python to have to be rewritten to comply with the API later. >> >> >> >I'm not sure that we should be arguing to include something that depends >on a specific environment like Apache in the standard library. We should >certainly be trying to promote a standard of some sort, however, which >seems to conflict. > >I see the parallel more as being with the DB API - there are Oracle >modules and ODBC modules (which are cross-engine) and SQL Server modules >and so on. What we need is something to provide closely similar >interfaces to different web server engines - whether those engines are >in pure Python or external components. > > Agreed. What I'm saying isn't that mod_python should be put in the standard library, but that the design of the web server API should be carefully done so that it doesn't require major changes to mod_python etc. >The one problem I see with mod_python is its defaulting behavior - you >can get the same content several different ways. Specifically, the >following URLs > > http://server/ > http://server/index.py > http://server/index.py.index > >all refer to the same content, and this makes it rather difficult to >come up with a scheme for producing sensible relative URLs -- the >browsers don't always interpret the path the same way the server does -- >which in turn can make it difficult to produce easily portable web >content. > > Hmmm ... looks like you are using AddHandler for .py files. I generally find that placing the Python files outside of the web directory, in libraries, works better. Then you can use SetHandler to get mod_python to handle everything, or AddHandler for specific file types to get it to handle some URLs. It makes more sense to me to have a URL of index.htm rather than index.py (why should the user care what I'm using to produce the file?) Hope that is relevant and/or helpful David From davidf at sjsoft.com Fri Oct 24 03:35:43 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:35:49 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: References: Message-ID: <3F98D64F.5050905@sjsoft.com> Steve Holden wrote: >>I initially didn't want to embark on a big class >>renaming because I >>thought Twisted would quickly and completely replace Medusa, >>but that seems >>unlikely to happen. >> >> >Well, if those Twisted guys would stop implementing neat ideas and do >some serious work explaining the structure of the framework they would >probably find their code was more widely used. I suspect it will take >Twisted a long time to mature because the developers are who and what >they are. Their enthusiasm is admirable, but sometimes I get a bit >annoyed by the hand waving :-) > >My experience is that people who've been walked through the Twisted code >one-to-one by a Twisted developer "get it", but that just reading the >docs or listening to conference presentations doesn't cut the mustard. >Or maybe that's just me... > >regards > > I think Twisted is inappropriate for a basic standalone web server to be included in the standard library. It's very fancy and hand-wavy, but that's great - we need something like that around. Eventually their ideas will find their way into other systems as well. But the standard library one should be simple, nice, clean, and extendable. David From davidf at sjsoft.com Fri Oct 24 03:38:49 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:38:58 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <3F984E79.8080501@bath.ac.uk> References: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com> <3F984E79.8080501@bath.ac.uk> Message-ID: <3F98D709.9070806@sjsoft.com> Simon Willison wrote: > Bill Janssen wrote: > >>> Actually, I wrote an application using the cgi module this week - >>> it's just been deployed as the system to manage >>> http://coupons.lawrence.com/ :) >> >> >> Sure, I write them all the time. But what's missing? What do you >> have to work around? > > > The biggest thing for me is distinguishing between GET and POST data. > Sending HTTP headers (including cookies) is also highly inconvenient > as with the cgi module they have to be manually constructed as HTTP > name:value pairs and sent before the rest of the text. > > This is where the request/response object model becomes very > attractive - maybe something like the following: > > import web.cgi > > req = web.cgi.HTTPRequest() # Auto-populates with data from environment > if req.POST: > # Form has been posted > body = 'Hi there, %s' % req.POST['name'] > else: > body = '
' > > res = web.cgi.HTTPResponse() > res.content_type = 'text/html' > res.set_cookie('name', 'Simon') > res['X-Additional-Header'] = 'Another header' > res.write('

Hi there

\n%s' % body) > print res For CGI, it would seem to make sense that you do something like the following: res = web.cgi.HTTPResponse(sys.stdout) res.content_type = 'text/html' res.set_cookie('name', 'Simon') res['X-Additional-Header'] = 'Another header' res.send_headers() res.write('

Hi there

\n%s' % body) Then if you end up writing multiple parts, they can be output to stdout as they are written, rather than having to generate the entire response object first David From davidf at sjsoft.com Fri Oct 24 03:48:36 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:48:42 2003 Subject: [Web-SIG] Python and the web In-Reply-To: References: Message-ID: <3F98D954.7060603@sjsoft.com> Michael C. Neel wrote: > For a http server module, this is not a great need for myself >but it would still be good to have. My idea would be a server class >that you derived a server from, overriding the phases of the request you >needed to work with, al'a the way Apache works. Something like: > >class MyServer(HTTPServer): > > def authhandler(self, req): > if self.validate(req.user, req.pass): > return true > else: > return false > > def handler(self, req): > page = req.uri.filename > try: > req.send(open(page,'r').read()) > return true > execpt: > return false > > That's basic, but if you've worked with the apache API in >mod_python, mod_perl, or C you get the idea. Also it would be nice if >the default handlers provided a working server, if some options were set >like a DocumentRoot: > >class MyServer(HTTPServer): > documentroot = "/var/www/html" > > I think this architecture is great... but obviously people coming from a non-Apache background may have other ideas. However, the key is defining the request/response system well, and then multiple different server structures could be built on that > I would say that Apache's 1.3 API should be a better goal, and >leave out the new features in 2.0 API. First is the KISS principle, >next is we should be trying to replace Apache but rather provide a >reasonable useful web server in the stdlib. Also, if someone needs a >feature of the 2.0 style API, they can always add that in the derived >class. > > On the other hand, the 2.0 API is an improvement of the 1.3 API, and allows things like Filters etc which would be great to include. David From davidf at sjsoft.com Fri Oct 24 03:50:30 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:50:34 2003 Subject: [Web-SIG] some thoughts In-Reply-To: <1066923432.11634.132.camel@anthem> References: <20031023105912.V10747@onyx.ispol.com> <1066923432.11634.132.camel@anthem> Message-ID: <3F98D9C6.4040704@sjsoft.com> Barry Warsaw wrote: >On Thu, 2003-10-23 at 11:17, Gregory (Grisha) Trubetskoy wrote: > > > >>What I would really like to see come out of this SIG is an agreement to >>work towards developing a set of standards, rather than a bunch of code. >> >> > >/Some/ code wouldn't hurt, but I definitely agree that the early focus >of the SIG should be on standards, much like the db-sig came up with >DB-API 1.0 and 2.0. E.g. I'd really like for my CGI based scripts to be >written against a CGI-API that would Just Work in mod_python, Twisted, >Zope, CGIHTTPServer, etc, etc. > > Maybe a better way round to look at it is that your Python Web-API based scripts could also run in a CGI server. I think if the API is based too much on CGI we will lose the benefits of web/application servers, whereas the conversion the other way round should be simpler David From davidf at sjsoft.com Fri Oct 24 03:51:22 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:51:26 2003 Subject: [Web-SIG] file uploads In-Reply-To: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com> References: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F98D9FA.7000603@sjsoft.com> Bill Janssen wrote: >>I certainly think a function for doing file uploads would be great, >>though. >> >> > >It's not difficult. > > But it should be in the standard library... something based on the code you included would be great David From davidf at sjsoft.com Fri Oct 24 03:53:53 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:53:58 2003 Subject: [Web-SIG] Client-side API In-Reply-To: <3F98BB4D.2020100@bath.ac.uk> References: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com> <3F98BB4D.2020100@bath.ac.uk> Message-ID: <3F98DA91.80908@sjsoft.com> Simon Willison wrote: > Bill Janssen wrote: > >> Another possibility would be to mimic the Java 1.4.1 libraries for the >> Web. For instance, we could have the "URL" object, which has a method >> called "open()", which when called gives you a "Connection", which can >> be of subtype "HTTPConnection", "FTPConnection", etc. Call the >> "create_request()" method on that "Connection" to get a new Request >> instance, use "set_header()", "set_cookie()", "set_body()", etc., then >> call the "send()" method, getting back a ReplyPromise instance, which >> can then be interrogated periodically to get a Reply instance, etc. > > > Ugh. One of the things I love about Python is that unlike Java it > doesn't force you to have horribly verbose interfaces with dozens of > different classes. A URL is a string, file-like-objects are > file-like-objects and most of the modules in the standard library only > make you deal with one or two classes and a few useful utility methods. > > I'm all for replicating the capabilities of Java libraries (if they > have a good bunch of features) but replicating the exact APIs seems to > me like a lost opportunity to take advantage of Python's more > expressive syntax. Absolutely. We need to be using attributes rather than method accesses, and dictionary/list-derived classes where sensible and possible. (We shouldn't imitate dictionaries by mimicing methods, and we should use the new-style classes if we need to extend them) David From davidf at sjsoft.com Fri Oct 24 03:55:46 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:55:55 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <200310240811.13760.thijs@fngtps.com> References: <200310240811.13760.thijs@fngtps.com> Message-ID: <3F98DB02.3010407@sjsoft.com> Thijs van der Vossen wrote: >>As for forms, originally I thought the forms code I wrote (ClientForm -- >>again, based on a port of Gisle Aas' libwww-perl, and again quite >>substantially changed since then) might be nice in the std lib, but I >>changed my mind a long while ago for a number of reasons. But if anybody >>wants to talk about HTML form parsers, of course, feel free to start a >>thread. Same goes for HTML table parsing -- I'm not convinced the >>standard library is the place for this. >> >> >I tend to agree with this. HTML form and/or table parsing is almost only used >for stuff like screen-scraping, but I don't think this is so common it should >be included in the stanstard library. Retrieving data from the web will be >done more and more through web service interfaces like XML-RPC and SOAP or >with REST-style interfaces. > > Actually HTML parsing would be fantastic for testing web applications, so maybe that could be related to the Web API. The parsing doesn't have to be very intelligent or do validation, HTML syntax is fairly simple. I think that does belong in the standard library. David From davidf at sjsoft.com Fri Oct 24 03:58:03 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 03:58:26 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <200310240946.37859.t.vandervossen@fngtps.com> References: <3F98D5C7.8030802@sjsoft.com> <200310240946.37859.t.vandervossen@fngtps.com> Message-ID: <3F98DB8B.1000605@sjsoft.com> Thijs van der Vossen wrote: >On Friday 24 October 2003 09:33, David Fraser wrote: > > >>>I'm not sure that we should be arguing to include something that depends >>>on a specific environment like Apache in the standard library. We should >>>certainly be trying to promote a standard of some sort, however, which >>>seems to conflict. >>> >>>I see the parallel more as being with the DB API - there are Oracle >>>modules and ODBC modules (which are cross-engine) and SQL Server modules >>>and so on. What we need is something to provide closely similar >>>interfaces to different web server engines - whether those engines are >>>in pure Python or external components. >>> >>> >>Agreed. What I'm saying isn't that mod_python should be put in the >>standard library, but that the design of the web server API should be >>carefully done so that it doesn't require major changes to mod_python etc. >> >> >Mod_python is probably _not_ a good starting point for a generic web server >API because it's purpose is to directly expose the Apache API. It makes no >sense to model a generic interface on a mostly direct mapping to the >internals of _one_ specific server. > > I'm not saying that the interface should be modelled on mod_python. But that mod_python is an important thing to consider when designing the interface. Apache is the most popular web server on the web. What this means is, if a Python Web API is designed that requires lots of unintuitive code for a mod_python implementation, it's badly designed David From davidf at sjsoft.com Fri Oct 24 04:04:57 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 04:05:07 2003 Subject: [Web-SIG] Request/Response features In-Reply-To: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> Message-ID: <3F98DD29.30706@sjsoft.com> Ian Bicking wrote: > On Thursday, October 23, 2003, at 05:05 AM, David Fraser wrote: > >>> The interface I wrote is at: >>> >>> http://colorstudy.com/~ianb/IHTTP_01.py >> >> Had a look at this, it's nice for a start. However I agree with you >> that the transaction interface is confusing... for example, what does >> "setTransaction" mean/do? > > Some of the methods were for setting up the request, or modifying the > request so it can be forwarded internally. It might be fine to leave > the request/response setup undefined -- it would be defined by the > context, e.g., cgi would set it up one way, mod_python another, etc. > For forwarding I think it might be better to simply create new object > that would be reinjected into the framework. > >> Some other comments: >> pathInfo/requestURI >> would be good to have some consistency between these names > > They are mostly based off their CGI environment equivalent. OK, now I understand, but is this a good way to name things for the future? >> getFieldDict >> It would be great if the user could set the behaviour they want for >> multiple keys. >> I know I *always* want to discard any extra values. Including an >> option to do this rather than return a list would prevent lots of >> people doing post-processing > > That seems to difficult to define. I don't think there should be > customizations, because that makes it too difficult to work in a > heterogeneous environment. If you turn that setting on and some > application you are using needs it off, then you get a configuration > mess. Wrappers could provide more friendly interfaces. If you defined a setField method as you said above, then people could override it to throw away duplicate values. Maybe this is the way to go >> General comment here: there are quite a few different methods to >> handle getting/setting get/post fields. Perhaps this would be made >> simpler by using a standard dictionary interface. That would also >> clear up confusion about what parameters to pass to setFieldDict etc. >> Another question is whether people really need get and post arguments >> to be processed differently. > > People do need to access them separately, as that's a common feature > request. Usually they'd be accessing some combined version of those, > but the option should be there. Fine. So we need a clever way of providing them in either form. I think using dictionaries for this is essential - even if it means defining a single dictionary that remembers which fields are get and which are post, and provides different wrappers to see the different elements. >> Also, is it neccessary for all attributes to be accessed by methods? >> Particularly (no pun intented) things like "method", "time" would >> seem to make more sense as attributes. If anyone really needs to run >> some code to access them, > > I wrote the interface with wrappers in mind, and I thought purely > using methods would be easier and more explicit. It is more explicit, but I don't think it's easier. It may require a few extra lines of code for some servers to require attributes, but it makes the user side much easier. The server side gets written once, many users use it, so it makes sense to make things as easy as possible for the user. For example, the DB-API has an attribute called rowcount. In order to implement that, I had to create a getrowcount() method, then put in a __getattr__ method that called getrowcount() to read the rowcount attribute. A few lines of code. But it makes all the client-side code much more logical and easier to read. Obviously this is a tradeoff and some things should be methods, some should be attributes. >> The input method seems strange. Perhaps this should be called read? >> In general, there needs to be a clear separation between low-level >> accessing of the request stream, and higher-level accessing of >> processed get/post fields. Perhaps a way to do this would be to >> analyse how the most popular existing servers do things, then define >> a set of low-level methods which would cover their functionality. If >> this was done well, the higher-level methods could be written so that >> they always fall back to use the underlying low-level methods if they >> aren't overridden, so at least people only have to implement basic >> functionality to match the API. > > I guess there's two ways you could go with that -- if a method is > derivative of other methods, then just leave it out and let a wrapper > implement it. But that doesn't work particularly well if we want to > use the request/response as part of the standard library (without any > wrapper in the library). So an abstract base class might be a good > idea, with subclasses implementing the actual construction and some of > the basic methods. Right, abstract base class + simple implementation is the way to go. David From t.vandervossen at fngtps.com Fri Oct 24 04:09:38 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Fri Oct 24 04:10:33 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <3F98DB8B.1000605@sjsoft.com> References: <200310240946.37859.t.vandervossen@fngtps.com> <3F98DB8B.1000605@sjsoft.com> Message-ID: <200310241009.39834.t.vandervossen@fngtps.com> On Friday 24 October 2003 09:58, David Fraser wrote: > Thijs van der Vossen wrote: > >Mod_python is probably _not_ a good starting point for a generic web > > server API because it's purpose is to directly expose the Apache API. It > > makes no sense to model a generic interface on a mostly direct mapping to > > the internals of _one_ specific server. > > I'm not saying that the interface should be modelled on mod_python. Ok. That's clear then. > But that mod_python is an important thing to consider when designing the > interface. Apache is the most popular web server on the web. > What this means is, if a Python Web API is designed that requires lots > of unintuitive code for a mod_python implementation, it's badly designed If a Python Web API is designed that requires lots of unintuitive code for a _any_ server implementation, it's badly designed. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From moof at metamoof.net Fri Oct 24 10:50:36 2003 From: moof at metamoof.net (Moof) Date: Fri Oct 24 10:51:12 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <3F98DB02.3010407@sjsoft.com> References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com> Message-ID: <3F993C3C.5080402@metamoof.net> David Fraser wrote: > Actually HTML parsing would be fantastic for testing web applications, > so maybe that could be related to the Web API. Actually, that is a very important point. Many python programmers are fans of Test-driven development. I'm currently developing an app with Webware and Cheetah, and find it very difficult to write tests for a lot of the stuff I do. This is mostly due to a huge amount of background work I need to do to set up an emulation environment first (make sure my request and session objects work correctly as far as I need them to for my testing, replacing the Page write and writeln methods, and so on) and even then, verifying a whole generated page is a pain. So a standard HTML parser would be nice, as well as keeping TDD in mind when we design request and response (and possibly session) objects. > The parsing doesn't have to be very intelligent or do validation, HTML > syntax is fairly simple. > I think that does belong in the standard library. Speaking of validation, a sort of standard form validation library would be nice: something to say "I'm expecting this value to be an int between 1-31" or "I'm expecting this to be a string with the following legal characters" and so on. It's not that difficult to write yourself, but I seem to find myself reinventing the wheel every time I do. A standard "best practice" way of doing this would be wonderful. Moof -- Giles Antonio Radford, a.k.a Moof Sympathy, eupathy, and, currently, apathy coming to you at: From cs1spw at bath.ac.uk Fri Oct 24 11:25:29 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 11:25:36 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F98DD29.30706@sjsoft.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> Message-ID: <3F994469.20304@bath.ac.uk> David Fraser wrote: > Fine. So we need a clever way of providing them in either form. I think > using dictionaries for this is essential - even if it means defining a > single dictionary that remembers which fields are get and which are > post, and provides different wrappers to see the different elements. I am convinced that the neatest way of handling this is to replicate PHP's form field dictionaries, in particular these three: GET - the form data that came in via GET POST - the form data that came in via POST REQUEST - the above two dictionaries combined (POST over-riding GET) However, there is one special case that needs considering: multiple form fields of the same name. For example, the following URL: script.py?a=1&a=2 What should GET['a'] return? There are three possibilities: return a list (or tuple) containing 1 and 2, or return 1 (the first value) or return 2 (the second value). The first has a huge disadvantage in that all of a sudden accessing the GET dictionary could return a list or a string - code will then have to start checking the type of the returned data before doing anything with it. The second and third have the disadvantage that some form data gets "lost" by the dictionary. PHP has an interesting way of dealing with this, based on special syntax used for the names of form elements. If you have two query string arguments of the same name, PHP over-writes the first with the second. However, if the form field names end in [] PHP creates an array of them instead. For example: script.py?a=1&a=2 GET['a'] == 2 script.py?a[]=1&a[]=2 GET['a'] == [1, 2] In fact, PHP extends this to allow for dictionary style data structures to be passed in from forms as well: script.py?a[first]=1&a[second]=2 GET['a'] == {'first': 1, 'second': 2} This is a pretty neat solution, but carries the slight disadvantage that information about the way an application is internally structured (i.e that it processes form input as a list or dictionary) is exposed in the application HTML. That said, from previousl experience with PHP it is an extremely powerful technique. For example, check out this example form for editing a blog entry:
Title:
Author:
Entry:
Submitting this form in PHP results in a dictionary style data structures called 'entry' being made available to the script, neatly encapsulating the data about the entry sent from the form. I'm sure there's an elegant solution to all of this, but I'm not sure what it is :) -- Simon Willison Web development weblog: http://simon.incutio.com/ From ianb at colorstudy.com Fri Oct 24 11:35:32 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 11:35:38 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: <3F994469.20304@bath.ac.uk> Message-ID: On Friday, October 24, 2003, at 10:25 AM, Simon Willison wrote: > David Fraser wrote: >> Fine. So we need a clever way of providing them in either form. I >> think using dictionaries for this is essential - even if it means >> defining a single dictionary that remembers which fields are get and >> which are post, and provides different wrappers to see the different >> elements. > > I am convinced that the neatest way of handling this is to replicate > PHP's form field dictionaries, in particular these three: > > GET - the form data that came in via GET > POST - the form data that came in via POST > REQUEST - the above two dictionaries combined (POST over-riding GET) > > However, there is one special case that needs considering: multiple > form fields of the same name. For example, the following URL: > > script.py?a=1&a=2 > > What should GET['a'] return? There are three possibilities: return a > list (or tuple) containing 1 and 2, or return 1 (the first value) or > return 2 (the second value). The first has a huge disadvantage in that > all of a sudden accessing the GET dictionary could return a list or a > string - code will then have to start checking the type of the > returned data before doing anything with it. The second and third have > the disadvantage that some form data gets "lost" by the dictionary. I think this is already really decided -- if (and only if) there are multiple values, then a list should appear in the output. I.e., {'a': ['1', '2']}. This is how cgi works, and how almost all Python request objects work. When there's near-consensus in previous implementations, I think we should keep the conventional behavior. Plus, it means less things to decide, which should make the design faster to create. There are more elegant ways to deal with this, but they are also more complex, and there's no One Right Way. The conventional way throws nothing away, can be adapted to any previously existing URL scheme, and does not require any trust in the user agent to submit correct input. It's also easy to go from the conventional input to another style of input, but difficult to go the other way. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From barry at python.org Fri Oct 24 11:44:49 2003 From: barry at python.org (Barry Warsaw) Date: Fri Oct 24 11:44:55 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: References: Message-ID: <1067010289.11634.378.camel@anthem> On Fri, 2003-10-24 at 11:35, Ian Bicking wrote: > I think this is already really decided -- if (and only if) there are > multiple values, then a list should appear in the output. I.e., {'a': > ['1', '2']}. This is how cgi works, and how almost all Python request > objects work. When there's near-consensus in previous implementations, > I think we should keep the conventional behavior. Plus, it means less > things to decide, which should make the design faster to create. I agree that it's basically decided, but I want to be clear in any standard that we develop, exactly what the return types are in that case, and/or how to test for one or the other. E.g. you can't use len() because both lists and strings are sequences. If the way to type test the value is going to be "isinstance(val, list)", let's set that in stone. Here's another alternative, if Python 2.2 is the minimal requirement (and I think it should be, if not Python 2.3). Return string and list subclasses, which will act perfectly string-like and list-like in those contexts, but which support extended protocols. See attached example. >>> show(s) single value: hello >>> show(l) multi value: hello, world -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: websig.py Type: text/x-python Size: 307 bytes Desc: Url : http://mail.python.org/pipermail/web-sig/attachments/20031024/0ba07080/websig.py From grisha at modpython.org Fri Oct 24 11:47:19 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 24 11:47:24 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <200310241009.39834.t.vandervossen@fngtps.com> References: <200310240946.37859.t.vandervossen@fngtps.com> <3F98DB8B.1000605@sjsoft.com> <200310241009.39834.t.vandervossen@fngtps.com> Message-ID: <20031024113028.P26153@onyx.ispol.com> On Fri, 24 Oct 2003, Thijs van der Vossen wrote: > On Friday 24 October 2003 09:58, David Fraser wrote: > > Thijs van der Vossen wrote: > > >Mod_python is probably _not_ a good starting point for a generic web > > > server API because it's purpose is to directly expose the Apache API. It > > > makes no sense to model a generic interface on a mostly direct mapping to > > > the internals of _one_ specific server. > > > > I'm not saying that the interface should be modelled on mod_python. > > Ok. That's clear then. I don't know how useful the mod_python interface would be since, as Thijs pointed out, it exposes the Apache API, with only a slight effort to make it user-friendly. All of the "cool" parts of mod_python (publisher, psp) exist as a layer on top of the core API. However it would be nice if whatever API we come up with, it would be *implementable* within mod_python. Of particular concern would the multi-process nature of httpd, which implies that one cannot simply assume that the memory space is global to all requests and there needs to be an inter-process communication/locking mechanism if state is to be maintained on the server side (easier said than done). As a sidenote, a multi-process server is a feature, not a limitation, because it works around the Python GIL bottleneck, allowing you to take advantage of multiprocessor machines, which is a very important consideration on the high-end applications. Grisha From ianb at colorstudy.com Fri Oct 24 11:48:56 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 11:49:01 2003 Subject: [Web-SIG] Client-side API In-Reply-To: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com> Message-ID: <939C9B28-0639-11D8-A49B-000393C2D67E@colorstudy.com> On Thursday, October 23, 2003, at 09:11 PM, Bill Janssen wrote: > Another possibility would be to mimic the Java 1.4.1 libraries for the > Web. For instance, we could have the "URL" object, which has a method > called "open()", While I wouldn't necessarily endorse copying a Java library -- because there's good stuff in Python already -- there was some ideas about unifying filesystem and URL access with the path module as a model: http://www.jorendorff.com/articles/python/path/ http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF- 8&threadm=mailman.1057651032.22842.python-list%40python.org -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Fri Oct 24 11:54:01 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 11:54:07 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <3F98D709.9070806@sjsoft.com> Message-ID: <4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com> Some minor nits... On Friday, October 24, 2003, at 02:38 AM, David Fraser wrote: > For CGI, it would seem to make sense that you do something like the > following: > res = web.cgi.HTTPResponse(sys.stdout) req = web.cgi.HTTPRequest() res = req.response > res.content_type = 'text/html' res.setHeader('content-type', 'text/html') # I don't really see a reason that this header needs special attention > res.set_cookie('name', 'Simon') > res['X-Additional-Header'] = 'Another header' res.setHeader('X-additional-header', 'Another header') # It's not clear what dictionary access to the response object would mean. # res.headers['X-additional-header'] = 'Another header' might be okay # but it makes it difficult to add multiple headers by the same name -- but # I don't know if HTTP ever really calls for that anyway. > res.send_headers() > res.write('

Hi there

\n%s' % body) # This, but also: res.write('

Hi there

\n%s' % body) res.setHeader('X-Yet-Another-Header', 'Yet another value') res.commit() # res.flush()? Sends headers *and* any body, can be called multiple times res.setHeader('Content-type', 'text/plain') # raises exception res.write('') # does not raise exception -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From grisha at modpython.org Fri Oct 24 11:56:40 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 24 11:58:18 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F994469.20304@bath.ac.uk> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> Message-ID: <20031024114945.M26153@onyx.ispol.com> On Fri, 24 Oct 2003, Simon Willison wrote: > script.py?a=1&a=2 > > What should GET['a'] return? I think this is adequately addressed in the FieldStorage starting with Python 2.2 with getfirst() and getlist(): http://www.python.org/doc/current/lib/node404.html Grisha From ianb at colorstudy.com Fri Oct 24 12:01:51 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 12:03:27 2003 Subject: [Web-SIG] Request/Response features In-Reply-To: <3F98DD29.30706@sjsoft.com> Message-ID: <613CA59C-063B-11D8-A49B-000393C2D67E@colorstudy.com> On Friday, October 24, 2003, at 03:04 AM, David Fraser wrote: >>> Some other comments: >>> pathInfo/requestURI >>> would be good to have some consistency between these names >> >> They are mostly based off their CGI environment equivalent. > > OK, now I understand, but is this a good way to name things for the > future? Pluses: CGI request variable names are being used by nearly every framework. They are translatable from other languages, where they are also often used. They are familiar and documented. If we don't use CGI names, then we can't use the names at all, as it would be a bad false cognate to use (for instance) requestURI and extraPath. Minuses: CGI variables don't have well standardized (or implemented) semantics. IIS and Apache (at least) send slightly different things. But we can probably paper over those differences as they arise. >>> getFieldDict >>> It would be great if the user could set the behaviour they want for >>> multiple keys. >>> I know I *always* want to discard any extra values. Including an >>> option to do this rather than return a list would prevent lots of >>> people doing post-processing >> >> That seems to difficult to define. I don't think there should be >> customizations, because that makes it too difficult to work in a >> heterogeneous environment. If you turn that setting on and some >> application you are using needs it off, then you get a configuration >> mess. Wrappers could provide more friendly interfaces. > > If you defined a setField method as you said above, then people could > override it to throw away duplicate values. Maybe this is the way to > go No, I retract my suggestion for setField ;) This could be handled by a wrapper, like: def getField(self, key): try: value = HTTPRequest.getField(self, key) except KeyError: value = HTTPRequest.getField(self, key + '[]') if not isinstance(value, list): return [value] return value if isinstance(value, list): return value[0] return value This can be adapted to whatever equivalent of getField we use. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From neel at mediapulse.com Fri Oct 24 12:02:18 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Fri Oct 24 12:03:41 2003 Subject: [Web-SIG] Re: Form field dictionaries Message-ID: > I agree that it's basically decided, but I want to be clear in any > standard that we develop, exactly what the return types are in that > case, and/or how to test for one or the other. E.g. you > can't use len() > because both lists and strings are sequences. If the way to type test > the value is going to be "isinstance(val, list)", let's set that in > stone. > I've always used if type(val) == type([]), because I can never remember type names =p Mike From greg-keyword-python.0eae23 at subrosa.ca Fri Oct 24 12:23:49 2003 From: greg-keyword-python.0eae23 at subrosa.ca (Gregory Collins) Date: Fri Oct 24 12:17:22 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031024114945.M26153@onyx.ispol.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> Message-ID: <87he1yhaei.fsf@genghis.subrosa.ca> "Gregory (Grisha) Trubetskoy" writes: > On Fri, 24 Oct 2003, Simon Willison wrote: > > > script.py?a=1&a=2 > > > > What should GET['a'] return? > > I think this is adequately addressed in the FieldStorage starting with > Python 2.2 with getfirst() and getlist(): I agree, I think this is the appropriate solution; I'd rather see all the typechecking pushed down into the library function rather than being exposed to the programmer. If the argument I'm looking for doesn't make sense as a list then I wouldn't care if it was given twice; if I'm expecting something to be a list then I'd want it to be a list even if it were empty or singleton. Gregory D. Collins From davidf at sjsoft.com Fri Oct 24 12:28:28 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 12:28:33 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031024114945.M26153@onyx.ispol.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> Message-ID: <3F99532C.9020308@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >On Fri, 24 Oct 2003, Simon Willison wrote: > > > >>script.py?a=1&a=2 >> >>What should GET['a'] return? >> >> > >I think this is adequately addressed in the FieldStorage starting with >Python 2.2 with getfirst() and getlist(): > >http://www.python.org/doc/current/lib/node404.html > >Grisha > > > That's fine, but I think it's important that these methods are available as an addition to a standard dictionary interface. I think the key point is, if somebody wants a list of values, they probably know that they want a list. It's very difficult to write code by accident that would handle a list of values as well as a string. So if somebody knows they want a list in certain circumstances, they could call getlist() But I think the default dictionary return value should be the same as getfirst(). That saves endless checks for lists for those who don't need them. David From cs1spw at bath.ac.uk Fri Oct 24 12:31:13 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 12:31:34 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: References: Message-ID: <3F9953D1.5050604@bath.ac.uk> Ian Bicking wrote: > I think this is already really decided -- if (and only if) there are > multiple values, then a list should appear in the output. I.e., {'a': > ['1', '2']}. This is how cgi works, and how almost all Python request > objects work. I don't have enough practical Python web development experience to back this up, but it seems to me that this could lead to an awful lot of unhandled exceptions. For example: username = GET['username'].lower() This would work fine provided no one fed two username values to the script, at which point it would die with an exception: Traceback (most recent call last): File "", line 1, in -toplevel- GET['username'].lower() AttributeError: 'list' object has no attribute 'lower' Adding exception handling to every piece of code that accesses string values from a form field dictionary would be a pretty tall order. One alternative might be some kind of enhanced form field access object that adds a layer of validation. For example: form = web.cgi.ValidatingForm() try: username = form.get_string('username') id = form.get_int('id') permissions = form.get_list('permissions') except ValidationError: print 'Invalid form data' redisplayform() Form validation like this though is really a whole other topic. -- Simon Willison Web development weblog: http://simon.incutio.com/ From davidf at sjsoft.com Fri Oct 24 12:31:13 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 12:31:36 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: References: Message-ID: <3F9953D1.6050407@sjsoft.com> Michael C. Neel wrote: >>I agree that it's basically decided, but I want to be clear in any >>standard that we develop, exactly what the return types are in that >>case, and/or how to test for one or the other. E.g. you >>can't use len() >>because both lists and strings are sequences. If the way to type test >>the value is going to be "isinstance(val, list)", let's set that in >>stone. >> >> >I've always used if type(val) == type([]), because I can never remember >type names =p > >Mike > > If the list is actually a class derived from a list, then that won't catch it. That's why isinstance is used here David From barry at python.org Fri Oct 24 12:36:52 2003 From: barry at python.org (Barry Warsaw) Date: Fri Oct 24 12:36:57 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: <3F9953D1.6050407@sjsoft.com> References: <3F9953D1.6050407@sjsoft.com> Message-ID: <1067013412.10257.9.camel@anthem> On Fri, 2003-10-24 at 12:31, David Fraser wrote: > If the list is actually a class derived from a list, then that won't > catch it. That's why isinstance is used here Right, and remember since Python 2.2, the type names (well for strings and lists) are what used to be built-in coercion functions in earlier Pythons. But I'd love to see a solution that didn't require type tests. -Barry From ianb at colorstudy.com Fri Oct 24 12:36:59 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 12:37:04 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F99532C.9020308@sjsoft.com> Message-ID: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com> On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote: > That's fine, but I think it's important that these methods are > available as an addition to a standard dictionary interface. > I think the key point is, if somebody wants a list of values, they > probably know that they want a list. > It's very difficult to write code by accident that would handle a list > of values as well as a string. > So if somebody knows they want a list in certain circumstances, they > could call getlist() > But I think the default dictionary return value should be the same as > getfirst(). > That saves endless checks for lists for those who don't need them. Every time I have encountered an unexpected list it has been because of a bug somewhere else in my code. I might use a getone() method that threw some exception when a list was encountered, but I'd *never* want to use getfirst(). getfirst() is sloppy programming. (getlist() is perfectly fine though) -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From cs1spw at bath.ac.uk Fri Oct 24 12:37:06 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 12:37:16 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <87he1yhaei.fsf@genghis.subrosa.ca> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <87he1yhaei.fsf@genghis.subrosa.ca> Message-ID: <3F995532.9040309@bath.ac.uk> Gregory Collins wrote: >>I think this is adequately addressed in the FieldStorage starting with >>Python 2.2 with getfirst() and getlist(): > > I agree, I think this is the appropriate solution; I'd rather see all > the typechecking pushed down into the library function rather than > being exposed to the programmer. If the argument I'm looking for > doesn't make sense as a list then I wouldn't care if it was given > twice; if I'm expecting something to be a list then I'd want it to be > a list even if it were empty or singleton. The vast majority of data sent from forms coems in as simple name/value pairs, which are crying out to be accessed from a dictionary. This is my problem with the current FieldStorage() class - it forces you to write code like this: username = form.getfirst("username", "") When code like this is far more intuitive: username = form['username'] The extended syntax is there to deal with the very rare case of multiple dataa arriving for the same key. Is it really worth doubling the length of the code needed to access the form variables for the sake of a very rare edge case? This is why I'd prefer to find an alternative solution. -- Simon Willison Web development weblog: http://simon.incutio.com/ From barry at python.org Fri Oct 24 12:40:10 2003 From: barry at python.org (Barry Warsaw) Date: Fri Oct 24 12:40:14 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F99532C.9020308@sjsoft.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <3F99532C.9020308@sjsoft.com> Message-ID: <1067013609.10257.12.camel@anthem> BTW, I'll note interestingly enough that in some recent cgi-ish applications I've written, I've always wanted __getitem__() to return a list. If there's one form variable by that name, I coerce the singleton string to a list of one element. For various reasons, it's been quite handy to treat everything uniformly in this manner. Maybe that's another option for the library. -Barry From greg-keyword-python.0eae23 at subrosa.ca Fri Oct 24 12:50:01 2003 From: greg-keyword-python.0eae23 at subrosa.ca (Gregory Collins) Date: Fri Oct 24 12:43:06 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F995532.9040309@bath.ac.uk> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk> Message-ID: <87d6cmh96u.fsf@genghis.subrosa.ca> Simon Willison writes: > The extended syntax is there to deal with the very rare case of > multiple dataa arriving for the same key. Is it really worth doubling > the length of the code needed to access the form variables for the > sake of a very rare edge case? This is why I'd prefer to find an > alternative solution. So if you access the object as a dictionary, should it behave as getfirst() or not? I'd argue for the former, in the rare instances you'd want a list I don't think it's onerous to have to type out obj.getlist("foo"). Gregory D. Collins From cs1spw at bath.ac.uk Fri Oct 24 13:01:49 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 13:02:34 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F99532C.9020308@sjsoft.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <3F99532C.9020308@sjsoft.com> Message-ID: <3F995AFD.1010607@bath.ac.uk> David Fraser wrote: > It's very difficult to write code by accident that would handle a list > of values as well as a string. > So if somebody knows they want a list in certain circumstances, they > could call getlist() > But I think the default dictionary return value should be the same as > getfirst(). > That saves endless checks for lists for those who don't need them. +1 - that sounds like a nice compromise Simon From ianb at colorstudy.com Fri Oct 24 13:02:23 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 13:02:41 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <1067013609.10257.12.camel@anthem> Message-ID: On Friday, October 24, 2003, at 11:40 AM, Barry Warsaw wrote: > BTW, I'll note interestingly enough that in some recent cgi-ish > applications I've written, I've always wanted __getitem__() to return a > list. If there's one form variable by that name, I coerce the > singleton > string to a list of one element. For various reasons, it's been quite > handy to treat everything uniformly in this manner. > > Maybe that's another option for the library. No, please no options! You already could get this through some getlist() method, or just make a wrapper, or just fiddle with the request in place: req.lfields = {} for name, value in req.fields(): # or whatever if not isinstance(value, list): value = [value] request.lfields[name] = value Any of these will leave the request object usable by other code that expects normal behavior. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From sholden at holdenweb.com Fri Oct 24 13:17:49 2003 From: sholden at holdenweb.com (Steve Holden) Date: Fri Oct 24 13:22:51 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <3F98D5C7.8030802@sjsoft.com> Message-ID: > >The one problem I see with mod_python is its defaulting > behavior - you > >can get the same content several different ways. Specifically, the > >following URLs > > > > http://server/ > > http://server/index.py > > http://server/index.py.index > > > >all refer to the same content, and this makes it rather difficult to > >come up with a scheme for producing sensible relative URLs -- the > >browsers don't always interpret the path the same way the > server does -- > >which in turn can make it difficult to produce easily portable web > >content. > > > > > Hmmm ... looks like you are using AddHandler for .py files. I > generally > find that placing the Python files outside of the web directory, in > libraries, works better. Then you can use SetHandler to get > mod_python > to handle everything, or AddHandler for specific file types > to get it to > handle some URLs. It makes more sense to me to have a URL of > index.htm > rather than index.py (why should the user care what I'm using > to produce > the file?) > > Hope that is relevant and/or helpful > Both, thanks very much. I only recently started using mod_python - it's already been pointed out that my complaint is specific to the publisher subsystem, and now I have what looks like a much better idea. Thanks a lot. The real problem is I just did what many new users do, and followed along from the documentation. Which, by the way, is rather better than for many other pieces of open source software, but there's always room for improvement. Thanks again! regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From sholden at holdenweb.com Fri Oct 24 13:27:08 2003 From: sholden at holdenweb.com (Steve Holden) Date: Fri Oct 24 13:32:03 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: <1067010289.11634.378.camel@anthem> Message-ID: > -----Original Message----- > From: web-sig-bounces+sholden=holdenweb.com@python.org > [mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of > Barry Warsaw > Sent: Friday, October 24, 2003 11:45 AM > To: Ian Bicking > Cc: web-sig@python.org > Subject: Re: [Web-SIG] Re: Form field dictionaries > > > On Fri, 2003-10-24 at 11:35, Ian Bicking wrote: > > > I think this is already really decided -- if (and only if) > there are > > multiple values, then a list should appear in the output. > I.e., {'a': > > ['1', '2']}. This is how cgi works, and how almost all > Python request > > objects work. When there's near-consensus in previous > implementations, > > I think we should keep the conventional behavior. Plus, it > means less > > things to decide, which should make the design faster to create. > > I agree that it's basically decided, but I want to be clear in any > standard that we develop, exactly what the return types are in that > case, and/or how to test for one or the other. E.g. you > can't use len() > because both lists and strings are sequences. If the way to type test > the value is going to be "isinstance(val, list)", let's set that in > stone. > > Here's another alternative, if Python 2.2 is the minimal requirement > (and I think it should be, if not Python 2.3). Return string and list > subclasses, which will act perfectly string-like and > list-like in those > contexts, but which support extended protocols. See attached example. > > >>> show(s) > single value: hello > >>> show(l) > multi value: hello, world > I've argued in the past that the correct approach is to determine in advance which fields can take multiple values, and reject multiple values for other fields as an error early in the form processing. The reason I say this is that it's annoying and inefficient to distinguish between a possibly-multi-valued field with only one value and a possibly-multi-valued field with multiple values. Here's an exerpt from a message I sent to ReportLab colleagues, and although it refers to a specific framework the intent should be obvious. The bottom line is that if a field *could* have multiple values I *always* want to see it as a list, even if the list only has a single member. And, of course, I *know* I'm right about this :-) > def getCGIParams(*names): > "returns dictionary of parameters found in the cgi script" > dictionary = {} > import cgi > form = cgi.FieldStorage() > for name in form.keys(): > value = form.getvalue(name) > if isinstance(value, list): > if name not in names: > raise IllegalMultiValue > dictionary[name] = [quoteValue(v) for v in value] > else: > if name in names: > dictionary[name] = [quotevalue(value)] > else: > dictionary[name] = quotevalue(value) > return dictionary > > This has the advantages that a) possibly-multi-valued arguments > are always represented as lists, and client code can use > len(arglist) to determine iteration count and subscripting > to select items, and b) we trap much earlier the case > where we see multiple values of arguments that are only > supposed to occur once. regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From ianb at colorstudy.com Fri Oct 24 13:39:02 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 13:39:06 2003 Subject: [Web-SIG] Exceptions (was: Form field dictionaries) In-Reply-To: Message-ID: On Friday, October 24, 2003, at 12:27 PM, Steve Holden wrote: > I've argued in the past that the correct approach is to determine in > advance which fields can take multiple values, and reject multiple > values for other fields as an error early in the form processing. This brings up error handling. If you encounter a bad request (e.g., mutliple fields where it's not expected), what do you do? An internal exception isn't good, because it's not really an internal error -- you get Internal Server Error, log messages imply your code is broken, etc. It would be nice instead to be able to throw an exception that would be translated into the proper response (in the CGI environment this could use a process like cgitb, in other environments the hook is a bit easier to put in). Of course, once you have a bad request exception, redirect, forbidden, authentication required, and other responses all make sense as exceptions too... -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From davidf at sjsoft.com Fri Oct 24 13:53:14 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 13:53:25 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: References: Message-ID: <3F99670A.8050901@sjsoft.com> Steve Holden wrote: >>-----Original Message----- >>From: web-sig-bounces+sholden=holdenweb.com@python.org >>[mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of >>Barry Warsaw >>Sent: Friday, October 24, 2003 11:45 AM >>To: Ian Bicking >>Cc: web-sig@python.org >>Subject: Re: [Web-SIG] Re: Form field dictionaries >> >> >>On Fri, 2003-10-24 at 11:35, Ian Bicking wrote: >> >> >> >>>I think this is already really decided -- if (and only if) >>> >>> >>there are >> >> >>>multiple values, then a list should appear in the output. >>> >>> >>I.e., {'a': >> >> >>>['1', '2']}. This is how cgi works, and how almost all >>> >>> >>Python request >> >> >>>objects work. When there's near-consensus in previous >>> >>> >>implementations, >> >> >>>I think we should keep the conventional behavior. Plus, it >>> >>> >>means less >> >> >>>things to decide, which should make the design faster to create. >>> >>> >>I agree that it's basically decided, but I want to be clear in any >>standard that we develop, exactly what the return types are in that >>case, and/or how to test for one or the other. E.g. you >>can't use len() >>because both lists and strings are sequences. If the way to type test >>the value is going to be "isinstance(val, list)", let's set that in >>stone. >> >>Here's another alternative, if Python 2.2 is the minimal requirement >>(and I think it should be, if not Python 2.3). Return string and list >>subclasses, which will act perfectly string-like and >>list-like in those >>contexts, but which support extended protocols. See attached example. >> >> >> >>>>>show(s) >>>>> >>>>> >>single value: hello >> >> >>>>>show(l) >>>>> >>>>> >>multi value: hello, world >> >> >> >I've argued in the past that the correct approach is to determine in >advance which fields can take multiple values, and reject multiple >values for other fields as an error early in the form processing. > >The reason I say this is that it's annoying and inefficient to >distinguish between a possibly-multi-valued field with only one value >and a possibly-multi-valued field with multiple values. > >Here's an exerpt from a message I sent to ReportLab colleagues, and >although it refers to a specific framework the intent should be obvious. >The bottom line is that if a field *could* have multiple values I >*always* want to see it as a list, even if the list only has a single >member. And, of course, I *know* I'm right about this :-) > > Agreed, you should get a list, and only a list, if you want a list. You should otherwise get a single value regardless. >>def getCGIParams(*names): >> "returns dictionary of parameters found in the cgi script" >> dictionary = {} >> import cgi >> form = cgi.FieldStorage() >> for name in form.keys(): >> value = form.getvalue(name) >> if isinstance(value, list): >> if name not in names: >> raise IllegalMultiValue >> dictionary[name] = [quoteValue(v) for v in value] >> else: >> if name in names: >> dictionary[name] = [quotevalue(value)] >> else: >> dictionary[name] = quotevalue(value) >> return dictionary >> >>This has the advantages that a) possibly-multi-valued arguments >>are always represented as lists, and client code can use >>len(arglist) to determine iteration count and subscripting >>to select items, and b) we trap much earlier the case >>where we see multiple values of arguments that are only >>supposed to occur once. >> This kind of thing could be fairly simply implemented on top of the API, so the question is, is it common and important enough to be included, and what should be the syntax David From davidf at sjsoft.com Fri Oct 24 14:00:30 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 24 14:00:53 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com> References: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com> Message-ID: <3F9968BE.1010009@sjsoft.com> Ian Bicking wrote: > On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote: > >> That's fine, but I think it's important that these methods are >> available as an addition to a standard dictionary interface. >> I think the key point is, if somebody wants a list of values, they >> probably know that they want a list. >> It's very difficult to write code by accident that would handle a >> list of values as well as a string. >> So if somebody knows they want a list in certain circumstances, they >> could call getlist() >> But I think the default dictionary return value should be the same as >> getfirst(). >> That saves endless checks for lists for those who don't need them. > > > Every time I have encountered an unexpected list it has been because > of a bug somewhere else in my code. I might use a getone() method > that threw some exception when a list was encountered, but I'd *never* > want to use getfirst(). getfirst() is sloppy programming. (getlist() > is perfectly fine though) There seems to be a lot of agreement on this... So let's take it that the interface will be a dictionary, with an extra method defined, getlist, which will return multiple items if multiple items were defined, or a list containing a single item otherwise. The next question is, how do we handle the Get/Post/Both situation? One way would be to have methods on the request object that return the desired dictionary Somebody also suggested including Cookies, as is done in PHP - I'm not sure this is a good idea David From grisha at modpython.org Fri Oct 24 14:55:07 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 24 14:55:13 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F9968BE.1010009@sjsoft.com> References: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com> <3F9968BE.1010009@sjsoft.com> Message-ID: <20031024141819.R70244@onyx.ispol.com> On Fri, 24 Oct 2003, David Fraser wrote: > The next question is, how do we handle the Get/Post/Both situation? Just to clarify nomenclature - POST /blah/blah.py?foo=bar is a valid request. The part after ? is called "query information", this is defined in RFC 1808 and RFC1738. CGI (which has no formal RFC, but there is Ken Coar's excellent draft) introduces something called "path-info", but its meaning is rather vague outside of cgi since it relies on a notion of a script, which isn't very meaningful in most non-CGI environments. The data submitted in the body of the POST request is called "form data" and I believe is described in RFC 1867. I think that query information and form data can be combined in a single mapping object, because if you want just query data, you can always parse the url directly via urlparse, and if you want only form data, you can read and parse it directly as a mime object. Path-info I think should be left where it belongs - in the cgi-specific module. Grisha From sholden at holdenweb.com Fri Oct 24 14:52:20 2003 From: sholden at holdenweb.com (Steve Holden) Date: Fri Oct 24 14:57:15 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F9968BE.1010009@sjsoft.com> Message-ID: > -----Original Message----- > From: web-sig-bounces+sholden=holdenweb.com@python.org > [mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of > David Fraser > Sent: Friday, October 24, 2003 2:01 PM > To: Ian Bicking > Cc: web-sig@python.org > Subject: Re: [Web-SIG] Form field dictionaries > > > Ian Bicking wrote: > > > On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote: > > > >> That's fine, but I think it's important that these methods are > >> available as an addition to a standard dictionary interface. > >> I think the key point is, if somebody wants a list of values, they > >> probably know that they want a list. > >> It's very difficult to write code by accident that would handle a > >> list of values as well as a string. > >> So if somebody knows they want a list in certain > circumstances, they > >> could call getlist() > >> But I think the default dictionary return value should be > the same as > >> getfirst(). > >> That saves endless checks for lists for those who don't need them. > > > > > > Every time I have encountered an unexpected list it has > been because > > of a bug somewhere else in my code. I might use a getone() method > > that threw some exception when a list was encountered, but > I'd *never* > > want to use getfirst(). getfirst() is sloppy programming. > (getlist() > > is perfectly fine though) > > There seems to be a lot of agreement on this... > So let's take it that the interface will be a dictionary, > with an extra > method defined, getlist, which will return multiple items if multiple > items were defined, or a list containing a single item otherwise. > The next question is, how do we handle the Get/Post/Both situation? > One way would be to have methods on the request object that > return the > desired dictionary > Somebody also suggested including Cookies, as is done in PHP > - I'm not > sure this is a good idea > The only nit I would pick is to have getlist() return a list even when the response contained a single value. regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/pytho From ianb at colorstudy.com Fri Oct 24 15:11:54 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 15:12:02 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F9968BE.1010009@sjsoft.com> Message-ID: On Friday, October 24, 2003, at 01:00 PM, David Fraser wrote: > Ian Bicking wrote: >> Every time I have encountered an unexpected list it has been because >> of a bug somewhere else in my code. I might use a getone() method >> that threw some exception when a list was encountered, but I'd >> *never* want to use getfirst(). getfirst() is sloppy programming. >> (getlist() is perfectly fine though) > > There seems to be a lot of agreement on this... > So let's take it that the interface will be a dictionary, with an > extra method defined, getlist, which will return multiple items if > multiple items were defined, or a list containing a single item > otherwise. Additionally, getlist should return the empty list if the key isn't found, as this follows naturally (but a KeyError for normal access when a value isn't found). I also think cgi's default of throwing away empty fields should not be supported, even optionally. But I haven't really heard reaction to the idea that you get a BadRequest or other exception if you try to get key that has multiple values. Throwing information away is bad, and unPythonic (though very PHPish). I don't think we should copy PHP here. I have *never* encountered a situation where throwing away extra values found in the query is the correct solution. Either the form that is doing the submission has a bug, or else the script needs to figure out some (explicit!) way to handle the ambiguity. We also need a way to get at the raw values. I suppose you could do: fields = {} for key in req.fields.items(): v = req.getlist(key) if len(v) == 1: fields[key] = v[0] else: fields[key] = v But that's kind of annoying, since the request object probably contains this kind of dictionary already. This will be required for backward compatibility, if we want this request to be wrapped to support existing request interfaces. As long as we're thinking about type information, there's also file uploads. cgi makes them look like normal fields, but at considerable expense to the overall API (always using .value). Everyone else puts the file-like objects into the variable, so you might end up testing: val = req['somefield'] try: val = val.read() except AttributeError: pass Most of the time this isn't required, as you will seldom get a file upload from a source where you don't expect it. But though less common, it's the same basic issue as the list issue. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Fri Oct 24 15:12:26 2003 From: jjl at pobox.com (John J Lee) Date: Fri Oct 24 15:12:31 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <3F98DB02.3010407@sjsoft.com> References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com> Message-ID: On Fri, 24 Oct 2003, David Fraser wrote: > Thijs van der Vossen wrote: [...] > Actually HTML parsing would be fantastic for testing web applications, > so maybe that could be related to the Web API. There's already HTML parsing in the std lib, of course. Do you mean a DOM-like API of some kind? What in particular? I'm not certain there would be agreement about what is needed. > The parsing doesn't have to be very intelligent or do validation, HTML > syntax is fairly simple. [...] Hmm, well, it's simple when it's valid, and especially when it doesn't miss out optional tags, etc. Could you specify more closely what you have in mind? John From jjl at pobox.com Fri Oct 24 15:31:03 2003 From: jjl at pobox.com (John J Lee) Date: Fri Oct 24 15:31:27 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <3F993C3C.5080402@metamoof.net> References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com> <3F993C3C.5080402@metamoof.net> Message-ID: On Fri, 24 Oct 2003, Moof wrote: > David Fraser wrote: > > > > Actually HTML parsing would be fantastic for testing web > applications, > so maybe that could be related to the Web API. > > > Actually, that is a very important point. Many python programmers are > fans of Test-driven development. I'm currently developing an app with > Webware and Cheetah, and find it very difficult to write tests for a lot > of the stuff I do. This is mostly due to a huge amount of background > work I need to do to set up an emulation environment first (make sure my > request and session objects work correctly as far as I need them to for > my testing, replacing the Page write and writeln methods, and so on) and > even then, verifying a whole generated page is a pain. Isn't it the fault of the framework you're using if it doesn't make unit testing easy? Still, I guess it's true that HTML parsing is a necessary part of some unit tests (not only functional tests). > So a standard HTML parser would be nice, as well as keeping TDD in mind > when we design request and response (and possibly session) objects. We already have an HTML parser (two, in fact). > > The parsing doesn't have to be very intelligent or do validation, > HTML > syntax is fairly simple. > > I think that does belong in the standard library. > > > Speaking of validation, a sort of standard form validation library would > be nice: something to say "I'm expecting this value to be an int between > 1-31" or "I'm expecting this to be a string with the following legal > characters" and so on. It's not that difficult to write yourself, but I > seem to find myself reinventing the wheel every time I do. A standard > "best practice" way of doing this would be wonderful. I guess that would look similar to ClientForm? If not, what? I'm not enthusiastic about putting something like that in the std lib. One reason is that, unless you build it on top of a DOM-like API, you end up with a library that gets you 'so far and no further' -- as soon as you need to know what that URL is (the one underneath the third table from the top), you're stuck, because the parser that built the forms object model didn't record that. So, it makes a lot of sense to build this kind of forms- and tables-parsing code on top of a DOM-like API that represents the whole document, not just the forms and/or the tables. And if you're going to be DOM-*like*, it makes sense to do it on top of the HTML DOM *itself*, so you can support embedded scripting. But HTML DOM 'as deployed' in browsers is not pretty, and doesn't really belong in the standard library. Well, that was the train of thought that lead to DOMForm, anyway. I can see that embedded scripting might be of little interest to many people, so maybe there's a place for a Pythonic HTML DOM-like API in the std lib. Does anybody else care about interoperability with the HTML DOM proper, or is it just me? John From gstein at lyra.org Fri Oct 24 16:20:28 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 24 16:26:13 2003 Subject: [Web-SIG] [server-side] request/response objects Message-ID: <20031024132028.C15765@lyra.org> In the most recent incarnation of a webapp of mine (subwiki), I almost went with a request/response object paradigm and even started a bit of refactoring along those lines. However, I ended up throwing out that dual-object concept. When you stop and think about it: *every* request object will have a matching response object. Why have two objects if they come in pairs? You will never see one without the other, and they are intrinsically tied to each other. So why separate them? I set up the subwiki core to instantiate a "handler" each time a request comes in. That Handler instance provides access to the request info, and is the conduit for generating the response. The app dispatches to the appropriate command function, passing the handler. The Handler is actually set up as a base class, with two subclasses so far: cgi, and cmdline. This lets me do some testing from the command line, along with the standard cgi model of access. At some point, I'll implement a mod_python subclass to do the request/response handling. (as a side note, I'll also point out that Apache operates this way, too; everything is based around the request_rec structure; it holds all the input data, output headers, the input and output filter chains, etc) In any kind of server-side framework design, I would give a big +1 to keeping it simple with a single "handler" type of object rather than a dual-object design. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Oct 24 16:22:03 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 24 16:27:06 2003 Subject: [Web-SIG] urllib docs (was: client-side support: PEP 268) In-Reply-To: ; from jjl@pobox.com on Fri, Oct 24, 2003 at 02:13:54AM +0100 References: <20031022165217.I11797@lyra.org> Message-ID: <20031024132203.D15765@lyra.org> On Fri, Oct 24, 2003 at 02:13:54AM +0100, John J Lee wrote: > Greg (or anybody else, for that matter), would you mind looking at these > doc bugs? > > http://www.python.org/sf/793553 > http://www.python.org/sf/798244 I avoid the urllib libraries in my client code, and tend to stick to just the httplib connections. I only barely get near those, so I don't have any particular knowledge to fix those doc issues. Sorry :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Oct 24 16:24:40 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 24 16:29:37 2003 Subject: [Web-SIG] file uploads In-Reply-To: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com>; from janssen@parc.com on Thu, Oct 23, 2003 at 07:08:29PM -0700 References: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com> Message-ID: <20031024132440.E15765@lyra.org> On Thu, Oct 23, 2003 at 07:08:29PM -0700, Bill Janssen wrote: > > I certainly think a function for doing file uploads would be great, > > though. >... > def https_post_multipart(host, port, selector, fields, files): > """ > Post fields and files to an http host as multipart/form-data. > FIELDS is a sequence of (name, value) elements for regular form fields. > FILES is a sequence of (name, filename [, value]) elements for data to be uploaded as files. > Return the server's response page. > """ > content_type, body = encode_multipart_formdata(fields, files) > h = httplib.HTTPS(host, port) Note that that class is deprecated. In any "new" code which is developed [by this SIG], please stick with the HTTP(S)Connection objects. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Oct 24 16:45:30 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 24 16:50:57 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <3F98DB02.3010407@sjsoft.com>; from davidf@sjsoft.com on Fri, Oct 24, 2003 at 09:55:46AM +0200 References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com> Message-ID: <20031024134530.F15765@lyra.org> On Fri, Oct 24, 2003 at 09:55:46AM +0200, David Fraser wrote: >... > Actually HTML parsing would be fantastic for testing web applications, > so maybe that could be related to the Web API. > The parsing doesn't have to be very intelligent or do validation, HTML > syntax is fairly simple. > I think that does belong in the standard library. There has been an HTML parser in the standard library for *YEARS*. I don't think there is an action item here. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Oct 24 16:52:36 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 24 16:57:35 2003 Subject: [Web-SIG] validation (was: Form field dictionaries) In-Reply-To: ; from sholden@holdenweb.com on Fri, Oct 24, 2003 at 01:27:08PM -0400 References: <1067010289.11634.378.camel@anthem> Message-ID: <20031024135236.G15765@lyra.org> On Fri, Oct 24, 2003 at 01:27:08PM -0400, Steve Holden wrote: >... > I've argued in the past that the correct approach is to determine in > advance which fields can take multiple values, and reject multiple > values for other fields as an error early in the form processing. Actually, I would upgrade this *way* past what you're thinking here. I think that every input/form field should have a definition and associated validation for it. Simple reason: cross-site scripting attacks. CSS attacks are a very real worry, and I think any core, form-handling on the server should provide easy mechanisms for dealing with it. Within ViewCVS, I process all incoming parameters. If the param is not recognized, an error is thrown. If the param does not match a specific format (e.g. numeric or matching ), then an error is thrown. ViewCVS doesn't have multi-valued parameters, but the validation concept could easily test a mismatch between single/multi values. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Oct 24 17:06:44 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 24 17:12:29 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com>; from ianb@colorstudy.com on Fri, Oct 24, 2003 at 10:54:01AM -0500 References: <3F98D709.9070806@sjsoft.com> <4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com> Message-ID: <20031024140644.H15765@lyra.org> On Fri, Oct 24, 2003 at 10:54:01AM -0500, Ian Bicking wrote: >... > > res.content_type = 'text/html' > > res.setHeader('content-type', 'text/html') > # I don't really see a reason that this header needs special attention Because it is a header that should almost always be set. Might even be required (dunno off the top of my head; RFC 2616 would say). Note that a character set should be set in there, too. Omitting the character set can cause problems, although I forget the exact nature of those. A few years ago, Apache httpd went and did a lot of work to add character sets into the Content-Type header; providing defaults and directives to make it easier and whatnot. It was done for some security reason, if I recall correctly. > > res.set_cookie('name', 'Simon') > > res['X-Additional-Header'] = 'Another header' > > res.setHeader('X-additional-header', 'Another header') > # It's not clear what dictionary access to the response object would > mean. > # res.headers['X-additional-header'] = 'Another header' might be okay > # but it makes it difficult to add multiple headers by the same name -- > but > # I don't know if HTTP ever really calls for that anyway. HTTP specifically discusses what happens when you see two headers with the same name: Some-Header: foo Some-Header: bar is equivalent to: Some-Header: foo, bar i.e. concatenate with a comma. While it is allowed, there is *generally* no reason for the API to enable writing separate headers, nor a reason to expose same-named headers as separate (i.e. just concatenate them internally). Note that I say "generally" because I've seen a client that could not deal properly with a long header value. By separating the tokens in the header across multiple instances, the client worked. IOW, a single line couldn't be longer than about 64 characters, but its internal value-concatenation worked just fine for long logical values. > > res.send_headers() > > res.write('

Hi there

\n%s' % body) > > # This, but also: > res.write('

Hi there

\n%s' % body) > res.setHeader('X-Yet-Another-Header', 'Yet another value') > res.commit() > # res.flush()? Sends headers *and* any body, can be called multiple > times > res.setHeader('Content-type', 'text/plain') > # raises exception That shouldn't raise an exception *until* you use a method which writes body-content. If you're talking about a method to send and *end* the header block, then it needs a better name. i.e. send_headers() (and yes, after calling that method, further headers would raise an exception) Both the client and server libraries should also respect the "Expect" header around the header/body transition point. See RFC 2616 for more info about that. Essentially, the client can send the headers, wait for the server to say "go ahead" (or throw errors back to the client), and *then* upload that 5 gigabyte body. It provides a way for the server to resolve authz/authn (or other) problems before you get into the business of uploading huge bodies. Cheers, -g -- Greg Stein, http://www.lyra.org/ From janssen at parc.com Fri Oct 24 17:55:26 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 17:55:58 2003 Subject: [Web-SIG] Client-side API In-Reply-To: Your message of "Thu, 23 Oct 2003 22:40:29 PDT." <3F98BB4D.2020100@bath.ac.uk> Message-ID: <03Oct24.145534pdt."58611"@synergy1.parc.xerox.com> > I'm all for replicating the capabilities of Java libraries (if they have > a good bunch of features) but replicating the exact APIs seems to me > like a lost opportunity to take advantage of Python's more expressive > syntax. Sure, I agree with that completely. I was thinking more of defining a few classes from which to export these API's we've been discussing. Bill From janssen at parc.com Fri Oct 24 17:59:11 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 17:59:39 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: Your message of "Fri, 24 Oct 2003 00:38:49 PDT." <3F98D709.9070806@sjsoft.com> Message-ID: <03Oct24.145919pdt."58611"@synergy1.parc.xerox.com> > res = web.cgi.HTTPResponse(sys.stdout) > res.content_type = 'text/html' > res.set_cookie('name', 'Simon') > res[ > res.send_headers() > res.write(' How about something like: result = web.cgi.HTTPResponse(request) result.set_content_type('text/html') result.set_cookie('name', 'Simon') result.set_header('X-Additional-Header', 'Another header') result.write('

Hi there

\n%s' % body)') ... I would assume that the 'result' instance would be auto-buffered and flushed when necessary (or when the flush() method is called, just as with file objects). Bill From janssen at parc.com Fri Oct 24 18:20:49 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 18:21:59 2003 Subject: [Web-SIG] Client-side API In-Reply-To: Your message of "Thu, 23 Oct 2003 22:40:29 PDT." <3F98BB4D.2020100@bath.ac.uk> Message-ID: <03Oct24.152055pdt."58611"@synergy1.parc.xerox.com> > Ugh. One of the things I love about Python is that unlike Java it > doesn't force you to have horribly verbose interfaces with dozens of > different classes. A URL is a string, file-like-objects are I agree that the Java standard library is horrible, and mainly because of all the different variations on different classes (plus the horrible lack of (1) multiple inheritance, and (2) operator overloading). However, classes are a useful way to partition an API -- I'm not ready to give up on them yet. Bill From janssen at parc.com Fri Oct 24 18:23:54 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 18:24:13 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: Your message of "Fri, 24 Oct 2003 00:55:46 PDT." <3F98DB02.3010407@sjsoft.com> Message-ID: <03Oct24.152400pdt."58611"@synergy1.parc.xerox.com> > The parsing doesn't have to be very intelligent or do validation, HTML > syntax is fairly simple. Successfully parsing HTML is incredibly complex, because of the variations in the various standards. > I think that does belong in the standard library. I agree, the ability should be there. My sense is that the existing XML packages do pretty well in handling both XHTML and HTML; the missing pieces are the ancillary standards, like CSS and Javascript. Bill From janssen at parc.com Fri Oct 24 18:27:59 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 18:28:23 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: Your message of "Fri, 24 Oct 2003 08:25:29 PDT." <3F994469.20304@bath.ac.uk> Message-ID: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com> First of all, let me say that I find FieldStorage so distasteful that the first thing I do is wrap it in a dictionary. Secondly, I don't think there's a need for separate GET and POST dictionaries -- there's only one kind of request at any one time, all you need is a REQUEST dictionary. Thirdly, the case where the same parameter is used more than once is so rare (and well-known to the implementor of the server script) that providing the value as a tuple in that case makes more sense than anything else. Bill From janssen at parc.com Fri Oct 24 18:29:54 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 18:30:25 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: Your message of "Fri, 24 Oct 2003 08:44:49 PDT." <1067010289.11634.378.camel@anthem> Message-ID: <03Oct24.153000pdt."58611"@synergy1.parc.xerox.com> Cute idea, Barry. It ties in with what I was thinking about for URLs, which would also be "string" subclasses, but support methods (OK, attributes) such as "scheme", "host", "port", etc. Bill From janssen at parc.com Fri Oct 24 18:36:18 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 18:36:50 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: Your message of "Fri, 24 Oct 2003 13:20:28 PDT." <20031024132028.C15765@lyra.org> Message-ID: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com> > When you stop and think about it: *every* request object will have a > matching response object. Why have two objects if they come in pairs? You > will never see one without the other, and they are intrinsically tied to > each other. So why separate them? > Mainly because they are two separate concepts. For instance, in my code, I always pass two arguments; one is the response, which the user manipulates to send back something to the caller, and the other is the request, which is basically a dictionary of all parameter values, plus a few extra special ones like 'path'. Bill From janssen at parc.com Fri Oct 24 18:41:27 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 18:41:50 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: Your message of "Fri, 24 Oct 2003 13:45:30 PDT." <20031024134530.F15765@lyra.org> Message-ID: <03Oct24.154130pdt."58611"@synergy1.parc.xerox.com> > There has been an HTML parser in the standard library for *YEARS*. I don't > think there is an action item here. It's not a particularl *good* HTML parser, though. It's just a simple syntax framework. It doesn't know about things like block elements, which elements take IDs and which don't, etc. When I was working on the Plucker distiller (a web crawler and HTML parser), I had to add oodles of code to it. Looking at the documentation for 2.3, I see "class HTMLParser: This is the basic HTML parser class. It supports all entity names required by the HTML 2.0 specification (RFC 1866). It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements." We can do better than that. 4.01, at least. Bill From barry at python.org Fri Oct 24 19:09:47 2003 From: barry at python.org (Barry Warsaw) Date: Fri Oct 24 19:09:55 2003 Subject: [Web-SIG] Re: Form field dictionaries In-Reply-To: <03Oct24.153000pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.153000pdt."58611"@synergy1.parc.xerox.com> Message-ID: <1067036987.10257.69.camel@anthem> On Fri, 2003-10-24 at 18:29, Bill Janssen wrote: > Cute idea, Barry. It ties in with what I was thinking about for URLs, > which would also be "string" subclasses, but support methods (OK, > attributes) such as "scheme", "host", "port", etc. +1! I'm using objects of this style in several places in my Mailman3 experiments, and it's really really cool. -Barry From cs1spw at bath.ac.uk Fri Oct 24 19:12:45 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 19:12:51 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F99B1ED.1090802@bath.ac.uk> Bill Janssen wrote: > Secondly, I don't think there's a need for separate GET and POST > dictionaries -- there's only one kind of request at any one time, all > you need is a REQUEST dictionary. I'm a huge fan of being able to distinguish between that data from a query string (GET data) and data that has been POSTed. I posted my reasons for caring about this to the Quixote mailing list a few days ago, but I'll repeat them here through the magic of copy and paste: 1. By differentiating between the two the same 'key' can be used twice. For example, a form submiting to a page called 'forms?id=1' can itself include an id attribute in the POST data without over-riding the id in the URL 2. My rule of thumb is "only modify data on a POST" - that way there's no chance of someone bookmarking a URL that updates a database (for example). 3. It is useful to be able to detect if a form has been submitted or not. In PHP, I frequently check for POSTed data and display a form if none is available, assume the form has been submitted if there is. 4. Security. While ensuring data has come from POST rather than GET provides absolutely no security against a serious intruder, it does discourage amateurs from "hacking the URL" to see if they can cause any damage. Security through obscurity admitedly, but it adds a bit of extra peace of mind. ( From http://mail.mems-exchange.org/pipermail/quixote-users/2003-October/002013.html ) The 2nd point above is supported by this quote from the HTTP spec: """ In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe" """ http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1 If you don't know which bit of data came from GET and which came from POST you have no way of ensuring that only POSTed data changes the "state" of data on the server. I accept that there is a great deal of convenience in only having to look in one place for data from both POST and GET, which is why I advocate a third dictionary (or dictionary like object) called something like REQUEST which combines the data from the other two. -- Simon Willison Web development weblog: http://simon.incutio.com/ From grisha at modpython.org Fri Oct 24 19:36:01 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 24 19:36:10 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com> Message-ID: <20031024192925.R71890@onyx.ispol.com> For what it's worth, I never liked the request/response separation either. I like a single object from which you can read() and to which you can write(), just like a file. Imagine if for file IO you had to have an object to read and another one to write? (I would agree that perhaps "request" is a misnomer, but I can't think of anything better) On Fri, 24 Oct 2003, Bill Janssen wrote: > > When you stop and think about it: *every* request object will have a > > matching response object. Why have two objects if they come in pairs? You > > will never see one without the other, and they are intrinsically tied to > > each other. So why separate them? > > > > Mainly because they are two separate concepts. For instance, in my > code, I always pass two arguments; one is the response, which the user > manipulates to send back something to the caller, and the other is the > request, which is basically a dictionary of all parameter values, plus > a few extra special ones like 'path'. > > Bill > > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/grisha%40modpython.org > From janssen at parc.com Fri Oct 24 20:51:41 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 20:52:11 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: Your message of "Fri, 24 Oct 2003 16:12:45 PDT." <3F99B1ED.1090802@bath.ac.uk> Message-ID: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com> > I'm a huge fan of being able to distinguish between that data from a > query string (GET data) and data that has been POSTed. I posted my > reasons for caring about this to the Quixote mailing list a few days > ago, but I'll repeat them here through the magic of copy and paste: > [...list of reasons you want to know the HTTP command omitted...] The way to differentiate them (if you care) is to look at the "command" attribute of the request object, IMO. That would tell you whether you were looking at data from GET, or POST, or HEAD, or whatever. I see no reason to pass the data differently, though. Parameters are parameters. (Of course, you can use query data with POST as well as with GET.) Bill From grisha at modpython.org Fri Oct 24 21:59:46 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 24 21:59:50 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F99B1ED.1090802@bath.ac.uk> References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com> <3F99B1ED.1090802@bath.ac.uk> Message-ID: <20031024215125.A1810@onyx.ispol.com> On Fri, 24 Oct 2003, Simon Willison wrote: > The 2nd point above is supported by this quote from the HTTP spec: > > """ > In particular, the convention has been established that the GET and HEAD > methods SHOULD NOT have the significance of taking an action other than > retrieval. These methods ought to be considered "safe" > """ > > http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1 For everyone's amusement, here is last two out of the three paragraphs of this section: In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested. Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them. At first I thought this was completely wacky and didn't belong in an RFC at all. But having read it a couple of times, I'm thinking that they are referring here to *browser implementations*, not web apps, so I don't think it's relevant to our discussion. Grisha From grisha at modpython.org Fri Oct 24 22:10:04 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 24 22:10:08 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F99B1ED.1090802@bath.ac.uk> References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com> <3F99B1ED.1090802@bath.ac.uk> Message-ID: <20031024220036.K1810@onyx.ispol.com> On Fri, 24 Oct 2003, Simon Willison wrote: > 2. My rule of thumb is "only modify data on a POST" - that way there's > no chance of someone bookmarking a URL that updates a database (for > example). I get upset at web pages that refuse to cooperate when I submit things via query strings. I think a reliable way to avoid accidental updates is to rely on a session mechanism; only modifying on POST only results in mild user annoyance IMHO. > 3. It is useful to be able to detect if a form has been submitted or > not. In PHP, I frequently check for POSTed data and display a form if > none is available, assume the form has been submitted if there is. I don't like doing things like this because they rely on protocol internals to drive application logic... > 4. Security. While ensuring data has come from POST rather than GET > provides absolutely no security against a serious intruder, it does > discourage amateurs from "hacking the URL" to see if they can cause any > damage. Security through obscurity admitedly, but it adds a bit of extra > peace of mind. Again, I don't agree; hackable URL's are a good thing! :-) And it is, indeed, security by obscurity. If you have good data validation, there should be no need for any obscurity. Grisha From ianb at colorstudy.com Fri Oct 24 22:06:36 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 22:11:07 2003 Subject: [Web-SIG] More prior art, less experimentation Message-ID: I'm feeling a little uncomfortable with the way some of these suggestions are moving. I feel like people are trying to make another framework, which I don't think is the appropriate goal for web-sig or for the standard library. Python doesn't need another framework, and I don't think it's reasonable or particularly polite to try to trump the work that a lot of people have done over the years, using some sort of back door to perceived authority that web-sig might provide. Nothing we're talking about is anything that hasn't been discussed before in the context of other projects. Nothing we are considering implementing (at least server-side) is something that hasn't been implemented before. We *do* have the opportunity to create something that can unify the Python web experience and provide the basis for more adoption of Python for web programming. To do that we will have to repeat the work done many times before. We should aspire to quality, but I think we need to hold ourselves back from aesthetic experimentation, and respect convention above our own preferences. We can still indulge our own fancies outside of the standard library, and building on the standard library -- nothing we do should preclude your individual preferences toward web programming, but it should not preclude other people's preference either. But most of all it should provide the foundation upon which the mature, *existing* frameworks can build. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From cs1spw at bath.ac.uk Fri Oct 24 22:40:58 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 22:41:03 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F99E2BA.9030701@bath.ac.uk> Bill Janssen wrote: >>I'm a huge fan of being able to distinguish between that data from a >>query string (GET data) and data that has been POSTed. I posted my >>reasons for caring about this to the Quixote mailing list a few days >>ago, but I'll repeat them here through the magic of copy and paste: >>[...list of reasons you want to know the HTTP command omitted...] > > The way to differentiate them (if you care) is to look at the > "command" attribute of the request object, IMO. That would tell you > whether you were looking at data from GET, or POST, or HEAD, or > whatever. I see no reason to pass the data differently, though. > Parameters are parameters. That doesn't work. The following is a perfectly valid form that sends data via GET and POST at the same time:
Name:
Email:
I really don't understand why people are opposed to being able to tell the difference between GET and POST data. To me it seems like a basic requirement of any web library - but I'm obviously almost alone in thinking that. -- Simon Willison Web development weblog: http://simon.incutio.com/ From cs1spw at bath.ac.uk Fri Oct 24 22:44:57 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 22:45:02 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031024215125.A1810@onyx.ispol.com> References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com> <3F99B1ED.1090802@bath.ac.uk> <20031024215125.A1810@onyx.ispol.com> Message-ID: <3F99E3A9.2010801@bath.ac.uk> Gregory (Grisha) Trubetskoy wrote: > For everyone's amusement, here is last two out of the three paragraphs of > this section: > > In particular, the convention has been established that the GET and > HEAD methods SHOULD NOT have the significance of taking an action > other than retrieval. These methods ought to be considered "safe". > This allows user agents to represent other methods, such as POST, PUT > and DELETE, in a special way, so that the user is made aware of the > fact that a possibly unsafe action is being requested. > > Naturally, it is not possible to ensure that the server does not > generate side-effects as a result of performing a GET request; in > fact, some dynamic resources consider that a feature. The important > distinction here is that the user did not request the side-effects, > so therefore cannot be held accountable for them. > > At first I thought this was completely wacky and didn't belong in an RFC > at all. But having read it a couple of times, I'm thinking that they are > referring here to *browser implementations*, not web apps, so I don't > think it's relevant to our discussion. I understand it to be a recommendation to developers of server side applications. It's saying "don't write apps that do something other than just blindly serve up content on a GET or HEAD" - in other words, only modify data stored on the server (the classic example being altering data in a database) in a POST or PUT request. Obviously this doesn't mean you shouldn't do anything dynamic on GETs, it just means that a user GETing a resource shouldn't result in a permanent change to the state maintained by the server. -- Simon Willison Web development weblog: http://simon.incutio.com/ From ianb at colorstudy.com Fri Oct 24 23:01:41 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 24 23:02:13 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031024215125.A1810@onyx.ispol.com> Message-ID: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com> On Friday, October 24, 2003, at 08:59 PM, Gregory (Grisha) Trubetskoy wrote: [GET vs. POST semantics...] > At first I thought this was completely wacky and didn't belong in an > RFC > at all. But having read it a couple of times, I'm thinking that they > are > referring here to *browser implementations*, not web apps, so I don't > think it's relevant to our discussion. It's very relevant to web applications, but not to the environment in which those applications are written. It's not relevant to our work here. In reference to the rest of the discussion -- I think it's enough to say that some people want to distinguish (sometimes) between these two types of variables. Simon is not the only one. It should be an option, because it's not hard to do. We're not telling people how to write their applications, we're giving them the tools to write their applications as they choose, and this is a valid way to write an application. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From grisha at modpython.org Fri Oct 24 23:20:19 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 24 23:20:22 2003 Subject: [Web-SIG] Moving forward Message-ID: <20031024230301.B51905@onyx.ispol.com> I get the feeling that so far most of what has been posted to this list has been water under the bridge. I would be delighted to see this SIG succeed, and to that end here is my proposal of a first baby step. Problem: The scope of this SIG seems so broad that I doubt any single member of this list has a good grasp on more than half of what the SIG covers. Frankly, it's not clear what the scope is. Proposed Resolution: Begin defining the scope in detail. Once we agree on general categories, we can start playing with it by looking at what presently exists in stdlib in each cateogry, what exists, but not in stdlib, and what does not exist. Once we have a clear scope, we can talk about a PEP. I think we can start with the at least the following general categories: o HTML Parsing and Generation o XML Parsing and Generation o An HTTP Server o An HTTP Client o SSL o A general interface to an HTTP server for web apps o PSP Standards and Implementation Grisha From janssen at parc.com Fri Oct 24 23:21:52 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 23:22:26 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: Your message of "Fri, 24 Oct 2003 20:01:41 PDT." <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com> Message-ID: <03Oct24.202156pdt."58611"@synergy1.parc.xerox.com> > In reference to the rest of the discussion -- I think it's enough to > say that some people want to distinguish (sometimes) between these two > types of variables. Simon is not the only one. It should be an > option, because it's not hard to do. We're not telling people how to > write their applications, we're giving them the tools to write their > applications as they choose, and this is a valid way to write an > application. +1. Bill From janssen at parc.com Fri Oct 24 23:32:53 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 24 23:33:21 2003 Subject: [Web-SIG] So what's missing? Message-ID: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> Apropos Ian's comments today, I'd like to suggest that at this stage we focus on what's missing, rather than on how to fix/change things. What have you needed that isn't in the standard libraries? Here's my list: Client-side: * CSS parser. I can't really do visual interpretation of Web pages without understanding their layout. * post-multipart (both http and https). * Asynchronous fetch. When working over the Plucker distiller, which is a web crawler of sorts, I really wanted a higher-powered client side HTTP library. In particular, I wanted to be able to start a fetch, go on to other things, and come back to the fetch periodically, checking to see whether there was data available. * Connection caching. Again, when pulling lots of pages from lots of sites, I want to be able to save the open connection to a host/port combo and re-use it, if the server doesn't kill the connection. There should be a pool of connections, with a user-settable limit, so that we don't run out of sockets/file-descriptors. * Anything else I can do with cURL to an HTTP or HTTPS URL. Server-side: * Server-side SSL support in the socket module, and some interface to management of certificates/identities for SSL. I want to build HTTPS servers with Python. * Some kind of response object usable in CGI scripts. This would make a few simple actions simple: write a response as a file (instead of using sys.stdout), return an error with a message, redirect to another URL, return a file. * A standard server framework on the order of Medusa. This should support a standalone Python web server, with the ability to serve files, and the ability to add new handlers. Not sure it has to support CGI invocation. What else are we missing? Bill From cs1spw at bath.ac.uk Fri Oct 24 23:39:28 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 23:39:32 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F99F070.4080202@bath.ac.uk> Bill Janssen wrote: > What else are we missing? An improved standard request object providing an interface to data sent from the client. This should include an interface that is designed for re-implementation by major web frameworks. This will essentially be an improved version of the CGI module. -- Simon Willison Web development weblog: http://simon.incutio.com/ From cs1spw at bath.ac.uk Fri Oct 24 23:59:11 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 24 23:59:15 2003 Subject: [Web-SIG] CSS parsers: prior art Message-ID: <3F99F50F.8020400@bath.ac.uk> I just had a look around for prior art of libraries that handle CSS parsing for languages other than Python. Perl has a number of interesting modules for handling CSS in CPAN: CSS-Tiny http://search.cpan.org/~adamk/CSS-Tiny-1.02/lib/CSS/Tiny.pm This module provides a small, light weight object oriented style API for reading and writing CSS files "with as little code as possible". It seems like it would map to a nicely Pythonic simple module that makes use of operator overloading. CSS 1.05 http://search.cpan.org/~iamcal/CSS-1.05/CSS.pm This is a large, object oriented library that appears to provide access to a variety of alternative parsers and formatters. The basic principle involves converting CSS declarations in to an object tree. CSS-SAC http://search.cpan.org/~rberjon/CSS-SAC-0.05/SAC.pm This is an event based CSS parser modelled on the W3C's Simple API for CSS: http://www.w3.org/TR/SAC/ Also of interest is the W3C's DOM specification for styling: http://www.w3.org/TR/DOM-Level-2-Style/ It seems we are spoilt for choice when it comes to picking an API to base a Python CSS module on. -- Simon Willison Web development weblog: http://simon.incutio.com/ From ianb at colorstudy.com Sat Oct 25 01:03:20 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Oct 25 01:03:26 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> Message-ID: <8DB31624-06A8-11D8-93B2-000393C2D67E@colorstudy.com> On Friday, October 24, 2003, at 10:32 PM, Bill Janssen wrote: > Client-side: To client-side, I would add that authentication is too hard in urllib2, and only works for HTTP (for trivially reasons). I think think urllib2's subclasses are unnecessarily complicated -- authentication handling could be put directly in the HTTP/HTTPS, both basic and digest. Goes together with post/multipart, and I think these shouldn't be too hard to add. There is also some talk about putting urllib2 and urlparse together, i.e., have a URL object. The distinction between the urllib, urllib2, and urlparse libraries is not very good, e.g., urllib.quote (and friends) are more related to urlparse than urllib. A URL object could unify all these. Cookie handling also fits into this, but from the opposite direction from a URL object, since we are creating something of a user agent. You'd almost want to do: ua = UserAgent() url = web.URL('http://whatever.com') content = ua.get(url) Or something like that. I think an explicit agent is called for, separate from the URLs that it may retrieve. But only when you start considering cookies and caching. If you want to take it a little further, WebDAV URLs support a bunch of other features. Nice to at least keep the door open for that. > Server-side: > > * Server-side SSL support in the socket module, and some interface to > management of certificates/identities for SSL. I want to build > HTTPS servers with Python. > > * Some kind of response object usable in CGI scripts. This would make > a few simple actions simple: write a response as a file (instead of > using sys.stdout), return an error with a message, redirect to > another URL, return a file. I'd still really like to get a response and request object, first implemented for CGI but possible to target to other environments. It's because I really want this that I didn't want us to get too experimental -- just a request and response object are very doable, and would be a real accomplishment. But we can get off track with this. Good (standard) Python libraries aren't frameworks, they are straight-forward, well-documented interfaces, which is all I'm looking for here. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From moof at metamoof.net Sat Oct 25 06:00:22 2003 From: moof at metamoof.net (Moof) Date: Sat Oct 25 06:02:10 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com> <3F993C3C.5080402@metamoof.net> Message-ID: <3F9A49B6.4050709@metamoof.net> John J Lee wrote: > On Fri, 24 Oct 2003, Moof wrote: >> [moaning about unit testing with webware and cheetah] > Isn't it the fault of the framework you're using if it doesn't make unit > testing easy? Quite so. However I'm appealing to say that whatever we develop here be reasonable to use when we're trying to simulate requests and responses for unit tests... > Still, I guess it's true that HTML parsing is a necessary > part of some unit tests (not only functional tests). > >> So a standard HTML parser would be nice, as well as keeping TDD in mind >> when we design request and response (and possibly session) objects. > > > > We already have an HTML parser (two, in fact). Bill seems to be talking about this elsewhere. >> > The parsing doesn't have to be very intelligent or do validation, >> HTML > syntax is fairly simple. >> > I think that does belong in the standard library. >> >> >> Speaking of validation, a sort of standard form validation library would >> be nice: something to say "I'm expecting this value to be an int between >> 1-31" or "I'm expecting this to be a string with the following legal >> characters" and so on. It's not that difficult to write yourself, but I >> seem to find myself reinventing the wheel every time I do. A standard >> "best practice" way of doing this would be wonderful. > > > > I guess that would look similar to ClientForm? If not, what? Nonono, though that could be useful for some sort of automated testing, the thought hadn't even occurred. When somebody fills in a form, the browser will return a string either as a GET or POST method with all the values filled in, which we seem to have decided elsewhere should be returned as a dictionary to the programmer. I want a nice standard way of saying field 'fromDay' in this dictionary should be an integer between 1 and 31. That 'fromMonth' should be an integer between 1 and 12. That 'username' should contain only ascii letters and numbers and this small amount of punctuation, and should be no shorter than x characters, and no longer than y characters which is what I set the length limit to on the form. That sort of thing. I should be able to get a list of errors that I can associate with the various fields, process them into intelligible sentences I can throw back at the user as errors on the page. Yes, you can do this from javascript on the same page, but this is for people who either have javascript turned off, or aren't necessarily using my page as input as it's been hijacked form elsewhere, or someone's trying to be malicious. Moof -- Giles Antonio Radford, a.k.a Moof Sympathy, eupathy, and, currently, apathy coming to you at: From jjl at pobox.com Sat Oct 25 08:16:35 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 08:16:41 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <03Oct24.152400pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.152400pdt."58611"@synergy1.parc.xerox.com> Message-ID: On Fri, 24 Oct 2003, Bill Janssen wrote: [...HTML parsing...] > > I think that does belong in the standard library. > > I agree, the ability should be there. My sense is that the existing > XML packages do pretty well in handling both XHTML and HTML; the > missing pieces are the ancillary standards, like CSS and Javascript. Again: what do CSS and JavaScript have to do with the standard library? John From jjl at pobox.com Sat Oct 25 08:16:52 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 08:16:59 2003 Subject: [Web-SIG] Defining a standard interface for common web tasks In-Reply-To: <20031024140644.H15765@lyra.org> References: <3F98D709.9070806@sjsoft.com> <4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com> <20031024140644.H15765@lyra.org> Message-ID: On Fri, 24 Oct 2003, Greg Stein wrote: > On Fri, Oct 24, 2003 at 10:54:01AM -0500, Ian Bicking wrote: [...] > > # res.headers['X-additional-header'] = 'Another header' might be okay > > # but it makes it difficult to add multiple headers by the same name -- > > but > > # I don't know if HTTP ever really calls for that anyway. > > HTTP specifically discusses what happens when you see two headers with the > same name: > > Some-Header: foo > Some-Header: bar > > is equivalent to: > > Some-Header: foo, bar > > i.e. concatenate with a comma. While it is allowed, there is *generally* > no reason for the API to enable writing separate headers, nor a reason to > expose same-named headers as separate (i.e. just concatenate them > internally). > > Note that I say "generally" because I've seen a client that could not deal > properly with a long header value. By separating the tokens in the header [...] Another thing that breaks this is the Cookie header: cookie values may contain commas (and they do!). Of course, this may not be relevant here, since Python programmers aren't going to be so silly as to put commas in their cookie values :-) John From jjl at pobox.com Sat Oct 25 08:38:55 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 08:39:00 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <8DB31624-06A8-11D8-93B2-000393C2D67E@colorstudy.com> References: <8DB31624-06A8-11D8-93B2-000393C2D67E@colorstudy.com> Message-ID: On Sat, 25 Oct 2003, Ian Bicking wrote: > On Friday, October 24, 2003, at 10:32 PM, Bill Janssen wrote: > > Client-side: > > To client-side, I would add that authentication is too hard in urllib2, > and only works for HTTP (for trivially reasons). I think think > urllib2's subclasses are unnecessarily complicated -- authentication > handling could be put directly in the HTTP/HTTPS, both basic and > digest. It's a minor issue, but it seems nicer to me to have authentication separate if it can easily be separate -- that fits in with the general philosophy of urllib2 that you pick 'n mix the features you want. What are the trivial reasons for it breaking on non-HTTP auth? > Goes together with post/multipart, and I think these shouldn't > be too hard to add. How does this go together with post/multipart? Do you just mean that you're likely to post the multipart data using urllib2.urlopen? > There is also some talk about putting urllib2 and urlparse together, > i.e., have a URL object. The distinction between the urllib, urllib2, > and urlparse libraries is not very good, e.g., urllib.quote (and > friends) are more related to urlparse than urllib. A URL object could > unify all these. It's an appealing idea, especially given the cuteness of string subclassing ;-) > Cookie handling also fits into this, but from the opposite direction > from a URL object, since we are creating something of a user agent. > You'd almost want to do: > > ua = UserAgent() > url = web.URL('http://whatever.com') > content = ua.get(url) > > Or something like that. I think an explicit agent is called for, > separate from the URLs that it may retrieve. But only when you start > considering cookies and caching. [...] Are you suggesting replacing urllib2, building on top of it, or extending it? urllib2's handlers already gets a lot of the 'user-agent' job done. What requirements does caching impose that urllib2 doesn't meet? There's already a CacheFTPHandler. John From jjl at pobox.com Sat Oct 25 08:51:05 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 08:52:01 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> Message-ID: On Fri, 24 Oct 2003, Bill Janssen wrote: [...] > * CSS parser. I can't really do visual interpretation of Web pages > without understanding their layout. Does anybody other than Bill want this? > * post-multipart (both http and https). Everybody is agreed this is needed. > * Asynchronous fetch. When working over the Plucker distiller, which [...] Nice, but not easy. Would it not introduce a lot of new code? There used to be asynchttp and asyncurl libraries, I think, built on top of asyncore. First (obviously) somebody would need to actually put the work in here. Second, would it be possible to do this without a lot of code duplication between the current urllib{2,} / httplib libraries and the new stuff? Is it worth it, when you can use threads instead? > * Connection caching. Again, when pulling lots of pages from lots of [...] That would be nice. Are you volunteering? John From t.vandervossen at fngtps.com Sat Oct 25 08:54:20 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Sat Oct 25 08:54:24 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: Message-ID: <59B56DA0-06EA-11D8-8AA9-000393678182@fngtps.com> On vrijdag, okt 24, 2003, at 21:12 Europe/Amsterdam, John J Lee wrote: > On Fri, 24 Oct 2003, David Fraser wrote: >> Thijs van der Vossen wrote: > [...] >> Actually HTML parsing would be fantastic for testing web applications, >> so maybe that could be related to the Web API. Sorry, but _I_ never said _that_. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From jjl at pobox.com Sat Oct 25 09:00:03 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 09:00:10 2003 Subject: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <59B56DA0-06EA-11D8-8AA9-000393678182@fngtps.com> References: <59B56DA0-06EA-11D8-8AA9-000393678182@fngtps.com> Message-ID: On Sat, 25 Oct 2003, Thijs van der Vossen wrote: > On vrijdag, okt 24, 2003, at 21:12 Europe/Amsterdam, John J Lee wrote: > > On Fri, 24 Oct 2003, David Fraser wrote: > >> Thijs van der Vossen wrote: > > [...] > >> Actually HTML parsing would be fantastic for testing web applications, > >> so maybe that could be related to the Web API. > > Sorry, but _I_ never said _that_. No, David did, as my quoting indicates. I should have removed the reference to you, though -- sorry. John From t.vandervossen at fngtps.com Sat Oct 25 09:10:02 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Sat Oct 25 09:10:47 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: Message-ID: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com> On zaterdag, okt 25, 2003, at 14:51 Europe/Amsterdam, John J Lee wrote: >> * Asynchronous fetch. When working over the Plucker distiller, which > [...] > > Nice, but not easy. Would it not introduce a lot of new code? There > used > to be asynchttp and asyncurl libraries, I think, built on top of > asyncore. > First (obviously) somebody would need to actually put the work in here. > Second, would it be possible to do this without a lot of code > duplication > between the current urllib{2,} / httplib libraries and the new stuff? > Is > it worth it, when you can use threads instead? This is already trivial with the asyncore libraries. If I remember correctly there is a nice example of this in Steve's 'Python Web Programming', but you might also want to take a look at http://python.org/doc/current/lib/asyncore-example.html Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From ianb at colorstudy.com Sat Oct 25 14:25:25 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Oct 25 14:25:30 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: Message-ID: <99FE7110-0718-11D8-93B2-000393C2D67E@colorstudy.com> On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote: > On Sat, 25 Oct 2003, Ian Bicking wrote: >> To client-side, I would add that authentication is too hard in >> urllib2, >> and only works for HTTP (for trivially reasons). I think think >> urllib2's subclasses are unnecessarily complicated -- authentication >> handling could be put directly in the HTTP/HTTPS, both basic and >> digest. > > It's a minor issue, but it seems nicer to me to have authentication > separate if it can easily be separate -- that fits in with the general > philosophy of urllib2 that you pick 'n mix the features you want. What > are the trivial reasons for it breaking on non-HTTP auth? There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and though the two concepts are orthogonal they are still tied into each other. Another option would be to take HTTPS out of the class hierarchy, and make SSL a feature of HTTPHandler (and maybe the other handlers too, FTP/SSL does exist after all). The AuthHandlers are a little annoying too, you can't just give them a username/password. You have to give them some manager object that can be queried for a password for a username/realm/URL. This is a nice option to have, but in most cases you don't need that kind of generality, and it makes it a lot harder to understand what you need to do. username=x, password=y are very easy to understand. >> Goes together with post/multipart, and I think these shouldn't >> be too hard to add. > > How does this go together with post/multipart? Do you just mean that > you're likely to post the multipart data using urllib2.urlopen? Yes, that's what I mean -- same code involved. > >> There is also some talk about putting urllib2 and urlparse together, >> i.e., have a URL object. The distinction between the urllib, urllib2, >> and urlparse libraries is not very good, e.g., urllib.quote (and >> friends) are more related to urlparse than urllib. A URL object could >> unify all these. > > It's an appealing idea, especially given the cuteness of string > subclassing ;-) > > >> Cookie handling also fits into this, but from the opposite direction >> from a URL object, since we are creating something of a user agent. >> You'd almost want to do: >> >> ua = UserAgent() >> url = web.URL('http://whatever.com') >> content = ua.get(url) >> >> Or something like that. I think an explicit agent is called for, >> separate from the URLs that it may retrieve. But only when you start >> considering cookies and caching. > [...] > > Are you suggesting replacing urllib2, building on top of it, or > extending > it? urllib2's handlers already gets a lot of the 'user-agent' job > done. > What requirements does caching impose that urllib2 doesn't meet? > There's > already a CacheFTPHandler. I think a URL class would probably building on top of urllib2, but would also need some more features. And obviously urllib2 can't go anywhere, so we might as well use it. The caching in CacheFTPHandler is connection caching, not result caching. HTTP has a wide array of ways to indicate caching, check for updates, etc. Enough that it becomes kind of complicated, which is why I don't think that fits well into the idea of a URL object (which should be quite simple, at least from the outside). -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Sat Oct 25 16:23:29 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 16:23:55 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com> References: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com> Message-ID: On Sat, 25 Oct 2003, Thijs van der Vossen wrote: [...] > > > * Asynchronous fetch. When working over the Plucker distiller, [...] > > Second, would it be possible to do this without a lot of code > > duplication between the current urllib{2,} / httplib libraries and the > > new stuff? Is it worth it, when you can use threads instead? > > This is already trivial with the asyncore libraries. If I remember [...] So what is this for? http://asynchttp.sourceforge.net/ 28k of Python code isn't exactly 'trivial', is it? John From jjl at pobox.com Sat Oct 25 16:54:06 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 16:54:33 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <99FE7110-0718-11D8-93B2-000393C2D67E@colorstudy.com> References: <99FE7110-0718-11D8-93B2-000393C2D67E@colorstudy.com> Message-ID: On Sat, 25 Oct 2003, Ian Bicking wrote: > On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote: [...] > > It's a minor issue, but it seems nicer to me to have authentication > > separate if it can easily be separate -- that fits in with the general > > philosophy of urllib2 that you pick 'n mix the features you want. What > > are the trivial reasons for it breaking on non-HTTP auth? > > There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and > though the two concepts are orthogonal they are still tied into each > other. Another option would be to take HTTPS out of the class > hierarchy, and make SSL a feature of HTTPHandler (and maybe the other Well, that would break code. And adding an HTTPSBasicAuthHandler is only five lines or so (even less if you want a class that handles both HTTP and HTTPS). [...] > The AuthHandlers are a little annoying too, you can't just give them a > username/password. You have to give them some manager object that can > be queried for a password for a username/realm/URL. This is a nice > option to have, but in most cases you don't need that kind of > generality, and it makes it a lot harder to understand what you need to > do. username=x, password=y are very easy to understand. That's just a documentation issue, I think -- and possibly adding some convenience method. I wrote some docs for this, and I keep asking for people who seem to be actually using these features to check this documentation bug, but nobody has yet: http://www.python.org/sf/798244 You don't have to provide a password manager object in fact: just let the HTTPBasicAuthHandler create one for you, and use the add_password method (which admittedly does require realm and uri as well as username / password -- perhaps None should act as a wildcard there?). > >> Cookie handling also fits into this, but from the opposite direction > >> from a URL object, since we are creating something of a user agent. > >> You'd almost want to do: > >> > >> ua = UserAgent() > >> url = web.URL('http://whatever.com') > >> content = ua.get(url) > >> > >> Or something like that. I think an explicit agent is called for, > >> separate from the URLs that it may retrieve. But only when you start > >> considering cookies and caching. > > [...] > > > > Are you suggesting replacing urllib2, building on top of it, or > > extending it? urllib2's handlers already gets a lot of the > > 'user-agent' job done. What requirements does caching impose that > > urllib2 doesn't meet? There's already a CacheFTPHandler. > > I think a URL class would probably building on top of urllib2, but > would also need some more features. And obviously urllib2 can't go > anywhere, so we might as well use it. OK. Does this URL class proposal fit with that path module PEP, do you think? Somebody mentioned that PEP (it was a PEP, wasn't it...?) before, but I've forgotten everything about it :-) > The caching in CacheFTPHandler is connection caching, not result OK. > caching. HTTP has a wide array of ways to indicate caching, check for > updates, etc. Enough that it becomes kind of complicated, which is why > I don't think that fits well into the idea of a URL object (which > should be quite simple, at least from the outside). That doesn't answer my question. To repeat: What requirements does caching impose that *urllib2* doesn't meet? And why do we need a new UserAgent class when we already have urllib2 and its handlers? John From jjl at pobox.com Sat Oct 25 17:05:18 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 17:07:05 2003 Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? In-Reply-To: <3F9A49B6.4050709@metamoof.net> References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com> <3F993C3C.5080402@metamoof.net> <3F9A49B6.4050709@metamoof.net> Message-ID: On Sat, 25 Oct 2003, Moof wrote: > John J Lee wrote: [...] > >> Speaking of validation, a sort of standard form validation library would [...] > > I guess that would look similar to ClientForm? If not, what? > > > Nonono, though that could be useful for some sort of automated testing, [...] > programmer. I want a nice standard way of saying field 'fromDay' in this > dictionary should be an integer between 1 and 31. That 'fromMonth' [...] Oh, I see -- I was fooled by the close proximity of the discussion of HTML parsing. I agree that would be useful. John From ianb at colorstudy.com Sat Oct 25 17:41:36 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Oct 25 17:42:11 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: Message-ID: <028299D8-0734-11D8-93B2-000393C2D67E@colorstudy.com> On Saturday, October 25, 2003, at 03:54 PM, John J Lee wrote: > On Sat, 25 Oct 2003, Ian Bicking wrote: > >> On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote: > [...] >>> It's a minor issue, but it seems nicer to me to have authentication >>> separate if it can easily be separate -- that fits in with the >>> general >>> philosophy of urllib2 that you pick 'n mix the features you want. >>> What >>> are the trivial reasons for it breaking on non-HTTP auth? >> >> There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and >> though the two concepts are orthogonal they are still tied into each >> other. Another option would be to take HTTPS out of the class >> hierarchy, and make SSL a feature of HTTPHandler (and maybe the other > > Well, that would break code. And adding an HTTPSBasicAuthHandler is > only > five lines or so (even less if you want a class that handles both HTTP > and > HTTPS). All the handlers start getting in the way. If we added authentication support to HTTPHandler, it the other classes could still be left in there. Authentication is part of HTTP, after all -- and the distinction between basic and digest auth doesn't seem necessary (implemented differently, but you shouldn't need to know which one you're going to need). It seems like HTTPHandler could do what HTTPBasicAuthHandler (and DigestAuthHandler) do if it is given a password manager. And that it could even create a password manager if it was given a username and password, or now, but then the password manager should accept a username and password in __init__ so that you don't have to do multiple sets to set that up. In general, I just don't feel like there needs to be quite so many handlers in urllib2. One featureful HTTP implementation would be easier to work with (and, I think, easier to extend). > [...] >> The AuthHandlers are a little annoying too, you can't just give them a >> username/password. You have to give them some manager object that can >> be queried for a password for a username/realm/URL. This is a nice >> option to have, but in most cases you don't need that kind of >> generality, and it makes it a lot harder to understand what you need >> to >> do. username=x, password=y are very easy to understand. > > That's just a documentation issue, I think -- and possibly adding some > convenience method. I wrote some docs for this, and I keep asking for > people who seem to be actually using these features to check this > documentation bug, but nobody has yet: > > http://www.python.org/sf/798244 > > > You don't have to provide a password manager object in fact: just let > the > HTTPBasicAuthHandler create one for you, and use the add_password > method > (which admittedly does require realm and uri as well as username / > password -- perhaps None should act as a wildcard there?). Yes, a wildcard could definitely be good. This is particularly important with scripts, i.e., one-off programs where you just want to grab something from a URL. >>>> Cookie handling also fits into this, but from the opposite direction >>>> from a URL object, since we are creating something of a user agent. >>>> You'd almost want to do: >>>> >>>> ua = UserAgent() >>>> url = web.URL('http://whatever.com') >>>> content = ua.get(url) >>>> >>>> Or something like that. I think an explicit agent is called for, >>>> separate from the URLs that it may retrieve. But only when you >>>> start >>>> considering cookies and caching. >>> [...] >>> >>> Are you suggesting replacing urllib2, building on top of it, or >>> extending it? urllib2's handlers already gets a lot of the >>> 'user-agent' job done. What requirements does caching impose that >>> urllib2 doesn't meet? There's already a CacheFTPHandler. >> >> I think a URL class would probably building on top of urllib2, but >> would also need some more features. And obviously urllib2 can't go >> anywhere, so we might as well use it. > > OK. Does this URL class proposal fit with that path module PEP, do you > think? Somebody mentioned that PEP (it was a PEP, wasn't it...?) > before, > but I've forgotten everything about it :-) No, there's no PEP, for this or for a filesystem path object. These were the links from the other email: http://www.jorendorff.com/articles/python/path/ http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF- 8&threadm=mailman.1057651032.22842.python-list%40python.org >> The caching in CacheFTPHandler is connection caching, not result > > OK. > > >> caching. HTTP has a wide array of ways to indicate caching, check for >> updates, etc. Enough that it becomes kind of complicated, which is >> why >> I don't think that fits well into the idea of a URL object (which >> should be quite simple, at least from the outside). > > That doesn't answer my question. To repeat: What requirements does > caching impose that *urllib2* doesn't meet? And why do we need a new > UserAgent class when we already have urllib2 and its handlers? All the normal HTTP caching, like If-Modified-Since and E-Tags. If you handle this, you have to store the retrieved results, handle the metadata for those results, and provide control (where to put the cache, when and how to expire it, what items are in the cache, flush the cache, maybe a memory cache, etc). That could be done in a handler, but it feels like a separate object to me (an object which might still go in urllib2). But looking back on what Bill was asking for, I think he was thinking more along the lines of connection caching, like CacheFTPHandler, and that would probably go in a handler. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Sat Oct 25 20:12:09 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 20:12:33 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <028299D8-0734-11D8-93B2-000393C2D67E@colorstudy.com> References: <028299D8-0734-11D8-93B2-000393C2D67E@colorstudy.com> Message-ID: On Sat, 25 Oct 2003, Ian Bicking wrote: [...] > In general, I just don't feel like there needs to be quite so many > handlers in urllib2. One featureful HTTP implementation would be > easier to work with (and, I think, easier to extend). Well, that was a large part of the purpose of urllib2 -- to let you choose what 'clever' stuff it does. If you don't want something, you just don't use that handler. More importantly, if you want to do something slightly differently, you supply your own handler. If you shift stuff from an auth handler into the HTTP{S,}Handler, anybody out there who's written their own auth handler will have their auth code suddenly stop being invoked by urllib2. Whatever special authorization they were doing (maybe just reading from a database, maybe fixing a bug, real or imagined, in urllib2) will stop happening, and their code will probably break. Anyway, it may or may not be the perfect system, but I'm not convinced it needs changing. Can you give a specific example of where having lots of handlers becomes oppressive? [...about inconvenience of having to provide realm and URI for auth...] > Yes, a wildcard could definitely be good. This is particularly > important with scripts, i.e., one-off programs where you just want to > grab something from a URL. OK. Do we have a document where we're recording these proposals? Is there a wiki? [...] > > OK. Does this URL class proposal fit with that path module PEP, do you > > think? Somebody mentioned that PEP (it was a PEP, wasn't it...?) > > before, > > but I've forgotten everything about it :-) > > No, there's no PEP, for this or for a filesystem path object. These > were the links from the other email: > > http://www.jorendorff.com/articles/python/path/ > > http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF- > 8&threadm=mailman.1057651032.22842.python-list%40python.org Thanks. Again, is there somewhere to record this URL class idea and the fact that this path module is related? [...] > > That doesn't answer my question. To repeat: What requirements does > > caching impose that *urllib2* doesn't meet? And why do we need a new > > UserAgent class when we already have urllib2 and its handlers? > > All the normal HTTP caching, like If-Modified-Since and E-Tags. If you > handle this, you have to store the retrieved results, handle the > metadata for those results, and provide control (where to put the > cache, when and how to expire it, what items are in the cache, flush > the cache, maybe a memory cache, etc). That could be done in a > handler, but it feels like a separate object to me (an object which > might still go in urllib2). So, merely because you think "it feels like a new object", you're proposing to create a whole new layer of complexity for users to learn? Why should people have to learn a new API just to get caching? If somebody had implemented HTTP caching and found the handler mechanism lacking, or had a specific argument that showed it to be so, a new layer *might* be justified. Otherwise, I think it's a bad idea. > But looking back on what Bill was asking for, I think he was thinking > more along the lines of connection caching, like CacheFTPHandler, and > that would probably go in a handler. Yep. John From jjl at pobox.com Sat Oct 25 20:18:23 2003 From: jjl at pobox.com (John J Lee) Date: Sat Oct 25 20:18:44 2003 Subject: [Web-SIG] Threading and client-side support Message-ID: First, I should state that I'm almost entirely ignorant of all things threads. Be gentle with me. What is the current state of thread-safety in the Python standard library client-side web code (ie. httplib, urllib, urllib2)? I ask because my cookies code is currently entirely thread-ignorant, and I'm wondering if it should have appropriate thread synchronization -- and if so, what problems I'm supposed to be preventing, and how to prevent them. John From ianb at colorstudy.com Sat Oct 25 21:00:39 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Oct 25 21:01:24 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: Message-ID: On Saturday, October 25, 2003, at 07:12 PM, John J Lee wrote: > On Sat, 25 Oct 2003, Ian Bicking wrote: > [...] >> In general, I just don't feel like there needs to be quite so many >> handlers in urllib2. One featureful HTTP implementation would be >> easier to work with (and, I think, easier to extend). > > Well, that was a large part of the purpose of urllib2 -- to let you > choose > what 'clever' stuff it does. If you don't want something, you just > don't > use that handler. More importantly, if you want to do something > slightly > differently, you supply your own handler. > > If you shift stuff from an auth handler into the HTTP{S,}Handler, > anybody > out there who's written their own auth handler will have their auth > code > suddenly stop being invoked by urllib2. Whatever special authorization > they were doing (maybe just reading from a database, maybe fixing a > bug, > real or imagined, in urllib2) will stop happening, and their code will > probably break. a) There's not a lot of different ways to deal with a 401 response. Is there something that's not covered by basic and digest authentication? b) Accessing a database should happen in the password manager, not the handler. The handler handles the protocol, the database is not tied to the protocol. I'm not proposing that the password manager go away (though it would be nice if it was hidden for simple usage) c) This doesn't have to effect backward compatibility anyway. We can leave HTTPBasicAuthHandler in there (deprecated), but also fold it's functionality into HTTPHandler. HTTPBasicAuthHandler doesn't require that HTTPHandler *not* handle authentication. > Anyway, it may or may not be the perfect system, but I'm not convinced > it > needs changing. Can you give a specific example of where having lots > of > handlers becomes oppressive? The documentation is certainly a problem (e.g., the HTTPBasicAuthHandler page), though it could be organized differently without changing the code. It's definitely ravioli code (http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO it's hard to document ravioli code well. (It's not so important how things are structured internally, but currently urllib2 also exposes that complex class structure) Also urlopen is not really extensible. You can't tell urlopen to use authentication information (and it doesn't obey browser URL conventions, like http://user:password@domain/). And we want to add structured POST data to that method (but also allow non-structured data), and cookies, and it might be nice to set the user-agent, and maybe other things that I haven't thought of. If urlopen doesn't support these extra features then programmers have to learn a new API as their program becomes more complex. Yet none of these features would be all that difficult to add via urlopen or perhaps other simple functions, (instead of via classes). I don't think there's any need for classes in the external API -- fetching URLs is about doing things, not representing things, and functions are easier to understand for doing. > [...about inconvenience of having to provide realm and URI for auth...] >> Yes, a wildcard could definitely be good. This is particularly >> important with scripts, i.e., one-off programs where you just want to >> grab something from a URL. > > OK. Do we have a document where we're recording these proposals? Is > there a wiki? No, we don't have anything. Should we use the main Python Wiki? Something else? Opinions? [...] >>> That doesn't answer my question. To repeat: What requirements does >>> caching impose that *urllib2* doesn't meet? And why do we need a new >>> UserAgent class when we already have urllib2 and its handlers? >> >> All the normal HTTP caching, like If-Modified-Since and E-Tags. If >> you >> handle this, you have to store the retrieved results, handle the >> metadata for those results, and provide control (where to put the >> cache, when and how to expire it, what items are in the cache, flush >> the cache, maybe a memory cache, etc). That could be done in a >> handler, but it feels like a separate object to me (an object which >> might still go in urllib2). > > So, merely because you think "it feels like a new object", you're > proposing to create a whole new layer of complexity for users to learn? > Why should people have to learn a new API just to get caching? If > somebody had implemented HTTP caching and found the handler mechanism > lacking, or had a specific argument that showed it to be so, a new > layer > *might* be justified. Otherwise, I think it's a bad idea. I think fetching and caching are two separate things. The caching requires a context. The fetching doesn't. I think fetching things should be simplified, with an API that's not very object-oriented. Since a cache is persistent it has to have a persistent representation, so it needs to be some sort of object. I also don't see how caching would fit very well into the handler structure. Maybe there'd be a HTTPCachingHandler, and you'd instantiate it with your caching policy? (where it stores files, how many files, etc) Also a HTTPBasicAuthCachingHandler, HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on? This caching is orthogonal -- not just to things like authentication, but even to HTTP (to some degree). The handler structure doesn't allow orthogonal features. Except through mixins, but don't get me started on mixins... Using a separate class, not related to Handlers, isn't more complex. Either way we have to provide the same features and the same options, and document all of those. No matter which way you cut it, it's new stuff, it's another layer. Implementing it in a new class is just calling it what it is. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Sun Oct 26 08:24:39 2003 From: jjl at pobox.com (John J Lee) Date: Sun Oct 26 08:24:47 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: References: Message-ID: On Sat, 25 Oct 2003, Ian Bicking wrote: > On Saturday, October 25, 2003, at 07:12 PM, John J Lee wrote: [...] > a) There's not a lot of different ways to deal with a 401 response. Is > there something that's not covered by basic and digest authentication? You may have a point. > b) Accessing a database should happen in the password manager, not the > handler. The handler handles the protocol, the database is not tied to > the protocol. I'm not proposing that the password manager go away > (though it would be nice if it was hidden for simple usage) OK, and another one. :-) > c) This doesn't have to effect backward compatibility anyway. We can > leave HTTPBasicAuthHandler in there (deprecated), but also fold it's > functionality into HTTPHandler. HTTPBasicAuthHandler doesn't require > that HTTPHandler *not* handle authentication. Well, it does if you do something important in your auth handler that never gets called because HTTPHandler has decided it knows best when it comes to 40x. But like you say, there's probably not much important that you could do since password management is already abstracted out. I *still* don't see why you're complaining about the current state of affairs, though. > > Anyway, it may or may not be the perfect system, but I'm not convinced > > it needs changing. Can you give a specific example of where having lots > > of handlers becomes oppressive? > > The documentation is certainly a problem (e.g., the > HTTPBasicAuthHandler page), though it could be organized differently > without changing the code. It's definitely ravioli code > (http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO > it's hard to document ravioli code well. (It's not so important how > things are structured internally, but currently urllib2 also exposes > that complex class structure) It's pretty simple conceptually: OpenerDirector asks all the handlers if they want to handle, not handle, or abort a response. It does the same for errors. Most of the handlers' functions are self-explanatory from their class names (OK, I guessed CacheFTPHandler wrong, but it was 50-50 :-). I wouldn't call that ravioli. I'm still waiting for that example. > Also urlopen is not really extensible. You can't tell urlopen to use Not directly, no. You have to do it via build_opener, or via OpenerDirector itself (or another class. That's probably not ideal: what did you have in mind instead? > authentication information (and it doesn't obey browser URL > conventions, like http://user:password@domain/). What is that convention? Is it standardised in an RFC? I see ProxyHandler knows about that syntax. Obviously it's not an intrinsic limitation of the handler system. > And we want to add > structured POST data to that method (but also allow non-structured We do? Why not just have a function (to make file upload data, assuming that's what you're thinking of)? > data), and cookies, and it might be nice to set the user-agent, and > maybe other things that I haven't thought of. If urlopen doesn't > support these extra features then programmers have to learn a new API > as their program becomes more complex. Well, I can do those things already (cookies, set user-agent) using urllib2. User-Agent is a bit ugly, I'll grant you, but I don't lose sleep over it. I did find an extension (backwards-compatible, I hope & believe) made things much cleaner -- see the RFE I mentioned earlier. But no need for a whole new layer. Mind you, if your idea can do the same job as my RFE, then it should certainly be considered alongside that. > Yet none of these features > would be all that difficult to add via urlopen or perhaps other simple > functions, (instead of via classes). I don't think there's any need > for classes in the external API -- fetching URLs is about doing things, > not representing things, and functions are easier to understand for > doing. Details? The only example you've given so far involved a UserAgent class. [...] > > So, merely because you think "it feels like a new object", you're > > proposing to create a whole new layer of complexity for users to learn? > > Why should people have to learn a new API just to get caching? If > > somebody had implemented HTTP caching and found the handler mechanism > > lacking, or had a specific argument that showed it to be so, a new > > layer *might* be justified. Otherwise, I think it's a bad idea. > > I think fetching and caching are two separate things. The caching > requires a context. The fetching doesn't. I think fetching things The context is provided by the handler. [...] > I also don't see how caching would fit very well into the handler > structure. Maybe there'd be a HTTPCachingHandler, and you'd > instantiate it with your caching policy? (where it stores files, how > many files, etc) Also a HTTPBasicAuthCachingHandler, > HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on? This > caching is orthogonal -- not just to things like authentication, but My assumption was that it wasn't orthogonal, since RFC 2616 seems to have rather a lot to say on the subject. If it *is* (or part of it is) orthogonal, three options come to mind. Let's say you have a cache class. 1. All the normal handlers know about the cache class, but have caching off by default. 2. Write a CacheHandler with a default_open. If there's a cache hit, return it, otherwise return None (let somebody else try to handle it). 3. Subclass (or replace without bothering to subclassing) OpenerDirector. I guess open is probably what you'd want to change, but I don't know about HTTP and other protocols' caching rules. I haven't thought it through so I certainly don't claim to know how any of these will turn out (though I'd guess 2. would do the job of any caching that's orthogonal to the various protocol schemes). If you want to justify a new layer, though, it's up to you to show caching *doesn't* fit urllib2 as-is. YAGNI. > even to HTTP (to some degree). The handler structure doesn't allow > orthogonal features. Except through mixins, but don't get me started > on mixins... I don't think that's true -- see above. Again, my 'processors' patch is relevant here (see that RFE). But no point in re-iterating here the long discussion I posted on the SF bug tracker. > Using a separate class, not related to Handlers, isn't more complex. > Either way we have to provide the same features and the same options, > and document all of those. I think it would be fruitless to comment on this until you put forward some details. > No matter which way you cut it, it's new > stuff, it's another layer. Implementing it in a new class is just > calling it what it is. Well, um, no. Having a new layer is different to not having a new layer. Otherwise, what was this little discussion of ours all about?? Another thing I think we shouldn't forget is that nobody has actually said they're going to write any caching code yet! Are you? Do you have any other requirements driving the need for this new layer, or is it all down to caching? John From t.vandervossen at fngtps.com Sun Oct 26 14:50:11 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Sun Oct 26 14:50:41 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: References: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com> Message-ID: <3F9C2573.8070207@fngtps.com> John J Lee wrote: > On Sat, 25 Oct 2003, Thijs van der Vossen wrote: > [...] > >>>>* Asynchronous fetch. When working over the Plucker distiller, > > [...] > >>>Second, would it be possible to do this without a lot of code >>>duplication between the current urllib{2,} / httplib libraries and the >>>new stuff? Is it worth it, when you can use threads instead? >> >>This is already trivial with the asyncore libraries. If I remember > > [...] > > So what is this for? > > http://asynchttp.sourceforge.net/ From this page: "Our goal is to provide the functionality of the excellent 'httplib' module without using blocking sockets." > 28k of Python code isn't exactly 'trivial', is it? Nope, but it's relatively trivial to use the asyncore libraries to asynchronous get multiple pages (once again, there is a nice example in Steve's book). Providing exactly the same functionality as httplib will obviously be more work. Regards, Thijs From jjl at pobox.com Sun Oct 26 15:24:46 2003 From: jjl at pobox.com (John J Lee) Date: Sun Oct 26 15:24:53 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <3F9C2573.8070207@fngtps.com> References: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com> <3F9C2573.8070207@fngtps.com> Message-ID: On Sun, 26 Oct 2003, Thijs van der Vossen wrote: [...] > >>>Second, would it be possible to do this without a lot of code > >>>duplication between the current urllib{2,} / httplib libraries and the > >>>new stuff? Is it worth it, when you can use threads instead? > >> > >>This is already trivial with the asyncore libraries. If I remember [...] > > So what is this for? > > > > http://asynchttp.sourceforge.net/ [...] > > 28k of Python code isn't exactly 'trivial', is it? > > Nope, but it's relatively trivial to use the asyncore libraries to > asynchronous get multiple pages (once again, there is a nice example in > Steve's book). Providing exactly the same functionality as httplib will > obviously be more work. Bill said he wanted a 'higher-powered HTTP client library', by which I assume he meant something more than sub-httplib. John From ianb at colorstudy.com Sun Oct 26 17:39:45 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Sun Oct 26 17:40:03 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: Message-ID: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com> On Sunday, October 26, 2003, at 07:24 AM, John J Lee wrote: >> c) This doesn't have to effect backward compatibility anyway. We can >> leave HTTPBasicAuthHandler in there (deprecated), but also fold it's >> functionality into HTTPHandler. HTTPBasicAuthHandler doesn't require >> that HTTPHandler *not* handle authentication. > > Well, it does if you do something important in your auth handler that > never gets called because HTTPHandler has decided it knows best when it > comes to 40x. But like you say, there's probably not much important > that > you could do since password management is already abstracted out. Essentially we'd just move HTTPBasicAuthHandler.http_error_401 into HTTPHandler. You could still override it, and HTTPBasicAuthHandler would still override it (and somewhat differently, because HTTPHandler.http_error_401 should handle both basic and digest auth). It's a pretty small change, really. >>> Anyway, it may or may not be the perfect system, but I'm not >>> convinced >>> it needs changing. Can you give a specific example of where having >>> lots >>> of handlers becomes oppressive? >> >> The documentation is certainly a problem (e.g., the >> HTTPBasicAuthHandler page), though it could be organized differently >> without changing the code. It's definitely ravioli code >> (http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO >> it's hard to document ravioli code well. (It's not so important how >> things are structured internally, but currently urllib2 also exposes >> that complex class structure) > > It's pretty simple conceptually: OpenerDirector asks all the handlers > if > they want to handle, not handle, or abort a response. It does the same > for errors. Most of the handlers' functions are self-explanatory from > their class names (OK, I guessed CacheFTPHandler wrong, but it was > 50-50 > :-). I wouldn't call that ravioli. It might work conceptually internally, and probably big internal changes aren't necessary. But it doesn't work conceptually for the programmer that has a task-oriented desire. The programmer starting to use urllib2 doesn't want to understand a framework of handlers, they want to get something off the net. urlopen() is the only easy way to do that in urllib2, everything else requires a lot more thinking. And urlopen() isn't very featureful. > I'm still waiting for that example. I thought I gave examples: documentation, proliferation of classes, non-orthogonality of features (e.g., HTTPS vs. HTTP isn't orthogonal to authentication). > >> Also urlopen is not really extensible. You can't tell urlopen to use > > Not directly, no. You have to do it via build_opener, or via > OpenerDirector itself (or another class. That's probably not ideal: > what > did you have in mind instead? Maybe keyword arguments that get passed to the handlers. E.g.: urlopen('http://whatever.com', username='bob', password='secret', postFields={...}, postFiles={'image': ('test.jpg', '... image body ...')}, addHeaders={'User-Agent': 'superbot 3000'}) It could get a little out of hand with all the protocols and all the features, but I can't think of a better way to do it. And I think the features would still be easier to document even when urlopen() took all sorts of funny options, than they are when there's separate handlers. But maybe urllib2 just needs better documentation with useful examples; that signature is pretty hairy. But it's still easier to read and write than any OO-based system. I'm concerned about the external ease of use, not the internal conceptual integrity. >> authentication information (and it doesn't obey browser URL >> conventions, like http://user:password@domain/). > > What is that convention? Is it standardised in an RFC? It's a URL convention that's been around a very long time, I don't know if it is in an RFC. > I see > ProxyHandler knows about that syntax. Obviously it's not an intrinsic > limitation of the handler system. I don't really know how a handler is chosen -- can it figure out whether it should use HTTPHandler, HTTPBasicAuthHandler, or HTTPDigestAuthHandler just from this URL? Obviously basic vs. digest can't be determined until you try to fetch the object. >> And we want to add >> structured POST data to that method (but also allow non-structured > > We do? Why not just have a function (to make file upload data, > assuming > that's what you're thinking of)? That would work too. >> data), and cookies, and it might be nice to set the user-agent, and >> maybe other things that I haven't thought of. If urlopen doesn't >> support these extra features then programmers have to learn a new API >> as their program becomes more complex. > > Well, I can do those things already (cookies, set user-agent) using > urllib2. User-Agent is a bit ugly, I'll grant you, but I don't lose > sleep > over it. I did find an extension (backwards-compatible, I hope & > believe) > made things much cleaner -- see the RFE I mentioned earlier. But no > need > for a whole new layer. > > Mind you, if your idea can do the same job as my RFE, then it should > certainly be considered alongside that. Hmm... I just looked at the RFE now, so I'm still not sure what it would mean to this. >> Yet none of these features >> would be all that difficult to add via urlopen or perhaps other simple >> functions, (instead of via classes). I don't think there's any need >> for classes in the external API -- fetching URLs is about doing >> things, >> not representing things, and functions are easier to understand for >> doing. > > Details? The only example you've given so far involved a UserAgent > class. Details about what? Your asking for details and examples, but I've provided some already and I don't know what you're looking for. Example of what? I don't have an implementation, or any set implementation in mind, and I haven't suggested that. > [...] >>> So, merely because you think "it feels like a new object", you're >>> proposing to create a whole new layer of complexity for users to >>> learn? >>> Why should people have to learn a new API just to get caching? If >>> somebody had implemented HTTP caching and found the handler mechanism >>> lacking, or had a specific argument that showed it to be so, a new >>> layer *might* be justified. Otherwise, I think it's a bad idea. >> >> I think fetching and caching are two separate things. The caching >> requires a context. The fetching doesn't. I think fetching things > > The context is provided by the handler. But we're fetching URLs, not handlers. The URL is context-less, intrinsically. The handler isn't context-less, but that's part of what I don't like about urllib2's handler-oriented perspective. > [...] >> I also don't see how caching would fit very well into the handler >> structure. Maybe there'd be a HTTPCachingHandler, and you'd >> instantiate it with your caching policy? (where it stores files, how >> many files, etc) Also a HTTPBasicAuthCachingHandler, >> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on? This >> caching is orthogonal -- not just to things like authentication, but > > My assumption was that it wasn't orthogonal, since RFC 2616 seems to > have > rather a lot to say on the subject. Well, if they aren't orthogonal, then they should all be implemented in a single class. Implementing features in subclasses means that they can't be easily used in combination. Why not have just one good HTTP handler class? It's all one protocol (and HTTPS is exactly the same protocol). Many parts of the caching mechanics aren't part of RFC 2616 -- specifically persistence, metadata storage and querying, and cache control. These aren't part of HTTP at all. > If it *is* (or part of it is) orthogonal, three options come to mind. > Let's say you have a cache class. > > 1. All the normal handlers know about the cache class, but have caching > off by default. > > 2. Write a CacheHandler with a default_open. If there's a cache hit, > return it, otherwise return None (let somebody else try to handle > it). > > 3. Subclass (or replace without bothering to subclassing) > OpenerDirector. > I guess open is probably what you'd want to change, but I don't know > about HTTP and other protocols' caching rules. > > I haven't thought it through so I certainly don't claim to know how > any of > these will turn out (though I'd guess 2. would do the job of any > caching > that's orthogonal to the various protocol schemes). If you want to > justify a new layer, though, it's up to you to show caching *doesn't* > fit > urllib2 as-is. YAGNI. 1 seems like a lot of trouble. 2 won't work, since CacheHandler can't return None and let someone else do the work, because it has to know about what the result is so that it can cache the result. It would have to be 3, since it's really about intercepting handler calls. I would imagine that it should wrap OpenerDirector, and perhaps subclass it as well. Then protocols can be added to the caching and non-caching directors at the same time. But it seems like there can be only one OpenDirector... that messes things up. Multiple caches with different policies should be possible. Which leads us back to a separate class that handles caching. >> even to HTTP (to some degree). The handler structure doesn't allow >> orthogonal features. Except through mixins, but don't get me started >> on mixins... > > I don't think that's true -- see above. > > Again, my 'processors' patch is relevant here (see that RFE). But no > point in re-iterating here the long discussion I posted on the SF bug > tracker. I missed that when you posted it. That might handle some of these features. It seems a little too global to me. For instance, how would you handle two distinct user agents with respect to the referer header? Seems like it would also make sense as a OpenerDirectory subclass/wrapper. At least portions of it are similar to doing caching (like cookies and referers), which is to say a request that is made in a specific context. One example of an application that would require separate contexts would be when testing concurrency in a web application -- you want to simulate multiple users logging in and performing actions concurrently. You can't do this if the context is stored globally. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Sun Oct 26 17:51:45 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Sun Oct 26 17:52:12 2003 Subject: [Web-SIG] Threading and client-side support In-Reply-To: Message-ID: On Saturday, October 25, 2003, at 07:18 PM, John J Lee wrote: > First, I should state that I'm almost entirely ignorant of all things > threads. Be gentle with me. > > What is the current state of thread-safety in the Python standard > library > client-side web code (ie. httplib, urllib, urllib2)? As far as I know they are threadsafe. > I ask because my cookies code is currently entirely thread-ignorant, > and > I'm wondering if it should have appropriate thread synchronization -- > and > if so, what problems I'm supposed to be preventing, and how to prevent > them. It's all about concurrent access. For instance, looking at ClientCookie, the question would be what would happen when ClientCookie.urlopen was called while another ClientCookie.urlopen was running. For instance, in ClientCookie._urllib2_support.urlopen, build_opener() can be called twice. If this is a problem then the code isn't threadsafe (i.e., if build_opener() isn't threadsafe then urlopen isn't threadsafe). urlopen() can protect build_opener() with a lock, like: urlopen_lock = threading.Lock() def urlopen(url, data=None): global _opener if _opener is None: urlopen_lock.acquire() try: if _opener is None: # it might not be None, because we might have called build_opener() # sometime between the first if and acquiring the lock... _opener = build_opener() finally: urlopen_lock.release() return _opener.open(url, data) There's a little more complexity there so that you don't have to acquire the lock every time you call urlopen(). _opener.open() still has to be threadsafe at this point (and you'll definitely want it to be threadsafe, so requests don't have to be done serially). Where you have to do this sort of thing depends on what parts of the system are exposed so that they can be used concurrently. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From gstein at lyra.org Mon Oct 27 00:30:27 2003 From: gstein at lyra.org (Greg Stein) Date: Mon Oct 27 00:31:15 2003 Subject: [Web-SIG] http headers (was: Defining a standard interface for common web tasks) In-Reply-To: ; from jjl@pobox.com on Sat, Oct 25, 2003 at 01:16:52PM +0100 References: <3F98D709.9070806@sjsoft.com> <4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com> <20031024140644.H15765@lyra.org> Message-ID: <20031026213027.A24764@lyra.org> On Sat, Oct 25, 2003 at 01:16:52PM +0100, John J Lee wrote: >... > > i.e. concatenate with a comma. While it is allowed, there is *generally* > > no reason for the API to enable writing separate headers, nor a reason to > > expose same-named headers as separate (i.e. just concatenate them > > internally). > > > > Note that I say "generally" because I've seen a client that could not deal > > properly with a long header value. By separating the tokens in the header > [...] > > Another thing that breaks this is the Cookie header: cookie values may > contain commas (and they do!). Of course, this may not be relevant here, > since Python programmers aren't going to be so silly as to put commas in > their cookie values :-) Yup. Good point. The WWW-Authenticate header has ambiguity here, too, although most of those issues have been sorted. With a bit of work, you can usually tease apart the header into the various challenges the server is offering up. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Mon Oct 27 00:47:41 2003 From: gstein at lyra.org (Greg Stein) Date: Mon Oct 27 00:47:46 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com>; from janssen@parc.com on Fri, Oct 24, 2003 at 05:51:41PM -0700 References: <3F99B1ED.1090802@bath.ac.uk> <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com> Message-ID: <20031026214741.B24764@lyra.org> On Fri, Oct 24, 2003 at 05:51:41PM -0700, Bill Janssen wrote: > > I'm a huge fan of being able to distinguish between that data from a > > query string (GET data) and data that has been POSTed. I posted my > > reasons for caring about this to the Quixote mailing list a few days > > ago, but I'll repeat them here through the magic of copy and paste: > > [...list of reasons you want to know the HTTP command omitted...] > > The way to differentiate them (if you care) is to look at the > "command" attribute of the request object, IMO. That would tell you Actually, it is called the "method" rather than "command". See section 9 of RFC 2616. Cheers, -g -- Greg Stein, http://www.lyra.org/ From davidf at sjsoft.com Mon Oct 27 02:41:16 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Oct 27 02:41:35 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: References: Message-ID: <3F9CCC1C.7010400@sjsoft.com> Steve Holden wrote: >>-----Original Message----- >>From: web-sig-bounces+sholden=holdenweb.com@python.org >>[mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of >>David Fraser >>Sent: Friday, October 24, 2003 2:01 PM >>To: Ian Bicking >>Cc: web-sig@python.org >>Subject: Re: [Web-SIG] Form field dictionaries >> >> >>Ian Bicking wrote: >> >> >> >>>On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote: >>> >>> >>> >>>>That's fine, but I think it's important that these methods are >>>>available as an addition to a standard dictionary interface. >>>>I think the key point is, if somebody wants a list of values, they >>>>probably know that they want a list. >>>>It's very difficult to write code by accident that would handle a >>>>list of values as well as a string. >>>>So if somebody knows they want a list in certain >>>> >>>> >>circumstances, they >> >> >>>>could call getlist() >>>>But I think the default dictionary return value should be >>>> >>>> >>the same as >> >> >>>>getfirst(). >>>>That saves endless checks for lists for those who don't need them. >>>> >>>> >>>Every time I have encountered an unexpected list it has >>> >>> >>been because >> >> >>>of a bug somewhere else in my code. I might use a getone() method >>>that threw some exception when a list was encountered, but >>> >>> >>I'd *never* >> >> >>>want to use getfirst(). getfirst() is sloppy programming. >>> >>> >>(getlist() >> >> >>>is perfectly fine though) >>> >>> >>There seems to be a lot of agreement on this... >>So let's take it that the interface will be a dictionary, >>with an extra >>method defined, getlist, which will return multiple items if multiple >>items were defined, or a list containing a single item otherwise. >>The next question is, how do we handle the Get/Post/Both situation? >>One way would be to have methods on the request object that >>return the >>desired dictionary >>Somebody also suggested including Cookies, as is done in PHP >>- I'm not >>sure this is a good idea >> >> >> >The only nit I would pick is to have getlist() return a list even when >the response contained a single value. > > Sure, that's what I meant above, sorry if it wasn't clear David From davidf at sjsoft.com Mon Oct 27 02:42:34 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Oct 27 02:42:39 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031024141819.R70244@onyx.ispol.com> References: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com> <3F9968BE.1010009@sjsoft.com> <20031024141819.R70244@onyx.ispol.com> Message-ID: <3F9CCC6A.7020503@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >On Fri, 24 Oct 2003, David Fraser wrote: > > > >>The next question is, how do we handle the Get/Post/Both situation? >> >> > >Just to clarify nomenclature - > >POST /blah/blah.py?foo=bar > >is a valid request. The part after ? is called "query information", this >is defined in RFC 1808 and RFC1738. > >CGI (which has no formal RFC, but there is Ken Coar's excellent draft) >introduces something called "path-info", but its meaning is rather vague >outside of cgi since it relies on a notion of a script, which isn't very >meaningful in most non-CGI environments. > >The data submitted in the body of the POST request is called "form data" >and I believe is described in RFC 1867. > >I think that query information and form data can be combined in a single >mapping object, because if you want just query data, you can always parse >the url directly via urlparse, and if you want only form data, you can >read and parse it directly as a mime object. > > Great. No need to complicate things unneccessarily! >Path-info I think should be left where it belongs - in the cgi-specific >module. > > Yes, we shouldn't integrate cgi-specific things into a general API David From davidf at sjsoft.com Mon Oct 27 02:45:11 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Oct 27 02:45:48 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: References: Message-ID: <3F9CCD07.4070502@sjsoft.com> Ian Bicking wrote: > On Friday, October 24, 2003, at 01:00 PM, David Fraser wrote: > >> Ian Bicking wrote: >> >>> Every time I have encountered an unexpected list it has been because >>> of a bug somewhere else in my code. I might use a getone() method >>> that threw some exception when a list was encountered, but I'd >>> *never* want to use getfirst(). getfirst() is sloppy programming. >>> (getlist() is perfectly fine though) >> >> >> There seems to be a lot of agreement on this... >> So let's take it that the interface will be a dictionary, with an >> extra method defined, getlist, which will return multiple items if >> multiple items were defined, or a list containing a single item >> otherwise. > > > Additionally, getlist should return the empty list if the key isn't > found, as this follows naturally (but a KeyError for normal access > when a value isn't found). I also think cgi's default of throwing > away empty fields should not be supported, even optionally. > > But I haven't really heard reaction to the idea that you get a > BadRequest or other exception if you try to get key that has multiple > values. Throwing information away is bad, and unPythonic (though very > PHPish). I don't think we should copy PHP here. I have *never* > encountered a situation where throwing away extra values found in the > query is the correct solution. Either the form that is doing the > submission has a bug, or else the script needs to figure out some > (explicit!) way to handle the ambiguity. What about comparing multiple values to see if they're the same. I don't see throwing values away as such a bad problem... > We also need a way to get at the raw values. I suppose you could do: > > fields = {} > for key in req.fields.items(): > v = req.getlist(key) > if len(v) == 1: fields[key] = v[0] > else: fields[key] = v > > But that's kind of annoying, since the request object probably > contains this kind of dictionary already. This will be required for > backward compatibility, if we want this request to be wrapped to > support existing request interfaces. I think the correct solution is to be explicit about the keys you want lists for ... as this would have to be coded explicitly somewhere in the code anyway. David From davidf at sjsoft.com Mon Oct 27 02:47:12 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Oct 27 02:47:26 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <20031024192925.R71890@onyx.ispol.com> References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com> <20031024192925.R71890@onyx.ispol.com> Message-ID: <3F9CCD80.10502@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >For what it's worth, I never liked the request/response separation either. >I like a single object from which you can read() and to which you can >write(), just like a file. Imagine if for file IO you had to have an >object to read and another one to write? > >(I would agree that perhaps "request" is a misnomer, but I can't think of >anything better) > > Connection? I think someone suggested "Transaction" for this, but it sounds out of place here... David >On Fri, 24 Oct 2003, Bill Janssen wrote: > > >>>When you stop and think about it: *every* request object will have a >>>matching response object. Why have two objects if they come in pairs? You >>>will never see one without the other, and they are intrinsically tied to >>>each other. So why separate them? >>> >>> >>Mainly because they are two separate concepts. For instance, in my >>code, I always pass two arguments; one is the response, which the user >>manipulates to send back something to the caller, and the other is the >>request, which is basically a dictionary of all parameter values, plus >>a few extra special ones like 'path'. >> >>Bill >> >> From davidf at sjsoft.com Mon Oct 27 02:49:00 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Oct 27 02:49:05 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <20031024132028.C15765@lyra.org> References: <20031024132028.C15765@lyra.org> Message-ID: <3F9CCDEC.30303@sjsoft.com> Greg Stein wrote: >In the most recent incarnation of a webapp of mine (subwiki), I almost >went with a request/response object paradigm and even started a bit of >refactoring along those lines. However, I ended up throwing out that >dual-object concept. > >When you stop and think about it: *every* request object will have a >matching response object. Why have two objects if they come in pairs? You >will never see one without the other, and they are intrinsically tied to >each other. So why separate them? > >I set up the subwiki core to instantiate a "handler" each time a request >comes in. That Handler instance provides access to the request info, and >is the conduit for generating the response. The app dispatches to the >appropriate command function, passing the handler. > >The Handler is actually set up as a base class, with two subclasses so >far: cgi, and cmdline. This lets me do some testing from the command line, >along with the standard cgi model of access. At some point, I'll implement >a mod_python subclass to do the request/response handling. > >(as a side note, I'll also point out that Apache operates this way, too; > everything is based around the request_rec structure; it holds all the > input data, output headers, the input and output filter chains, etc) > > >In any kind of server-side framework design, I would give a big +1 to >keeping it simple with a single "handler" type of object rather than a >dual-object design. > >Cheers, >-g > +1 from me too. We should also think about things that may/may not be supported by the API, such as filters in Apache 2 These seem to me to be a very Pythonic concept that could easily be layered on top of any underlying API If the request-response object is well designed, filters could fit snugly on top of it. David From davidf at sjsoft.com Mon Oct 27 02:52:59 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Oct 27 02:53:06 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031024220036.K1810@onyx.ispol.com> References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com> <3F99B1ED.1090802@bath.ac.uk> <20031024220036.K1810@onyx.ispol.com> Message-ID: <3F9CCEDB.6090506@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >On Fri, 24 Oct 2003, Simon Willison wrote: > > >>2. My rule of thumb is "only modify data on a POST" - that way there's >>no chance of someone bookmarking a URL that updates a database (for >>example). >> >> >I get upset at web pages that refuse to cooperate when I submit things via >query strings. > >I think a reliable way to avoid accidental updates is to rely on a session >mechanism; only modifying on POST only results in mild user annoyance >IMHO. > > >>3. It is useful to be able to detect if a form has been submitted or >>not. In PHP, I frequently check for POSTed data and display a form if >>none is available, assume the form has been submitted if there is. >> >> >I don't like doing things like this because they rely on protocol >internals to drive application logic... > > >>4. Security. While ensuring data has come from POST rather than GET >>provides absolutely no security against a serious intruder, it does >>discourage amateurs from "hacking the URL" to see if they can cause any >>damage. Security through obscurity admitedly, but it adds a bit of extra >>peace of mind. >> >> >Again, I don't agree; hackable URL's are a good thing! :-) > >And it is, indeed, security by obscurity. If you have good data >validation, there should be no need for any obscurity. > > Absolutely. And I really like the bookmarklet for Mozilla that lets you transform all POST forms into Get forms so you can hack the URLs :-) http://www.squarefree.com/bookmarklets/forms.html David From davidf at sjsoft.com Mon Oct 27 02:58:23 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Oct 27 02:58:27 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> References: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com> Message-ID: <3F9CD01F.4070204@sjsoft.com> Bill Janssen wrote: >Apropos Ian's comments today, I'd like to suggest that at this stage >we focus on what's missing, rather than on how to fix/change things. >What have you needed that isn't in the standard libraries? Here's my >list: > > I think the key reason for discussing how to fix/change things is that a general web application API is needed that will allow code to run on top of any framework that's supported. This requires a lot of in-depth discussion about the subtleties of how things work... Anyway we seem to have cleared up a fair amount of that, so I would see the definition of this API as my primary interest David From jjl at pobox.com Mon Oct 27 09:45:16 2003 From: jjl at pobox.com (John J Lee) Date: Mon Oct 27 09:45:56 2003 Subject: [Web-SIG] Threading and client-side support In-Reply-To: References: Message-ID: On Sun, 26 Oct 2003, Ian Bicking wrote: > On Saturday, October 25, 2003, at 07:18 PM, John J Lee wrote: [...] > > What is the current state of thread-safety in the Python standard > > library client-side web code (ie. httplib, urllib, urllib2)? > > As far as I know they are threadsafe. I suppose I should ask on python-dev if there's a policy / tradition here. [...] > urlopen_lock = threading.Lock() > def urlopen(url, data=None): [...] OK, thanks, that's basically as my vague understanding had it, but I had the impression that there were all kinds of flavours of thread-safety, guaranteeing various subtly different things? I guess I've got some reading to do... Some thinking out loud in case anybody cares to help clear up my current confusion: Hmm, urllib2 doesn't do what your example does, but I suppose OpenerDirectors don't currently have any state that could get lost in a race condition in that particular case. That would change with cookie handling. Am I going to have a hard time spotting all the places where I need locks? I can't see any other place where I'd need locks other than in CookieJar. I suppose I need to lock all access to all CookieJar methods, so that neither reading or writing state can happen whenever CookieJar state is changing? I suppose I'd also need to just label the .cookies attribute as non-threadsafe (or get rid of it, or add a __getattr__ to allow locking it -- yuck). Can I justify saying that some of this is the application's problem? For example, perhaps the .filename and attribute of CookieJar could mess things up if altered by one thread while another thread was reading it in order to open a file? Is it the application's own stupid fault if it fails to lock access to that attribute in cases where that might happen, or is it CookieJar's problem? John From jjl at pobox.com Mon Oct 27 10:00:20 2003 From: jjl at pobox.com (John J Lee) Date: Mon Oct 27 10:01:26 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com> References: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com> Message-ID: On Sun, 26 Oct 2003, Ian Bicking wrote: > On Sunday, October 26, 2003, at 07:24 AM, John J Lee wrote: [...] > Essentially we'd just move HTTPBasicAuthHandler.http_error_401 into > HTTPHandler. You could still override it, and HTTPBasicAuthHandler > would still override it (and somewhat differently, because > HTTPHandler.http_error_401 should handle both basic and digest auth). > It's a pretty small change, really. So is the benefit. It's a = HTTPBasicAuthHandler() a.add_password(user="joe", password="joe") o = build_opener(a) vs. o = build_opener(HTTPHandler(user="joe", password="joe")) (assuming defaults for realm and uri -- BTW, there seems to be an HTTPPasswordMgrWithDefaultRealm already, which I guess is some way to what you want) If we're still using build_opener, and HTTPBasicAuthHandler were to override HTTPHandler, it would have to be derived from it. Not that a build_opener work-alike couldn't be devised, of course. [...] > > I'm still waiting for that example. > > I thought I gave examples: documentation, proliferation of classes, > non-orthogonality of features (e.g., HTTPS vs. HTTP isn't orthogonal to > authentication). Lack of documentation doesn't justify changes to the code. There is not any harmful proliferation of classes, I think: the function of the handlers is pretty obvious in most cases (though obviously the docs could be better). I don't recognize the orthogonality problem you're referring to. [...] > urlopen('http://whatever.com', > username='bob', > password='secret', > postFields={...}, > postFiles={'image': ('test.jpg', '... image body ...')}, > addHeaders={'User-Agent': 'superbot 3000'}) [...] > write than any OO-based system. I'm concerned about the external ease > of use, not the internal conceptual integrity. OK, maybe I'm overconcerned about this layer -- if it's a simple convenience thing like this, fine (as long as it actually is useful and simple, of course). My biggest concern was that you seemed to be advocating a new UserAgent class, which would presumably more-or-less duplicate OpenerDirector (you probably want to skip to the end of this post at this point, because I think you may have missed a crucial point about that class). OpenerDirector is not such a great name, actually: maybe UserAgent or URLOpener would have been better... > >> authentication information (and it doesn't obey browser URL > >> conventions, like http://user:password@domain/). > > > > What is that convention? Is it standardised in an RFC? > > It's a URL convention that's been around a very long time, I don't know > if it is in an RFC. > > > I see > > ProxyHandler knows about that syntax. Obviously it's not an intrinsic > > limitation of the handler system. > > I don't really know how a handler is chosen -- can it figure out > whether it should use HTTPHandler, HTTPBasicAuthHandler, or > HTTPDigestAuthHandler just from this URL? Obviously basic vs. digest > can't be determined until you try to fetch the object. The user and password here are for the proxy, not the server (there's some code duplication here actually, but that's just a bug). Dunno if that's standard use of that syntax. [...] > > Mind you, if your idea can do the same job as my RFE, then it should > > certainly be considered alongside that. > > Hmm... I just looked at the RFE now, so I'm still not sure what it > would mean to this. Sorry, I don't understand 'what it would mean to this'. What's 'this'? > >> Yet none of these features > >> would be all that difficult to add via urlopen or perhaps other simple > >> functions, (instead of via classes). I don't think there's any need > >> for classes in the external API -- fetching URLs is about doing > >> things, > >> not representing things, and functions are easier to understand for > >> doing. > > > > Details? The only example you've given so far involved a UserAgent > > class. > > Details about what? Your asking for details and examples, but I've > provided some already and I don't know what you're looking for. You provided some examples of features you think would require some kind of layer on top of urllib2. I thought you were originally suggesting a new UserAgent class or similar (that was you, wasn't it?). I don't think that's necessary. But in the post I'm replying to here, you gave an example of adding args to urlopen. I do agree that something like that could be useful. I think the docs should be changed here to make it clear that urlopen is just a convenience function that uses a global OpenerDirector. [...] > >> I think fetching and caching are two separate things. The caching > >> requires a context. The fetching doesn't. I think fetching things > > > > The context is provided by the handler. > > But we're fetching URLs, not handlers. The URL is context-less, > intrinsically. The handler isn't context-less, but that's part of what > I don't like about urllib2's handler-oriented perspective. I don't understand what you just said, but I think we're agreed something that doesn't require calling build_opener or OpenerDirector.add_handler could be convenient. > > [...] > >> I also don't see how caching would fit very well into the handler > >> structure. Maybe there'd be a HTTPCachingHandler, and you'd > >> instantiate it with your caching policy? (where it stores files, how > >> many files, etc) Also a HTTPBasicAuthCachingHandler, > >> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on? This > >> caching is orthogonal -- not just to things like authentication, but > > > > My assumption was that it wasn't orthogonal, since RFC 2616 seems to > > have > > rather a lot to say on the subject. > > Well, if they aren't orthogonal, then they should all be implemented in > a single class. Yes. Off the top of my head, I'd say something like (taking note of your point below about needing to actually cache responses as well as return cached data!): class AbstractHTTPCacheHandler: def cached_open(self, request): # return cached response, or None if no cache hit def cache(self, response): # cache response if appropriate class HTTPCacheHandler(AbstractHTTPCacheHandler): http_open = cached_open http_response = cache or, if you want a class that does both HTTP and HTTPS: class HTTPXCacheHandler(AbstractHTTPCacheHandler): https_open = http_open = cached_open https_response = http_response = cache [...] > Why not have just one good HTTP handler class? Why would you want one when you can easily do whatever you want with a convenience function or two, and / or a class derived from OpenerDirector, or something that works like build_opener, etc.? Not so easy to go in the other direction, and separate out the various features of a big, all-singing all-dancing HTTP handler. That was a big part of the motivation for urllib2 in the first place: inflexibility of urllib. > Many parts of the caching mechanics aren't part of RFC 2616 -- > specifically persistence, metadata storage and querying, and cache > control. These aren't part of HTTP at all. I'll take your word for that, but I admit I don't see where that causes problems for urllib2. > > If it *is* (or part of it is) orthogonal, three options come to mind. > > Let's say you have a cache class. > > > > 1. All the normal handlers know about the cache class, but have caching > > off by default. > > > > 2. Write a CacheHandler with a default_open. If there's a cache hit, > > return it, otherwise return None (let somebody else try to handle > > it). > > > > 3. Subclass (or replace without bothering to subclassing) > > OpenerDirector. > > I guess open is probably what you'd want to change, but I don't know > > about HTTP and other protocols' caching rules. > > > > I haven't thought it through so I certainly don't claim to know how > > any of > > these will turn out (though I'd guess 2. would do the job of any > > caching > > that's orthogonal to the various protocol schemes). If you want to > > justify a new layer, though, it's up to you to show caching *doesn't* > > fit > > urllib2 as-is. YAGNI. > > 1 seems like a lot of trouble. Doesn't appeal to me either. > 2 won't work, since CacheHandler can't > return None and let someone else do the work, because it has to know > about what the result is so that it can cache the result. At last, a real problem! Actually, I think this is a problem already solved by my 'processors' idea, though perhaps not quite in its current form -- that should be easy to fix, though (ATM, IIRC, they're separate from handlers: you can't have an object that is both a handler and a processor -- and they don't currently have default_request and default_response methods, either). > It would > have to be 3, since it's really about intercepting handler calls. I > would imagine that it should wrap OpenerDirector, and perhaps subclass > it as well. Then protocols can be added to the caching and non-caching > directors at the same time. > > But it seems like there can be only one OpenDirector... that messes Nope. You can have as many as you like, with as many different implementations as you like. There is only the inconvenience of having to cut-n-paste build_opener (certainly build_opener isn't ideal as it is, but I guess people agree with me that that's a pretty small issue, since nobody has bothered to finish OpenerFactory). > things up. Multiple caches with different policies should be possible. > Which leads us back to a separate class that handles caching. > > >> even to HTTP (to some degree). The handler structure doesn't allow > >> orthogonal features. Except through mixins, but don't get me started > >> on mixins... > > > > I don't think that's true -- see above. > > > > Again, my 'processors' patch is relevant here (see that RFE). But no > > point in re-iterating here the long discussion I posted on the SF bug > > tracker. > > I missed that when you posted it. That might handle some of these > features. It seems a little too global to me. For instance, how would > you handle two distinct user agents with respect to the referer header? Two OpenerDirectors! new_opener = build_opener() new_opener.addheaders = [("User-agent", "Mozilla/5.0")] old_opener = build_opener() old_opener.addheaders = [("User-agent", "Mozilla/4.0")] new_opener.open("http://www.a.com/") old_opener.open("http://www.b.com/") > Seems like it would also make sense as a OpenerDirectory > subclass/wrapper. IIRC, there are issues with redirection that prevent that. > At least portions of it are similar to doing caching > (like cookies and referers), which is to say a request that is made in > a specific context. One example of an application that would require > separate contexts would be when testing concurrency in a web > application -- you want to simulate multiple users logging in and > performing actions concurrently. You can't do this if the context is > stored globally. Perhaps this is all you're missing? Nothing is global until you use install_opener. o = build_opener() # build OpenerDirector o.open(url) # nothing global here, urlopen doesn't know about our opener install_opener(o) # install OpenerDirector globally, for use by urlopen urlopen(url) John From amk at amk.ca Mon Oct 27 10:07:09 2003 From: amk at amk.ca (amk@amk.ca) Date: Mon Oct 27 10:07:14 2003 Subject: [Web-SIG] Threading and client-side support In-Reply-To: References: Message-ID: <20031027150709.GA29045@rogue.amk.ca> On Mon, Oct 27, 2003 at 02:45:16PM +0000, John J Lee wrote: > I suppose I should ask on python-dev if there's a policy / tradition here. The rough tradition would be: Thread-safety is good, and library modules shouldn't be non-threadsafe unless there's a very good reason. > changing? I suppose I'd also need to just label the .cookies attribute as > non-threadsafe (or get rid of it, or add a __getattr__ to allow locking it > -- yuck). Assuming .cookies is a Python dictionary (I haven't looked at the CookieJar code), there's no locking needed. Locking is necessary when a data structure is temporarily inconsistent, or some invariant is temporarily broken. For example, let's say you had two dictionaries, .cookies which maps name -> Cookie object, and .durations which maps name -> an integer given the duration of the cookie, and it's stated that every entry in .cookies always has a corresponding entry in .durations. In this case you need locking, because when you add an entry like this: self.cookies[name] = Cookie() # danger point self.durations[name] = value If a thread switch occurs at the danger point, another thread might loop over cookies.items(), see the missing duration, and die with a KeyError, so you need to have a lock around the two statements, and make read accesses use the lock. (You could also set the value in .durations first and avoid locking, but that's not possible in general.) But if you're assigning to a single attribute (self.filename = 'foo'), there's no point in time where the attribute is inconsistent, a mix of the old and new names; instead it's first the old value, and then it's set to 'foo'. So no lock is needed. --amk From ianb at colorstudy.com Mon Oct 27 13:47:49 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Oct 27 13:48:01 2003 Subject: [Web-SIG] Threading and client-side support In-Reply-To: Message-ID: <0FD31AC9-08AE-11D8-A3EF-000393C2D67E@colorstudy.com> On Monday, October 27, 2003, at 08:45 AM, John J Lee wrote: > [...] >> urlopen_lock = threading.Lock() >> def urlopen(url, data=None): > [...] > > OK, thanks, that's basically as my vague understanding had it, but I > had > the impression that there were all kinds of flavours of thread-safety, > guaranteeing various subtly different things? I guess I've got some > reading to do... Different parts of the system may be threadsafe, while others are not. For instance DB-API has threadsafety "levels", which is just a way of indicating which parts of the system are threadsafe, e.g., level 0 means nothing is threadsafe, level 1 means connections aren't threadsafe so you have to use one connection for each thread, and higher levels mean that objects deeper in the system become threadsafe. The analog of level 0 is bad, because you have to serialize all operations for the entire process. Level 1 isn't so bad (it's what most DB-API drivers have), it just means you have to create a new handler/connection/whatever object for each thread (but you have to be very explicit about that requirement). Or if object creation is expensive you have to do pooling, which is an incentive to make object creation cheap. > Some thinking out loud in case anybody cares to help clear up my > current > confusion: > > Hmm, urllib2 doesn't do what your example does, but I suppose > OpenerDirectors don't currently have any state that could get lost in a > race condition in that particular case. That would change with cookie > handling. I'm not sure about urllib2 in particular, but anything you initialize at the module level doesn't have to be protected. So in ClientCookie if you didn't lazily create the opener, it wouldn't be a problem. Or, if it's no big deal if you recreate the object twice then it's not a problem -- just unnecessarily recreating an object because of a very specific race condition isn't a problem. But if that meant that one of the objects created got lost, but maybe someone would still have a reference to that object (so it wasn't *completely* lost), then that would be a problem (and probably a very hard to debug problem if you encounter it). > Am I going to have a hard time spotting all the places where I need > locks? > I can't see any other place where I'd need locks other than in > CookieJar. > I suppose I need to lock all access to all CookieJar methods, so that > neither reading or writing state can happen whenever CookieJar state is > changing? I suppose I'd also need to just label the .cookies > attribute as > non-threadsafe (or get rid of it, or add a __getattr__ to allow > locking it > -- yuck). Can I justify saying that some of this is the application's > problem? For example, perhaps the .filename and attribute of CookieJar > could mess things up if altered by one thread while another thread was > reading it in order to open a file? Is it the application's own stupid > fault if it fails to lock access to that attribute in cases where that > might happen, or is it CookieJar's problem? You can't be sure of what concurrency expectations the application has. But in general reads don't have to be protected, unless someone is reading multiple things and expecting consistency between those reads. If it's a problem that you read value A, then someone changes the related value B in another thread, then you read B and it doesn't fit with A, then there's a threading issue for a read. Andrew pointed out a possible example of this with cookies and expiration. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From gstein at lyra.org Mon Oct 27 14:26:48 2003 From: gstein at lyra.org (Greg Stein) Date: Mon Oct 27 14:26:59 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <3F9CCD80.10502@sjsoft.com>; from davidf@sjsoft.com on Mon, Oct 27, 2003 at 09:47:12AM +0200 References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com> <20031024192925.R71890@onyx.ispol.com> <3F9CCD80.10502@sjsoft.com> Message-ID: <20031027112648.A27607@lyra.org> On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote: > Gregory (Grisha) Trubetskoy wrote: > > >For what it's worth, I never liked the request/response separation either. > >I like a single object from which you can read() and to which you can > >write(), just like a file. Imagine if for file IO you had to have an > >object to read and another one to write? Woah. Nice analogy. Thanks. > >(I would agree that perhaps "request" is a misnomer, but I can't think of > >anything better) > > Connection? I think someone suggested "Transaction" for this, but it > sounds out of place here... Nope. A number of request/response pairs occur on a given connection. The two are rather independent concepts. That was one of the basic tenets to my redesign of httplib. The old HTTP(S) classes are individual requests, which sucks for performance. With the new HTTP(S)Connection, you can open a connection, and then issue multiple requests over it. The name for the thing can be one of two things, I believe, depending on where you focus: - focus on the transaction itself - focus on the thing handling the transaction Per my original note here, SubWiki tends towards the latter. Each incoming request instantiates a Handler which deals with both reading/writing at a basic level (although there are still external entities which treat the Handler instance like in the first focus type). "Transaction" does sound out of place since that has connotations of a database transaction. I don't have any better suggestions (as I've never had to ponder a name for it since I didn't choose that focus :-) In the first model, the transaction is a passive entity, dealt with by some other code which does the processing. In the second model, the transaction and processing are bundled into the same object -- this is where you'd instantiate some thing and call a "run" method on it, which Does The Right Thing. I tend to disfavor that model because the conflation of data and request processing gets to be very cumbersome and tangled. Instantiating objects, custom to the request (type), to do the processing is all well and fine, but pass along a (relatively) passive data object to it (IMO). Cheers, -g -- Greg Stein, http://www.lyra.org/ From jjl at pobox.com Mon Oct 27 16:47:31 2003 From: jjl at pobox.com (John J Lee) Date: Mon Oct 27 16:48:20 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <20031027112648.A27607@lyra.org> References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com> <20031024192925.R71890@onyx.ispol.com> <3F9CCD80.10502@sjsoft.com> <20031027112648.A27607@lyra.org> Message-ID: On Mon, 27 Oct 2003, Greg Stein wrote: > On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote: [...] > > Connection? I think someone suggested "Transaction" for this, but it > > sounds out of place here... > > Nope. A number of request/response pairs occur on a given connection. The [...] > "Transaction" does sound out of place since that has connotations of a > database transaction. I don't have any better suggestions (as I've never [...] Exchange? John From neel at mediapulse.com Mon Oct 27 17:06:29 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Mon Oct 27 17:06:33 2003 Subject: [Web-SIG] [server-side] request/response objects Message-ID: > On Mon, 27 Oct 2003, Greg Stein wrote: > > On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote: > [...] > > > Connection? I think someone suggested "Transaction" for > this, but it > > > sounds out of place here... > > > > Nope. A number of request/response pairs occur on a given > connection. The > [...] > > "Transaction" does sound out of place since that has > connotations of a > > database transaction. I don't have any better suggestions > (as I've never > [...] > > Exchange? > Yea, that word isn't loaded with connotations =) (anyone who's office is on MS Exchange for email knows what I mean) mike From janssen at parc.com Mon Oct 27 17:57:30 2003 From: janssen at parc.com (Bill Janssen) Date: Mon Oct 27 17:57:50 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: Your message of "Sun, 26 Oct 2003 21:47:41 PST." <20031026214741.B24764@lyra.org> Message-ID: <03Oct27.145733pst."58611"@synergy1.parc.xerox.com> > Actually, it is called the "method" rather than "command". See section 9 > of RFC 2616. Sure. I was slipping into a Medusa-ism. Bill From janssen at parc.com Mon Oct 27 18:21:07 2003 From: janssen at parc.com (Bill Janssen) Date: Mon Oct 27 18:21:30 2003 Subject: [Web-SIG] A list is available (http://www.parc.com/janssen/web-sig/needed.html) Message-ID: <03Oct27.152114pst."58611"@synergy1.parc.xerox.com> I'll try to act as a scribe and gather various individual suggestions together. Please feel free to send mail to correct any malscription you spot. http://www.parc.com/janssen/web-sig/needed.html Bill From jjl at pobox.com Tue Oct 28 05:31:05 2003 From: jjl at pobox.com (John J Lee) Date: Tue Oct 28 05:31:13 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: References: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com> Message-ID: On Mon, 27 Oct 2003, John J Lee wrote: [...] > class AbstractHTTPCacheHandler: > def cached_open(self, request): > # return cached response, or None if no cache hit > def cache(self, response): > # cache response if appropriate [...] I should have said: def cache(self, request, response): John From jjl at pobox.com Tue Oct 28 05:35:33 2003 From: jjl at pobox.com (John J Lee) Date: Tue Oct 28 05:35:40 2003 Subject: [Web-SIG] Threading and client-side support In-Reply-To: <20031027150709.GA29045@rogue.amk.ca> References: <20031027150709.GA29045@rogue.amk.ca> Message-ID: On Mon, 27 Oct 2003 amk@amk.ca wrote: > On Mon, Oct 27, 2003 at 02:45:16PM +0000, John J Lee wrote: > > I suppose I should ask on python-dev if there's a policy / tradition here. > > The rough tradition would be: Thread-safety is good, and library modules > shouldn't be non-threadsafe unless there's a very good reason. Thanks. So, in particular, httplib, urllib and urllib2 are thread-safe (except for problems noted in the source: FTP connection caching in urllib2, FTP content caching in urllib)? > > changing? I suppose I'd also need to just label the .cookies attribute as > > non-threadsafe (or get rid of it, or add a __getattr__ to allow locking it > > -- yuck). > > Assuming .cookies is a Python dictionary (I haven't looked at the CookieJar > code), there's no locking needed. Locking is necessary when a data > structure is temporarily inconsistent, or some invariant is temporarily > broken. Yes, I realise that. .cookies is a nested dict (currently documented as publicly readable, though FWLIW will probably have to cease to be soon, for non-thread related reasons): self.cookies[domain][path][name] So my set_cookie method certainly needs locking, because there are tests like this: c = self.cookies if not c.has_key(cookie.domain): c[cookie.domain] = {} I guess what I was really worrying about, though (without fully realizing it), was higher-level integrity issues over and above mere thread-safety. For example, if one thread is iterating over cookies and reading their values, and halfway through, another thread calls extract_cookies to extract the cookies from an HTTP response, causing some cookies to be added and/or removed, that might cause trouble, but isn't a thread-safety issue (and is the application's problem, not mine). I guess the methods I have for loading / saving to a file also fall into this category, but I'm still a little confused. Since the relevant level of granularity is the bytecode instruction (right?), am I right in assuming you may have to start thinking about what your code looks like in bytecode form? I guess you play with the compiler module until you get to know which operations are single bytecode instructions and which are not? [...] > But if you're assigning to a single attribute (self.filename = 'foo'), > there's no point in time where the attribute is inconsistent, a mix of the > old and new names; instead it's first the old value, and then it's set to > 'foo'. So no lock is needed. OK. I wasn't sure whether that was a single bytecode or not, but I suppose that makes sense given Python's semantics. I saw masses of 'synchronize's on strings in a Java implementation of cookie handling (jCookie), and I'm far from sure what they're all there for... John From amk at amk.ca Tue Oct 28 07:46:46 2003 From: amk at amk.ca (amk@amk.ca) Date: Tue Oct 28 07:46:51 2003 Subject: [Web-SIG] Threading and client-side support In-Reply-To: References: <20031027150709.GA29045@rogue.amk.ca> Message-ID: <20031028124646.GB1095@rogue.amk.ca> On Tue, Oct 28, 2003 at 10:35:33AM +0000, John J Lee wrote: > Thanks. So, in particular, httplib, urllib and urllib2 are thread-safe? No idea; reading the code would be needed to figure that out. > So my set_cookie method certainly needs locking, because there are tests > like this: Correct; that case would need locking. --amk From davidf at sjsoft.com Tue Oct 28 07:52:24 2003 From: davidf at sjsoft.com (David Fraser) Date: Tue Oct 28 07:52:52 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <20031027112648.A27607@lyra.org> References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com> <20031024192925.R71890@onyx.ispol.com> <3F9CCD80.10502@sjsoft.com> <20031027112648.A27607@lyra.org> Message-ID: <3F9E6688.9030805@sjsoft.com> Greg Stein wrote: >On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote: > > >>Gregory (Grisha) Trubetskoy wrote: >> >> >> >>>For what it's worth, I never liked the request/response separation either. >>>I like a single object from which you can read() and to which you can >>>write(), just like a file. Imagine if for file IO you had to have an >>>object to read and another one to write? >>> >>> > >Woah. Nice analogy. Thanks. > > > >>>(I would agree that perhaps "request" is a misnomer, but I can't think of >>>anything better) >>> >>> >>Connection? I think someone suggested "Transaction" for this, but it >>sounds out of place here... >> >> > >Nope. A number of request/response pairs occur on a given connection. The >two are rather independent concepts. That was one of the basic tenets to >my redesign of httplib. The old HTTP(S) classes are individual requests, >which sucks for performance. With the new HTTP(S)Connection, you can open >a connection, and then issue multiple requests over it. > > OK. Now in this case, you clearly can't handle more than one request/response on a single connection at a time. So would it be feasible (I'm not suggesting it's neccessarily a good idea) to use a Connection object, which changes state to reflect the request-responses? Or should a Connection object create separate request-response objects for each event? The reason I'm asking is, surely the response write method will simply flow through to the underlying Connection. Though this may be an implementation detail, it may say something about how the API should work. >The name for the thing can be one of two things, I believe, depending on >where you focus: > > - focus on the transaction itself > - focus on the thing handling the transaction > >Per my original note here, SubWiki tends towards the latter. Each incoming >request instantiates a Handler which deals with both reading/writing at a >basic level (although there are still external entities which treat the >Handler instance like in the first focus type). > >"Transaction" does sound out of place since that has connotations of a >database transaction. I don't have any better suggestions (as I've never >had to ponder a name for it since I didn't choose that focus :-) > >In the first model, the transaction is a passive entity, dealt with by >some other code which does the processing. In the second model, the >transaction and processing are bundled into the same object -- this is >where you'd instantiate some thing and call a "run" method on it, which >Does The Right Thing. I tend to disfavor that model because the conflation >of data and request processing gets to be very cumbersome and tangled. >Instantiating objects, custom to the request (type), to do the processing >is all well and fine, but pass along a (relatively) passive data object to >it (IMO). > >Cheers, >-g > > > Looking at it from an API point of view, the difference is between creating a request-response object structure which any of the various implementors can handle in their code, and creating a handler object structure which the various implementors have to conform to, either by changing the existing implementation or by writing wrappers around their existing code. I think the request-response object idea is clearly simpler from this point of view... David From jjl at pobox.com Tue Oct 28 12:25:54 2003 From: jjl at pobox.com (John J Lee) Date: Tue Oct 28 12:27:19 2003 Subject: [Web-SIG] Threading and client-side support In-Reply-To: <20031028124646.GB1095@rogue.amk.ca> References: <20031027150709.GA29045@rogue.amk.ca> <20031028124646.GB1095@rogue.amk.ca> Message-ID: [background for python-dev-ers: In the process of making my client-side cookie module a suitable candidate for inclusion in the standard library, I'm trying to make it thread-safe] On Tue, 28 Oct 2003 amk@amk.ca wrote: > On Tue, Oct 28, 2003 at 10:35:33AM +0000, John J Lee wrote: > > Thanks. So, in particular, httplib, urllib and urllib2 are thread-safe? > > No idea; reading the code would be needed to figure that out. That might not be helpful if the person reading it (me) has zero threading experience ;-) I certainly plan to gain that experience, but surely *somebody* already knows whether they're thread-safe? I presume they are, broadly, since a couple of violations of thread safety are commented in urllib2 and urllib. Right? John From jbauer at rubic.com Tue Oct 28 15:58:54 2003 From: jbauer at rubic.com (Jeff Bauer) Date: Tue Oct 28 16:00:39 2003 Subject: [Web-SIG] A list is available Message-ID: <3F9ED88E.577AEB9@rubic.com> I haven't had a need for this (well, not since old-Zope PCGI days), but a way to monitor HTTP/S traffic with stdlib tools might represent a valid use case, if not an actual web component. I was thinking of something based around a simple asynchronous proxy server. Anyway, I thought I'd bring it up since Bill Janssen is compiling a list. Jeff Bauer Rubicon Research From moof at metamoof.net Tue Oct 28 17:49:45 2003 From: moof at metamoof.net (Moof) Date: Tue Oct 28 17:53:01 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <20031024132028.C15765@lyra.org> References: <20031024132028.C15765@lyra.org> Message-ID: <3F9EF289.8040500@metamoof.net> Greg Stein wrote: > When you stop and think about it: *every* request object will have a > matching response object. Why have two objects if they come in pairs? You > will never see one without the other, and they are intrinsically tied to > each other. So why separate them? An example where a separate response object is useful, though this could well be due to lazy programming, or could be circumvented other ways: I'm currently writing an app in WebKit, and amongst other things, I find myself writing parts of the page, followed by doing some calculations, followed by writing other parts of the page. Alternatively, I find myself validating user input and doing calculations, and then writing the whole page as a result. Either way, if there's an error that occurs somewhere along the line, due to faulty input, I tell the page to forward the request to another servlet that can handle the errors (normally right back to the servlet that generated the form that inputted the faulty data). It's a bit of a poor man's exception, because Page.forward() doesn't *actually* break out of the current context, so I need to break out manually, either with a break statement or more normally by continuing til an uncaught exception is thrown. The forward directive will be taken into account as soon as the page ends, and will just delete the current response object and call the forwarded servlet with a new response object which will buffer and eventually send out the data that the servlet eventually generates. Then again, it could just be lazy programming on my part. Moof -- Giles Antonio Radford, a.k.a Moof Sympathy, eupathy, and, currently, apathy coming to you at: From moof at metamoof.net Tue Oct 28 17:51:06 2003 From: moof at metamoof.net (Moof) Date: Tue Oct 28 17:54:20 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com> References: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com> Message-ID: <3F9EF2DA.3060604@metamoof.net> Ian Bicking wrote: > In reference to the rest of the discussion -- I think it's enough to say that some people want to distinguish (sometimes) between these two types of variables. Simon is not the only one. It should be an option, because it's not hard to do. We're not telling people how to write their applications, we're giving them the tools to write their applications as they choose, and this is a valid way to write an application. +1 for the reasons stated. It's good to be able to distinguish. On appeasement approach would be do a webkit-like thing. Currently in webkit you can choose to get stuff out of the submitted data (GET and POST are scrunched together) or out of the cookies, or you can just ask for request.value() (which is aliased also to request.__getattr__) which will look in both places and returns the first thing it comes to. so how about a request.postvalues dict, a request.getvalues dict, and a request.values dict (or pseudo-dict) which will return value out of whichever. The main downside I can see with this is a long ensuing argument about whether GET should take precedence over POST or vice-versa. Moof -- Giles Antonio Radford, a.k.a Moof Sympathy, eupathy, and, currently, apathy coming to you at: From gtalvola at nameconnector.com Tue Oct 28 18:08:52 2003 From: gtalvola at nameconnector.com (Geoffrey Talvola) Date: Tue Oct 28 18:09:03 2003 Subject: [Web-SIG] [server-side] request/response objects Message-ID: <61957B071FF421419E567A28A45C7FE59AF763@mailbox.nameconnector.com> Moof wrote: > An example where a separate response object is useful, though > this could > well be due to lazy programming, or could be circumvented other ways: > > I'm currently writing an app in WebKit, and amongst other > things, I find > myself writing parts of the page, followed by doing some calculations, > followed by writing other parts of the page. Alternatively, I find > myself validating user input and doing calculations, and then writing > the whole page as a result. Either way, if there's an error > that occurs > somewhere along the line, due to faulty input, I tell the page to > forward the request to another servlet that can handle the errors > (normally right back to the servlet that generated the form that > inputted the faulty data). > > It's a bit of a poor man's exception, because Page.forward() doesn't > *actually* break out of the current context, so I need to break out > manually, either with a break statement or more normally by continuing > til an uncaught exception is thrown. Actually, in Webware CVS Page.forward() _does_ break out of the current context by raising an EndResponse exception that gets caught in the framework. You are probably using a released version of Webware which doesn't work this way, but instead substitutes a "dummy" response object for the real response object to swallow up any output from the original servlet. I agree with your point, which I take to be this: it's nice to be able to throw away any response that may have accumulated so far and re-process the request. And that seems to argue for separate request and response objects. > > The forward directive will be taken into account as soon as the page > ends, and will just delete the current response object and call the > forwarded servlet with a new response object which will buffer and > eventually send out the data that the servlet eventually generates. I don't think this is how WebKit ever worked. I'm pretty sure that both in current Webware CVS and in previous releases, the forwarded-to servlet processes the request immediately, not when the page ends. > > Then again, it could just be lazy programming on my part. - Geoff From cs1spw at bath.ac.uk Tue Oct 28 18:24:37 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Tue Oct 28 18:24:44 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F9EF2DA.3060604@metamoof.net> References: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com> <3F9EF2DA.3060604@metamoof.net> Message-ID: <3F9EFAB5.2090800@bath.ac.uk> Moof wrote: > On appeasement approach would be do a webkit-like thing. Currently in > webkit you can choose to get stuff out of the submitted data (GET and > POST are scrunched together) or out of the cookies, or you can just ask > for request.value() (which is aliased also to request.__getattr__) which > will look in both places and returns the first thing it comes to. > > so how about a request.postvalues dict, a request.getvalues dict, and a > request.values dict (or pseudo-dict) which will return value out of > whichever. The main downside I can see with this is a long ensuing > argument about whether GET should take precedence over POST or vice-versa. I'm quite fond of request.GET and request.POST personally, but that's my PHP background speaking. I'm not sure that upper case dictionary names are particularly pythonic. request.getvalues and request.postvalues seem a bit verbose to my liking. Is there really a long ensuing argument about precedence of GET over POST? I had always assumed that the standard way of tackling this was for POST data to over-write GET data since POST was the actual HTTP action used in a combined request. -- Simon Willison Web development weblog: http://simon.incutio.com/ From davidf at sjsoft.com Wed Oct 29 00:02:55 2003 From: davidf at sjsoft.com (David Fraser) Date: Wed Oct 29 00:03:03 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <3F9EF289.8040500@metamoof.net> References: <20031024132028.C15765@lyra.org> <3F9EF289.8040500@metamoof.net> Message-ID: <3F9F49FF.2000201@sjsoft.com> Moof wrote: > Greg Stein wrote: > > > When you stop and think about it: *every* request object will have a > > matching response object. Why have two objects if they come in > pairs? You > > will never see one without the other, and they are intrinsically > tied to > > each other. So why separate them? > > > An example where a separate response object is useful, though this > could well be due to lazy programming, or could be circumvented other > ways: > > I'm currently writing an app in WebKit, and amongst other things, I > find myself writing parts of the page, followed by doing some > calculations, followed by writing other parts of the page. > Alternatively, I find myself validating user input and doing > calculations, and then writing the whole page as a result. Either way, > if there's an error that occurs somewhere along the line, due to > faulty input, I tell the page to forward the request to another > servlet that can handle the errors (normally right back to the servlet > that generated the form that inputted the faulty data). > > It's a bit of a poor man's exception, because Page.forward() doesn't > *actually* break out of the current context, so I need to break out > manually, either with a break statement or more normally by continuing > til an uncaught exception is thrown. > > The forward directive will be taken into account as soon as the page > ends, and will just delete the current response object and call the > forwarded servlet with a new response object which will buffer and > eventually send out the data that the servlet eventually generates. > > Then again, it could just be lazy programming on my part. > > Moof There's no requirement that just because the API defines a response object that is the same as the request object, that you have to use that object to build up your response. The response side of the request object would mainly be used to *write* the response back to the client. Since once you have started writing, you can't throw it away, it seems to me your situation would be entirely the same (you would have your own "response" which you would only write back when you wanted to) David From aquarius-lists at kryogenix.org Wed Oct 29 01:21:14 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Wed Oct 29 01:19:25 2003 Subject: [Web-SIG] Form field dictionaries References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk> Message-ID: Simon Willison spoo'd forth: > Gregory Collins wrote: >>>I think this is adequately addressed in the FieldStorage starting with >>>Python 2.2 with getfirst() and getlist(): >> >> I agree, I think this is the appropriate solution; I'd rather see all >> the typechecking pushed down into the library function rather than >> being exposed to the programmer. If the argument I'm looking for >> doesn't make sense as a list then I wouldn't care if it was given >> twice; if I'm expecting something to be a list then I'd want it to be >> a list even if it were empty or singleton. > > The vast majority of data sent from forms coems in as simple name/value > pairs, which are crying out to be accessed from a dictionary. This is my > problem with the current FieldStorage() class - it forces you to write > code like this: > > username = form.getfirst("username", "") > > When code like this is far more intuitive: > > username = form['username'] Would it be worth having form['fieldname'] default to doing a getfirst()? That way, if you're *expecting* a list, you can look for one by doing form.getlist("username") and if not you just get one entry (getfirst should possibly be getlast, but that's a different issue). This is a bit non-discoverable, though... sil -- "Computer games don't affect kids. I mean if Pacman had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music." -- Kristian Wilson, Nintendo From ianb at colorstudy.com Wed Oct 29 01:46:22 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 01:46:42 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: Message-ID: <9C0B687E-09DB-11D8-ABB3-000393C2D67E@colorstudy.com> On Wednesday, October 29, 2003, at 12:21 AM, Stuart Langridge wrote: > Would it be worth having form['fieldname'] default to doing a > getfirst()? That way, if you're *expecting* a list, you can look for > one by doing form.getlist("username") and if not you just get one entry > (getfirst should possibly be getlast, but that's a different issue). > This is a bit non-discoverable, though... getfirst, getlast? Why ever would you choose one over the other? (Why ever would you choose either?) Explicit is better than implicit. In the face of ambiguity, refuse the temptation to guess. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Wed Oct 29 02:17:57 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 02:18:38 2003 Subject: [Web-SIG] So what's missing? In-Reply-To: Message-ID: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com> On Monday, October 27, 2003, at 09:00 AM, John J Lee wrote: > On Sun, 26 Oct 2003, Ian Bicking wrote: >> On Sunday, October 26, 2003, at 07:24 AM, John J Lee wrote: > [...] >> Essentially we'd just move HTTPBasicAuthHandler.http_error_401 into >> HTTPHandler. You could still override it, and HTTPBasicAuthHandler >> would still override it (and somewhat differently, because >> HTTPHandler.http_error_401 should handle both basic and digest auth). >> It's a pretty small change, really. > > So is the benefit. It's > > a = HTTPBasicAuthHandler() > a.add_password(user="joe", password="joe") > o = build_opener(a) > > vs. > > o = build_opener(HTTPHandler(user="joe", password="joe")) > > > (assuming defaults for realm and uri -- BTW, there seems to be an > HTTPPasswordMgrWithDefaultRealm already, which I guess is some way to > what > you want) Yes, I just recently noticed that too. Why it is implemented in a separate class I cannot fathom. > If we're still using build_opener, and HTTPBasicAuthHandler were to > override HTTPHandler, it would have to be derived from it. Not that a > build_opener work-alike couldn't be devised, of course. > > [...] >>> I'm still waiting for that example. >> >> I thought I gave examples: documentation, proliferation of classes, >> non-orthogonality of features (e.g., HTTPS vs. HTTP isn't orthogonal >> to >> authentication). > > Lack of documentation doesn't justify changes to the code. There is > not > any harmful proliferation of classes, I think: the function of the > handlers is pretty obvious in most cases (though obviously the docs > could > be better). I don't recognize the orthogonality problem you're > referring > to. I'm not as concerned with the internals, but rather the exposed interface. This isn't a concern purely about lack of documentation either, but about the thoroughness and conciseness of that documentation. A good interface lends itself to good documentation. I don't think this interface can result in good documentation -- it will either be incomplete, difficult to navigate, or verbose (or all), as a reflection of the way in which internal implementation is exposed. > [...] >> urlopen('http://whatever.com', >> username='bob', >> password='secret', >> postFields={...}, >> postFiles={'image': ('test.jpg', '... image body ...')}, >> addHeaders={'User-Agent': 'superbot 3000'}) > [...] >> write than any OO-based system. I'm concerned about the external ease >> of use, not the internal conceptual integrity. > > OK, maybe I'm overconcerned about this layer -- if it's a simple > convenience thing like this, fine (as long as it actually is useful > and simple, of course). > > My biggest concern was that you seemed to be advocating a new UserAgent > class, which would presumably more-or-less duplicate OpenerDirector > (you > probably want to skip to the end of this post at this point, because I > think you may have missed a crucial point about that class). > OpenerDirector is not such a great name, actually: maybe UserAgent or > URLOpener would have been better... > >>>> authentication information (and it doesn't obey browser URL >>>> conventions, like http://user:password@domain/). >>> >>> What is that convention? Is it standardised in an RFC? >> >> It's a URL convention that's been around a very long time, I don't >> know >> if it is in an RFC. >> >>> I see >>> ProxyHandler knows about that syntax. Obviously it's not an >>> intrinsic >>> limitation of the handler system. >> >> I don't really know how a handler is chosen -- can it figure out >> whether it should use HTTPHandler, HTTPBasicAuthHandler, or >> HTTPDigestAuthHandler just from this URL? Obviously basic vs. digest >> can't be determined until you try to fetch the object. > > The user and password here are for the proxy, not the server (there's > some > code duplication here actually, but that's just a bug). Dunno if > that's > standard use of that syntax. > > > [...] >>> Mind you, if your idea can do the same job as my RFE, then it should >>> certainly be considered alongside that. >> >> Hmm... I just looked at the RFE now, so I'm still not sure what it >> would mean to this. > > Sorry, I don't understand 'what it would mean to this'. What's 'this'? This discussion. >>>> Yet none of these features >>>> would be all that difficult to add via urlopen or perhaps other >>>> simple >>>> functions, (instead of via classes). I don't think there's any need >>>> for classes in the external API -- fetching URLs is about doing >>>> things, >>>> not representing things, and functions are easier to understand for >>>> doing. >>> >>> Details? The only example you've given so far involved a UserAgent >>> class. >> >> Details about what? Your asking for details and examples, but I've >> provided some already and I don't know what you're looking for. > > You provided some examples of features you think would require some > kind > of layer on top of urllib2. I thought you were originally suggesting a > new UserAgent class or similar (that was you, wasn't it?). I don't > think > that's necessary. In the context of stateful HTTP requests, yes, I still think some object along the lines of a UserAgent is the best interface. > But in the post I'm replying to here, you gave an example of adding > args > to urlopen. I do agree that something like that could be useful. I > think > the docs should be changed here to make it clear that urlopen is just a > convenience function that uses a global OpenerDirector. > > [...] >>>> I think fetching and caching are two separate things. The caching >>>> requires a context. The fetching doesn't. I think fetching things >>> >>> The context is provided by the handler. >> >> But we're fetching URLs, not handlers. The URL is context-less, >> intrinsically. The handler isn't context-less, but that's part of >> what >> I don't like about urllib2's handler-oriented perspective. > > I don't understand what you just said, but I think we're agreed > something > that doesn't require calling build_opener or OpenerDirector.add_handler > could be convenient. Okay, good. That my statement was nonsensical was part of my point, but that's probably not a helpful way to make a point ;) >>> [...] >>>> I also don't see how caching would fit very well into the handler >>>> structure. Maybe there'd be a HTTPCachingHandler, and you'd >>>> instantiate it with your caching policy? (where it stores files, how >>>> many files, etc) Also a HTTPBasicAuthCachingHandler, >>>> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on? This >>>> caching is orthogonal -- not just to things like authentication, but >>> >>> My assumption was that it wasn't orthogonal, since RFC 2616 seems to >>> have >>> rather a lot to say on the subject. >> >> Well, if they aren't orthogonal, then they should all be implemented >> in >> a single class. > > Yes. Off the top of my head, I'd say something like (taking note of > your > point below about needing to actually cache responses as well as return > cached data!): > > class AbstractHTTPCacheHandler: > def cached_open(self, request): > # return cached response, or None if no cache hit > def cache(self, response): > # cache response if appropriate > > class HTTPCacheHandler(AbstractHTTPCacheHandler): > http_open = cached_open > http_response = cache > > or, if you want a class that does both HTTP and HTTPS: > > class HTTPXCacheHandler(AbstractHTTPCacheHandler): > https_open = http_open = cached_open > https_response = http_response = cache > > > [...] >> Why not have just one good HTTP handler class? > > Why would you want one when you can easily do whatever you want with a > convenience function or two, and / or a class derived from > OpenerDirector, > or something that works like build_opener, etc.? Not so easy to go in > the > other direction, and separate out the various features of a big, > all-singing all-dancing HTTP handler. That was a big part of the > motivation for urllib2 in the first place: inflexibility of urllib. Why would I want two pieces if I could have one that can do both their jobs? And why fold different ideas together into one notion of handler? HTTP and HTTPS are almost exactly the same. Basic and digest auth are almost exactly the same. Using a cache and not using a cache are almost exactly the same. All these details can be combined reliably in many ways, but the structure of handlers seems to get in the way. But maybe this comes down to a disagreement about coding aesthetics. I don't like inheritance, especially when it gets clever. But if that's just an implementation detail, then eh... I can live. It's when it gets exposed through the public interface (as it is in urllib2) that it bothers me. [...] >> 2 won't work, since CacheHandler can't >> return None and let someone else do the work, because it has to know >> about what the result is so that it can cache the result. > > At last, a real problem! Actually, I think this is a problem already > solved by my 'processors' idea, though perhaps not quite in its current > form -- that should be easy to fix, though (ATM, IIRC, they're separate > from handlers: you can't have an object that is both a handler and a > processor -- and they don't currently have default_request and > default_response methods, either). The processors really sound like wrappers to me. >> I missed that when you posted it. That might handle some of these >> features. It seems a little too global to me. For instance, how >> would >> you handle two distinct user agents with respect to the referer >> header? > > Two OpenerDirectors! > > new_opener = build_opener() > new_opener.addheaders = [("User-agent", "Mozilla/5.0")] > > old_opener = build_opener() > old_opener.addheaders = [("User-agent", "Mozilla/4.0")] > > new_opener.open("http://www.a.com/") > old_opener.open("http://www.b.com/") Okay, I didn't realize that. That makes it much better, though the name OpenerDirector distracts. >> Seems like it would also make sense as a OpenerDirectory >> subclass/wrapper. > > IIRC, there are issues with redirection that prevent that. How so? For instance, with referer, don't you essentially just want to do something like: class RefererDirector(OpenerDirector): def __init__(self): OpenerDirector.__init__(self) self.last_url = '' def open(self, fullurl, data=None): if isinstance(fullurl, str): fullurl = Request(fullurl) if self.last_url: fullurl.add_header('HTTP-Referer', self.last_url) result = OpenerDirector.open(self, fullurl, data=data) self.last_url = result.geturl() return result This is essentially how a browser works, isn't it? Does a header get lost somewhere? If so, then that seems like a bug in the handler. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Wed Oct 29 01:51:20 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 02:18:41 2003 Subject: [Web-SIG] A list is available (http://www.parc.com/janssen/web-sig/needed.html) In-Reply-To: <03Oct27.152114pst."58611"@synergy1.parc.xerox.com> Message-ID: <4D8D751E-09DC-11D8-ABB3-000393C2D67E@colorstudy.com> On Monday, October 27, 2003, at 05:21 PM, Bill Janssen wrote: > I'll try to act as a scribe and gather various individual suggestions > together. Please feel free to send mail to correct any malscription > you spot. > > http://www.parc.com/janssen/web-sig/needed.html Should this list go on the Wiki? A list was already started at http://www.python.org/cgi-bin/moinmoin/WebSIGTasks -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Wed Oct 29 02:32:23 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 02:32:30 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <20031024132028.C15765@lyra.org> Message-ID: <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com> On Friday, October 24, 2003, at 03:20 PM, Greg Stein wrote: > In the most recent incarnation of a webapp of mine (subwiki), I almost > went with a request/response object paradigm and even started a bit of > refactoring along those lines. However, I ended up throwing out that > dual-object concept. > > When you stop and think about it: *every* request object will have a > matching response object. Why have two objects if they come in pairs? > You > will never see one without the other, and they are intrinsically tied > to > each other. So why separate them? The biggest justification for me is: because that's what everyone does. SkunkWeb doesn't separate them, but I can't think of any others in Python. The request/response distinction is ubiquitous throughout web programming. I guess it's natural to people. But it doesn't even matter why: it is the way it is. Another justification is that the request is essentially static. It is created and complete, then it is processed. When the request is complete, the response has just barely begun existence. The request object could very well be immutable at this point. (Unfortunately that probably would make compatibility with previous code too difficult, but that's an aside) You can very reasonably pass around the request with the expectation that the response will not be touched, or even vice versa (though that is less common -- which is a bit backwards if you follow a convention that the response belongs to the request). The request and response aren't particularly interwoven either. Request cookies have nothing to do with response cookies (and any attempt to combine their semantics would be futile). Request variables follow arcane paths through all kinds of representations when you trace them back to their source. And then there's simply the naming issue: request and response are pretty clear names. Everyone knows what they are. Everyone can guess at their interface, and certainly can read their interface. There's no compelling alternative name for the combined object -- "handler" implies almost nothing, "transaction" implies the incorrect thing, "connection" implies a low-level interface... The difficulty of writing, say, request.response.write(something) vs. handler.write(something) doesn't seem like a big deal to me. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Wed Oct 29 06:56:59 2003 From: jjl at pobox.com (John J Lee) Date: Wed Oct 29 06:57:17 2003 Subject: urllib2.UserAgent [was: Re: [Web-SIG] So what's missing?] In-Reply-To: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com> References: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: OK, having slept on it, I just had a tiny epiphany. I haven't been listening to my own arguments. Since OpenerDirector is *already* a UserAgent-type thing, but not a very friendly one, we should just create a new OpenerDirector, and name it UserAgent. I don't see that as a wrapper, so my delicate sensibilities aren't offended by it ;-) So, I'm persuaded: sorry it took me so long... Problems to be solved: - awkward to dynamically change behaviour of user-agent -- you have to build a OpenerDirector every time you want to change things - unhelpful separation by default of HTTP and HTTPS - unhelpful separation by default of various server authentication schemes - no ability to do partial fetches - no ability to do HEAD and PUT ...any more? The last two need changes in the rest of urllib2, of course. I'll have a look at some of the Perl & Java UserAgent-type classes for ideas, and probably write a class to base discussions on. On Wed, 29 Oct 2003, Ian Bicking wrote: [...about my processors patch to urllib2...] > The processors really sound like wrappers to me. [...] No, they work rather like handlers, and are definitely internal to urllib2. > >> Seems like it would also make sense as a OpenerDirectory > >> subclass/wrapper. > > > > IIRC, there are issues with redirection that prevent that. > > How so? For instance, with referer, don't you essentially just want to > do something like: I've forgotten the details, but I'm pretty confident they're not very interesting :-) They're in my bug tracker item, I think. John From david at sundayta.com Wed Oct 29 08:21:55 2003 From: david at sundayta.com (david) Date: Wed Oct 29 08:22:00 2003 Subject: [Web-SIG] Request and Response objects Message-ID: <3F9FBEF3.5090307@sundayta.com> Hi, New to the list, but I have read the archive. There was a discussion about whether there should be a single object for both request and response. I would like to suggest that the best is pinched from the solution used by Turbine http://jakarta.apache.org/turbine/turbine-2.3/apidocs/org/apache/turbine/util/RunData.html This is a single object that contains the request and response as well as anything else useful to pass around the server. Would that be a way to move forward? By the way I like names that include context for this eg a RunContext that contains Request and Response Just a small 2c. Dave -- David Warnock: http://davew.typepad.com/42 | Sundayta Ltd: http://www.sundayta.com iDocSys for Document Management. VisibleResults for Fundraising. Development and Hosting of Web Applications and Sites. From grisha at modpython.org Wed Oct 29 09:26:08 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Oct 29 09:26:43 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk> Message-ID: <20031029092503.E78438@onyx.ispol.com> On Wed, 29 Oct 2003, Stuart Langridge wrote: > Would it be worth having form['fieldname'] default to doing a > getfirst()? +1 on this From grisha at modpython.org Wed Oct 29 09:29:34 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Oct 29 09:29:37 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031029092503.E78438@onyx.ispol.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk> <20031029092503.E78438@onyx.ispol.com> Message-ID: <20031029092715.M78438@onyx.ispol.com> On Wed, 29 Oct 2003, Gregory (Grisha) Trubetskoy wrote: > > > On Wed, 29 Oct 2003, Stuart Langridge wrote: > > > Would it be worth having form['fieldname'] default to doing a > > getfirst()? > > +1 on this > Sorry I take it back - I misread it (not finished my coffee yet) - I am in favor of form['fieldname'] to act the same as getfirst() only if there is a single element, otherwise it should return a list. Grisha From barry at python.org Wed Oct 29 09:30:50 2003 From: barry at python.org (Barry Warsaw) Date: Wed Oct 29 09:30:55 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <20031029092715.M78438@onyx.ispol.com> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk> <20031029092503.E78438@onyx.ispol.com> <20031029092715.M78438@onyx.ispol.com> Message-ID: <1067437849.4918.12.camel@anthem> On Wed, 2003-10-29 at 09:29, Gregory (Grisha) Trubetskoy wrote: > I am in favor of form['fieldname'] to act the same as getfirst() only if > there is a single element, otherwise it should return a list. +1 -Barry From davidf at sjsoft.com Wed Oct 29 09:40:42 2003 From: davidf at sjsoft.com (David Fraser) Date: Wed Oct 29 09:41:51 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <1067437849.4918.12.camel@anthem> References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com> <3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk> <20031024114945.M26153@onyx.ispol.com> <87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk> <20031029092503.E78438@onyx.ispol.com> <20031029092715.M78438@onyx.ispol.com> <1067437849.4918.12.camel@anthem> Message-ID: <3F9FD16A.404@sjsoft.com> Barry Warsaw wrote: >On Wed, 2003-10-29 at 09:29, Gregory (Grisha) Trubetskoy wrote: > > > >>I am in favor of form['fieldname'] to act the same as getfirst() only if >>there is a single element, otherwise it should return a list. >> >> > >+1 > > > -2! (That's two factorial :-) I want form['fieldname'] to always return a single element. You should always know when you're wanting a list. Returning a list otherwise requires lots of exceptional-case-checking code that is unneccessary. But then I'm just repeating myself... David From sholden at holdenweb.com Wed Oct 29 10:11:33 2003 From: sholden at holdenweb.com (Steve Holden) Date: Wed Oct 29 10:16:35 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <3F9FD16A.404@sjsoft.com> Message-ID: [David Fraser] > Barry Warsaw wrote: > > >On Wed, 2003-10-29 at 09:29, Gregory (Grisha) Trubetskoy wrote: > > > > > > > >>I am in favor of form['fieldname'] to act the same as > getfirst() only if > >>there is a single element, otherwise it should return a list. > >> > >> > > > >+1 > > > > > > > -2! > (That's two factorial :-) > I want form['fieldname'] to always return a single element. > You should always know when you're wanting a list. > Returning a list otherwise requires lots of exceptional-case-checking > code that is unneccessary. > But then I'm just repeating myself... > In which case, let _me_ repeat _myself_ :-) If an argument has multiple values, this should only be handled if the processing element (page code) has indicated that multiple values are acceptable for that argument. When an argument is possibly multi-valued, form['fieldname'] should *always* be a list, even if it has only one element [and I don't see why it shouldn't be legal to see an empty list if the argument doesn't appear in the URL or POST input at all]. If no indication has been given that multiple occurrences are acceptable then an exception should be raised which, if not trapped by the web app, should eventually result in (say) a 422 (unprocessable entity) or a 406 (not acceptable) server response. When an argument is *not* allowed to be multi-valued then form['fieldname'] should return a string, and an error should be raised if the argument has multiple occurrences. If the argument doesn't appear in the URL or POST data at all then it's arguable that a KeyError should be raised, again resulting in a server error if untrapped. I'd be prepared to allow a "sloppy" option to have form['fieldname'] return an empty string under those circumstances, a la ASP, and to return the first of multiple occurrences. But I *do* think that "sloppy" would be an apposite name for such tactics. if we're building a new API then for heaven's sake let's not repeat the mistakes of earlier generations. And let's try not to have each of these discussions more that two or three times a month :-) regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From neel at mediapulse.com Wed Oct 29 10:21:00 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Wed Oct 29 10:21:06 2003 Subject: [Web-SIG] Request and Response objects Message-ID: > There was a discussion about whether there should be a single > object for > both request and response. At first I thought that having a separate request and response object didn't offer any advantages. This is most likely because I've work with Apache for so long, which only has one object to handle both. Upon more thought though, I'm starting to think having them as separate objects might be better. Separate, a project could focus only on the side of the process they are interested in. An example would be an XLST engine. So in theory take in any request object, from cgi, mod_python, python's stdlib server and prepare my XML response based on the request, then pass this XML data to the XLST response object to do the skinning. Since the developers of this XLST response object have no reason to care about the request side, it seems better that they don't need to even be aware of it. On a related note, for all those out there like mod_python that have in place a request or request and response objects now, I think the best solution would be for them to include a conversion function in their objects to convert their foramts to whatever the SIG comes up with as the standards. I *do not* want mod_python to match the SIG's standard, I want it to match the Apache API; but being able to convert between the two at the cost of a few cpu cycles would be great. On a less related note, I don't know if XLST parser made the list yet, but if it could be added it's something I would really like to see. Mike From barry at python.org Wed Oct 29 12:12:21 2003 From: barry at python.org (Barry Warsaw) Date: Wed Oct 29 11:12:18 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: References: Message-ID: <1067447541.13656.3.camel@geddy> On Wed, 2003-10-29 at 10:11, Steve Holden wrote: > If an argument has multiple values, this should only be handled if the > processing element (page code) has indicated that multiple values are > acceptable for that argument. When an argument is possibly multi-valued, > form['fieldname'] should *always* be a list, even if it has only one > element [and I don't see why it shouldn't be legal to see an empty list > if the argument doesn't appear in the URL or POST input at all]. If no > indication has been given that multiple occurrences are acceptable then > an exception should be raised which, if not trapped by the web app, > should eventually result in (say) a 422 (unprocessable entity) or a 406 > (not acceptable) server response. I tend to agree with Steve here, but maybe we can have our cake and eat it too. Dumb-ass suggestion of the day: what if the field values were represented by a dict subclass, and we had several different subclasses, each of which specified the exact behavior for __getitem__(). E.g. David could have his "_getitem__ is getfirst" behavior, Steve could have his verified-multiples behavior, and I could have my "always return a list" behavior. We'd then be reduced to choosing a default and a few interfaces and everyone would be happy . -Barry From mailinglists at qinternet.com Wed Oct 29 11:17:39 2003 From: mailinglists at qinternet.com (Brian Olsen - Lists) Date: Wed Oct 29 11:17:41 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: <6A4B74EA-0A2B-11D8-9FB2-000502B9AE42@qinternet.com> On Wednesday, October 29, 2003, at 02:32 AM, Ian Bicking wrote: > On Friday, October 24, 2003, at 03:20 PM, Greg Stein wrote: >> In the most recent incarnation of a webapp of mine (subwiki), I almost >> went with a request/response object paradigm and even started a bit of >> refactoring along those lines. However, I ended up throwing out that >> dual-object concept. >> >> When you stop and think about it: *every* request object will have a >> matching response object. Why have two objects if they come in pairs? >> You >> will never see one without the other, and they are intrinsically tied >> to >> each other. So why separate them? > > The biggest justification for me is: because that's what everyone > does. SkunkWeb doesn't separate them, but I can't think of any others > in Python. The request/response distinction is ubiquitous throughout > web programming. I guess it's natural to people. But it doesn't even > matter why: it is the way it is. > Another justification is that the request is essentially static. It > is created and complete, then it is processed. When the request is > complete, the response has just barely begun existence. The request > object could very well be immutable at this point. (Unfortunately > that probably would make compatibility with previous code too > difficult, but that's an aside) You can very reasonably pass around > the request with the expectation that the response will not be > touched, or even vice versa (though that is less common -- which is a > bit backwards if you follow a convention that the response belongs to > the request). > The request and response aren't particularly interwoven either. > Request cookies have nothing to do with response cookies (and any > attempt to combine their semantics would be futile). Request > variables follow arcane paths through all kinds of representations > when you trace them back to their source. > > And then there's simply the naming issue: request and response are > pretty clear names. Everyone knows what they are. Everyone can guess > at their interface, and certainly can read their interface. There's > no compelling alternative name for the combined object -- "handler" > implies almost nothing, "transaction" implies the incorrect thing, > "connection" implies a low-level interface... > > The difficulty of writing, say, request.response.write(something) vs. > handler.write(something) doesn't seem like a big deal to me. Reading this thread, it sounds more like an aesthetic choice than anything. I like single objects, but this is also aesthetic. (Maybe you can call the single object HTTPConnection? That's what it is, no?) But if I am going to fight something for no particular reason, it will be against dual-objects, just to be against the dual-object status quo. :-) Fight for the single-object!! Beian From ianb at colorstudy.com Wed Oct 29 11:17:53 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 11:17:57 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <1067447541.13656.3.camel@geddy> Message-ID: <7325A4A1-0A2B-11D8-ABB3-000393C2D67E@colorstudy.com> On Wednesday, October 29, 2003, at 11:12 AM, Barry Warsaw wrote: > Dumb-ass suggestion of the day: what if the field values were > represented by a dict subclass, and we had several different > subclasses, > each of which specified the exact behavior for __getitem__(). E.g. > David could have his "_getitem__ is getfirst" behavior, Steve could > have > his verified-multiples behavior, and I could have my "always return a > list" behavior. We'd then be reduced to choosing a default and a few > interfaces and everyone would be happy . That would make me unhappy... next thing you know, you'll be introducing a magic quoting dict subclass... -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Wed Oct 29 11:31:59 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 11:39:56 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: Message-ID: <6B07DB2A-0A2D-11D8-ABB3-000393C2D67E@colorstudy.com> On Wednesday, October 29, 2003, at 09:11 AM, Steve Holden wrote: > In which case, let _me_ repeat _myself_ :-) > > If an argument has multiple values, this should only be handled if the > processing element (page code) has indicated that multiple values are > acceptable for that argument. When an argument is possibly > multi-valued, > form['fieldname'] should *always* be a list, even if it has only one > element [and I don't see why it shouldn't be legal to see an empty list > if the argument doesn't appear in the URL or POST input at all]. If no > indication has been given that multiple occurrences are acceptable then > an exception should be raised which, if not trapped by the web app, > should eventually result in (say) a 422 (unprocessable entity) or a 406 > (not acceptable) server response. > > When an argument is *not* allowed to be multi-valued then > form['fieldname'] should return a string, and an error should be raised > if the argument has multiple occurrences. If the argument doesn't > appear > in the URL or POST data at all then it's arguable that a KeyError > should > be raised, again resulting in a server error if untrapped. We can also handle this through the particulars of the method calls we use. E.g.: def getone(self, field, default=NoDefault): try: value = self._rawfields[field] if isinstance(value, list): raise BadRequestError, "Multiple values were not expected for the field %s" % field return value except KeyError: if default is NoDefault: raise return default This doesn't require any declaration, and follows the typical implicit type checking that's usually done in Python code. Of course, that BadRequestError is another (important) point of discussion. > I'd be prepared to allow a "sloppy" option to have form['fieldname'] > return an empty string under those circumstances, a la ASP, and to > return the first of multiple occurrences. But I *do* think that > "sloppy" > would be an apposite name for such tactics. > > if we're building a new API then for heaven's sake let's not repeat the > mistakes of earlier generations. And let's try not to have each of > these > discussions more that two or three times a month :-) I'm not sure if people will ever really be happy with one decision. Which makes me feel like we should just expose the status quo -- you get a dictionary that might contain lists -- and let people process that as they want. If we provide multiple options, then we do so explicitly and without strong bias. http://cvs.sourceforge.net/viewcvs.py/*checkout*/webware-sandbox/ Sandbox/ianbicking/FormEncode/DictCall.txt?content- type=text%2Fplain&rev=1.1 http://cvs.sourceforge.net/viewcvs.py/*checkout*/webware-sandbox/ Sandbox/ianbicking/FormEncode/DictCall.py?content- type=text%2Fplain&rev=1.1 This uses a method signature to handle list conversion and a bunch of other conversions as well, like ints, ordered lists, and dictionaries. But it's not complete, and I doubt it could be made into something complete (and I've already moved on). OTOH, something that was complete would be burdensome in some situations. And other possible features, like Zope's :action or Webware's _action_ fields, are naturally tied to a specific environment. List vs. string fields are just the tip of an iceberg of general validation, and validation is not something we can tackle right now (at least not for the standard library). -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From grisha at modpython.org Wed Oct 29 16:12:35 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Oct 29 16:12:41 2003 Subject: [Web-SIG] htmlgen Message-ID: <20031029161141.G82536@onyx.ispol.com> Should HTML-generating capability a la HTMLgen go on the missing list as well? Grisha From jjl at pobox.com Wed Oct 29 16:20:50 2003 From: jjl at pobox.com (John J Lee) Date: Wed Oct 29 16:21:23 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031029161141.G82536@onyx.ispol.com> References: <20031029161141.G82536@onyx.ispol.com> Message-ID: On Wed, 29 Oct 2003, Gregory (Grisha) Trubetskoy wrote: > Should HTML-generating capability a la HTMLgen go on the missing list as > well? The trouble is, there's no "one obvious way" to do this, so I'd think it's not a great candidate for the standard library. John From ianb at colorstudy.com Wed Oct 29 16:29:52 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 16:29:56 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031029161141.G82536@onyx.ispol.com> Message-ID: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com> On Wednesday, October 29, 2003, at 03:12 PM, Gregory (Grisha) Trubetskoy wrote: > Should HTML-generating capability a la HTMLgen go on the missing list > as > well? It seems like a good candidate -- it's been around a long time (in one form or another), its scope is very defined, and it's something people often look for. HTMLgen has some quirkiness to it, though. It's not as tight as a simple HTML generator could be. Would it make sense to use a more minimal XML generator, that could also do XHTML generation (maybe with a little validation thrown in)? Is there any library like this already included in Python xml packages? I do like generating HTML with a Python syntax (when in-code HTML generation is called for). Quixote's PTL has some stuff related to this as well (at least related to quoting), but I don't remember much about it. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From randyp at cycla.com Wed Oct 29 16:30:38 2003 From: randyp at cycla.com (Randy Pearson) Date: Wed Oct 29 16:31:07 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: Message-ID: <21294262503873@dserver.cycla.com> The case seems to be whether to have two closely-coordinated request/response objects versus a single object. I can see a few points in favor of the former: 1. Independence for extensibility. By having separate classes for each object, they are free to grow independently. So, if Jane develops an intersesting extension to the Request class (by way of subclassing), and Bob does the same for the Response class, it becomes much easier to combine these in a best-of-breed approach. If all one class, this would be difficult or impossible. 2. Multiplicity. If a single object is used, there is an implicit assumption of a 1:1 relation between requests and responses. But is that always the case? Consider two cases. Case 1: Your "response" to a request includes both the standard response _and_ a generated email of content-type text/html. Case 2: You have a mixed-mode site that includes both static and dynamic content, and in some instances you update some of the static (published) content in response to an incoming request. In both of these cases, you are producing more than one "response", and if your response class encapsulates the ability to produce both, you might easily want to operate on multiple response objects in parallel. 3. Timing. If processing a request may cause a time-out, you may prefer to queue the request and provide the usual auto-refresh type of HTML response, polling for completion status. In this case, you have new needs: the ability to queue a request and the ability to store and resurrect a response. It's hard to see a single combined object dealing with all of this. Perhaps some form of a mediator or facade could be created to provide an interface between these objects, but in any event, they don't strike me as deriving from a single class. -- Randy From jjl at pobox.com Wed Oct 29 16:51:24 2003 From: jjl at pobox.com (John J Lee) Date: Wed Oct 29 16:51:34 2003 Subject: urllib2.UserAgent [was: Re: [Web-SIG] So what's missing?] In-Reply-To: References: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: On Wed, 29 Oct 2003, John J Lee wrote: [...] > Problems to be solved: > > - awkward to dynamically change behaviour of user-agent -- you have > to build a OpenerDirector every time you want to change things > - unhelpful separation by default of HTTP and HTTPS > - unhelpful separation by default of various server authentication > schemes > - no ability to do partial fetches > - no ability to do HEAD and PUT > > ...any more? > > The last two need changes in the rest of urllib2, of course. [...] A few other things this class should handle (eventually) in a friendly fashion. Some of them require work to httplib / urllib / urllib2. - timeouts - connection caching - robots.txt observance (using existing std. lib. module) - caching - convenient debugging (showing redirections, response bodies, etc.) - cookies - HTML HEAD section http-equiv handling - Refresh handling - seekability of responses (required for doing http-equiv) - control of From and User-Agent headers; maybe just leave this as- is: ie. the addheaders attribute John From cs1spw at bath.ac.uk Wed Oct 29 17:26:43 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Wed Oct 29 18:13:54 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com> References: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: <3FA03EA3.9000907@bath.ac.uk> Ian Bicking wrote: > HTMLgen has some quirkiness to it, though. It's not as tight as a > simple HTML generator could be. Would it make sense to use a more > minimal XML generator, that could also do XHTML generation (maybe with a > little validation thrown in)? Is there any library like this already > included in Python xml packages? I do like generating HTML with a > Python syntax (when in-code HTML generation is called for). I think any XML generating libraries would be more suited to the xml package than the web package. Personally I'm not too keen on including any HTML generation or templating capabilities in the standard library - a large number of templating systems already exist, mainly because there's no "one obvious way" of doing templating so people tend to go with different solutions based on personal preference. As a side note, I'm a big fan of this XMLWriter class for generating XML. It uses an extremely simple stack based push/pop model: http://www.xml.com/pub/a/2003/04/09/py-xml.html -- Simon Willison Web development weblog: http://simon.incutio.com/ From janssen at parc.com Wed Oct 29 18:48:07 2003 From: janssen at parc.com (Bill Janssen) Date: Wed Oct 29 18:48:32 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: Your message of "Tue, 28 Oct 2003 23:32:23 PST." <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: <03Oct29.154815pst."58611"@synergy1.parc.xerox.com> I think you can separate them and combine them at the same time, without much trouble. For instance, Ian used the example "request.response.write()", implying that the response object is accessible from the request object, which makes sense to me. So in one view, there's just one object, the request, and the response object is just a part of that. But for those who prefer it, it's easy to assign response = request.response and deal with the two different variables. Bill From cs1spw at bath.ac.uk Wed Oct 29 19:26:34 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Wed Oct 29 19:26:40 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <03Oct29.154815pst."58611"@synergy1.parc.xerox.com> References: <03Oct29.154815pst."58611"@synergy1.parc.xerox.com> Message-ID: <3FA05ABA.5050909@bath.ac.uk> Bill Janssen wrote: > I think you can separate them and combine them at the same time, > without much trouble. For instance, Ian used the example > "request.response.write()", implying that the response object is > accessible from the request object, which makes sense to me. So in > one view, there's just one object, the request, and the response > object is just a part of that. But for those who prefer it, it's easy > to assign > > response = request.response > > and deal with the two different variables. I have to admit I prefer keeping the two completely separate, as is done by the Java servlet specification. As mentioned by someone else, the big difference between the two is that request should be read only while response can have its state altered. An advantage of this is that you can potentially do interesting things with the two objects - like pickling the request object and logging it somewhere, or pickling the response object and caching it once it has been populated to speed up future requests for the same data. I can see the POV of people who prefer a single object or nested objects as well though. This is going to be a tricky issue to resolve. If there are no utterly convincing arguments for one approach or the other we could take it to a vote? -- Simon Willison Web development weblog: http://simon.incutio.com/ From janssen at parc.com Wed Oct 29 20:04:37 2003 From: janssen at parc.com (Bill Janssen) Date: Wed Oct 29 20:05:03 2003 Subject: [Web-SIG] htmlgen In-Reply-To: Your message of "Wed, 29 Oct 2003 13:12:35 PST." <20031029161141.G82536@onyx.ispol.com> Message-ID: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com> Can you describe what HTMLgen does? Bill > > Should HTML-generating capability a la HTMLgen go on the missing list as > well? > > Grisha From janssen at parc.com Wed Oct 29 20:09:34 2003 From: janssen at parc.com (Bill Janssen) Date: Wed Oct 29 20:11:20 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: Your message of "Wed, 29 Oct 2003 16:26:34 PST." <3FA05ABA.5050909@bath.ac.uk> Message-ID: <03Oct29.170937pst."58611"@synergy1.parc.xerox.com> > I can see the POV of people who prefer a single object or nested objects > as well though. This is going to be a tricky issue to resolve. If there > are no utterly convincing arguments for one approach or the other we > could take it to a vote? I tend to prefer protracted formal discussion till the pros and cons force a choice, a la Rittel/Webber "wicked problems". See http://www.poppendieck.com/wicked.htm. Bill From grisha at modpython.org Wed Oct 29 20:14:31 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Oct 29 20:14:37 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: <21294262503873@dserver.cycla.com> References: <21294262503873@dserver.cycla.com> Message-ID: <20031029195318.C82536@onyx.ispol.com> Let me argue the single request point with some specifics. IMO dual objects create a semantics mess, here is a couple of examples: o The point that I already brought up that reading from one object and writing to another is unintuitive and misleading. o Where does the connection information such as remote host, the raw socket, etc information belong, request or response? o Mod_python (or httpd rather) allows for cleanups to be registered, to run after the request is finished being processed. Again - where would a clean up fit in, at the end of a _request_ or at the end of a _response_? (and when _does_ a request really end?) o What about server information (document root, etc)? o If there exists such a thing as a subrequest or internal redirect, then in httpd's single object framework you can access the previous and next request objects via req.prev or req.next. With two objects, it would be something like response.subreq and response.subreq.resp, and to dig one level deeper (req.next.next in single object model), it would be response.subreq.resp.subreq.resp Or if I am within a subrequest, how can I get at the parent (req.prev)? - you see my point, I hope. 6. When processing is aborted, which could happen while the request is being read or while the response is being written - the logic should not be duplicated in two different objects. These are a few problems that I can think of with the dual object model, yet so far I haven't seen anything seriously convincing in advocacy of the dual object model :-) Grisha From amk at amk.ca Wed Oct 29 21:48:07 2003 From: amk at amk.ca (A.M. Kuchling) Date: Wed Oct 29 21:47:16 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: <7DDEF574-0A83-11D8-B82C-0003931BF218@amk.ca> On Wednesday, October 29, 2003, at 04:29 PM, Ian Bicking wrote: > Quixote's PTL has some stuff related to this as well (at least related > to quoting), but I don't remember much about it. http://www.mems-exchange.org/software/quixote/doc/PTL.html is the relevant documentation. Basically, the 'htmltext' data type behaves like a string. In operations involving both htmltext and regular strings, the regular string is coerced to htmltext; coercing a string to htmltext involves quoting HTML/XML special characters. For example: >>> from quixote import html >>> html.htmltext('abc') >>> h = html.htmltext >>> h('%s') % 'Magic chars: <, >, &' Magic chars: <, >, &'> >>> h('abc') + '&' If a templating package uses htmltext for portions of the template that were known to be trusted, then you don't have to remember to pass untrusted data from the browser through cgi,escape() or some equivalent; the coercion handles it for you, thus closing one source of security holes. Quixote's PTL then layers some compiler magic on top of this so you don't have htmltext() constructors all over the place, but you don't need to buy into PTL to use htmltext. Adding it to the stdlib might not be a bad idea. --amk From grisha at modpython.org Wed Oct 29 22:46:31 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Oct 29 22:46:37 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com> References: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com> Message-ID: <20031029224104.O82536@onyx.ispol.com> On Wed, 29 Oct 2003, Bill Janssen wrote: > > Can you describe what HTMLgen does? >>> from HTMLgen import * >>> ul = UL(["blah", "blah"]) >>> ul.append(H(1, "bleh")) >>> print ul
  • blah
  • blah
  • bleh

>>> From cs1spw at bath.ac.uk Wed Oct 29 23:01:51 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Wed Oct 29 23:02:00 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031029224104.O82536@onyx.ispol.com> References: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com> <20031029224104.O82536@onyx.ispol.com> Message-ID: <3FA08D2F.6040701@bath.ac.uk> Gregory (Grisha) Trubetskoy wrote: >>Can you describe what HTMLgen does? > >>>>from HTMLgen import * >>>>ul = UL(["blah", "blah"]) >>>>ul.append(H(1, "bleh")) >>>>print ul > >
    >
  • blah >
  • blah >
  • bleh

    >
A big problem here is one of style. I prefer my HTML to be lower case with explicit end tags (even when optional), and often work in XHTML where end tags are required. I also like my lists to have their
  • s indented with 2 spaces. The point I'm trying to make is that different people have different preferences for HTML, and there is no one correct way of writing it. This is why I'm opposed to HTML generation tools in the standard library - there are simply too many styles. HTML generation tools already exist outside the standard library in abundance and I see no pressing need for the default Python install to ship with one that has been chosen over all of the others. If there's an obvious demand from Python's user base for an HTML generation system in the standard library then by all means there should be one, but I don't see any reason to include one without good reason when there is no obviously "correct" way of going about it. -- Simon Willison Web development weblog: http://simon.incutio.com/ From ianb at colorstudy.com Wed Oct 29 22:52:52 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Oct 29 23:03:16 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com> Message-ID: <892DED96-0A8C-11D8-ABB3-000393C2D67E@colorstudy.com> On Wednesday, October 29, 2003, at 07:04 PM, Bill Janssen wrote: > Can you describe what HTMLgen does? http://starship.python.net/crew/friedrich/HTMLgen/html/main.html But the core portion is really about creating HTML, something along the lines of: HTML(HEAD(TITLE('my page')), BODY(H1('my page'), IMG(src="/mypicture.jpg" width=100, height=100), ...) With the output -- either directly or through str() -- being corresponding HTML. There's several similar systems, with slight differences. I used a class with magic attributes, like html.br(), for one system. Someone else did something like BODY(bgcolor="#aaaaaa")[H1('title'), P()['some content']], and some other variations exist. HTMLgen also includes some aspects that are more like templating, where you define the structure for an entire page. But in its more basic form it's often useful for creating valid HTML snippets inside Python code. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From davidf at sjsoft.com Wed Oct 29 23:31:08 2003 From: davidf at sjsoft.com (David Fraser) Date: Wed Oct 29 23:31:14 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: <20031029195318.C82536@onyx.ispol.com> References: <21294262503873@dserver.cycla.com> <20031029195318.C82536@onyx.ispol.com> Message-ID: <3FA0940C.2080301@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >Let me argue the single request point with some specifics. > >IMO dual objects create a semantics mess, here is a couple of examples: > >o The point that I already brought up that reading from one object and >writing to another is unintuitive and misleading. > >o Where does the connection information such as remote host, the raw >socket, etc information belong, request or response? > >o Mod_python (or httpd rather) allows for cleanups to be registered, to >run after the request is finished being processed. Again - where would a >clean up fit in, at the end of a _request_ or at the end of a _response_? >(and when _does_ a request really end?) > >o What about server information (document root, etc)? > >o If there exists such a thing as a subrequest or internal redirect, then >in httpd's single object framework you can access the previous and next >request objects via req.prev or req.next. With two objects, it would be >something like response.subreq and response.subreq.resp, and to dig one >level deeper (req.next.next in single object model), it would be >response.subreq.resp.subreq.resp > >Or if I am within a subrequest, how can I get at the parent (req.prev)? > - you see my point, I hope. > >6. When processing is aborted, which could happen while the request is >being read or while the response is being written - the logic should not >be duplicated in two different objects. > >These are a few problems that I can think of with the dual object model, >yet so far I haven't seen anything seriously convincing in advocacy of the >dual object model :-) > >Grisha > > Great explanation, Grisha. A lot of the arguments for the dual object model are about what you can do with a separate object. But these seem to me to miss the point .... you can create your own "response"-type class that holds the *value* of a response, and as many instances of it per request as you want to. But the actual Web API response object is for *writing* the response back to the client. You can only write one response back per request, so it makes sense for them to be the same object. (The only extension would be to filter what is being written, but this is a separate issue). David From ianb at colorstudy.com Thu Oct 30 01:00:01 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 30 01:00:44 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <3FA08D2F.6040701@bath.ac.uk> Message-ID: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com> On Wednesday, October 29, 2003, at 10:01 PM, Simon Willison wrote: > A big problem here is one of style. I prefer my HTML to be lower case > with explicit end tags (even when optional), and often work in XHTML > where end tags are required. I also like my lists to have their
  • s > indented with 2 spaces. HTMLgen is kind of old and predates XHTML. Any newer system would create XHTML and use lower-case tags. As far as indentation, well, the HTML isn't intended to be terribly readable from these systems. The point is to make the source readable. (And you actually could make the HTML well indented using these systems, but it's usually not that important) > The point I'm trying to make is that different people have different > preferences for HTML, and there is no one correct way of writing it. > This is why I'm opposed to HTML generation tools in the standard > library - there are simply too many styles. HTML generation tools > already exist outside the standard library in abundance and I see no > pressing need for the default Python install to ship with one that has > been chosen over all of the others. I think you probably have more opinion about HTML than many Python programmers. > If there's an obvious demand from Python's user base for an HTML > generation system in the standard library then by all means there > should be one, but I don't see any reason to include one without good > reason when there is no obviously "correct" way of going about it. If we were talking about a templating system, then yes, way too much personal preference there, but this isn't really a templating system. While not everyone will want to use this, the actual variations (despite frequent reimplementation) are not that great. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From aquarius-lists at kryogenix.org Thu Oct 30 02:46:21 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Thu Oct 30 02:44:22 2003 Subject: [Web-SIG] htmlgen References: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com> Message-ID: Ian Bicking spoo'd forth: > On Wednesday, October 29, 2003, at 10:01 PM, Simon Willison wrote: >> A big problem here is one of style. I prefer my HTML to be lower case >> with explicit end tags (even when optional), and often work in XHTML >> where end tags are required. I also like my lists to have their
  • s >> indented with 2 spaces. > > HTMLgen is kind of old and predates XHTML. Any newer system would > create XHTML and use lower-case tags. Without wishing to make life more complex for everything, it should be able to do HTML 4.01 as well; there are still problems with XHTML (by which I mean which content-type it's served as -- serving it as xml doesn't work in all browsers and serving it as html means that browsers treat it as tag soup), so I'm still using 4.01 Strict for most projects. sil -- Soon -- as it measured time -- it would have work to do once again. Thousands upon thousands of worlds. -- "Fallen Star", Simon Clay From grisha at modpython.org Thu Oct 30 11:33:18 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Thu Oct 30 11:33:22 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com> References: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com> Message-ID: <20031030112919.J97494@onyx.ispol.com> On Thu, 30 Oct 2003, Ian Bicking wrote: > > should be one, but I don't see any reason to include one without good > > reason when there is no obviously "correct" way of going about it. > > If we were talking about a templating system, then yes, way too much > personal preference there, but this isn't really a templating system. HTMLgen has a DocumentTemplate thing which is a bare bones templating system allowing for substitution in a text file. I think something primitive of this sort and perhaps implemented based on this: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330 (which can probably be even further optimized) would be nice to have in stdlib. Grisha From ianb at colorstudy.com Thu Oct 30 11:44:52 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 30 11:44:59 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031030112919.J97494@onyx.ispol.com> Message-ID: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> On Thursday, October 30, 2003, at 10:33 AM, Gregory (Grisha) Trubetskoy wrote: > HTMLgen has a DocumentTemplate thing which is a bare bones templating > system allowing for substitution in a text file. I think something > primitive of this sort and perhaps implemented based on this: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330 > > (which can probably be even further optimized) > > would be nice to have in stdlib. A templating system in its most naive form is just a kind of string substitution. If that's the kind of thing we're looking for, then perhaps -- but it has to be usefully better than %. (Though % would be more useful if it had other formatting options, like %h does HTML quoting, or %u does URL quoting... but where would it stop?) There's a PEP out there for $ string substitution, but it's static substitution (i.e., it always fills from locals()). Guido just mentioned recently on python-dev that he didn't want to improve % (specifically a request that "%{var}" be equivalent to "%(var)s") because he wanted to leave room for a better solution. What better solution? I don't know... I think it has to be something both elegant and useful, minimal and flexible. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From amk at amk.ca Thu Oct 30 11:53:16 2003 From: amk at amk.ca (amk@amk.ca) Date: Thu Oct 30 11:53:24 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <03Oct29.170937pst."58611"@synergy1.parc.xerox.com> References: <3FA05ABA.5050909@bath.ac.uk> <03Oct29.170937pst."58611"@synergy1.parc.xerox.com> Message-ID: <20031030165316.GA12422@rogue.amk.ca> On Wed, Oct 29, 2003 at 05:09:34PM -0800, Bill Janssen wrote: > I tend to prefer protracted formal discussion till the pros and cons > force a choice, a la Rittel/Webber "wicked problems". See > http://www.poppendieck.com/wicked.htm. I doubt this is such a problem, though; it doesn't really *matter* if there's one object or two, and neither side has any overwhelming arguments on the point, so ultimately it'll come down to taste. --amk From randyp at cycla.com Thu Oct 30 12:02:10 2003 From: randyp at cycla.com (Randy Pearson) Date: Thu Oct 30 12:02:43 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <3FA05ABA.5050909@bath.ac.uk> Message-ID: <17011139006433@dserver.cycla.com> > ... the big > difference between the two is that request should be read only while > response can have its state altered.... If request is read-only, how would you create unit tests for other components? A testing harness would need the ability to instantiate and alter pseudo requests outside of the HTTP server context. I do agree that, from the response's point-of-view, the request is immutable. -- Randy From grisha at modpython.org Thu Oct 30 12:05:38 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Thu Oct 30 12:06:03 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> References: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> Message-ID: <20031030120232.T98038@onyx.ispol.com> On Thu, 30 Oct 2003, Ian Bicking wrote: > substitution (i.e., it always fills from locals()). Guido just > mentioned recently on python-dev that he didn't want to improve % > (specifically a request that "%{var}" be equivalent to "%(var)s") > because he wanted to leave room for a better solution. What better > solution? I don't know... I think it has to be something both elegant > and useful, minimal and flexible. This is along the lines of what I think. Another thing with %() is that if the dictionary doesn't have a corresponding value you get key error as opposed to leaving it as is or defaulting to nothing. I might actually take the time to put something together, then we can ponder on whether it's worth including. Grisha From davidf at sjsoft.com Thu Oct 30 12:20:37 2003 From: davidf at sjsoft.com (David Fraser) Date: Thu Oct 30 12:21:14 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031030120232.T98038@onyx.ispol.com> References: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> <20031030120232.T98038@onyx.ispol.com> Message-ID: <3FA14865.7090902@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >On Thu, 30 Oct 2003, Ian Bicking wrote: > > > >>substitution (i.e., it always fills from locals()). Guido just >>mentioned recently on python-dev that he didn't want to improve % >>(specifically a request that "%{var}" be equivalent to "%(var)s") >>because he wanted to leave room for a better solution. What better >>solution? I don't know... I think it has to be something both elegant >>and useful, minimal and flexible. >> >> > >This is along the lines of what I think. Another thing with %() is that if >the dictionary doesn't have a corresponding value you get key error as >opposed to leaving it as is or defaulting to nothing. > >I might actually take the time to put something together, then we can >ponder on whether it's worth including. > >Grisha > > I'm not sure about how useful this kind of variable substitution would be for html ... any examples? David From ianb at colorstudy.com Thu Oct 30 12:37:27 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 30 12:37:33 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <17011139006433@dserver.cycla.com> Message-ID: On Thursday, October 30, 2003, at 11:02 AM, Randy Pearson wrote: >> ... the big >> difference between the two is that request should be read only while >> response can have its state altered.... > > If request is read-only, how would you create unit tests for other > components? A testing harness would need the ability to instantiate and > alter pseudo requests outside of the HTTP server context. You'd be able to create artificial requests, and copy requests with changes. Immutable objects usually have to have better support for these sorts of things for just this reason. So maybe you'd have something like: # Ignoring some details here... vars = request.vars vars.update({'action': 'delete'}) forward(request.clone(path='/target/delete', variables = vars)) Or: req = HTTPRequest(variables={}, method='GET', ...) While perhaps with CGI you'd use: req = HTTPRequest.fromEnvironment() (.fromCGI()? just .cgi()?) Anyway, I think there's compatibility problems with this, but if I was doing it from scratch I might do this. (Immutability would be a little soft, though -- you could, for instance, set the response for the request after the object was created, but you could change the response one it had been set) -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Thu Oct 30 12:46:05 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 30 12:46:12 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <3FA14865.7090902@sjsoft.com> Message-ID: On Thursday, October 30, 2003, at 11:20 AM, David Fraser wrote: > I'm not sure about how useful this kind of variable substitution would > be for html ... any examples? defaults = {'username': req.cookie('username', '')} defaults.update(req.fields) if request.fields.get('username'): defaults['message'] = "Login incorrect
    " defaults['action'] = '/loginform' form = '''
    %(message)s Username:
    Password:
    '''.substitute(defaults) ## Using something HTMLgen-ish: defaults = {'username': req.cookie('username', '')} defaults.update(req.fields) if request.fields.get('username'): defaults['message'] = html.b("Login incorrect") + html.br() defaults['action'] = '/loginform' form = html.form(action=defaults['action'], method="POST")( defaults.get('message'), 'Username: ', html.input(type="text", name="username", value=defaults.get('username')), html.br(), 'Password: ', html.input(type="password", name="password", value=defaults.get('password')), html.input(type="submit")) -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From grisha at modpython.org Thu Oct 30 12:54:09 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Thu Oct 30 12:54:13 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <3FA14865.7090902@sjsoft.com> References: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> <20031030120232.T98038@onyx.ispol.com> <3FA14865.7090902@sjsoft.com> Message-ID: <20031030124343.Q98038@onyx.ispol.com> On Thu, 30 Oct 2003, David Fraser wrote: > Gregory (Grisha) Trubetskoy wrote: > > >This is along the lines of what I think. Another thing with %() is that if > >the dictionary doesn't have a corresponding value you get key error as > >opposed to leaving it as is or defaulting to nothing. > > > I'm not sure about how useful this kind of variable substitution would > be for html ... any examples? It comes in handy in various HTML formatting, e.g. let's say we have a menu, and you want one item highlighted: HTML = """
    Home
    Products
    About
    """ To highlight home you'd have to do something like: HTML % {'home' : 'class="highlighted"', 'prod':'', 'about':''} But it's nice to not have to list every menu option (less typing, and more importantly, you can change the template without having to fix the code), something functionally equivalent to: HTML % {'home' : 'class="highlighted"'} (this would raise key error) Grisha From ianb at colorstudy.com Thu Oct 30 13:11:08 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Oct 30 13:11:14 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031030124343.Q98038@onyx.ispol.com> Message-ID: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com> On Thursday, October 30, 2003, at 11:54 AM, Gregory (Grisha) Trubetskoy wrote: > It comes in handy in various HTML formatting, e.g. let's say we have a > menu, and you want one item highlighted: > > HTML = """ > Home
    > Products
    > About
    > """ > > To highlight home you'd have to do something like: > > HTML % {'home' : 'class="highlighted"', 'prod':'', 'about':''} > > But it's nice to not have to list every menu option (less typing, and > more importantly, you can change the template without having to fix the > code), something functionally equivalent to: > > HTML % {'home' : 'class="highlighted"'} > > (this would raise key error) This would solve this particular problem: class EmptyStringDict(dict): def __getitem__(self, item): try: return dict.__getitem__(self, item) except KeyError: return '' You might add a test for None as well, and replace None with '' (which is what I always want in these sorts of situations). A more structured description can work even better, though. Something like: classes = {'home': 'highlighted'} html( html.a(href="home", class_=classes.get('home'))('Home'), html.br(), html.a(href="products", class_=classes.get('products'))('Produccts'), html.br(), html.a(href="about", class_=classes.get('about'))('About'), html.br(), ) In this example, any attribute with a value None will simply be excluded. (Perhaps there should also be a way to indicate an attribute with no value, like "checked" -- I've used None for that and a special object for exclude before, or that could be reversed) -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From grisha at modpython.org Thu Oct 30 14:09:29 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Thu Oct 30 14:09:33 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com> References: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com> Message-ID: <20031030140302.M98038@onyx.ispol.com> On Thu, 30 Oct 2003, Ian Bicking wrote: > This would solve this particular problem: > > class EmptyStringDict(dict): > def __getitem__(self, item): > try: > return dict.__getitem__(self, item) > except KeyError: > return '' Neat trick! Here is an even more generic version: class DefaultDict(dict): def __init__(self, init={}, default=""): self.default = default dict.__init__(self, init) def __getitem__(self, item): try: return dict.__getitem__(self, item) except KeyError: return self.default Now I can do: >>> "Hello %(title)s %(name)s, how are you?" % DefaultDict({'title' : 'Mr.', 'name' : 'Smith'}) 'Hello Mr. Smith, how are you?' >>> >>> "Hello %(title)s %(name)s, how are you?" % DefaultDict({'name' : 'Smith'}) 'Hello Smith, how are you?' >>> Grisha From amk at amk.ca Thu Oct 30 14:27:18 2003 From: amk at amk.ca (amk@amk.ca) Date: Thu Oct 30 14:27:28 2003 Subject: [Web-SIG] HTML parsing: anyone use formatter? Message-ID: <20031030192718.GA13220@rogue.amk.ca> [Crossposted to python-dev, web-sig, and xml-sig. Followups to web-sig@python.org, please.] I'm working on bringing htmllib.py up to HTML 4.01 by adding handlers for all the missing elements. I've currently been adding just empty methods to the HTMLParser class, but the existing methods actually help render the HTML by calling methods on a Formatter object. For example, the definitions for the H1 element look like this: def start_h1(self, attrs): self.formatter.end_paragraph(1) self.formatter.push_font(('h1', 0, 1, 0)) def end_h1(self): self.formatter.end_paragraph(1) self.formatter.pop_font() Question: should I continue supporting this in new methods? This can only go so far; a tag such as or is easy for me to handle, but handling
    or or would require greatly expanding the Formatter class's repertoire. I suppose the more general question is, does anyone use Python's formatter module? Do we want to keep it around, or should htmllib be pushed toward doing just HTML parsing? formatter.py is a long way from being able to handle modern web pages and it would be a lot of work to build a decent renderer. --amk From barry at python.org Thu Oct 30 16:01:00 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 30 16:01:08 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031030140302.M98038@onyx.ispol.com> References: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com> <20031030140302.M98038@onyx.ispol.com> Message-ID: <1067547658.5295.165.camel@anthem> On Thu, 2003-10-30 at 14:09, Gregory (Grisha) Trubetskoy wrote: > Neat trick! Here is an even more generic version: http://mail.python.org/pipermail/python-dev/2003-October/039369.html :) -Barry From gward at python.net Thu Oct 30 21:51:17 2003 From: gward at python.net (Greg Ward) Date: Thu Oct 30 21:51:20 2003 Subject: [Web-SIG] Random thoughts Message-ID: <20031031025116.GA7401@cthulhu.gerg.ca> I'm just catching up on the archive for this list. Some random thoughts: * a new package, 'web', is definitely in order. "from web import cookies", "from web import http" just sounds right. (That contradicts Greg Stein's proposal in PEP 267, but I assume he's not strongly wedded to that.) * I'm all for stealing good ideas from other sources (eg. PHP, the Java servlet API), but I'm not keen on the exact semantics Simon has mentioned from PHP. In particular, I hope no one is seriously considering global dictionaries called COOKIES or GET. Clearly, the Right Way is: request.get_cookie("session_id") request.get_form_var("name") (spelling and terminology yet to be decided; eg. I could live with it getcookie() and getformvar() ;-) An aside: in the query string ?name=Greg&colour=blue&age=31 what exactly are 'name', 'colour', and 'age'? Are they form variables? query variables? parameters? fields? Is this specified anywhere? (In Quixote's HTTPRequest class, they're called "form variables" -- hence get_form_var() -- but I've never been terribly thrilled with that terminology. At the moment, I like "query variables".) * on the fields-with-multiple-values issue: I'm with Steve Holden and David Fraser. (I.e., the programmer should know which query variables expect multiple values, and the request object should always return a list for those variables.) cgi.py is Dead Wrong here; the type of an object should be predictable from the code, not dependent on the HTTP client! (But I disagree with Steve on handling multiple values for a query variable that expects a single value: in that case, IMHO sloppy should be the default, and you should get the first value. I don't want to guard every get_form_var() call with a "except KeyError" to avoid broken/malicious clients crashing the script!) (I've mentally toyed with funky types like Barry suggested, but I think that sort of context-sensitive trickery is unPythonic. Just because you can do something doesn't mean you should.) Perhaps something like Quixote's form framework belongs in the standard library -- the Widget classes solve a lot of problems with handling HTML forms. There's some out-of-date documentation here: http://www.mems-exchange.org/software/quixote/doc/widgets.html * the "PATHINFO" variable is not CGI-specific. Zope and Quixote are both utterly dependent on PATHINFO, and they're not tied to CGI. (There are strong connections, but you can run a Quixote app with mod_python, Medusa, or Twisted -- no CGI there!) Also, the Java servlet API has a getPathInfo() method, and Java servlets are most certainly not CGI scripts. "pathinfo" is just the part of the URL that the HTTP server doesn't look at. ;-) * I oppose Simon Willison's practice of using the same variable in the "GET" and "POST" part of a request, but I will defend to the death his right to do so. (But not in Quixote, where a narrower definition of what is Right, Good, and Truthfull prevails.) Enough for now. I still have lots of archive to read. ;-( Greg -- Greg Ward http://www.gerg.ca/ And now for something completely different. From janssen at parc.com Thu Oct 30 22:46:21 2003 From: janssen at parc.com (Bill Janssen) Date: Thu Oct 30 22:46:51 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: Your message of "Thu, 30 Oct 2003 18:51:17 PST." <20031031025116.GA7401@cthulhu.gerg.ca> Message-ID: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> > * I oppose Simon Willison's practice of using the same variable > in the "GET" and "POST" part of a request, but I will defend to the > death his right to do so. (But not in Quixote, where a narrower > definition of what is Right, Good, and Truthfull prevails.) I don't get it. Any particular request only has one method, not two: "GET" and "POST". Are you talking about for some reason special-casing these two methods in the Request class? I think it makes more sense to do things generically: request.path (e.g., '/foo/bar') request.method (e.g., "GET") request.part (e.g., "#bletch", perhaps without the #) request.headers request.parameters (either the query parms, or the multipart/form-data values) request.response() => returns a Response object tied to this request response.error(code, message) Sends back an error response.reply(htmltext) Sends back a message response.open(ContentType="text/html", code=200) => file object to write to fp.write(...) fp.close() Sends back the response response.redirect(URL) Sends back redirect to the URL Bill From gstein at lyra.org Fri Oct 31 01:26:20 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 31 01:26:42 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031031025116.GA7401@cthulhu.gerg.ca>; from gward@python.net on Thu, Oct 30, 2003 at 09:51:17PM -0500 References: <20031031025116.GA7401@cthulhu.gerg.ca> Message-ID: <20031030222620.B1901@lyra.org> On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote: > I'm just catching up on the archive for this list. Some random > thoughts: > > * a new package, 'web', is definitely in order. > "from web import cookies", "from web import http" just sounds right. > (That contradicts Greg Stein's proposal in PEP 267, but I assume > he's not strongly wedded to that.) Correct. The name isn't the important part of the PEP. That said, "web" is a big misnomer for [package containing] an http client library, but that's a bikeshed of an entirely different color :-) I'm more interested in a way of constructing a connection to a server, where that connection has some various combination of features: * SSL * Basic/Digest/??? authentication * WebDAV * Proxy * Proxy auth The current model for the client side uses two, distinct classes to deal with the SSL feature. I have an entirely separate module for the WebDAV stuff. And authentication isn't even handled in the core http classes, but over in urllib(2). Same for proxy support. PEP 267 is about a refactoring to bring these features under one cover, and to move some features from urllib down into the basic connection classes to be used by any http client. (and yah, urllib would still expose some concepts since ftp still needs some authn, but it could just defer to the "new" httplib authn facilities) Cheers, -g -- Greg Stein, http://www.lyra.org/ From ianb at colorstudy.com Fri Oct 31 01:28:01 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 31 01:28:32 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> References: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> Message-ID: <607181AE-0B6B-11D8-88D0-000393C2D67E@colorstudy.com> On Oct 30, 2003, at 9:46 PM, Bill Janssen wrote: >> * I oppose Simon Willison's practice of using the same variable >> in the "GET" and "POST" part of a request, but I will defend to >> the >> death his right to do so. (But not in Quixote, where a narrower >> definition of what is Right, Good, and Truthfull prevails.) > > I don't get it. Any particular request only has one method, not two: > "GET" and "POST". Are you talking about for some reason > special-casing these two methods in the Request class? I think it > makes more sense to do things generically: > > request.path (e.g., '/foo/bar') > request.method (e.g., "GET") > request.part (e.g., "#bletch", perhaps without the #) No real way to access this. > request.headers > request.parameters (either the query parms, or the > multipart/form-data values) I think fields is better name -- common, and a bit shorter (since it's the most used part of the request) > request.response() => returns a Response object tied to this request > > response.error(code, message) Sends back an error Message, like response.error(404, "Not Found"), or response.error(403, "Administrator permission is required to access this resource") > response.reply(htmltext) Sends back a message or setBody perhaps -- reply implies that the text will be immediately (irrevocably?) sent. Maybe that's good, or maybe a separate commit/close is better. > response.open(ContentType="text/html", code=200) => file object to > write to I'm not sure I understand the purpose of the keyword arguments. > fp.write(...) > fp.close() Sends back the response > response.redirect(URL) Sends back redirect to the URL -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From anthony at interlink.com.au Fri Oct 31 01:28:34 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Oct 31 01:31:34 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031030222620.B1901@lyra.org> Message-ID: <200310310628.h9V6SYdw023795@localhost.localdomain> >>> Greg Stein wrote > On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote: > > I'm just catching up on the archive for this list. Some random > > thoughts: > > > > * a new package, 'web', is definitely in order. > > "from web import cookies", "from web import http" just sounds right. > > (That contradicts Greg Stein's proposal in PEP 267, but I assume > > he's not strongly wedded to that.) > > Correct. The name isn't the important part of the PEP. That said, "web" is > a big misnomer for [package containing] an http client library, but that's > a bikeshed of an entirely different color :-) Wouldn't it be better to have something more like: web/ client.py cgi.py server.py .. and the like? web.http seems so very redundant, web.client seems more meaningful. -- Anthony Baxter It's never too late to have a happy childhood. From gstein at lyra.org Fri Oct 31 01:33:29 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 31 01:33:49 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>; from ianb@colorstudy.com on Thu, Oct 30, 2003 at 10:44:52AM -0600 References: <20031030112919.J97494@onyx.ispol.com> <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> Message-ID: <20031030223329.C1901@lyra.org> On Thu, Oct 30, 2003 at 10:44:52AM -0600, Ian Bicking wrote: > On Thursday, October 30, 2003, at 10:33 AM, Gregory (Grisha) Trubetskoy > wrote: > > HTMLgen has a DocumentTemplate thing which is a bare bones templating > > system allowing for substitution in a text file. I think something > > primitive of this sort and perhaps implemented based on this: > > > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330 > > > > (which can probably be even further optimized) > > > > would be nice to have in stdlib. > > A templating system in its most naive form is just a kind of string > substitution. If that's the kind of thing we're looking for, then > perhaps -- but it has to be usefully better than %. (Though % would be Right. Simple interpolation is rarely enough. The features that I found to be useful in a templating system: * interpolation * conditionals * iteration * structured objects (i.e. something like: foo.bar) * including sub-templates I've also found that *restricting* the functionality to just this limited set helps to provide clarity and avoid complex abuses of templates. I look at the task simply as "rendering data" and prefer a simple syntax and functionality to match that. Cheers, -g p.s. yah yah, this is an implicit pimping of my ezt module :-) http://svn.webdav.org/repos/projects/ezt/trunk/ezt.py -- Greg Stein, http://www.lyra.org/ From amk at amk.ca Fri Oct 31 06:41:09 2003 From: amk at amk.ca (amk@amk.ca) Date: Fri Oct 31 06:41:32 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> References: <20031031025116.GA7401@cthulhu.gerg.ca> <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> Message-ID: <20031031114109.GA16773@rogue.amk.ca> On Thu, Oct 30, 2003 at 07:46:21PM -0800, Bill Janssen wrote: > I don't get it. Any particular request only has one method, not two: > "GET" and "POST". Are you talking about for some reason > special-casing these two methods in the Request class? I think it Simon wants to differentiate between where a variable comes from; http://example/?password=foo is treated differently than when the 'password' variable is specified in the body of a POST. --amk From neel at mediapulse.com Fri Oct 31 09:43:58 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Fri Oct 31 09:44:06 2003 Subject: [Web-SIG] htmlgen Message-ID: > -----Original Message----- > From: Greg Stein [mailto:gstein@lyra.org] > Sent: Friday, October 31, 2003 1:33 AM > To: web-sig@python.org > Subject: Re: [Web-SIG] htmlgen > > p.s. yah yah, this is an implicit pimping of my ezt module :-) > http://svn.webdav.org/repos/projects/ezt/trunk/ezt.py > I can +1 ezt having used it before; it's the exact type of lightweight template system that should be part of stdlib. It cover the basics and you can extend it form there if you need some more. It's also worth note that it's not in anyway tied to HTML (I use it for email templates mostly). I'd recommend to all here to take a few moments and play with it, then give feedback on any changes you think should be made. No need to solve this from scratch if we don't have to. Mike From amk at amk.ca Fri Oct 31 10:09:22 2003 From: amk at amk.ca (amk@amk.ca) Date: Fri Oct 31 10:09:45 2003 Subject: [Web-SIG] htmlgen In-Reply-To: References: Message-ID: <20031031150922.GA17539@rogue.amk.ca> On Fri, Oct 31, 2003 at 09:43:58AM -0500, Michael C. Neel wrote: > I'd recommend to all here to take a few moments and play with it, then > give feedback on any changes you think should be made. No need to solve > this from scratch if we don't have to. ... well, except for the other 12 templating solutions that already exist. ezt looks very cute, but it's clear that no one has the same requirements for templating. Let's just walk away from trying to choose one. --amk From barry at python.org Fri Oct 31 10:16:02 2003 From: barry at python.org (Barry Warsaw) Date: Fri Oct 31 10:16:08 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031031025116.GA7401@cthulhu.gerg.ca> References: <20031031025116.GA7401@cthulhu.gerg.ca> Message-ID: <1067613362.5173.8.camel@anthem> On Thu, 2003-10-30 at 21:51, Greg Ward wrote: > (But I disagree with Steve on handling multiple values for a query > variable that expects a single value: in that case, IMHO sloppy > should be the default, and you should get the first value. I don't > want to guard every get_form_var() call with a "except KeyError" to > avoid broken/malicious clients crashing the script!) Agreed! I'd much rather test for None-ness or provide my own default. > (I've mentally toyed with funky types like Barry suggested, but I > think that sort of context-sensitive trickery is unPythonic. Just > because you can do something doesn't mean you should.) Greg's been reading my Oblique Strategies again. :) -Barry From davidf at sjsoft.com Fri Oct 31 10:42:52 2003 From: davidf at sjsoft.com (David Fraser) Date: Fri Oct 31 10:43:02 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031030223329.C1901@lyra.org> References: <20031030112919.J97494@onyx.ispol.com> <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> <20031030223329.C1901@lyra.org> Message-ID: <3FA282FC.8060406@sjsoft.com> Greg Stein wrote: >On Thu, Oct 30, 2003 at 10:44:52AM -0600, Ian Bicking wrote: > > >>On Thursday, October 30, 2003, at 10:33 AM, Gregory (Grisha) Trubetskoy >>wrote: >> >> >>>HTMLgen has a DocumentTemplate thing which is a bare bones templating >>>system allowing for substitution in a text file. I think something >>>primitive of this sort and perhaps implemented based on this: >>> >>>http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330 >>> >>>(which can probably be even further optimized) >>> >>>would be nice to have in stdlib. >>> >>> >>A templating system in its most naive form is just a kind of string >>substitution. If that's the kind of thing we're looking for, then >>perhaps -- but it has to be usefully better than %. (Though % would be >> >> > >Right. Simple interpolation is rarely enough. The features that I found to >be useful in a templating system: > >* interpolation >* conditionals >* iteration >* structured objects (i.e. something like: foo.bar) >* including sub-templates > >I've also found that *restricting* the functionality to just this limited >set helps to provide clarity and avoid complex abuses of templates. I look >at the task simply as "rendering data" and prefer a simple syntax and >functionality to match that. > >Cheers, >-g > >p.s. yah yah, this is an implicit pimping of my ezt module :-) > http://svn.webdav.org/repos/projects/ezt/trunk/ezt.py > > > What I've found really helpful in my jtoolkit framework is to allow anything to go inside a tag object (in between the start and end tags), including a string, another tag object, or a list of any of the above. The toolkit then expands any of the required items. pagelinks = [] for pagelinknum in range(1, len(pages)+1): pagelinktext = "Page %d" % pagelinknum if pagelinknum = currentpagenum: pagelinktext += " (current)" pagelinklink = '?page=%d' % pagelinknum pagelinks.append(widgets.Link(pagelinklink, pagelinktext)) pagelinks.append(' ') e.g. widgets.Page(title, contents=[widgets.Paragraph(pagelinks), restofcontents]) David From grisha at modpython.org Fri Oct 31 10:54:46 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 31 10:54:50 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031031025116.GA7401@cthulhu.gerg.ca> References: <20031031025116.GA7401@cthulhu.gerg.ca> Message-ID: <20031031094609.T12375@onyx.ispol.com> On Thu, 30 Oct 2003, Greg Ward wrote: > An aside: in the query string > > ?name=Greg&colour=blue&age=31 > > what exactly are 'name', 'colour', and 'age'? Short answer: "field names" Long answer: I cannot claim to be an absolute expert on the matter, but here is my best understanding: In ?name=Greg&colour=blue&age=31 "name=Greg&colour=blue&age=31" is called "searchpart", "query information" or simply "query" from RFC 1808 sec 2.1 "URL Syntactic Components": :///;?# - [snip] - "?" query ::= query information, as per Section 3.3 of RFC 1738 [2]. Then if we look at RFC 1738, it describes an HTTP URL specifically: An HTTP URL takes the form: http://>:/? Now, RFC 1866 (HTML) introduces the concept of a "form". Forms have a METHOD attribute which lets you specify how the form is to be submitted. When method is 'GET', the form will be submitted as "query information", described above. Since there are limits to what is allowed in a URL, the data has to be "url encoded", as described in 8.2.1 of RFC 1866: 2. The fields are listed in the order they appear in the document with the name separated from the value by `=' and the pairs separated from each other by `&'. [Note BTW that the order is specified] Therefore, 'name', 'colour', and 'age' are "field names", and 'Greg', 'blue', '31' are "field values". A more clever example would be: ?name=Greg%20Ward&colour=blue&age=31 Here, "Greg Ward" is a form field value, while "Greg%20Ward" is a random chunk of a URL query with no particular meaning, just as "0Ward&col". Here is the interesting part (RFC 1866 8.2.3): To process a form whose action URL is an HTTP URL and whose method is `POST', the user agent conducts an HTTP POST transaction using the action URI, and a message body of type `application/x-www-form- urlencoded' format as above. Note that it doesn't say that the action URI cannot contain a query, so based on this, I can have a form like this: References: <20031031025116.GA7401@cthulhu.gerg.ca> Message-ID: <20031031105512.G12375@onyx.ispol.com> On Thu, 30 Oct 2003, Greg Ward wrote: > * the "PATHINFO" variable is not CGI-specific. "the PATHINFO variable" only has meaning in a particular context, here is the CGI definition: http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt Section 6.1.6: The PATH_INFO metavariable specifies a path to be interpreted by the CGI script. It identifies the resource or sub-resource to be returned by the CGI script, and it is derived from the portion of the URI path following the script name but preceding any query data. > Also, the Java servlet API has a getPathInfo() method Yes, it does: http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/http/HttpServletRequest.html#getPathInfo() any extra path information associated with the URL the client sent when it made this request. The extra path information follows the servlet path but precedes the query string and will start with a "/" character". So this definition relies on the notion of a "servlet", which is OK since this is part of J2EE. Then they go on to say that it is "Same as the value of the CGI variable PATH_INFO", but it really isn't, "similar" would be a better word. I think I can live with a pathinfo that is "implementation specific", or if we were to define a "Python Enterprise Architecture" with our own definition of a servlet (or whatever), but for the Python standard library to try to define it *outside of any context* would be a mistake I think. Grisha From jjl at pobox.com Fri Oct 31 11:34:02 2003 From: jjl at pobox.com (John J Lee) Date: Fri Oct 31 11:34:10 2003 Subject: client-side [was: Re: [Web-SIG] Random thoughts] In-Reply-To: <20031030222620.B1901@lyra.org> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> Message-ID: On Thu, 30 Oct 2003, Greg Stein wrote: > On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote: > > I'm just catching up on the archive for this list. Some random > > thoughts: > > > > * a new package, 'web', is definitely in order. > > "from web import cookies", "from web import http" just sounds right. > > (That contradicts Greg Stein's proposal in PEP 267, but I assume > > he's not strongly wedded to that.) > > Correct. The name isn't the important part of the PEP. That said, "web" is > a big misnomer for [package containing] an http client library, but that's > a bikeshed of an entirely different color :-) He was talking about the server side! > I'm more interested in a way of constructing a connection to a server, > where that connection has some various combination of features: > > * SSL That's already down at the httplib level (and the socket level, of course). > * Basic/Digest/??? authentication That's naturally done at the urllib / urllib2 level, given the way it works. > * WebDAV I plead ignorance. > * Proxy > * Proxy auth Somebody has submitted a patch (515003) to shift this to a lower level than urllib2. I have no opinion as yet. > The current model for the client side uses two, distinct classes to deal > with the SSL feature. Sorry, which classes are they? > I have an entirely separate module for the WebDAV > stuff. How should it be integrated (if at all), in your opinion (assuming you want it in the standard library)? > And authentication isn't even handled in the core http classes, but > over in urllib(2). Same for proxy support. See above. > PEP 267 is about a refactoring to bring these features under one cover, Er, "Optimized Access to Module Namespaces"? Which PEP *did* you mean? I haven't seen it. John From jjl at pobox.com Fri Oct 31 11:35:24 2003 From: jjl at pobox.com (John J Lee) Date: Fri Oct 31 11:36:26 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <200310310628.h9V6SYdw023795@localhost.localdomain> References: <200310310628.h9V6SYdw023795@localhost.localdomain> Message-ID: On Fri, 31 Oct 2003, Anthony Baxter wrote: [...] > Wouldn't it be better to have something more like: > > web/ > client.py > cgi.py > server.py > > .. and the like? web.http seems so very redundant, web.client seems more > meaningful. Nobody has yet explained to me why we need a new module for client-side code. John From ianb at colorstudy.com Fri Oct 31 11:45:04 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 31 11:45:57 2003 Subject: client-side [was: Re: [Web-SIG] Random thoughts] In-Reply-To: References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> Message-ID: <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> On Oct 31, 2003, at 10:34 AM, John J Lee wrote: >> * WebDAV > > I plead ignorance. I don't think urllib2 and WebDAV will work very well together, though maybe... in the end, a WebDAV interface has to be a lot more complex than a URL-fetching interface. So even if WebDAV was built on urllib2, it would end up looking a lot different in the end. Though thinking about it... for the most part a WebDAV client could *use* urllib2. The most important things are just using different methods (PROPFIND, PUT, etc), and setting the body of the request -- these are probably already easy to do with urllib2. Dealing with multiple error responses, and some of the other error responses that WebDAV defines, may be more challenging urllib2 (or not, I don't know) -- you can do compound operations with WebDAV, and so there may be an error message associated with a specific subrequest. There's some sort of "multiple response" response code, but the actual responses are in the body of the response. urllib2 could just do nothing and pass all the information on to the WebDAV client and let it reinterpret the results. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Fri Oct 31 11:46:22 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 31 11:46:31 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: <200310310628.h9V6SYdw023795@localhost.localdomain> Message-ID: On Oct 31, 2003, at 10:35 AM, John J Lee wrote: > On Fri, 31 Oct 2003, Anthony Baxter wrote: > [...] >> Wouldn't it be better to have something more like: >> >> web/ >> client.py >> cgi.py >> server.py >> >> .. and the like? web.http seems so very redundant, web.client seems >> more >> meaningful. > > Nobody has yet explained to me why we need a new module for client-side > code. Nobody likes the name urllib2? That "2" is pretty icky... That's probably not a good enough justification, though. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From cs1spw at bath.ac.uk Fri Oct 31 11:49:16 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Fri Oct 31 11:49:21 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031031150922.GA17539@rogue.amk.ca> References: <20031031150922.GA17539@rogue.amk.ca> Message-ID: <3FA2928C.4080004@bath.ac.uk> amk@amk.ca wrote: >>I'd recommend to all here to take a few moments and play with it, then >>give feedback on any changes you think should be made. No need to solve >>this from scratch if we don't have to. > > ... well, except for the other 12 templating solutions that already exist. > > ezt looks very cute, but it's clear that no one has the same requirements > for templating. Let's just walk away from trying to choose one. +1. Everyone's templating style is different. At work, we just spent a couple of days implementing our own having looked at over a dozen existing systems because none of them quite matched our requirements. Templating is the kind of problem to which there is nostraight forward solution, and I see no benefit of including it in the standard library when so many template systems are already available that cover so many different styles. -- Simon Willison Web development weblog: http://simon.incutio.com/ From jjl at pobox.com Fri Oct 31 12:52:54 2003 From: jjl at pobox.com (John J Lee) Date: Fri Oct 31 12:53:00 2003 Subject: client-side [was: Re: [Web-SIG] Random thoughts] In-Reply-To: <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> Message-ID: On Fri, 31 Oct 2003, Ian Bicking wrote: > On Oct 31, 2003, at 10:34 AM, John J Lee wrote: > >> * WebDAV > > > > I plead ignorance. > [...info about WebDAV from Ian...] Sounds (I'm saying this with virtually no knowledge of the protocol, of course) like it would be best built on top of urllib2 rather than integrated with it. Do you agree, Greg S.? John From jjl at pobox.com Fri Oct 31 12:55:55 2003 From: jjl at pobox.com (John J Lee) Date: Fri Oct 31 12:56:06 2003 Subject: client-side [was: Re: [Web-SIG] Random thoughts] In-Reply-To: References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> Message-ID: On Fri, 31 Oct 2003, John J Lee wrote: [...] > best built on top of urllib2 rather than integrated with it. [...] Or entirely separate from it, of course... John From neel at mediapulse.com Fri Oct 31 13:22:03 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Fri Oct 31 13:22:07 2003 Subject: [Web-SIG] htmlgen Message-ID: > ... well, except for the other 12 templating solutions that > already exist. > > ezt looks very cute, but it's clear that no one has the same > requirements > for templating. Let's just walk away from trying to choose one. > Now that's a scary thought. A problem that is common to several domains is not addressed because there may be more than one way to address it? Yes there are several template options out there, but how many can be considered for inclusion? Albatross (one I personally find to be extremely useful) doesn't have much if any scope outside of html templates, so it wouldn't be a good candidate. PSP (python server pages) also have a limited scope. Going though the list and see which systems are good candidates, i.e. those that really just provide a good alternative to %()s, should produce a manageable sized list to consider. Also inclusion of a template system doesn't preclude the use of any other template systems, so I don't see the harm. Python is billed as "batteries included" so we should be making choosing python for the web more than just a syntax preference. Mike From t.vandervossen at fngtps.com Fri Oct 31 15:48:58 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Fri Oct 31 16:32:32 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <3FA2928C.4080004@bath.ac.uk> References: <20031031150922.GA17539@rogue.amk.ca> <3FA2928C.4080004@bath.ac.uk> Message-ID: <3FA2CABA.6010205@fngtps.com> Simon Willison wrote: > amk@amk.ca wrote: >>> I'd recommend to all here to take a few moments and play with it, then >>> give feedback on any changes you think should be made. No need to solve >>> this from scratch if we don't have to. >> >> ... well, except for the other 12 templating solutions that already >> exist. >> >> ezt looks very cute, but it's clear that no one has the same requirements >> for templating. Let's just walk away from trying to choose one. > > +1. Everyone's templating style is different. At work, we just spent a > couple of days implementing our own having looked at over a dozen > existing systems because none of them quite matched our requirements. > Templating is the kind of problem to which there is nostraight forward > solution, and I see no benefit of including it in the standard library > when so many template systems are already available that cover so many > different styles. +1. My company came to the same conclusion and also developed our own matching our requirements. Let's please drop the issue of templating, we will never find a solution fitting everyone's needs. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From janssen at parc.com Fri Oct 31 18:42:16 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Oct 31 18:42:44 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: Your message of "Fri, 31 Oct 2003 03:41:09 PST." <20031031114109.GA16773@rogue.amk.ca> Message-ID: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com> > Simon wants to differentiate between where a variable comes from; > http://example/?password=foo is treated differently than when > the 'password' variable is specified in the body of a POST. > > --amk That makes more sense, but I don't see the connection to GET and POST. Thanks. Bill From ianb at colorstudy.com Fri Oct 31 18:50:36 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Oct 31 18:50:42 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com> References: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com> Message-ID: <05FFB5B0-0BFD-11D8-B230-000393C2D67E@colorstudy.com> On Oct 31, 2003, at 5:42 PM, Bill Janssen wrote: >> Simon wants to differentiate between where a variable comes from; >> http://example/?password=foo is treated differently than when >> the 'password' variable is specified in the body of a POST. >> >> --amk > > That makes more sense, but I don't see the connection to GET and POST. > Thanks. A more accurate description would be "URL parameters" or "query parameters" instead of GET. Though POST variables really are POST variables (request body parameters, maybe, but that's kind of confusing). And if you have POST, it's a natural tendency to consider the "opposite" of POST as GET and call them GET variables. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From gstein at lyra.org Fri Oct 31 19:18:02 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 31 19:18:29 2003 Subject: client-side [was: Re: [Web-SIG] Random thoughts] In-Reply-To: ; from jjl@pobox.com on Fri, Oct 31, 2003 at 04:34:02PM +0000 References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> Message-ID: <20031031161802.C3462@lyra.org> On Fri, Oct 31, 2003 at 04:34:02PM +0000, John J Lee wrote: > On Thu, 30 Oct 2003, Greg Stein wrote: > > > On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote: > > > I'm just catching up on the archive for this list. Some random > > > thoughts: > > > > > > * a new package, 'web', is definitely in order. > > > "from web import cookies", "from web import http" just sounds right. > > > (That contradicts Greg Stein's proposal in PEP 267, but I assume > > > he's not strongly wedded to that.) NOTE: typo here. Greg Ward meant to say "PEP 268" (http://www.python.org/peps/pep-0268.html) > > Correct. The name isn't the important part of the PEP. That said, "web" is > > a big misnomer for [package containing] an http client library, but that's > > a bikeshed of an entirely different color :-) > > He was talking about the server side! No, Greg Ward was talking about an http client. Otherwise, he would not have mentioned PEP 268. > > I'm more interested in a way of constructing a connection to a server, > > where that connection has some various combination of features: > > > > * SSL > > That's already down at the httplib level (and the socket level, of > course). I know that (given that I wrote the current httplib :-). However, I maintain that the implementation uses an improper design. > > * Basic/Digest/??? authentication > > That's naturally done at the urllib / urllib2 level, given the way it > works. There is nothing "natural" about it. That is where it resides, but authentication is part of the HTTP specification and should be able to be used by anything attempting to interact at the HTTP level. HTTP is far more than "fetch the contents of this URL." My list was specifically intended to say: each of these items belongs in the core HTTP (client) service layer. Not urllib. > > * WebDAV > > I plead ignorance. RFC 2518 and RFC 3253. Essentially, WebDAV provides a way to write to your web server. It also provides for versioning support. And a lot of other stuff. WebDAV provides a lot of interesting features, layered on top of HTTP. Thus, any HTTP layer should also be able to provide DAV facilities. > > * Proxy > > * Proxy auth > > Somebody has submitted a patch (515003) to shift this to a lower level > than urllib2. I have no opinion as yet. Oh, geez. Again with the improper design model. Following in this lead, we'll end up with a combinatoric explosion of every feature combination ending up with its own class. /me goes to comment on that patch > > The current model for the client side uses two, distinct classes to deal > > with the SSL feature. > > Sorry, which classes are they? HTTPConnection and HTTPSConnection. (or HTTP and HTTPS for the backwards compat stuff). See above about combinatorics using this design model. > > I have an entirely separate module for the WebDAV stuff. > > How should it be integrated (if at all), in your opinion (assuming you > want it in the standard library)? See PEP 268. > > And authentication isn't even handled in the core http classes, but > > over in urllib(2). Same for proxy support. > > See above. See PEP 268 :-) > > PEP 267 is about a refactoring to bring these features under one cover, > > Er, "Optimized Access to Module Namespaces"? Which PEP *did* you mean? > I haven't seen it. Sorry, I just blindly repeated the number from Greg Ward's post. It really should be 268. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Oct 31 19:28:53 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 31 19:29:17 2003 Subject: [Web-SIG] Re: client-side In-Reply-To: ; from jjl@pobox.com on Fri, Oct 31, 2003 at 05:52:54PM +0000 References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> Message-ID: <20031031162853.D3462@lyra.org> On Fri, Oct 31, 2003 at 05:52:54PM +0000, John J Lee wrote: > On Fri, 31 Oct 2003, Ian Bicking wrote: > > On Oct 31, 2003, at 10:34 AM, John J Lee wrote: > > >> * WebDAV > > > > > > I plead ignorance. > > > [...info about WebDAV from Ian...] > > Sounds (I'm saying this with virtually no knowledge of the protocol, of > course) like it would be best built on top of urllib2 rather than > integrated with it. Do you agree, Greg S.? WebDAV belongs on top of httplib, not urllib. And... hey, what do you know! ... that is exactly how I implemented davlib.py many years ago. In fact, creating davlib.py was the impetus for rebuilding httplib into a connection-based client model rather than the old request-based model. urllib is about fetching content. That's about it. WebDAV was designed specifically for writing-to/managing your server remotely. Not to mention that "V" in its name, for versioning. Cheers, -g p.s. http://www.lyra.org/greg/python/ for info on davlib.py -- Greg Stein, http://www.lyra.org/ From grisha at modpython.org Fri Oct 31 19:30:31 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Fri Oct 31 19:32:07 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <05FFB5B0-0BFD-11D8-B230-000393C2D67E@colorstudy.com> References: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com> <05FFB5B0-0BFD-11D8-B230-000393C2D67E@colorstudy.com> Message-ID: <20031031192847.N16489@onyx.ispol.com> On Fri, 31 Oct 2003, Ian Bicking wrote: > On Oct 31, 2003, at 5:42 PM, Bill Janssen wrote: > >> Simon wants to differentiate between where a variable comes from; > >> http://example/?password=foo is treated differently than when > >> the 'password' variable is specified in the body of a POST. > >> > >> --amk > > > > That makes more sense, but I don't see the connection to GET and POST. > > Thanks. > > A more accurate description would be "URL parameters" or "query > parameters" instead of GET. Though POST variables really are POST > variables (request body parameters, maybe, but that's kind of > confusing). And if you have POST, it's a natural tendency to consider > the "opposite" of POST as GET and call them GET variables. And what's even more fun is when a GET variable is submitted via POST :-) Grisha From gstein at lyra.org Fri Oct 31 19:30:19 2003 From: gstein at lyra.org (Greg Stein) Date: Fri Oct 31 19:32:09 2003 Subject: client-side [was: Re: [Web-SIG] Random thoughts] In-Reply-To: <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>; from ianb@colorstudy.com on Fri, Oct 31, 2003 at 10:45:04AM -0600 References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> Message-ID: <20031031163019.E3462@lyra.org> Simple answer: to see what a DAV client would look like, see davlib.py. http://www.lyra.org/greg/python/ It really *wouldn't* use urllib, which is all about fetching. On Fri, Oct 31, 2003 at 10:45:04AM -0600, Ian Bicking wrote: > On Oct 31, 2003, at 10:34 AM, John J Lee wrote: > >> * WebDAV > > > > I plead ignorance. > > I don't think urllib2 and WebDAV will work very well together, though > maybe... in the end, a WebDAV interface has to be a lot more complex > than a URL-fetching interface. So even if WebDAV was built on urllib2, > it would end up looking a lot different in the end. > > Though thinking about it... for the most part a WebDAV client could > *use* urllib2. The most important things are just using different > methods (PROPFIND, PUT, etc), and setting the body of the request -- > these are probably already easy to do with urllib2. Dealing with > multiple error responses, and some of the other error responses that > WebDAV defines, may be more challenging urllib2 (or not, I don't know) > -- you can do compound operations with WebDAV, and so there may be an > error message associated with a specific subrequest. There's some sort > of "multiple response" response code, but the actual responses are in > the body of the response. urllib2 could just do nothing and pass all > the information on to the WebDAV client and let it reinterpret the > results. > > -- > Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org > > > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/gstein%40lyra.org -- Greg Stein, http://www.lyra.org/ From gward at python.net Fri Oct 31 21:23:05 2003 From: gward at python.net (Greg Ward) Date: Fri Oct 31 21:23:08 2003 Subject: [Web-SIG] [server-side] request/response objects In-Reply-To: <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com> References: <20031024132028.C15765@lyra.org> <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: <20031101022305.GA5781@cthulhu.gerg.ca> On 29 October 2003, Ian Bicking said: > The difficulty of writing, say, request.response.write(something) vs. > handler.write(something) doesn't seem like a big deal to me. FWIW, this is how Quixote works. We started out with completely separate HTTPRequest and HTTPResponse objects (borrowed from Zope, and drastically stripped down). Then somewhere along the line, someone (Neil S. I think) noted, like Greg S., that you can't have a response without a request, and vice-versa. So now the HTTPResponse object is a accessible as request.response. It's convenient and simple, and I agree that the request and response are indeed distinct concepts. But Greg S.'s "handler" idea has an appeal too. One thing that bugs me about Quixote's request.response is that the request is "special" because it's what's passed around, and the response is subordinate to it. That's wrong; although the request comes first chronologically, the two are equally important in a typical web app. So right now I think I'm 51% in favour of a single object. But I'm not sure if "handler" is the right name though: in English, I would call it an "HTTP request/response cycle", but that's a bit of a mouthful for a classname. (Except in Java, but LetsNotGoThere.) Maybe HTTPTransaction -- tack on the "HTTP" and it's pretty clear we're not talking about databases. Greg -- Greg Ward http://www.gerg.ca/ I hope something GOOD came in the mail today so I have a REASON to live!! From gward at python.net Fri Oct 31 21:27:18 2003 From: gward at python.net (Greg Ward) Date: Fri Oct 31 21:27:20 2003 Subject: [Web-SIG] More prior art, less experimentation In-Reply-To: References: Message-ID: <20031101022718.GB5781@cthulhu.gerg.ca> On 24 October 2003, Ian Bicking said: > We *do* have the opportunity to create something that can unify the > Python web experience and provide the basis for more adoption of Python > for web programming. To do that we will have to repeat the work done > many times before. We should aspire to quality, but I think we need to > hold ourselves back from aesthetic experimentation, and respect > convention above our own preferences. We can still indulge our own > fancies outside of the standard library, and building on the standard > library -- nothing we do should preclude your individual preferences > toward web programming, but it should not preclude other people's > preference either. But most of all it should provide the foundation > upon which the mature, *existing* frameworks can build. +1000. Hence my statement about disagreeing with the practice of overlapping GET and POST variables, but supporting that practice *in the standard library*. And, simultaneously, *not* supporting that practice in Quixote, where a slightly different aesthetic prevails. Whatever we come up with here must be agnostic with respect to many choices, eg. how to map URLs to code (or data) or how to generate HTML (or XML, or whatever) pages from code. (Those two decisions are, IMHO, at the heart of most web frameworks, and the most prone to religious discussions -- ie. they have no place in the stdlib.) Greg -- Greg Ward http://www.gerg.ca/ Save energy: be apathetic. From gward at python.net Fri Oct 31 21:48:13 2003 From: gward at python.net (Greg Ward) Date: Fri Oct 31 21:48:19 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031030222620.B1901@lyra.org> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> Message-ID: <20031101024813.GA9101@cthulhu.gerg.ca> On 30 October 2003, Greg Stein said: > Correct. The name isn't the important part of the PEP. That said, "web" is > a big misnomer for [package containing] an http client library, but that's > a bikeshed of an entirely different color :-) Really? I know "world-wide web" (capitalized or not) is a ridiculously over-used, over-broad term, but what's the alternative? If an HTTP client library isn't about "the web", then what the heck is it about? (BTW, whoever said that "web.client" and "web.server" are better names than "web.http" is right. I think. So far I've agreed with every idea I've seen on this sig, including the mutually contradicting ones. ;-) Greg -- Greg Ward http://www.gerg.ca/ Never put off till tomorrow what you can put off till the day after tomorrow. From gward at python.net Fri Oct 31 21:54:58 2003 From: gward at python.net (Greg Ward) Date: Fri Oct 31 21:55:01 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> References: <20031031025116.GA7401@cthulhu.gerg.ca> <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> Message-ID: <20031101025458.GA9131@cthulhu.gerg.ca> [me] > * I oppose Simon Willison's practice of using the same variable > in the "GET" and "POST" part of a request, but I will defend to the > death his right to do so. (But not in Quixote, where a narrower > definition of what is Right, Good, and Truthfull prevails.) [Bill Janssen] > I don't get it. Any particular request only has one method, not two: > "GET" and "POST". Are you talking about for some reason > special-casing these two methods in the Request class? I think it > makes more sense to do things generically: Sorry, lame/fuzzy terminology on part. AMK cleared it up nicely. Greg -- Greg Ward http://www.gerg.ca/ If you can read this, thank a programmer. From richardjones at optushome.com.au Sat Oct 25 00:08:13 2003 From: richardjones at optushome.com.au (Richard Jones) Date: Mon Nov 3 14:52:29 2003 Subject: [Web-SIG] Client-side support - webunit is back :) Message-ID: <200310251408.13911.richardjones@optushome.com.au> [sorry, I'm not subscribed to this list - I simply don't have the spare cycles] I noticed some archive messages saying webunit code was off the air. I've been migrating my website, and the code's back now. See webunit's PyPI page for info: http://www.python.org/pypi?:action=display&name=webunit&version=1.3.3 and the code is at: http://mechanicalcat.net/tech/webunit/ Richard ps. from the discussion, it sounds like my code does pretty much everything that has been asked of client-side code. It's not pretty but is used in Real Life. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://mail.python.org/pipermail/web-sig/attachments/20031025/d7f20ea8/attachment.bin From aahz at pythoncraft.com Tue Oct 28 13:10:10 2003 From: aahz at pythoncraft.com (Aahz) Date: Mon Nov 3 14:52:34 2003 Subject: [Python-Dev] Re: [Web-SIG] Threading and client-side support In-Reply-To: References: <20031027150709.GA29045@rogue.amk.ca> <20031028124646.GB1095@rogue.amk.ca> Message-ID: <20031028181009.GA20129@panix.com> On Tue, Oct 28, 2003, John J Lee wrote: > On Tue, 28 Oct 2003 amk@amk.ca wrote: >> On Tue, Oct 28, 2003 at 10:35:33AM +0000, John J Lee wrote: >>> >>> Thanks. So, in particular, httplib, urllib and urllib2 are thread-safe? >> >> No idea; reading the code would be needed to figure that out. > > That might not be helpful if the person reading it (me) has zero > threading experience ;-) > > I certainly plan to gain that experience, but surely *somebody* > already knows whether they're thread-safe? I presume they are, > broadly, since a couple of violations of thread safety are commented > in urllib2 and urllib. Right? Generally speaking, any code that does not rely on global objects is thread-safe in Python. For more information, let's take this to python-list. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From thijs at vandervossen.net Thu Oct 30 02:55:24 2003 From: thijs at vandervossen.net (Thijs van der Vossen) Date: Mon Nov 3 14:52:39 2003 Subject: [Web-SIG] Form field dictionaries In-Reply-To: <7325A4A1-0A2B-11D8-ABB3-000393C2D67E@colorstudy.com> References: <7325A4A1-0A2B-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: <200310300855.24441.thijs@vandervossen.net> On Wednesday 29 October 2003 17:17, Ian Bicking wrote: > On Wednesday, October 29, 2003, at 11:12 AM, Barry Warsaw wrote: > > Dumb-ass suggestion of the day: what if the field values were > > represented by a dict subclass, and we had several different > > subclasses, > > each of which specified the exact behavior for __getitem__(). E.g. > > David could have his "_getitem__ is getfirst" behavior, Steve could > > have > > his verified-multiples behavior, and I could have my "always return a > > list" behavior. We'd then be reduced to choosing a default and a few > > interfaces and everyone would be happy . > > That would make me unhappy... next thing you know, you'll be > introducing a magic quoting dict subclass... Aargh! Maybe it's time to move to Ruby for web development without magic? ;-) Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From fincher.8 at osu.edu Thu Oct 30 16:03:15 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Mon Nov 3 14:52:46 2003 Subject: [Web-SIG] Re: [Python-Dev] HTML parsing: anyone use formatter? In-Reply-To: <20031030192718.GA13220@rogue.amk.ca> References: <20031030192718.GA13220@rogue.amk.ca> Message-ID: <200310301603.15437.fincher.8@osu.edu> On Thursday 30 October 2003 02:27 pm, amk@amk.ca wrote: > I suppose the more general question is, does anyone use Python's formatter > module? Do we want to keep it around, or should htmllib be pushed toward > doing just HTML parsing? formatter.py is a long way from being able to > handle modern web pages and it would be a lot of work to build a decent > renderer. I've never used it myself, though I'll admit that some software I've used (for searching the IMDB) does use it. Jeremy