From jjl at pobox.com Sat Nov 1 07:12:49 2003 From: jjl at pobox.com (John J Lee) Date: Sat Nov 1 07:12:55 2003 Subject: [Web-SIG] Re: client-side In-Reply-To: <20031031162853.D3462@lyra.org> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> <20031031162853.D3462@lyra.org> Message-ID: On Fri, 31 Oct 2003, Greg Stein wrote: > On Fri, Oct 31, 2003 at 05:52:54PM +0000, John J Lee wrote: > > On Fri, 31 Oct 2003, Ian Bicking wrote: > > > On Oct 31, 2003, at 10:34 AM, John J Lee wrote: > > > >> * WebDAV > > > > > > > > I plead ignorance. > > > > > [...info about WebDAV from Ian...] > > > > Sounds (I'm saying this with virtually no knowledge of the protocol, of > > course) like it would be best built on top of urllib2 rather than > > integrated with it. Do you agree, Greg S.? > > WebDAV belongs on top of httplib, not urllib. And... hey, what do you If you read my other post (posted about ten seconds after the one you reply to here :-) you'll see I didn't mean to say it *should* be on top of urllib{2,}, just that it looked like it *shouldn't* be part of urllib{2,}. > know! ... that is exactly how I implemented davlib.py many years ago. In [...] All right, all right, we believe you! :-) John From jjl at pobox.com Sat Nov 1 07:50:25 2003 From: jjl at pobox.com (John J Lee) Date: Sat Nov 1 07:50:37 2003 Subject: client-side [was: Re: [Web-SIG] Random thoughts] In-Reply-To: <20031031161802.C3462@lyra.org> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <20031031161802.C3462@lyra.org> Message-ID: On Fri, 31 Oct 2003, Greg Stein wrote: > On Fri, Oct 31, 2003 at 04:34:02PM +0000, John J Lee wrote: [...] > > He was talking about the server side! > > No, Greg Ward was talking about an http client. Otherwise, he would not > have mentioned PEP 268. OK. Personally, I don't think the client-side stuff particularly well bundled in with the server-side stuff, but I guess I'm not really bothered. I don't think any of the stuff we've discussed so far is particularly helpfully described as "an http client", though. [...] > > > * SSL > > > > That's already down at the httplib level (and the socket level, of > > course). > > I know that (given that I wrote the current httplib :-). However, I > maintain that the implementation uses an improper design. Fine. > > > * Basic/Digest/??? authentication > > > > That's naturally done at the urllib / urllib2 level, given the way it > > works. > > There is nothing "natural" about it. That is where it resides, but > authentication is part of the HTTP specification and should be able to be > used by anything attempting to interact at the HTTP level. HTTP is far OK. > more than "fetch the contents of this URL." Yes. And simultaneously far less, depending on how you look at it... [note to self: stop getting philosophical in Python threads] > My list was specifically intended to say: each of these items belongs in > the core HTTP (client) service layer. Not urllib. Indeed. That's why I was commenting on that question. [...] > > > The current model for the client side uses two, distinct classes to deal > > > with the SSL feature. > > > > Sorry, which classes are they? > > HTTPConnection and HTTPSConnection. (or HTTP and HTTPS for the backwards > compat stuff). See above about combinatorics using this design model. if it causes problems for somebody (at the httplib / httpx level), I guess it should be fixed. > > > I have an entirely separate module for the WebDAV stuff. > > > > How should it be integrated (if at all), in your opinion (assuming you > > want it in the standard library)? > > See PEP 268. [...] OK, I see it won't be part of httplib, but a separate module httpx. As long as you can send one HTTP request, and get back one HTTP response, and know that httplib isn't going to "swallow" any HTTP responses, no problem. But I see your PEP doesn't break that simple way of working (in fact I don't know why I was worried about that in the first place...). Is there any reason not to arrange your stuff in such a a way as to allow reimplementing a maximal part of urllib2's proxy / auth handlers in terms of your code? But perhaps I'm too concerned about the simplicity of the non-swallowing nature of the current httplib, in these particular cases of authentication and proxying -- maybe in practice that can be given up without ill-effects (not by changing httplib, but by deprecating the auth/proxy code in urllib2 (and urllib too I guess)). Not sure. Where is this proxy / auth code? davlib? John From gstein at lyra.org Sat Nov 1 15:44:40 2003 From: gstein at lyra.org (Greg Stein) Date: Sat Nov 1 15:45:14 2003 Subject: [Web-SIG] Re: client-side In-Reply-To: ; from jjl@pobox.com on Sat, Nov 01, 2003 at 12:12:49PM +0000 References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com> <20031031162853.D3462@lyra.org> Message-ID: <20031101124440.B4434@lyra.org> On Sat, Nov 01, 2003 at 12:12:49PM +0000, John J Lee wrote: > On Fri, 31 Oct 2003, Greg Stein wrote: >... > > WebDAV belongs on top of httplib, not urllib. And... hey, what do you > > If you read my other post (posted about ten seconds after the one you > reply to here :-) you'll see I didn't mean to say it *should* be on top of > urllib{2,}, just that it looked like it *shouldn't* be part of urllib{2,}. Ah! Right! Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Nov 1 16:19:25 2003 From: gstein at lyra.org (Greg Stein) Date: Sat Nov 1 16:19:54 2003 Subject: [Web-SIG] Re: client-side http In-Reply-To: ; from jjl@pobox.com on Sat, Nov 01, 2003 at 12:50:25PM +0000 References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <20031031161802.C3462@lyra.org> Message-ID: <20031101131925.C4434@lyra.org> On Sat, Nov 01, 2003 at 12:50:25PM +0000, John J Lee wrote: > On Fri, 31 Oct 2003, Greg Stein wrote: > > > On Fri, Oct 31, 2003 at 04:34:02PM +0000, John J Lee wrote: > [...] > > > He was talking about the server side! > > > > No, Greg Ward was talking about an http client. Otherwise, he would not > > have mentioned PEP 268. > > OK. Personally, I don't think the client-side stuff particularly well > bundled in with the server-side stuff, but I guess I'm not really Agreed. Separation between client and server should be clear in the packaging. >... > > more than "fetch the contents of this URL." > > Yes. And simultaneously far less, depending on how you look at it... Yup. "Move this data from here to there. Thanks." Not a lot to it in some ways. :-) >... > > > > I have an entirely separate module for the WebDAV stuff. > > > > > > How should it be integrated (if at all), in your opinion (assuming you > > > want it in the standard library)? > > > > See PEP 268. > [...] > > OK, I see it won't be part of httplib, but a separate module httpx. As > long as you can send one HTTP request, and get back one HTTP response, and > know that httplib isn't going to "swallow" any HTTP responses, no problem. Agreed; sending in a response and getting nothing back would be... weird. That said, the http code *could* automatically resend a request with new credentials. It already attempts to resend when it sees a broken pipe (the server closed its connection; usually from a timeout). > But I see your PEP doesn't break that simple way of working (in fact I > don't know why I was worried about that in the first place...). Is there > any reason not to arrange your stuff in such a a way as to allow > reimplementing a maximal part of urllib2's proxy / auth handlers in terms > of your code? The hope was to refactor all the authentication, proxy handling, and proxy-auth handling down into "httpx" ("httplib extensions"). And then, yes, urllib(2) would be reimplemented in terms of httpx. > But perhaps I'm too concerned about the simplicity of the > non-swallowing nature of the current httplib, in these particular cases of > authentication and proxying -- maybe in practice that can be given up > without ill-effects (not by changing httplib, but by deprecating the > auth/proxy code in urllib2 (and urllib too I guess)). Not sure. Yup. I wanted to leave httplib alone, or with just minimal changes, and then to layer on new HTTP handling code via httpx. By refactoring that stuff out of urllib, then it could be used for any HTTP code. For example, if you wanted to write some GUI that interacted with a server via WebDAV, then you'd be able to use cool auth, proxies, etc for those interactions. > Where is this proxy / auth code? davlib? I started on the work within the Python repository's sandbox. See /nondist/sandbox/Lib/httpx.py. And holy crap... I don't remember writing that much code. Woah! It is mostly auth stuff right now. I never got to the proxy work, nor to the DAV integration. A copy of my davlib.py is sitting in that sandbox directory, too. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jjl at pobox.com Sat Nov 1 17:57:37 2003 From: jjl at pobox.com (John J Lee) Date: Sat Nov 1 17:57:45 2003 Subject: [Web-SIG] Re: client-side http In-Reply-To: <20031101131925.C4434@lyra.org> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <20031031161802.C3462@lyra.org> <20031101131925.C4434@lyra.org> Message-ID: On Sat, 1 Nov 2003, Greg Stein wrote: > On Sat, Nov 01, 2003 at 12:50:25PM +0000, John J Lee wrote: [...] > That said, the http code *could* automatically resend a request with new > credentials. It already attempts to resend when it sees a broken pipe (the > server closed its connection; usually from a timeout). Yeah, but that does seem much more transport-y and low-level than authentication and proxying. > > But I see your PEP doesn't break that simple way of working (in fact I > > don't know why I was worried about that in the first place...). Is there > > any reason not to arrange your stuff in such a a way as to allow > > reimplementing a maximal part of urllib2's proxy / auth handlers in terms > > of your code? > > The hope was to refactor all the authentication, proxy handling, and > proxy-auth handling down into "httpx" ("httplib extensions"). And then, > yes, urllib(2) would be reimplemented in terms of httpx. Well, the choice I was discussing was that between reimplementing the stuff like urllib2.HTTPBasicAuthHandler using httpx (which I guess would require some refactoring of httpx), OR just reimplementing urllib2.AbstractHTTPHandler using httpx (ie. replacing HTTP / HTTPS with whatever class(es) from httplib / httpx). You're suggesting the latter, obviously. There's no equivalent issue for urllib, I guess. > > But perhaps I'm too concerned about the simplicity of the > > non-swallowing nature of the current httplib, in these particular cases of > > authentication and proxying -- maybe in practice that can be given up > > without ill-effects (not by changing httplib, but by deprecating the > > auth/proxy code in urllib2 (and urllib too I guess)). Not sure. > > Yup. I wanted to leave httplib alone, or with just minimal changes, and > then to layer on new HTTP handling code via httpx. By refactoring that > stuff out of urllib, then it could be used for any HTTP code. For example, > if you wanted to write some GUI that interacted with a server via WebDAV, > then you'd be able to use cool auth, proxies, etc for those interactions. [...] John From jjl at pobox.com Sun Nov 2 09:34:11 2003 From: jjl at pobox.com (John J Lee) Date: Sun Nov 2 09:34:19 2003 Subject: urllib2.UserAgent [was: Re: [Web-SIG] So what's missing?] In-Reply-To: References: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com> Message-ID: How about something like this (unfinished, untested!)? Sorry for the 1.5.2-isms. class UserAgent(OpenerDirector): """Convenient user-agent class. Do not modify the addheaders attribute directly. """ # XXX # AbstractHTTPHandler should be updated to use HTTP{S,}Connection. # Either AbstractHTTPHandler or auth (/proxy?) classes need to use # httpx and this interface adjusted as appropriate. # Conditional fetches?? # XXX should this be public? self.handler_classes = { "http": HTTPHandler, "https": HTTPSHandler, "ftp": CacheFTPHandler, # XXX etc. # XXX # rest of auth # proxies "_authen": HTTPBasicAuthHandler, "_cookies": HTTPCookieProcessor, "_robots": RobotRulesProcessor, "_refresh": HTTPRefreshProcessor, "_equiv": HTTPEquivProcessor, "_seek": SeekableProcessor, "_debug_redirect": HTTPRedirectDebugProcessor, "_debug_response_body": HTTPResponseDebugProcessor, } self.default_schemes = ["http", "https", "ftp"] self.default_handlers = ["_authen"] def __init__(self): OpenerDirector.__init__(self) self._handlers = {} for scheme, klass in self.default_schemes+self.default_handlers: self._handlers[scheme] = klass() # XXX ## def set_timeout(self, timeout): ## self._timeout = timeout ## def set_connection_cache(self, conn_cache): ## self._conn_cache = conn_cache ## def set_cache(self, cache): ## self._cache = cache def set_handled_schemes(self, schemes): """Set sequence of protocol scheme strings.""" schemesd = {} for scheme in schemes: if startswith(scheme, "_"): raise ValueError("invalid scheme '%s'" % scheme) schemesd[scheme] = None # get rid of scheme handlers we don't want for scheme, oldhandler in self._handlers.items(): if startswith(scheme, "_"): continue # not a scheme handler if not schemesd.has_key[scheme]: self._replace_handler(oldhandler, None) else: del schemesd[scheme] # add the scheme handlers that are missing for scheme in schemesd.keys(): handler_class = self.handler_classes[scheme] self.add_handler(handler_class()) def set_persistent_headers(self, headers): """Set sequence of header name, value pairs. These headers are sent with every request, as long as they are not overridden in the Request. >>> ua = UserAgent() >>> ua.set_peristent_headers( ... [("User-agent", "Mozilla/5.0 (compatible)"), ... ("From", "responsible.person@example.com")]) """ # XXX tie in with robots stuff d = {} for name, value in headers: d[name.capitalize()] = value self.addheaders = d.items() def _set_handler(self, key, obj=None): oldhandler = self._handlers.get(key) handler_class = self.handler_classes[key] if obj is not None: newhandler = handler_class(obj) else: newhandler = handler_class() self._replace_handler(oldhandler, newhandler) def set_cookiejar(self, cookiejar): """Set a ClientCookie.CookieJar, or None.""" self._set_handler("_cookies", cookiejar) def set_robotfileparser(self, rfp): """Set a robots.RobotFileParser, or None.""" self._set_handler("_robots", cookiejar) def set_robotfileparser(self, credentials): """Set a urllib2.HTTPPasswordMgr, or None.""" # XXX httpx? self._set_handler("_authen", credentials) # these methods all take a boolean parameter def set_handle_refresh(self, handle): """Set whether to handle HTTP Refresh headers.""" self._set_handler("_refresh") def set_handle_equiv(self, handle): """Set whether to treat HTML http-equiv headers like HTTP headers. Implies seekable responses. """ self.set_seekable_responses(True) self._set_handler("_equiv") def set_seekable_responses(self, handle): """Make response objects .seek()able.""" self._set_handler("_seek") # XXX haven't thought through debugging... def set_debug_redirects(self, handle): """Print information about HTTP redirects.""" self._set_handler("_debug_redirect") def set_debug_responses(self, handle): """Print HTTP response bodies.""" self._set_handler("_debug_response_body") def http_get(self, fullurl, ranges=None, conditions=None): """HTTP GET. ranges: sequence of pairs of byte ranges (start, end) to fetch; Ranges follow the usual Python rules (the start byte is included, the end byte is not; negative numbers count back from the end of the entity; start None means start of entity; end None means end of entity). There are restrictions, though: end must not be negative, and if start is negative, end must be None. >>> ua.http_get("http://www.example.com/big.dat", [(0, 10), (-10, None)]) # first and last 10 bytes >>> ua.http_get("http://www.example.com/big.dat", [(50000, None)]) # from byte 500000 to the end """ req = self._request(fullurl, data) assert req.get_type() == "http", "http_get for non-HTTP URI" rs = [] for start, end in ranges: if start < 0: assert end is None, "invalid range" start = "" else: assert 0 <= start <= end, "invalid range" if start == end: continue end = end - 1 rs.append("%s-%s" % range) req.add_header(("Range", "bytes=" % string.join(rs, ", "))) return self.open(req) # XXX how to support these methods using Request class? ## def http_head(self, fullurl): ## def http_put(self, fullurl, data=None): ## # XXX what about 30x handling? def _replace_handler(self, handler, newhandler=None): # first, if handler was previously added, remove it for dict_ in [self.handlers, self.handle_open, self.handle_error, self.process_request, self.process_response]: if handler is None: break for handlers in dict_.values(): for i in range(len(handlers)): if handlers[i] is handler: del handlers[i] # then add the replacement, if any if newhandler is not None: self.add_handler(newhandler) From grisha at modpython.org Sun Nov 2 16:37:21 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Sun Nov 2 16:37:27 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031101024813.GA9101@cthulhu.gerg.ca> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <20031101024813.GA9101@cthulhu.gerg.ca> Message-ID: <20031102163243.K47569@onyx.ispol.com> On Fri, 31 Oct 2003, Greg Ward wrote: > (BTW, whoever said that "web.client" and "web.server" are better names > than "web.http" is right. I think. So far I've agreed with every idea > I've seen on this sig, including the mutually contradicting ones. ;-) A sidenote on this - lower case common words do not make good module names because they are too easily overshadowed (mistakenly) by local variables. E.g.: from web import cookie [ blah blah ] cookie = request.get_cookie() [oops, the cookie module just got inaccessible, must use kookie or something ungly like this] I think the best solution is to use upper case for module names like "Cookie", "HTTP", "Client", etc. Grisha From sholden at holdenweb.com Sun Nov 2 19:05:46 2003 From: sholden at holdenweb.com (Steve Holden) Date: Sun Nov 2 19:11:09 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com> Message-ID: > -----Original Message----- > From: web-sig-bounces+sholden=holdenweb.com@python.org > [mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of > Bill Janssen > Sent: Thursday, October 30, 2003 10:46 PM > To: Greg Ward > Cc: web-sig@python.org > Subject: Re: [Web-SIG] Random thoughts > > > > * I oppose Simon Willison's practice of using the same variable > > in the "GET" and "POST" part of a request, but I will > defend to the > > death his right to do so. (But not in Quixote, where a narrower > > definition of what is Right, Good, and Truthfull prevails.) > > I don't get it. Any particular request only has one method, not two: > "GET" and "POST". Are you talking about for some reason > special-casing these two methods in the Request class? I think it > makes more sense to do things generically: > > request.path (e.g., '/foo/bar') > request.method (e.g., "GET") > request.part (e.g., "#bletch", perhaps without the #) > request.headers > request.parameters (either the query parms, or the > multipart/form-data values) > > request.response() => returns a Response object tied to > this request > > response.error(code, message) Sends back an error > response.reply(htmltext) Sends back a message > response.open(ContentType="text/html", code=200) => file > object to write to > fp.write(...) > fp.close() Sends back the response > response.redirect(URL) Sends back redirect to the URL > The question is how to integrate data from a POST request (i.e. from the standard input of the PORT transaction) and any parameters that may be included in the for action's URI (e.g. if the form has an action attribute such as "form.py?extra1=val1&extra2=val2") regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From t.vandervossen at fngtps.com Mon Nov 3 03:19:31 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Mon Nov 3 03:19:40 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: Message-ID: <200311030919.32650.t.vandervossen@fngtps.com> On Monday 03 November 2003 01:05, Steve Holden wrote: > > > * I oppose Simon Willison's practice of using the same variable > > > in the "GET" and "POST" part of a request, but I will defend to the > > > death his right to do so. (But not in Quixote, where a narrower > > > definition of what is Right, Good, and Truthfull prevails.) > > > > I don't get it. Any particular request only has one method, not two: > > "GET" and "POST". Are you talking about for some reason > > special-casing these two methods in the Request class? I think it > > makes more sense to do things generically: > > > > ... > > The question is how to integrate data from a POST request (i.e. from the > standard input of the PORT transaction) and any parameters that may be > included in the for action's URI (e.g. if the form has an action > attribute such as "form.py?extra1=val1&extra2=val2") Before asking _how_ we might first want to decide if we _should_. What's wrong with having both a 'parameters' (query params) and a 'form-data' (posted form data) dictionary as attributes of the request object? In your code you could merge these two with a single line of code: merged = request.parameters.update(request.form-data) Where the posted form data will overwrite the query parameters with the same name. You can also do this the other way round ofcourse. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From davidf at sjsoft.com Mon Nov 3 03:51:37 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Nov 3 03:51:42 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031102163243.K47569@onyx.ispol.com> References: <20031031025116.GA7401@cthulhu.gerg.ca> <20031030222620.B1901@lyra.org> <20031101024813.GA9101@cthulhu.gerg.ca> <20031102163243.K47569@onyx.ispol.com> Message-ID: <3FA61719.5040603@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >On Fri, 31 Oct 2003, Greg Ward wrote: > > > >>(BTW, whoever said that "web.client" and "web.server" are better names >>than "web.http" is right. I think. So far I've agreed with every idea >>I've seen on this sig, including the mutually contradicting ones. ;-) >> >> > >A sidenote on this - lower case common words do not make good module names >because they are too easily overshadowed (mistakenly) by local variables. > >E.g.: > >from web import cookie > >[ blah blah ] > > cookie = request.get_cookie() > > [oops, the cookie module just got inaccessible, must use kookie > or something ungly like this] > >I think the best solution is to use upper case for module names like >"Cookie", "HTTP", "Client", etc. > >Grisha > > But lower case names for packages ("web.Cookie")? David From anthony at interlink.com.au Mon Nov 3 04:21:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Nov 3 04:25:30 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031102163243.K47569@onyx.ispol.com> Message-ID: <200311030921.hA39LbjA009699@localhost.localdomain> >>> "Gregory (Grisha) Trubetskoy" wrote > A sidenote on this - lower case common words do not make good module names > because they are too easily overshadowed (mistakenly) by local variables. > > E.g.: > > from web import cookie > > [ blah blah ] > > cookie = request.get_cookie() > > [oops, the cookie module just got inaccessible, must use kookie > or something ungly like this] > > I think the best solution is to use upper case for module names like > "Cookie", "HTTP", "Client", etc. I disagree - first off, case sensitivity differs on Unix and non-Unix platforms. Secondly, this is a new package, so it's not like we have to worry about existing codebases. If a user does this, they'll figure it out pretty quickly. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From sholden at holdenweb.com Mon Nov 3 07:31:04 2003 From: sholden at holdenweb.com (Steve Holden) Date: Mon Nov 3 07:37:01 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <200311030919.32650.t.vandervossen@fngtps.com> Message-ID: > -----Original Message----- > From: Thijs van der Vossen [mailto:t.vandervossen@fngtps.com] > Sent: Monday, November 03, 2003 3:20 AM > To: sholden@holdenweb.com > Cc: web-sig@python.org > Subject: Re: [Web-SIG] Random thoughts > > > On Monday 03 November 2003 01:05, Steve Holden wrote: > > > > * I oppose Simon Willison's practice of using the > same variable > > > > in the "GET" and "POST" part of a request, but I > will defend to the > > > > death his right to do so. (But not in Quixote, > where a narrower > > > > definition of what is Right, Good, and Truthfull prevails.) > > > > > > I don't get it. Any particular request only has one > method, not two: > > > "GET" and "POST". Are you talking about for some reason > > > special-casing these two methods in the Request class? I think it > > > makes more sense to do things generically: > > > > > > ... > > > > The question is how to integrate data from a POST request > (i.e. from the > > standard input of the PORT transaction) and any parameters > that may be > > included in the for action's URI (e.g. if the form has an action > > attribute such as "form.py?extra1=val1&extra2=val2") > > Before asking _how_ we might first want to decide if we > _should_. What's wrong > with having both a 'parameters' (query params) and a > 'form-data' (posted form > data) dictionary as attributes of the request object? > > In your code you could merge these two with a single line of code: > > merged = request.parameters.update(request.form-data) > > Where the posted form data will overwrite the query > parameters with the same > name. You can also do this the other way round ofcourse. > Of course, you *could*. But you will find that whatever you choose to do, there will be users whose use cases don't fit your choices. And, of course, if you choose to offer all possible alternatives then some people will complain that it's too complicated. On balance, your suggestion seems the most practical except for the naming (form-data is not a valid name). Maybe formargs and queryargs, or formdata and querydata, or some such, leaving the user to merge the two sets as (and only when) required. regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From jjl at pobox.com Mon Nov 3 07:39:12 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 07:39:19 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <200311030921.hA39LbjA009699@localhost.localdomain> References: <200311030921.hA39LbjA009699@localhost.localdomain> Message-ID: On Mon, 3 Nov 2003, Anthony Baxter wrote: [...] > > from web import cookie > > > > [ blah blah ] > > > > cookie = request.get_cookie() > > > > [oops, the cookie module just got inaccessible, must use kookie > > or something ungly like this] > > > > I think the best solution is to use upper case for module names like > > "Cookie", "HTTP", "Client", etc. > > I disagree - first off, case sensitivity differs on Unix and non-Unix > platforms. And...? Python module names *once they're imported* are case-sensitive. > Secondly, this is a new package, so it's not like we have to > worry about existing codebases. And...? > If a user does this, they'll figure it > out pretty quickly. Grisha was arguing that it's a shame to force people to pick ugly names, not (necessarily) that it would cause bugs. John From jjl at pobox.com Mon Nov 3 07:45:52 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 07:45:59 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: Message-ID: On Mon, 3 Nov 2003, Steve Holden wrote: [...] > On balance, your suggestion seems the most practical except for the > naming (form-data is not a valid name). Maybe formargs and queryargs, or > formdata and querydata, or some such, leaving the user to merge the two > sets as (and only when) required. [...] 'form' seems bad because form data can be in the URL-encoded data, not only in the POST data. How about postdata and querydata? That doesn't imply that querydata are absent from POSTs (as getdata would), but doesn't imply that querydata is necessarily non-form data either. John From t.vandervossen at fngtps.com Mon Nov 3 07:50:43 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Mon Nov 3 07:50:49 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: Message-ID: <200311031350.44270.t.vandervossen@fngtps.com> On Monday 03 November 2003 13:31, Steve Holden wrote: > > Before asking _how_ we might first want to decide if we > > _should_. What's wrong > > with having both a 'parameters' (query params) and a > > 'form-data' (posted form > > data) dictionary as attributes of the request object? > > > > In your code you could merge these two with a single line of code: > > > > merged = request.parameters.update(request.form-data) > > > > Where the posted form data will overwrite the query > > parameters with the same > > name. You can also do this the other way round ofcourse. > > Of course, you *could*. But you will find that whatever you choose to > do, there will be users whose use cases don't fit your choices. And, of > course, if you choose to offer all possible alternatives then some > people will complain that it's too complicated. I agree. The best choice in cases like these IMHO is indeed not to add all possible alternatives to an api, but to write solid and comprehensive documentation explaining the easiest way to get to those alternatives in the calling code. Adding an additional attribute to the request object containing the merged data is useless if you can do this yourself in one line of code. > On balance, your suggestion seems the most practical except for the > naming (form-data is not a valid name). Maybe formargs and queryargs, or > formdata and querydata, or some such, leaving the user to merge the two > sets as (and only when) required. Naming was just an example following the earlier posts. I suggest 'post' (the only way to post form data is using a HTTP POST request) and either 'arguments', 'parameters' 'queryargs' or 'queryparams' for the query parameters. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From sholden at holdenweb.com Mon Nov 3 08:38:18 2003 From: sholden at holdenweb.com (Steve Holden) Date: Mon Nov 3 08:44:15 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <200311031350.44270.t.vandervossen@fngtps.com> Message-ID: [Thijs] > On Monday 03 November 2003 13:31, Steve Holden wrote: > > > Before asking _how_ we might first want to decide if we > > > _should_. What's wrong > > > with having both a 'parameters' (query params) and a > > > 'form-data' (posted form > > > data) dictionary as attributes of the request object? > > > > > > In your code you could merge these two with a single line of code: > > > > > > merged = request.parameters.update(request.form-data) > > > > > > Where the posted form data will overwrite the query > > > parameters with the same > > > name. You can also do this the other way round ofcourse. > > > > Of course, you *could*. But you will find that whatever you > choose to > > do, there will be users whose use cases don't fit your > choices. And, of > > course, if you choose to offer all possible alternatives then some > > people will complain that it's too complicated. > > I agree. The best choice in cases like these IMHO is indeed > not to add all > possible alternatives to an api, but to write solid and comprehensive > documentation explaining the easiest way to get to those > alternatives in the > calling code. > > Adding an additional attribute to the request object > containing the merged > data is useless if you can do this yourself in one line of code. > > > On balance, your suggestion seems the most practical except for the > > naming (form-data is not a valid name). Maybe formargs and > queryargs, or > > formdata and querydata, or some such, leaving the user to > merge the two > > sets as (and only when) required. > > Naming was just an example following the earlier posts. I > suggest 'post' (the > only way to post form data is using a HTTP POST request) and either > 'arguments', 'parameters' 'queryargs' or 'queryparams' for the query > parameters. > And since it seems natural for us all to call it "data", and since input is all data anyway, I'd suggest going with John J Lee's "postdata" vs "querydata". Or should that be postData vs queryData? Actually, although I was being fatuous, one thing we *can* do to help users is adopt a well-policed consistent case convention in this new module. I'm not even that bothered which convention we adopt, as long as someone will police it. regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From grisha at modpython.org Mon Nov 3 09:54:31 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Mon Nov 3 09:54:35 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <200311030919.32650.t.vandervossen@fngtps.com> References: <200311030919.32650.t.vandervossen@fngtps.com> Message-ID: <20031103094844.V58482@onyx.ispol.com> On Mon, 3 Nov 2003, Thijs van der Vossen wrote: > Before asking _how_ we might first want to decide if we _should_. What's wrong > with having both a 'parameters' (query params) and a 'form-data' (posted form > data) dictionary as attributes of the request object? > > In your code you could merge these two with a single line of code: > > merged = request.parameters.update(request.form-data) The problem here is that this would work great for someone who believes in separation, but someone who wants these things compbined would need to have this line in every bit of code. I think the way to do it would be to somehow provide either behaviour without additional code, e.g. as an argument to some __init__. I also think that the combined should be default, with the query string overriding posted data. Grisha From grisha at modpython.org Mon Nov 3 09:58:06 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Mon Nov 3 09:58:13 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: <200311030921.hA39LbjA009699@localhost.localdomain> Message-ID: <20031103095549.P58482@onyx.ispol.com> On Mon, 3 Nov 2003, John J Lee wrote: > Grisha was arguing that it's a shame to force people to pick ugly names, > not (necessarily) that it would cause bugs. Exactly! Grisha From grisha at modpython.org Mon Nov 3 10:07:17 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Mon Nov 3 10:07:21 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: Message-ID: <20031103095935.A58482@onyx.ispol.com> On Mon, 3 Nov 2003, John J Lee wrote: > On Mon, 3 Nov 2003, Steve Holden wrote: > [...] > > On balance, your suggestion seems the most practical except for the > > naming (form-data is not a valid name). Maybe formargs and queryargs, or > > formdata and querydata, or some such, leaving the user to merge the two > > sets as (and only when) required. > [...] > > 'form' seems bad because form data can be in the URL-encoded data, not > only in the POST data. How about postdata and querydata? I like "post" and "query" as a qualifier to "form data", after reading all this RFC stuff, this seems most standard compliant. request.form(query_overrides=1) <-- returns both request.form.postdata() request.form.querydata() I think would be clear and intuitive. Grisha From davidf at sjsoft.com Mon Nov 3 10:14:54 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Nov 3 10:14:59 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031103095935.A58482@onyx.ispol.com> References: <20031103095935.A58482@onyx.ispol.com> Message-ID: <3FA670EE.6030607@sjsoft.com> Gregory (Grisha) Trubetskoy wrote: >On Mon, 3 Nov 2003, John J Lee wrote: > > > >>On Mon, 3 Nov 2003, Steve Holden wrote: >>[...] >> >> >>>On balance, your suggestion seems the most practical except for the >>>naming (form-data is not a valid name). Maybe formargs and queryargs, or >>>formdata and querydata, or some such, leaving the user to merge the two >>>sets as (and only when) required. >>> >>> >>[...] >> >>'form' seems bad because form data can be in the URL-encoded data, not >>only in the POST data. How about postdata and querydata? >> >> > >I like "post" and "query" as a qualifier to "form data", after reading all >this RFC stuff, this seems most standard compliant. > >request.form(query_overrides=1) <-- returns both >request.form.postdata() >request.form.querydata() > >I think would be clear and intuitive. > >Grisha > > I agree, post and query are great, but I don't think "form" should be the general term (what about a query parameter in a hyperlink?) I prefer parameters: request.parameters(query_overrides=1) request.parameters.postdata() request.parameters.querydata() Anyway parameters might be too cumbersome, but I think it's a better term than form. We should also allow for file uploads - for this we may need to take into account that the entire post may not be read in by the time the handler gets here, so some of the postdata() objects may be file-like objects that can read the uploaded file. David From davidf at sjsoft.com Mon Nov 3 10:50:27 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Nov 3 10:50:34 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <3FA67446.8080808@sundayta.com> References: <20031103095935.A58482@onyx.ispol.com> <3FA670EE.6030607@sjsoft.com> <3FA67446.8080808@sundayta.com> Message-ID: <3FA67943.7080802@sjsoft.com> david wrote: > David > > > >> I agree, post and query are great, but I don't think "form" should be >> the general term (what about a query parameter in a hyperlink?) >> I prefer parameters: >> request.parameters(query_overrides=1) >> request.parameters.postdata() >> request.parameters.querydata() > > > +1 > >> We should also allow for file uploads - for this we may need to take >> into account that the entire post may not be read in by the time the >> handler gets here, so some of the postdata() objects may be file-like >> objects that can read the uploaded file. > > > Also with REST it is becoming more common to need to access the body > of the request because you POST an XML body instead of parameters. > > Dave Thanks David, presumed you meant to Cc this to the list too... David From ianb at colorstudy.com Mon Nov 3 11:19:28 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Nov 3 11:19:35 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031103095935.A58482@onyx.ispol.com> References: <20031103095935.A58482@onyx.ispol.com> Message-ID: <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> On Nov 3, 2003, at 9:07 AM, Gregory (Grisha) Trubetskoy wrote: > On Mon, 3 Nov 2003, John J Lee wrote: > >> On Mon, 3 Nov 2003, Steve Holden wrote: >> [...] >>> On balance, your suggestion seems the most practical except for the >>> naming (form-data is not a valid name). Maybe formargs and >>> queryargs, or >>> formdata and querydata, or some such, leaving the user to merge the >>> two >>> sets as (and only when) required. >> [...] >> >> 'form' seems bad because form data can be in the URL-encoded data, not >> only in the POST data. How about postdata and querydata? > > I like "post" and "query" as a qualifier to "form data", after reading > all > this RFC stuff, this seems most standard compliant. > > request.form(query_overrides=1) <-- returns both > request.form.postdata() > request.form.querydata() Seems a little long-winded. How about request.formdata, .postdata, .querydata, where .formdata is postdata+querydata? (In practice most people use the combined version) They could be proper dictionary-like objects. Though if they aren't real dictionaries, I suppose we would have: request.fields, request.fields.post, request.fields.query, each of which implements a dictionary interface. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From grisha at modpython.org Mon Nov 3 11:19:49 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Mon Nov 3 11:19:54 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <3FA670EE.6030607@sjsoft.com> References: <20031103095935.A58482@onyx.ispol.com> <3FA670EE.6030607@sjsoft.com> Message-ID: <20031103111056.K58482@onyx.ispol.com> On Mon, 3 Nov 2003, David Fraser wrote: > I agree, post and query are great, but I don't think "form" should be > the general term (what about a query parameter in a hyperlink?) This gets kind of interesting. The concept of a "form" is described in HTML standard. The concept of a query is described in the URL RFC. To the best of my understanding, query data has no standard for storing fields and values. All it is is "stuff after the question mark" So when you see: blah/blah?name=john&age=55 The query data is "name=john&age=55". If you take that query data *and* treat it as url-encoded form data, then you can say that "name" is "john", and "age" is "55". But if you don't treat it as form data, then there is no standard that says you can pass parameters separated by ampresands, etc. So based on that I think "form" is actually appropriate. If you want to get the query string, that should be available as request.query. Grisha From cs1spw at bath.ac.uk Mon Nov 3 12:23:06 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Mon Nov 3 12:23:14 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> Message-ID: <3FA68EFA.8090201@bath.ac.uk> Ian Bicking wrote: > Seems a little long-winded. How about request.formdata, .postdata, > .querydata, where .formdata is postdata+querydata? (In practice most > people use the combined version) That gets my vote. postdata and querydata seem like good pythonic names as well (I'm not a huge fan of camelCase for variable names in Python code). -- Simon Willison Web development weblog: http://simon.incutio.com/ From jjl at pobox.com Mon Nov 3 13:14:42 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 13:14:52 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031103111056.K58482@onyx.ispol.com> References: <20031103095935.A58482@onyx.ispol.com> <3FA670EE.6030607@sjsoft.com> <20031103111056.K58482@onyx.ispol.com> Message-ID: On Mon, 3 Nov 2003, Gregory (Grisha) Trubetskoy wrote: [...] > To the best of my understanding, query data has no standard for storing > fields and values. All it is is "stuff after the question mark" [...] > and "age" is "55". But if you don't treat it as form data, then there is > no standard that says you can pass parameters separated by ampresands, > etc. > > So based on that I think "form" is actually appropriate. 'querydata' (note the 'data' there) doesn't imply to me the ability to parse nonstandard query strings (though obviously it should have some well-defined value when the query string isn't in standard form). 'formdata', though, does imply to me that it contains all data relating to forms, which would be misleading. > If you want to get the query string, that should be available as > request.query. I suspect that's rare enough that we could expect people to just parse the request URL themselves, rather than adding yet another attribute. John From jjl at pobox.com Mon Nov 3 13:23:51 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 13:24:01 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> Message-ID: On Mon, 3 Nov 2003, Ian Bicking wrote: [...] > Seems a little long-winded. How about request.formdata, .postdata, > .querydata, where .formdata is postdata+querydata? (In practice most > people use the combined version) +1 > They could be proper dictionary-like objects. Though if they aren't > real dictionaries, I suppose we would have: > > request.fields, request.fields.post, request.fields.query, each of > which implements a dictionary interface. [...] Why does the dictionary-ness of these objects force moving them from attributes of request into attributes of a fields object? I don't like __getitem__, but I like breaking the "law of demeter" less. If people prefer to avoid __getitem__, they could just be methods: .formdata(), .querydata(), .postdata() (or use the new descriptor stuff? I know nothing about that...). John From ianb at colorstudy.com Mon Nov 3 13:26:18 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Nov 3 13:26:20 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: <20031103095935.A58482@onyx.ispol.com> <3FA670EE.6030607@sjsoft.com> <20031103111056.K58482@onyx.ispol.com> Message-ID: <37783324-0E2B-11D8-92EF-000393C2D67E@colorstudy.com> On Nov 3, 2003, at 12:14 PM, John J Lee wrote: >> If you want to get the query string, that should be available as >> request.query. > > I suspect that's rare enough that we could expect people to just parse > the > request URL themselves, rather than adding yet another attribute. If we give standard CGI variables somewhere (which is seems like we should) then QUERY_STRING will also provide this data. But of course REQUEST_URI.split('?', 1)[1] should do it easily enough as well. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Mon Nov 3 13:39:59 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Nov 3 13:40:02 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> Message-ID: <2102E1C2-0E2D-11D8-92EF-000393C2D67E@colorstudy.com> On Nov 3, 2003, at 12:23 PM, John J Lee wrote: >> They could be proper dictionary-like objects. Though if they aren't >> real dictionaries, I suppose we would have: >> >> request.fields, request.fields.post, request.fields.query, each of >> which implements a dictionary interface. > [...] > > Why does the dictionary-ness of these objects force moving them from > attributes of request into attributes of a fields object? I don't like > __getitem__, but I like breaking the "law of demeter" less. If people > prefer to avoid __getitem__, they could just be methods: .formdata(), > .querydata(), .postdata() (or use the new descriptor stuff? I know > nothing about that...). Sorry, I mixed that up a bit, I should have said: if form values are dictionaries (*not* just dictionary-like) then the request object has to have three objects for the three options (query, post, and mixed). If we use a dict subclass (or just implement __getitem__ and frields) then "fields" (or "formdata" or whatever) could have the .post and .query attributes (which obviously aren't attributes of a normal dictionary). If we want to add .getlist() and other options as well (which I think we do), we've already given up strict dictionariness. Personally I like fields.post and fields.query more than having three separate attributes of request. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Mon Nov 3 14:41:10 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 14:41:25 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <2102E1C2-0E2D-11D8-92EF-000393C2D67E@colorstudy.com> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> <2102E1C2-0E2D-11D8-92EF-000393C2D67E@colorstudy.com> Message-ID: On Mon, 3 Nov 2003, Ian Bicking wrote: > On Nov 3, 2003, at 12:23 PM, John J Lee wrote: [...] > > Why does the dictionary-ness of these objects force moving them from > > attributes of request into attributes of a fields object? I don't like > > __getitem__, but I like breaking the "law of demeter" less. If people > > prefer to avoid __getitem__, they could just be methods: .formdata(), > > .querydata(), .postdata() (or use the new descriptor stuff? I know > > nothing about that...). > > Sorry, I mixed that up a bit, I should have said: if form values are [...] I understood what you were saying, I think. > we do), we've already given up strict dictionariness. Personally I > like fields.post and fields.query more than having three separate > attributes of request. That was what I was complaining about: I don't like having to use multiple dots (for the reason I gave above: LoD): I want request.postdata, rather than request.fields.post. Of course, request.fields.post can still exist if you like, but I don't think it should be part of the public interface. John From janssen at parc.com Mon Nov 3 14:45:57 2003 From: janssen at parc.com (Bill Janssen) Date: Mon Nov 3 14:46:23 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: Your message of "Mon, 03 Nov 2003 06:54:31 PST." <20031103094844.V58482@onyx.ispol.com> Message-ID: <03Nov3.114604pst."58611"@synergy1.parc.xerox.com> > The problem here is that this would work great for someone who believes in > separation, but someone who wants these things compbined would need to > have this line in every bit of code. > > I think the way to do it would be to somehow provide either behaviour > without additional code, e.g. as an argument to some __init__. > > I also think that the combined should be default, with the query string > overriding posted data. +1. Bill From janssen at parc.com Mon Nov 3 14:48:45 2003 From: janssen at parc.com (Bill Janssen) Date: Mon Nov 3 14:49:06 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: Your message of "Mon, 03 Nov 2003 08:19:49 PST." <20031103111056.K58482@onyx.ispol.com> Message-ID: <03Nov3.114849pst."58611"@synergy1.parc.xerox.com> Note that GET requests can also have bodies, and these can contain data in multipart/form-data format. I think it's clearer to speak of the two forms as body data and query data, rather than form data and query data. Bill From grisha at modpython.org Mon Nov 3 15:29:25 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Mon Nov 3 15:29:34 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> Message-ID: <20031103150625.H60270@onyx.ispol.com> On Mon, 3 Nov 2003, Ian Bicking wrote: > On Nov 3, 2003, at 9:07 AM, Gregory (Grisha) Trubetskoy wrote: > > > > request.form(query_overrides=1) <-- returns both > > request.form.postdata() > > request.form.querydata() > > Seems a little long-winded. How about request.formdata, .postdata, > .querydata, where .formdata is postdata+querydata? (In practice most > people use the combined version) I'd say -1 on postdata and querydata, because request.postdata sounds to me like the body of the POST and querydata sounds like the "stuff after question mark". The word "form" has to be in there. > They could be proper dictionary-like objects. Of course. To elaborate on what I had in mind (the above, btw, has an error - form() cannot be both a method and an object): request.form() would return a mapping object (aka dictionary-like) containing both post and query data. [At this point I'd like to backtrack on my prior statement - query data should not override post data or vice versa, they should probably be combined, just like they would if there were multiple form inputs by the same name - so perhaps we don't even need to override option at all] If you want only post data, you'd need to call request.form().postdata() So most everyone's code would look like: myform = request.form() While some people may use: myform = request.form().postdata() Also, form() *has* to be a method for this reason: In case of POST, availability of form data implies that something consumes (reads) the request. Some people would prefer to read it themselves. A (first) call to form() would trigger this action, after which there wouldn't be anything to read. Otherwise you could read it with request.read(). Grisha From ianb at colorstudy.com Mon Nov 3 15:36:43 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Nov 3 15:36:47 2003 Subject: [Web-SIG] Client-side support - webunit is back :) In-Reply-To: <200310251408.13911.richardjones@optushome.com.au> References: <200310251408.13911.richardjones@optushome.com.au> Message-ID: <6F8B1CB2-0E3D-11D8-BD47-000393C2D67E@colorstudy.com> On Oct 24, 2003, at 11:08 PM, Richard Jones wrote: > [sorry, I'm not subscribed to this list - I simply don't have the spare > cycles] > > I noticed some archive messages saying webunit code was off the air. > I've been > migrating my website, and the code's back now. See webunit's PyPI page > for > info: > > > http://www.python.org/pypi?:action=display&name=webunit&version=1.3.3 > > and the code is at: > > http://mechanicalcat.net/tech/webunit/ Is this the same code line as webunit.sf.net? I think that might have been what people were referring to. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Mon Nov 3 15:53:00 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Nov 3 15:53:07 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031103150625.H60270@onyx.ispol.com> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> <20031103150625.H60270@onyx.ispol.com> Message-ID: On Nov 3, 2003, at 2:29 PM, Gregory (Grisha) Trubetskoy wrote: > On Mon, 3 Nov 2003, Ian Bicking wrote: > >> On Nov 3, 2003, at 9:07 AM, Gregory (Grisha) Trubetskoy wrote: >>> >>> request.form(query_overrides=1) <-- returns both >>> request.form.postdata() >>> request.form.querydata() >> >> Seems a little long-winded. How about request.formdata, .postdata, >> .querydata, where .formdata is postdata+querydata? (In practice most >> people use the combined version) > > I'd say -1 on postdata and querydata, because request.postdata sounds > to > me like the body of the POST and querydata sounds like the "stuff after > question mark". The word "form" has to be in there. Sure, I don't actually like the *data names that much. Another name might be "field" -- I think in some contexts field is clearer, because "form" has some other concepts associated with it (method and action being the most obvious). >> They could be proper dictionary-like objects. > > Of course. To elaborate on what I had in mind (the above, btw, has an > error - form() cannot be both a method and an object): > > request.form() would return a mapping object (aka dictionary-like) > containing both post and query data. > > [At this point I'd like to backtrack on my prior statement - query data > should not override post data or vice versa, they should probably be > combined, just like they would if there were multiple form inputs by > the > same name - so perhaps we don't even need to override option at all] I generally agree, but the implementation would be easier if one overrides the other, like: def __getitem__(self, name): try: return self.query[name] except KeyError: return self.post[name] But otherwise: def __getitem__(self, name): if self.query.has_key(name): value = self.query[name] if self.post.has_key(name): postvalue = self.post[name] if isinstance(value, list): if isinstance(postvalue, list): return value + postvalue else: return value + [postvalue] else: if isinstance(postvalue, list): return [value] + postvalue else: return [value, postvalue] else: return value else: return self.post[name] It's not *that* bad, but it's a little annoying. At least it's complete (i.e., information neither hidden nor lost). So I think the less-than-elegant implementation isn't so bad. After all, it could be the simpler: def __getitem__(self, name): value = self.query.getlist(name) + self.post.getlist(name) if not value: raise KeyError, "..." elif len(value) == 1: return value[0] else: return value > In case of POST, availability of form data implies that something > consumes > (reads) the request. Some people would prefer to read it themselves. A > (first) call to form() would trigger this action, after which there > wouldn't be anything to read. Otherwise you could read it with > request.read(). I forget how exactly cgi works right now, Webware has tried to get this right but I'm not sure if it has. I think it consumes the request body if it's valid data that can be parsed (maybe even in spite of the content-type of the request), but otherwise leaves it intact and sets no fields. If it is valid data that can be parsed into fields, maybe it's not so bad if the body is lost, because all the information remains. If you have an option to keep the data, I'd just include it in the constructor -- parsing it lazily (and thus throwing away the body lazily) seems error-prone. If it can't be parsed into fields, then certainly it should be available in some other form. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Mon Nov 3 15:56:10 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 15:56:21 2003 Subject: [Web-SIG] Client-side support - webunit is back :) In-Reply-To: <6F8B1CB2-0E3D-11D8-BD47-000393C2D67E@colorstudy.com> References: <200310251408.13911.richardjones@optushome.com.au> <6F8B1CB2-0E3D-11D8-BD47-000393C2D67E@colorstudy.com> Message-ID: On Mon, 3 Nov 2003, Ian Bicking wrote: > On Oct 24, 2003, at 11:08 PM, Richard Jones wrote: [...] > > and the code is at: > > > > http://mechanicalcat.net/tech/webunit/ > > Is this the same code line as webunit.sf.net? [...] No. John From jjl at pobox.com Mon Nov 3 16:00:21 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 16:00:32 2003 Subject: [Web-SIG] Client-side support - webunit is back :) In-Reply-To: <200310251408.13911.richardjones@optushome.com.au> References: <200310251408.13911.richardjones@optushome.com.au> Message-ID: On Sat, 25 Oct 2003, Richard Jones wrote: > [sorry, I'm not subscribed to this list - I simply don't have the spare > cycles] [...] > http://mechanicalcat.net/tech/webunit/ [...] Richard, it's a shame there are two Python webunits now. Couldn't you change the name to make it a bit less confusing? John From ianb at colorstudy.com Mon Nov 3 16:14:29 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Nov 3 16:14:36 2003 Subject: [Web-SIG] Client-side support - webunit is back :) In-Reply-To: References: <200310251408.13911.richardjones@optushome.com.au> Message-ID: On Nov 3, 2003, at 3:00 PM, John J Lee wrote: > On Sat, 25 Oct 2003, Richard Jones wrote: >> [sorry, I'm not subscribed to this list - I simply don't have the >> spare >> cycles] > [...] >> http://mechanicalcat.net/tech/webunit/ > [...] > > Richard, it's a shame there are two Python webunits now. Couldn't you > change the name to make it a bit less confusing? Or, alternately, figure out something with the author of the original webunit. Maybe he's fine with someone taking over what is otherwise a perfectly good name -- if there was a note/link from webunit.sf.net that would probably clear up any ambiguity. The other webunit isn't that fully developed, and has been superceded by other projects, so I don't think anything more is going to become of it. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Mon Nov 3 16:17:21 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 3 16:17:48 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031103150625.H60270@onyx.ispol.com> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> <20031103150625.H60270@onyx.ispol.com> Message-ID: On Mon, 3 Nov 2003, Gregory (Grisha) Trubetskoy wrote: [...] > I'd say -1 on postdata and querydata, because request.postdata sounds to > me like the body of the POST and querydata sounds like the "stuff after > question mark". The word "form" has to be in there. But the query mapping doesn't necessarily *contain* any form data! Maybe querymap and postmap? queryfields and postfields? queryinfo and postinfo? Surely we can agree on *something*! [...] > Also, form() *has* to be a method for this reason: True (well, not *essential* in Python, but the sanest option). > In case of POST, availability of form data implies that something consumes > (reads) the request. Some people would prefer to read it themselves. A [...] So why not have .formdata(), .querydata() and .postdata() methods (however you want to spell them) on request, rather than methods on whatever object .formdata() returns? John From grisha at modpython.org Mon Nov 3 17:25:42 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Mon Nov 3 17:25:46 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> <20031103150625.H60270@onyx.ispol.com> Message-ID: <20031103172504.J64608@onyx.ispol.com> On Mon, 3 Nov 2003, John J Lee wrote: > On Mon, 3 Nov 2003, Gregory (Grisha) Trubetskoy wrote: > [...] > > I'd say -1 on postdata and querydata, because request.postdata sounds to > > me like the body of the POST and querydata sounds like the "stuff after > > question mark". The word "form" has to be in there. > > But the query mapping doesn't necessarily *contain* any form data! Yes, which is why there should also be request.query Grisha From davidf at sjsoft.com Tue Nov 4 07:14:23 2003 From: davidf at sjsoft.com (David Fraser) Date: Tue Nov 4 07:14:29 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> <20031103150625.H60270@onyx.ispol.com> Message-ID: <3FA7981F.1030103@sjsoft.com> Ian Bicking wrote: > On Nov 3, 2003, at 2:29 PM, Gregory (Grisha) Trubetskoy wrote: > >> In case of POST, availability of form data implies that something >> consumes >> (reads) the request. Some people would prefer to read it themselves. A >> (first) call to form() would trigger this action, after which there >> wouldn't be anything to read. Otherwise you could read it with >> request.read(). > > > I forget how exactly cgi works right now, Webware has tried to get > this right but I'm not sure if it has. I think it consumes the > request body if it's valid data that can be parsed (maybe even in > spite of the content-type of the request), but otherwise leaves it > intact and sets no fields. > > If it is valid data that can be parsed into fields, maybe it's not so > bad if the body is lost, because all the information remains. If you > have an option to keep the data, I'd just include it in the > constructor -- parsing it lazily (and thus throwing away the body > lazily) seems error-prone. If it can't be parsed into fields, then > certainly it should be available in some other form. However, we should deal with uploaded files differently here - they could be huge! You dno't want them read in automatically. David From janssen at parc.com Tue Nov 4 14:52:44 2003 From: janssen at parc.com (Bill Janssen) Date: Tue Nov 4 14:53:07 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: Your message of "Tue, 04 Nov 2003 04:14:23 PST." <3FA7981F.1030103@sjsoft.com> Message-ID: <03Nov4.115246pst."58611"@synergy1.parc.xerox.com> > However, we should deal with uploaded files differently here - they > could be huge! You dno't want them read in automatically. Well, uploaded files appear as data in-line in the request body. You have to read them from the connection to clear the request (and possibly to get to other data which follows the uploaded file). The question is whether you want to pass them around as 200-MB strings, or whether you want to write them to a temp file and pass the file around. Bill From t.vandervossen at fngtps.com Tue Nov 4 17:00:50 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Tue Nov 4 17:01:09 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <03Nov4.115246pst."58611"@synergy1.parc.xerox.com> References: <03Nov4.115246pst."58611"@synergy1.parc.xerox.com> Message-ID: <3FA82192.3040708@fngtps.com> Bill Janssen wrote: >>However, we should deal with uploaded files differently here - they >>could be huge! You dno't want them read in automatically. > > Well, uploaded files appear as data in-line in the request body. You > have to read them from the connection to clear the request (and > possibly to get to other data which follows the uploaded file). The > question is whether you want to pass them around as 200-MB strings, or > whether you want to write them to a temp file and pass the file > around. Writing to temp files is probably what you want most of the time, but for some applications it can be usefull to directly read the posted file from the connection. Examples of this include feeding a event-based parser with a large xml file of being able to display an upload indicator for (multiple) file uploads. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From jjl at pobox.com Tue Nov 4 18:48:57 2003 From: jjl at pobox.com (John J Lee) Date: Tue Nov 4 18:49:07 2003 Subject: [Web-SIG] Client-side support - webunit is back :) In-Reply-To: <200311050945.52750.richardjones@optushome.com.au> References: <200310251408.13911.richardjones@optushome.com.au> <200311050945.52750.richardjones@optushome.com.au> Message-ID: On Wed, 5 Nov 2003, Richard Jones wrote: [...] > Heh. It looks like both codebases have been around for about the same time too > :) > > I'll look into changing the name of my codebase. You're obviously more mature than I am -- I'd end up squabbling over who should keep the cute name ;-) John From stuart at stuartbishop.net Tue Nov 4 20:01:54 2003 From: stuart at stuartbishop.net (Stuart Bishop) Date: Tue Nov 4 20:02:10 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: <3FA0940C.2080301@sjsoft.com> References: <21294262503873@dserver.cycla.com> <20031029195318.C82536@onyx.ispol.com> <3FA0940C.2080301@sjsoft.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 30/10/2003, at 3:31 PM, David Fraser wrote: > A lot of the arguments for the dual object model are about what you > can do with a separate object. > But these seem to me to miss the point .... you can create your own > "response"-type class that holds the *value* of a response, and as > many instances of it per request as you want to. But the actual Web > API response object is for *writing* the response back to the client. > You can only write one response back per request, so it makes sense > for them to be the same object. The "response"-type class is the interesting bit - the API for setting status codes, headers, cookies etc. And you do want multiples, of which only one is sent to the client. In particular, if you catch an exception and are preparing an error message you will want a clean response to work with rather than, for example, accidently sending your error message with the wrong content-type. The alternative would be a reset() method on the response buffer, although this isn't as flexible. def handler(request): try: response = Reponse(request) filename = response.request.getFirst('filename') response.headers['Content-Type'] = 'image/jpeg' response.cookies['latest'] = filename response.write(open(filename,'rb').read()) # A filelike object except IOError: response = Response(request) response.status = 404 print >> response, 'File not found' response.close() # No more data - compute content-length header response.send() # Send to client. In this example, response is a file like object that buffers the document's content and is explicitly sent to the client. The use of a close method before the send would allow you to use response.send(somedata) or combinations of response.write(somedata) and response.send() to stream unbuffered content to the client. As others have suggested, the send method could just as easily be a method of the request object (such as the case in Zope), although I personally prefer response.send() rather than request.send(response) or 'request.response = response; request.send()' - -- Stuart Bishop ? http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/qEwHAfqZj7rGN0oRAgYYAKCV2Qpbm1M28pdgKWBTBPY+scYc5wCeLkiE cWXlE+pjB+RNHUFbleLkQpk= =gzez -----END PGP SIGNATURE----- From gward at python.net Tue Nov 4 21:01:15 2003 From: gward at python.net (Greg Ward) Date: Tue Nov 4 21:01:19 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <3FA7981F.1030103@sjsoft.com> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> <20031103150625.H60270@onyx.ispol.com> <3FA7981F.1030103@sjsoft.com> Message-ID: <20031105020115.GA28244@cthulhu.gerg.ca> On 04 November 2003, David Fraser said: > However, we should deal with uploaded files differently here - they > could be huge! You dno't want them read in automatically. I'm pretty happy with the solution I came up with for Quixote 0.5.1: a subclass of HTTPRequest, HTTPUploadRequest, specialized to handle "multipart/form-data" requests (which are mainly used for uploads, hence the name of the class). From upload.py in the Quixote distribution: class HTTPUploadRequest (HTTPRequest): """ Represents a single HTTP request with Content-Type "multipart/form-data", which is used for HTTP uploads. (It's actually possible for any HTML form to specify an encoding type of "multipart/form-data", even if there are no file uploads in that form. In that case, you'll still get an HTTPUploadRequest object -- but since this is a subclass of HTTPRequest, that shouldn't cause you any problems.) When processing the upload request, any uploaded files are stored under a temporary filename in the directory specified by the 'upload_dir' instance attribute (which is normally set, by Publisher, from the UPLOAD_DIR configuration variable). HTTPUploadRequest then creates an Upload object which contains the various filenames for this upload. Other form variables are stored as usual in the 'form' dictionary, to be fetched later with get_form_var(). Uploaded files can also be accessed via get_form_var(), which returns the Upload object created at upload-time, rather than a string. Eg. if your upload form contains this: then, when processing the form, you might do this: upload = request.get_form_var("upload") after which you could open the uploaded file immediately: file = open(upload.tmp_filename) or move it to a more permanent home before doing anything with it: permanent_name = os.path.join(permanent_upload_dir, upload.base_filename) os.rename(upload.tmp_filename, permanent_name) """ Even though this design was fairly strongly motivated by backwards compatibility concerns, it turns out to be pretty neat and elegant. The request body isn't read until Quixote's Publisher calls request.process_inputs(), which means that Quixote can still return certain types of error response (mainly 404 "not found" or 403 "access denied") before it reads that potentially huge upload. And the uploaded file is written to disk with a secure temporary name, so the application can rename it, read it, or whatever without worrying about sudden leaps in memory consumption. Greg -- Greg Ward http://www.gerg.ca/ Hand me a pair of leather pants and a CASIO keyboard -- I'm living for today! From gward at python.net Tue Nov 4 21:03:32 2003 From: gward at python.net (Greg Ward) Date: Tue Nov 4 21:03:35 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031103094844.V58482@onyx.ispol.com> References: <200311030919.32650.t.vandervossen@fngtps.com> <20031103094844.V58482@onyx.ispol.com> Message-ID: <20031105020332.GB28244@cthulhu.gerg.ca> On 03 November 2003, Gregory (Grisha) Trubetskoy said: > The problem here is that this would work great for someone who believes in > separation, but someone who wants these things compbined would need to > have this line in every bit of code. > > I think the way to do it would be to somehow provide either behaviour > without additional code, e.g. as an argument to some __init__. > > I also think that the combined should be default, +1 on the above > with the query string > overriding posted data. Hang on a sec. I thought everyone agreed that POST data should override query data. Am I badly misremembering, or did you have brainfart? Greg -- Greg Ward http://www.gerg.ca/ I appoint you ambassador to Fantasy Island!!! From grisha at modpython.org Tue Nov 4 22:09:10 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Tue Nov 4 22:09:14 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031105020332.GB28244@cthulhu.gerg.ca> References: <200311030919.32650.t.vandervossen@fngtps.com> <20031103094844.V58482@onyx.ispol.com> <20031105020332.GB28244@cthulhu.gerg.ca> Message-ID: <20031104215717.K95365@onyx.ispol.com> On Tue, 4 Nov 2003, Greg Ward wrote: > On 03 November 2003, Gregory (Grisha) Trubetskoy said: > > > with the query string > > overriding posted data. > > Hang on a sec. I thought everyone agreed that POST data should override > query data. Am I badly misremembering, or did you have brainfart? I've since changed my mind on it (it's in some later post). :-) My latest position is: [as opposed to overriding] they get combined in a list [same as multiple inputs with same name]. I also think that the order of items in the combined list should be documented as "undefined", i.e. "don't rely on it, dear developer". :-) Grisha From davidf at sjsoft.com Wed Nov 5 03:49:59 2003 From: davidf at sjsoft.com (David Fraser) Date: Wed Nov 5 03:50:14 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: <20031105020115.GA28244@cthulhu.gerg.ca> References: <20031103095935.A58482@onyx.ispol.com> <7FD7E854-0E19-11D8-92EF-000393C2D67E@colorstudy.com> <20031103150625.H60270@onyx.ispol.com> <3FA7981F.1030103@sjsoft.com> <20031105020115.GA28244@cthulhu.gerg.ca> Message-ID: <3FA8B9B7.8000705@sjsoft.com> Greg Ward wrote: >On 04 November 2003, David Fraser said: > > >>However, we should deal with uploaded files differently here - they >>could be huge! You dno't want them read in automatically. >> >> > >I'm pretty happy with the solution I came up with for Quixote 0.5.1: a >subclass of HTTPRequest, HTTPUploadRequest, specialized to handle >"multipart/form-data" requests (which are mainly used for uploads, hence >the name of the class). From upload.py in the Quixote distribution: > >class HTTPUploadRequest (HTTPRequest): > """ > Represents a single HTTP request with Content-Type > "multipart/form-data", which is used for HTTP uploads. (It's > actually possible for any HTML form to specify an encoding type of > "multipart/form-data", even if there are no file uploads in that > form. In that case, you'll still get an HTTPUploadRequest object -- > but since this is a subclass of HTTPRequest, that shouldn't cause > you any problems.) > > When processing the upload request, any uploaded files are stored > under a temporary filename in the directory specified by the > 'upload_dir' instance attribute (which is normally set, by > Publisher, from the UPLOAD_DIR configuration variable). > HTTPUploadRequest then creates an Upload object which contains the > various filenames for this upload. > > Other form variables are stored as usual in the 'form' dictionary, > to be fetched later with get_form_var(). Uploaded files can also be > accessed via get_form_var(), which returns the Upload object created > at upload-time, rather than a string. > > Eg. if your upload form contains this: > > > then, when processing the form, you might do this: > upload = request.get_form_var("upload") > > after which you could open the uploaded file immediately: > file = open(upload.tmp_filename) > > or move it to a more permanent home before doing anything with it: > permanent_name = os.path.join(permanent_upload_dir, > upload.base_filename) > os.rename(upload.tmp_filename, permanent_name) > """ > >Even though this design was fairly strongly motivated by backwards >compatibility concerns, it turns out to be pretty neat and elegant. The >request body isn't read until Quixote's Publisher calls >request.process_inputs(), which means that Quixote can still return >certain types of error response (mainly 404 "not found" or 403 "access >denied") before it reads that potentially huge upload. And the uploaded >file is written to disk with a secure temporary name, so the application >can rename it, read it, or whatever without worrying about sudden leaps >in memory consumption. > > Greg > > This sounds great. Since the requirement here is just to define an API, we wouldn't need to define the actual mechanism (such as using temporary files) as long as we define a clear method to access those files. But this would be a great basis for that David From grisha at modpython.org Wed Nov 5 11:34:05 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Nov 5 11:34:16 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: References: <21294262503873@dserver.cycla.com> <20031029195318.C82536@onyx.ispol.com> <3FA0940C.2080301@sjsoft.com> Message-ID: <20031105104509.Q2712@onyx.ispol.com> On Wed, 5 Nov 2003, Stuart Bishop wrote: > The "response"-type class is the interesting bit - the API for setting > status codes, headers, cookies etc. And you do want multiples, of > which only one is sent to the client. In particular, if you catch an > exception and are preparing an error message you will want a clean > response to work with rather than, for example, accidently sending > your error message with the wrong content-type. The alternative would > be a reset() method on the response buffer, although this isn't as > flexible. > > def handler(request): > try: > response = Reponse(request) > filename = response.request.getFirst('filename') > response.headers['Content-Type'] = 'image/jpeg' > response.cookies['latest'] = filename > response.write(open(filename,'rb').read()) # A filelike object > except IOError: > response = Response(request) > response.status = 404 > print >> response, 'File not found' > response.close() # No more data - compute content-length header > response.send() # Send to client. The functional equivalent of the above would look like this in mod_python. def handler(req): req.content_type = 'image/jpeg' try: req.sendfile(req.filename) except IOError: return apache.HTTP_NOT_FOUND return apache.OK 1. This is a pretty good example of the fact that dual objects don't do much other than introduce extra typing. 2. This is too low level of an example: The specifics of how an HTTP error is handled are going to vary from server to server - e.g. Apache httpd will furnish it's own error text. (BTW, HTTP errors shouldn't happen if your application is written well.) Response.close() and response.send() also assume too much control over the server. Unless we abstract completely by providing our own buffering (which would do little other than introduce inefficiency), the buffering is handled by the server, and whether and when content-length is set depends on encoding used (chunked doesn't need content-length), which is something also best left for the server to decide. Whatever we come up with needs to be at a higher level. having flush() would be appropriate I think. sendfile() is another nice thing to have - if the environment has a native implementation (e.g. mod_python), then it could be used, otherwise it'd just be req.write(open(file)read()) Grisha From ianb at colorstudy.com Wed Nov 5 12:08:44 2003 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Nov 5 12:08:50 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: <20031105104509.Q2712@onyx.ispol.com> References: <21294262503873@dserver.cycla.com> <20031029195318.C82536@onyx.ispol.com> <3FA0940C.2080301@sjsoft.com> <20031105104509.Q2712@onyx.ispol.com> Message-ID: On Nov 5, 2003, at 10:34 AM, Gregory (Grisha) Trubetskoy wrote: > On Wed, 5 Nov 2003, Stuart Bishop wrote: >> The "response"-type class is the interesting bit - the API for setting >> status codes, headers, cookies etc. And you do want multiples, of >> which only one is sent to the client. In particular, if you catch an >> exception and are preparing an error message you will want a clean >> response to work with rather than, for example, accidently sending >> your error message with the wrong content-type. The alternative would >> be a reset() method on the response buffer, although this isn't as >> flexible. >> >> def handler(request): >> try: >> response = Reponse(request) >> filename = response.request.getFirst('filename') >> response.headers['Content-Type'] = 'image/jpeg' >> response.cookies['latest'] = filename >> response.write(open(filename,'rb').read()) # A filelike object >> except IOError: >> response = Response(request) >> response.status = 404 >> print >> response, 'File not found' >> response.close() # No more data - compute content-length header >> response.send() # Send to client. > > > The functional equivalent of the above would look like this in > mod_python. > > def handler(req): > > req.content_type = 'image/jpeg' > > try: > req.sendfile(req.filename) > except IOError: > return apache.HTTP_NOT_FOUND > > return apache.OK > > > 1. This is a pretty good example of the fact that dual objects don't do > much other than introduce extra typing. Dual objects avoid something like "req.content_type = 'image/jpeg'", which is not just a misnomer, but confusing and ambiguous, because both request and response have a content type. > 2. This is too low level of an example: > > The specifics of how an HTTP error is handled are going to vary from > server to server - e.g. Apache httpd will furnish it's own error text. > (BTW, HTTP errors shouldn't happen if your application is written > well.) Of course applications should return errors. 404 is common, 401 and 403 are entirely reasonable, and 400 is a reasonable way to respond to unexpected input; 30x errors are obviously okay, and fit into an overall framework of exceptions. And WebDAV servers have to set the error response very specifically, including the body of the response. A boilerplate message is fine when nothing else is specified, but there should exist the possibility of setting the message in your application. > Response.close() and response.send() also assume too much control over > the > server. Unless we abstract completely by providing our own buffering > (which would do little other than introduce inefficiency), the > buffering > is handled by the server, and whether and when content-length is set > depends on encoding used (chunked doesn't need content-length), which > is > something also best left for the server to decide. Some sort of buffering is probably necessary if we want to be able to add headers after some of the body has been created. This is a common practice. Raising an exception in the middle of creating the body should also be handled gracefully. I think it's okay to make an exception when someone explicitly says they want to stream the response, but for most pages it doesn't matter. > Whatever we come up with needs to be at a higher level. > > having flush() would be appropriate I think. > > sendfile() is another nice thing to have - if the environment has a > native implementation (e.g. mod_python), then it could be used, > otherwise > it'd just be req.write(open(file)read()) Yes, that's a good idea to have. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From jjl at pobox.com Wed Nov 5 12:10:57 2003 From: jjl at pobox.com (John J Lee) Date: Wed Nov 5 12:11:06 2003 Subject: [Web-SIG] [client side] RFC 2616 HTTP header parsing, Digest auth, and cookies In-Reply-To: <200310310628.h9V6SYdw023795@localhost.localdomain> References: <200310310628.h9V6SYdw023795@localhost.localdomain> Message-ID: First, Greg (Stein): do you intend to do the necessary work (whatever that is -- ?) to get httpx into Python 2.4? There seem to be a whole bunch of functions around which all parse HTTP headers like foo=bar; spam="e g\"g;s,", another=header missing=semicolon, lonetoken; missingvalue= Concatenated header values are separated by commas. Optional semicolons separate key/value pairs. Values are optionally quoted. Quoted values may contain commas, semicolons and (\-quoted) quotes. Values may be empty (foo=) or missing (foo). Not sure if this is defined formally somewhere in RFC 2616. I may have the details wrong. Something like this is needed by both digest auth and cookie handling. urllib2 defines (broken) functions to do this, and I think Anthony, Greg S. and me all have functions to do this job. I don't see why they can't all be merged. For cookies, a restricted version of the RFC 2616 format will do (I don't need to worry about missing semicolons). [I also need a weird function to handle Netscape cookies, but that's another story.] Do either of you have a function that you think will do most of this? I have a rather liberal, not very beautiful, implementation that uses regexps, ported from Perl. I don't care much what gets used, as long as we don't end up with four or five separate functions! John From jjl at pobox.com Wed Nov 5 12:16:34 2003 From: jjl at pobox.com (John J Lee) Date: Wed Nov 5 12:16:51 2003 Subject: Server-side too? [was: Re: [Web-SIG] [client side] RFC 2616 ...] In-Reply-To: References: <200310310628.h9V6SYdw023795@localhost.localdomain> Message-ID: On Wed, 5 Nov 2003, John J Lee wrote: [...] > urllib2 defines (broken) functions to do this, and I think Anthony, Greg > S. and me all have functions to do this job. I don't see why they can't > all be merged. [...] Are there server-side functions that could be merged here, too? John From sholden at holdenweb.com Wed Nov 5 12:21:04 2003 From: sholden at holdenweb.com (Steve Holden) Date: Wed Nov 5 12:26:19 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: Message-ID: [Ian Bicking] > On Nov 5, 2003, at 10:34 AM, Gregory (Grisha) Trubetskoy wrote: > > On Wed, 5 Nov 2003, Stuart Bishop wrote: > >> The "response"-type class is the interesting bit - the API > for setting > >> status codes, headers, cookies etc. And you do want multiples, of > >> which only one is sent to the client. In particular, if > you catch an > >> exception and are preparing an error message you will want a clean > >> response to work with rather than, for example, accidently sending > >> your error message with the wrong content-type. The > alternative would > >> be a reset() method on the response buffer, although this isn't as > >> flexible. > >> > >> def handler(request): > >> try: > >> response = Reponse(request) > >> filename = response.request.getFirst('filename') > >> response.headers['Content-Type'] = 'image/jpeg' > >> response.cookies['latest'] = filename > >> response.write(open(filename,'rb').read()) # A > filelike object > >> except IOError: > >> response = Response(request) > >> response.status = 404 > >> print >> response, 'File not found' > >> response.close() # No more data - compute content-length header > >> response.send() # Send to client. > > > > > > The functional equivalent of the above would look like this in > > mod_python. > > > > def handler(req): > > > > req.content_type = 'image/jpeg' > > > > try: > > req.sendfile(req.filename) > > except IOError: > > return apache.HTTP_NOT_FOUND > > > > return apache.OK > > > > > > 1. This is a pretty good example of the fact that dual > objects don't do > > much other than introduce extra typing. > > Dual objects avoid something like "req.content_type = 'image/jpeg'", > which is not just a misnomer, but confusing and ambiguous, > because both > request and response have a content type. > But in mod_python a request had both headers_in and headers_out, so the confusion is less. There's also err_headers_out, used only when an error occurs. For some reason Apache special-cases the Content-Type header. It's a bit perverse to suggest we might achieve anything sensible by changing the _request_ Content-Type, though, or am I missing the point? > > 2. This is too low level of an example: > > > > The specifics of how an HTTP error is handled are going to vary from > > server to server - e.g. Apache httpd will furnish it's own > error text. > > (BTW, HTTP errors shouldn't happen if your application is written > > well.) > > Of course applications should return errors. 404 is common, 401 and > 403 are entirely reasonable, and 400 is a reasonable way to > respond to > unexpected input; 30x errors are obviously okay, and fit into an > overall framework of exceptions. And WebDAV servers have to set the > error response very specifically, including the body of the response. > > A boilerplate message is fine when nothing else is specified, > but there > should exist the possibility of setting the message in your > application. > I agree it's desirable to have control over error handling, to the extent of providing a tailor-made page with the application's look-and-feel where appropriate. Errors will always occur, but the fact that it's a 404 is irrelevant to many users, and meaningless to even more. Sensible error messages would be better. [...] > > > Whatever we come up with needs to be at a higher level. > > > having flush() would be appropriate I think. > > > > sendfile() is another nice thing to have - if the environment has a > > native implementation (e.g. mod_python), then it could be used, > > otherwise > > it'd just be req.write(open(file)read()) > > Yes, that's a good idea to have. > Clearly a good idea to optimize the satic content case! regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From neel at mediapulse.com Wed Nov 5 12:30:04 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Wed Nov 5 12:30:14 2003 Subject: [Web-SIG] Request and Response objects Message-ID: > 1. This is a pretty good example of the fact that dual > objects don't do > much other than introduce extra typing. I think there is a lot of gain in dual objects, but I don't think anyone is seeing the full idea yet - to blinded by low level details and how it's always been done. > 2. This is too low level of an example: Argee. I'd want to see something more like the following: # if we have a ?type=pdf argument, send the response as a pdf # otherwise just send the html def handler(request, response): if request.args.get('type') == 'pdf': response.body = html_to_pdf(request.filename) response.headers["Content-type"] = 'appliaction/pdf' else: response.body = open(request.filename).read() Notice I didn't do anythin about sending. IMHO I should have to, the "server" has the request object and the response object, it should be able to send everything on it's own. If I need to send for some odd reason, I can overide that method for the server (borrowing again the apache concept of phases to handle a request). Now the html_to_pdf is a function I would have to develop, but it would be nice if there was a response object that did this already. So that might looks something like: def handler(request, response): if request.args.get('type') == 'pdf': response = PDFResponse(response) response.convert(request.filename) else: response.body = open(request.filename).read() The cool part is this html-to-pdf response object will work in medusa, mod_python, cgi, etc (after everyone conforms to the standard of course =D ). Here I changed response objects mid stream, but I also see an option where I tell the server which classes to use by default. # This time I have a XML data store I want to put online # I have written and XLST that will be applied to convert the data to html # The response class XLSTResponse does the needed dirty work for me class MyHTTPServer(HTTPServer): # init is called only once per server instance def init(request, response): response.load_xlst('my_style_sheet.xlst') # handler is the content handler, called once per request def handler(request, response): response.parse(request.filename) server = MyHTTPServer(response_class=XLSTResponse) server.run() Server configuration will of cource differ from server to server, mod_python would probably take the class change though an httpd.conf directive. But the parts I like here are again that I can take the response class to any server and also developers can focus only on the side of the coin they are concerned with. All I have to do to make a legit response class is have my class be able to return a complete RFC complient HTTP response - I am free from having to worry about connection details and parsing the request. this will make it much eaiser I think to take existing code, say the ezt template system, and wrap it to make it an HTTP response. It would also be pretty easy to replace the request object with one that does, say webdav for an example. I think if we explore this, other gains will be realised. Someone already mentioned pickling the different objects for debuging. We could have a method in the server called error_handler you could override that was called whenever another handler raised an error and place the pickle code there. (btw I think errors should be raised, not returned - it's more python like that way, and less typing). This could also solve some other issues on the list, such as how do you want your form data parsed - just use the request class that parses the data in a way you like. That may not be 100% feasible though, since there should be a standard way of passing a response object the requests's parsed data, if it's needed. Mike From grisha at modpython.org Wed Nov 5 12:49:05 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Nov 5 12:49:11 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: References: Message-ID: <20031105124134.T5123@onyx.ispol.com> On Wed, 5 Nov 2003, Steve Holden wrote: > But in mod_python a request had both headers_in and headers_out, so the > confusion is less. Yes, content_type is a special case in mod_python (see below) > There's also err_headers_out, used only when an error occurs. Actually, err_headers_out is *always* sent, headers_out is is sent *only when there is no error*. > For some reason Apache special-cases the Content-Type header. This is because Apache has AddOutputFiltersByType, which activates certain filters based on type. Because of this the content type can't just be set in headers, but must set via a function call, which will add filters accordingly. For our purposes req.headers_out["content-type"] = "blah/blah" should be fine. Grisha From cs1spw at bath.ac.uk Wed Nov 5 12:54:17 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Wed Nov 5 12:54:09 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: References: Message-ID: <3FA93949.3020508@bath.ac.uk> Michael C. Neel wrote: > This could also solve some other issues on the list, such as how do you > want your form data parsed - just use the request class that parses the > data in a way you like. That may not be 100% feasible though, since > there should be a standard way of passing a response object the > requests's parsed data, if it's needed. That's a really neat idea, if we can flesh out the details. It would at least provide a way of changing things like whether GET over-rides POST or vice-versa. It also leads on to opportunities like input/output filters implemented as custom request/response objects - a custom request object that tries to filter out bad input for example, or a custom response object that applies an XSLT stylesheet. Of course, stuff like that is almost certainly better handled by the application logic in the middle but it's still an interesting idea. -- Simon Willison Web development weblog: http://simon.incutio.com/ From grisha at modpython.org Wed Nov 5 12:59:06 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Nov 5 12:59:09 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: References: <21294262503873@dserver.cycla.com> <20031029195318.C82536@onyx.ispol.com> <3FA0940C.2080301@sjsoft.com> <20031105104509.Q2712@onyx.ispol.com> Message-ID: <20031105124913.M5123@onyx.ispol.com> On Wed, 5 Nov 2003, Ian Bicking wrote: > > (BTW, HTTP errors shouldn't happen if your application is written > > well.) > > Of course applications should return errors. 404 is common, 401 and > 403 are entirely reasonable, and 400 is a reasonable way to respond to > unexpected input; 30x errors are obviously okay, and fit into an > overall framework of exceptions. And WebDAV servers have to set the > error response very specifically, including the body of the response. I guess this depends on what we mean by "application". If I log in to my online banking, click on "confirm payment" and get 404 HTTP_NOT_FOUND, I will probably get on the phone with their customer service right away. On the other hand if I am trying to implement WebDAV, of course, an HTTP error is business as usual. Grisha From casey at zope.com Wed Nov 5 13:15:17 2003 From: casey at zope.com (Casey Duncan) Date: Wed Nov 5 13:17:56 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: <20031105124913.M5123@onyx.ispol.com> References: <21294262503873@dserver.cycla.com> <20031029195318.C82536@onyx.ispol.com> <3FA0940C.2080301@sjsoft.com> <20031105104509.Q2712@onyx.ispol.com> <20031105124913.M5123@onyx.ispol.com> Message-ID: <20031105131517.7ed6de56.casey@zope.com> On Wed, 5 Nov 2003 12:59:06 -0500 (EST) "Gregory (Grisha) Trubetskoy" wrote: > > > On Wed, 5 Nov 2003, Ian Bicking wrote: > > > > (BTW, HTTP errors shouldn't happen if your application is written > > > well.) > > > > Of course applications should return errors. 404 is common, 401 and > > 403 are entirely reasonable, and 400 is a reasonable way to respond to > > unexpected input; 30x errors are obviously okay, and fit into an > > overall framework of exceptions. And WebDAV servers have to set the > > error response very specifically, including the body of the response. > > I guess this depends on what we mean by "application". > > If I log in to my online banking, click on "confirm payment" and get 404 > HTTP_NOT_FOUND, I will probably get on the phone with their customer > service right away. Setting status 4xx/5xx does not preclude the server from returning a human-readable response entity body. In fact the HTTP spec says that servers should return one and that user-agents should display it. Of course, in some cases (like 401 unauthorized) a user-agent will choose to do something other than display the response body to the user. -Casey From davidf at sjsoft.com Wed Nov 5 16:23:28 2003 From: davidf at sjsoft.com (David Fraser) Date: Wed Nov 5 16:23:35 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: References: <21294262503873@dserver.cycla.com> <20031029195318.C82536@onyx.ispol.com> <3FA0940C.2080301@sjsoft.com> <20031105104509.Q2712@onyx.ispol.com> Message-ID: <3FA96A50.40007@sjsoft.com> Ian Bicking wrote: > On Nov 5, 2003, at 10:34 AM, Gregory (Grisha) Trubetskoy wrote: > >> On Wed, 5 Nov 2003, Stuart Bishop wrote: >> >>> The "response"-type class is the interesting bit - the API for setting >>> status codes, headers, cookies etc. And you do want multiples, of >>> which only one is sent to the client. In particular, if you catch an >>> exception and are preparing an error message you will want a clean >>> response to work with rather than, for example, accidently sending >>> your error message with the wrong content-type. The alternative would >>> be a reset() method on the response buffer, although this isn't as >>> flexible. >>> >>> def handler(request): >>> try: >>> response = Reponse(request) >>> filename = response.request.getFirst('filename') >>> response.headers['Content-Type'] = 'image/jpeg' >>> response.cookies['latest'] = filename >>> response.write(open(filename,'rb').read()) # A filelike object >>> except IOError: >>> response = Response(request) >>> response.status = 404 >>> print >> response, 'File not found' >>> response.close() # No more data - compute content-length header >>> response.send() # Send to client. >> >> >> >> The functional equivalent of the above would look like this in >> mod_python. >> >> def handler(req): >> >> req.content_type = 'image/jpeg' >> >> try: >> req.sendfile(req.filename) >> except IOError: >> return apache.HTTP_NOT_FOUND >> >> return apache.OK >> >> >> 1. This is a pretty good example of the fact that dual objects don't do >> much other than introduce extra typing. > > > Dual objects avoid something like "req.content_type = 'image/jpeg'", > which is not just a misnomer, but confusing and ambiguous, because > both request and response have a content type. > >> 2. This is too low level of an example: >> >> The specifics of how an HTTP error is handled are going to vary from >> server to server - e.g. Apache httpd will furnish it's own error text. >> (BTW, HTTP errors shouldn't happen if your application is written well.) > > > Of course applications should return errors. 404 is common, 401 and > 403 are entirely reasonable, and 400 is a reasonable way to respond to > unexpected input; 30x errors are obviously okay, and fit into an > overall framework of exceptions. And WebDAV servers have to set the > error response very specifically, including the body of the response. > > A boilerplate message is fine when nothing else is specified, but > there should exist the possibility of setting the message in your > application. > >> Response.close() and response.send() also assume too much control >> over the >> server. Unless we abstract completely by providing our own buffering >> (which would do little other than introduce inefficiency), the buffering >> is handled by the server, and whether and when content-length is set >> depends on encoding used (chunked doesn't need content-length), which is >> something also best left for the server to decide. > > > Some sort of buffering is probably necessary if we want to be able to > add headers after some of the body has been created. This is a common > practice. Raising an exception in the middle of creating the body > should also be handled gracefully. I think it's okay to make an > exception when someone explicitly says they want to stream the > response, but for most pages it doesn't matter. This sounds more application-level specifics to me. We're designing an API that will have to work with multiple servers > >> Whatever we come up with needs to be at a higher level. >> >> having flush() would be appropriate I think. >> >> sendfile() is another nice thing to have - if the environment has a >> native implementation (e.g. mod_python), then it could be used, >> otherwise >> it'd just be req.write(open(file)read()) > > > Yes, that's a good idea to have. From janssen at parc.com Wed Nov 5 16:54:42 2003 From: janssen at parc.com (Bill Janssen) Date: Wed Nov 5 16:55:10 2003 Subject: [Web-SIG] Random thoughts In-Reply-To: Your message of "Tue, 04 Nov 2003 19:09:10 PST." <20031104215717.K95365@onyx.ispol.com> Message-ID: <03Nov5.135443pst."58611"@synergy1.parc.xerox.com> > I also think that the order of items in the combined list should be > documented as "undefined", i.e. "don't rely on it, dear developer". :-) Yes, I think that's a good idea. Bill From anthony at interlink.com.au Wed Nov 5 22:17:24 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Nov 5 22:26:31 2003 Subject: [Web-SIG] Re: [client side] RFC 2616 HTTP header parsing, Digest auth, and cookies In-Reply-To: Message-ID: <200311060317.hA63HPsV000585@localhost.localdomain> >>> John J Lee wrote > I have a rather liberal, not very beautiful, implementation that uses > regexps, ported from Perl. There's something similar in the urllib2 code for this. I'd prefer something not based on regexps, as I think that they often lead to hard-to-read code. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From t.vandervossen at fngtps.com Thu Nov 6 02:26:33 2003 From: t.vandervossen at fngtps.com (Thijs van der Vossen) Date: Thu Nov 6 02:26:41 2003 Subject: [Web-SIG] Request and Response objects In-Reply-To: References: Message-ID: <200311060826.35643.t.vandervossen@fngtps.com> On Wednesday 05 November 2003 18:21, Steve Holden wrote: > I agree it's desirable to have control over error handling, to the > extent of providing a tailor-made page with the application's > look-and-feel where appropriate. Errors will always occur, but the fact > that it's a 404 is irrelevant to many users, and meaningless to even > more. Sensible error messages would be better. Yeah, but some of your users might not be human and they need the response code in the HTTP header to make sense of it all. ;-) Examples of this include Google, who will happily index all pages where you don't send a 404 in the header and web service interfaces and aggregators who depend on the response code to do the right thing (see: http://diveintomark.org/archives/2003/07/21/atom_aggregator_behavior_http_level And even something as simple as your web browser uses the 301 Permanent redirect to update your bookmarks. Any server-side web application api should surely have an easy way to set the response code yourself. Regards, Thijs -- Fingertips __ www.fngtps.com __ +31.(0)20.4896540 From jjl at pobox.com Thu Nov 6 20:14:26 2003 From: jjl at pobox.com (John J Lee) Date: Thu Nov 6 20:14:45 2003 Subject: [Web-SIG] Re: [client side] RFC 2616 HTTP header parsing, Digest auth, and cookies In-Reply-To: <200311060317.hA63HPsV000585@localhost.localdomain> References: <200311060317.hA63HPsV000585@localhost.localdomain> Message-ID: On Thu, 6 Nov 2003, Anthony Baxter wrote: [...] > There's something similar in the urllib2 code for this. [...] Yeah, though, as I said in my post: http://www.python.org/sf/735248 And the combination parse_http_list / parse_keqv_list isn't quite general enough for my (cookie) purposes. But, I see your Digest fixes have been checked in, and Greg's show no sign of moving ATM, and aren't usable for non-auth purposes anyway. I'll replace my function with a new one that uses parse_http_list. (And, presumably, Greg S.'s auth code will eventually obsolete the current urllib2 auth code.) John From jjl at pobox.com Sun Nov 9 19:13:37 2003 From: jjl at pobox.com (John J Lee) Date: Sun Nov 9 19:13:46 2003 Subject: [Web-SIG] Re: [client side] RFC 2616 HTTP header parsing, Digest auth, and cookies In-Reply-To: References: <200311060317.hA63HPsV000585@localhost.localdomain> Message-ID: On Fri, 7 Nov 2003, John J Lee wrote: > On Thu, 6 Nov 2003, Anthony Baxter wrote: > [...] > > There's something similar in the urllib2 code for this. > [...] > > Yeah, though, as I said in my post: > > http://www.python.org/sf/735248 It's worse than that: parse_http_list doesn't do \-quoting, either, which is broken for both Digest auth (its current use) and cookies. Not easy to fix, I think. [...] > non-auth purposes anyway. I'll replace my function with a new one that > uses parse_http_list. I'll have to stick with my old function. > (And, presumably, Greg S.'s auth code will eventually obsolete the current > urllib2 auth code.) So I guess there's no moral imperative on Anthony to fix parse_http_list. John From anthony at interlink.com.au Sun Nov 9 19:28:58 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Nov 9 19:29:22 2003 Subject: [Web-SIG] Re: [client side] RFC 2616 HTTP header parsing, Digest auth, and cookies In-Reply-To: Message-ID: <200311100028.hAA0SwfP028333@localhost.localdomain> >>> John J Lee wrote > > (And, presumably, Greg S.'s auth code will eventually obsolete the current > > urllib2 auth code.) > > So I guess there's no moral imperative on Anthony to fix > parse_http_list. I've not seen Greg's auth code, but I'm working on breaking out the digest auth code into a separate module. There's a bunch of non-HTTP protocols that can use digest auth (IMAP and SIP, to name two) so it makes sense. Is there a thorough and exhaustive list of the various magic variants that need to be supported by this code? Anthony -- Anthony Baxter It's never too late to have a happy childhood. From jtauber at jtauber.com Mon Nov 10 13:22:35 2003 From: jtauber at jtauber.com (James Tauber) Date: Mon Nov 10 13:22:41 2003 Subject: [Web-SIG] htmlgen In-Reply-To: <20031030120232.T98038@onyx.ispol.com> References: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com> <20031030120232.T98038@onyx.ispol.com> Message-ID: <20031110182235.A6D4F40649@server1.messagingengine.com> Here is something I wrote up a little while ago which builds up an approach to templating based on %(var)s ## 1. Simple string substitution can be achieved with the % operator: name = "Guido" print "Hello %s!" % name ## 2. ...which can be used with a dictionary: dict = {"name": "Guido"} ### print "Hello %(name)s!" % dict ### ## 3. The template can be a class, where the dictionary is passed into ## the constructor and __str__ is overridden to make the substitution: class Template: ### def __init__(self, dict): ### self.dict = dict ### def __str__(self): ### return "Hello %(name)s!" % self.dict ### print Template({"name": "Guido"}) ## 4. __getitem__ can be overridden to perform additional processing ## on values: class Template: def __init__(self, dict): self.dict = dict def __str__(self): return "Hello %(name)s!" % self ### def __getitem__(self, key): ### return self.dict[key].upper() ### print Template({"name": "Guido"}) ## 5. Processing can even be driven from the template itself, with ## %(...)s referencing a function to apply to the value: class Template: def __init__(self, dict): self.dict = dict def __str__(self): return "Hello %(name)s. Hello %(name|upper)s!" % self ### def __getitem__(self, key): l = key.split("|") ### if len(l) == 1: ### return self.dict[key] ### else: ### return apply(getattr(self, l[1]), [self.dict[l[0]]]) ### def upper(self, s): ### return s.upper() ### print Template({"name": "Guido"}) ## 6. Values in the dictionary can even be lists whose items are ## processed individually: class Template: def __init__(self, dict): self.dict = dict def __str__(self): return "

\n%(list|li)s" % self ### def __getitem__(self, key): l = key.split("|") if len(l) == 1: return self.dict[key] else: return apply(getattr(self, l[1]), [self.dict[l[0]]]) def li(self, l): ### return "".join(["

\n" % x for x in l]) ### print Template({"list": ["foo", "bar", "baz"]}) ## 7. The template can be moved into a class attribute: class Template: __template = """

\n%(list|li)s""" ### def __init__(self, dict): self.dict = dict def __str__(self): return Template.__template % self ### def __getitem__(self, key): l = key.split("|") if len(l) == 1: return self.dict[key] else: return apply(getattr(self, l[1]), [self.dict[l[0]]]) def li(self, l): return "".join(["

\n" % x for x in l]) print Template({"list": ["foo", "bar", "baz"]}) ## 8. In some cases, you may want a value to come from a method rather ## than the dictionary: class Template: def __template(self): ### return """

\n%(lst|li)s""" ### def __init__(self, dict={}): self.dict = dict def __str__(self): return self.__template() % self def __getitem__(self, key): return self.__process(key.split("|")) ### def __process(self, l): ### arg = l[0] ### if len(l) == 1: ### if arg in self.dict: ### return self.dict[arg] ### elif hasattr(self, arg) and callable(getattr(self, arg)): ### return apply(getattr(self, arg), []) ### else: ### raise "can't retrieve %s" % arg ### else: ### func = l[1] ### return apply(getattr(self, func), [self.__process([arg])])### def lst(self): ### return ["foo", "bar", "baz"] ### def li(self, l): return "".join(["

\n" % x for x in l]) print Template() ## 9. Now let's define a base template class and try multiple ## instances where we delegate formatting of the items to a ## different template than the overall list itself: # the base template taken from previous example class DictionaryTemplate: def __init__(self, dict={}): self.dict = dict def __str__(self): return self._template() % self def __getitem__(self, key): return self.__process(key.split("|")) def __process(self, l): arg = l[0] if len(l) == 1: if arg in self.dict: return self.dict[arg] elif hasattr(self, arg) and callable(getattr(self, arg)): return apply(getattr(self, arg), []) else: raise "can't retrieve %s" % arg else: func = l[1] return apply(getattr(self, func), [self.__process([arg])]) # template for individual items class LI_Template: ### __template = """

\n""" ### def __init__(self, input_list=[]): ### self.input_list = input_list ### def __str__(self): ### return "".join( ### [LI_Template.__template % x for x in self.input_list]) ### # template for overall list class UL_Template(DictionaryTemplate): ### def _template(self): ### return """

\n%(lst|li)s""" ### def lst(self): ### return ["foo", "bar", "baz"] ### def li(self, input_list): ### return LI_Template(input_list) ### print UL_Template() ## 10. Much of the LI_Template can be refactored into a base class ## that does for lists what DictionaryTemplate does for dictionaries: # assume class DictionaryTemplate exactly as before class ListTemplate: ### def __init__(self, input_list=[]): ### self.input_list = input_list ### def __str__(self): ### return "".join( ### [self._template() % x for x in self.input_list]) ### class LI_Template(ListTemplate): ### def _template(self): ### return """

\n""" ### class UL_Template(DictionaryTemplate): def _template(self): return """

\n%(lst|li)s""" def li(self, input_list): return LI_Template(input_list) print UL_Template({"lst": ["foo", "bar"]}) ## 11. We can make at least two more improvements to ## DictionaryTemplate. One is to allow keyword args to the ## constructor. The other is to change __process to support ## references to functions that are passed in (rather than being ## defined as methods): class DictionaryTemplate: def __init__(self, dict={}, **keywords): ### self.dict = dict self.dict.update(keywords) ### def __str__(self): return self._template() % self def __getitem__(self, key): return self.__process(key.split("|")) def __process(self, l): arg = l[0] if len(l) == 1: if arg in self.dict: return self.dict[arg] elif hasattr(self, arg) and callable(getattr(self, arg)): return apply(getattr(self, arg), []) else: raise "can't retrieve %s" % arg else: func_name = l[1] ### if func_name in self.dict: ### func = self.dict[func_name] ### else: ### func = getattr(self, func_name) ### return apply(func, [self.__process([arg])]) ### # assume ListTemplate as before class LI_Template(ListTemplate): def _template(self): return """

\n""" class UL_Template(DictionaryTemplate): def _template(self): return """

\n%(lst|li)s""" print UL_Template(lst=["foo", "bar", "baz", "biz"], li=LI_Template) ## 12. Here is an example which starts to show a slightly more ## involved template. # a list with no wrapper elements class NakedList_Template(DictionaryTemplate): def _template(self): return """%(lst|li)s""" # a template for an article class Article_Template(ListTemplate): def _template(self): return """

%(heading)s

%(date)s

%(abstract)s

Link

""" # the actual data articles = [ {"heading": "Article 1", "date": "2003-02-10", "abstract": "This is the first article.", "link": "http://example.com/article/1"}, {"heading": "Article 2", "date": "2003-02-13", "abstract": "This is the second article.", "link": "http://example.com/article/2"}] print NakedList_Template(lst=articles, li=Article_Template) James -- James Tauber http://jtauber.com/ From jjl at pobox.com Mon Nov 10 17:06:16 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 10 17:07:26 2003 Subject: [Web-SIG] cgi module doesn't process query string values with POST, old bug report (fwd) Message-ID: Forwarded message from c.l.py: > From: Irmen de Jong > Subject: cgi module doesn't process query string values with POST, old bug > report > Newsgroups: comp.lang.python > Date: Mon, 10 Nov 2003 22:50:30 +0100 > > Hello > > I've got a nuisance with the cgi module. (Python 2.3.2) > When processing a HTTP POST request, it ignores the > query string parameters that may also be present. > I.e. only the parameters from the POST body are processed. > > I've looked at a rather old bug report on SF; > http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=411612 > > but that bug is closed. The last comment is from Steve Holden, > and it says "...My approach will be to have the new functionality > depend on the provision of additional keyword arguments..." > > Can somebody comment on this? (Steve?) I can't seem to find any > of this logic in the current (2.3.2) cgi.py module. > > Is it in there somewhere or has this bug been forgotten? > > > I have now added some code myself after creating a FieldStorage > object, to parse any additional query args using cgi.parse_qsl. > This way any query args are added to my form fields, possibly > overwriting the fields that were sent in the POST body. > > But Steve's comment in the old bug report made me wonder > why the standard cgi module doesn't include this possibility. > > > --Irmen de Jong John From jjl at pobox.com Tue Nov 11 07:14:05 2003 From: jjl at pobox.com (John J Lee) Date: Tue Nov 11 07:14:11 2003 Subject: [Web-SIG] Re: [client side] RFC 2616 HTTP header parsing, Digest auth, and cookies In-Reply-To: <200311100028.hAA0SwfP028333@localhost.localdomain> References: <200311100028.hAA0SwfP028333@localhost.localdomain> Message-ID: On Mon, 10 Nov 2003, Anthony Baxter wrote: > >>> John J Lee wrote > > > (And, presumably, Greg S.'s auth code will eventually obsolete the current > > > urllib2 auth code.) > > > > So I guess there's no moral imperative on Anthony to fix > > parse_http_list. > > I've not seen Greg's auth code, but I'm working on breaking out the digest > auth code into a separate module. There's a bunch of non-HTTP protocols > that can use digest auth (IMAP and SIP, to name two) so it makes sense. Great, that would be perfect. > Is there a thorough and exhaustive list of the various magic variants that > need to be supported by this code? Since you're working on Digest again, Greg looks like he's made a good job of Digest parsing in httpx, so it would be better to use that instead of the urllib2 stuff (and maybe more of httpx too). http://cvs.sourceforge.net/viewcvs.py/python/python/nondist/sandbox/Lib/httpx.py Greg's _parse_challenges function does a proper job, and as a result is more specialised than the urllib2.parse_*_list functions. That means I can't use it for cookies, but that's not code duplication, so it's fine. John From sholden at holdenweb.com Tue Nov 11 08:57:46 2003 From: sholden at holdenweb.com (Steve Holden) Date: Tue Nov 11 09:01:25 2003 Subject: [Web-SIG] cgi module doesn't process query string values with POST, old bug report (fwd) In-Reply-To: Message-ID: [John J Lee] > > From: Irmen de Jong > > Subject: cgi module doesn't process query string values > with POST, old bug > > report > > Newsgroups: comp.lang.python > > Date: Mon, 10 Nov 2003 22:50:30 +0100 > > > > Hello > > > > I've got a nuisance with the cgi module. (Python 2.3.2) > > When processing a HTTP POST request, it ignores the > > query string parameters that may also be present. > > I.e. only the parameters from the POST body are processed. > > > > I've looked at a rather old bug report on SF; > > > http://sourceforge.net/tracker/?group_id=5470&atid=105470&func > =detail&aid=411612 > > > > but that bug is closed. The last comment is from Steve Holden, > > and it says "...My approach will be to have the new functionality > > depend on the provision of additional keyword arguments..." > > > > Can somebody comment on this? (Steve?) I can't seem to find any > > of this logic in the current (2.3.2) cgi.py module. > > > > Is it in there somewhere or has this bug been forgotten? > > > > > > I have now added some code myself after creating a FieldStorage > > object, to parse any additional query args using cgi.parse_qsl. > > This way any query args are added to my form fields, possibly > > overwriting the fields that were sent in the POST body. > > > > But Steve's comment in the old bug report made me wonder > > why the standard cgi module doesn't include this possibility. > > > > > > --Irmen de Jong > Maybe I need to update the comment: I discussed this with Guido and pointed out that the code seemed rather fragile. He agreed, and said that one of the problems with making any changes at all to the CGI module is the layer-upon-layer of fixing that has been done to what started out as fairly simple code. So I decided not to proceed with a patch,. but never annotated that fact. Does this list collectively feel it's worth trying to get something together that a) works nd b) doesn't cause any incompatibility problems or failed tests? regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From janssen at parc.com Tue Nov 11 18:03:57 2003 From: janssen at parc.com (Bill Janssen) Date: Tue Nov 11 18:04:21 2003 Subject: [Web-SIG] cgi module doesn't process query string values with POST, old bug report (fwd) In-Reply-To: Your message of "Tue, 11 Nov 2003 05:57:46 PST." Message-ID: <03Nov11.150401pst."58611"@synergy1.parc.xerox.com> There's an outstanding issue about how any new work would integrate with cgi, httplib, urllib, urllib2, etc. The current idea would be to create a new package, web, and various modules under it (client, server, tools, etc.), which would mainly be implemented by importing functionality from the existing web support modules. We could then replace the functionality in the new modules with new code in an incremental fashion, as necessary. Bill > Maybe I need to update the comment: I discussed this with Guido and > pointed out that the code seemed rather fragile. He agreed, and said > that one of the problems with making any changes at all to the CGI > module is the layer-upon-layer of fixing that has been done to what > started out as fairly simple code. From jjl at pobox.com Wed Nov 12 15:18:39 2003 From: jjl at pobox.com (John J Lee) Date: Wed Nov 12 15:18:56 2003 Subject: [Web-SIG] cgi module doesn't process query string values with POST, old bug report (fwd) In-Reply-To: <03Nov11.150401pst."58611"@synergy1.parc.xerox.com> References: <03Nov11.150401pst."58611"@synergy1.parc.xerox.com> Message-ID: On Tue, 11 Nov 2003, Bill Janssen wrote: > There's an outstanding issue about how any new work would integrate > with cgi, httplib, urllib, urllib2, etc. The current idea would be to > create a new package, web, and various modules under it (client, > server, tools, etc.), which would mainly be implemented by importing > functionality from the existing web support modules. We could then > replace the functionality in the new modules with new code in an > incremental fashion, as necessary. [...] For the client-side stuff, who is going to do that? I don't see any obvious need for it, nor people lining up to do it. Greg S. has a new module, but apart from that... John From tinuviel at sparcs.kaist.ac.kr Fri Nov 14 12:27:55 2003 From: tinuviel at sparcs.kaist.ac.kr (Seo Sanghyeon) Date: Fri Nov 14 12:28:14 2003 Subject: [Web-SIG] Grail resurrection Message-ID: <20031114172755.GA17424@sparcs.kaist.ac.kr> . -------------- next part -------------- Hello, I'm very interested in Python web client programming. What do you think about resurrecting Grail? Undoubtedly a web browser is a web client... I checked out Grail CVS, and try to run it with Python 2.2. (Grail uses rexec, which is disabled in Python 2.3...) It doesn't work out of the box, but with an small patch below it seems to be functional. (Basically, 1.5 compatibility class httplib.HTTP is not compatible enough.) Bookmark doesn't work. Scrolling is screwed in pages using tables. JPG causes an error. But at least I can read Python Tutorial just fine. Is anyone here interested in resurrecting/modernizing Grail? I think the effort's results will be helpful to Python web client programming community in general. *** cvs/grail/src/protocols/httpAPI.py 13 Nov 2003 10:07:28 -0000 1.1.1.1 --- cvs/grail/src/protocols/httpAPI.py 13 Nov 2003 10:51:15 -0000 1.2 *************** *** 52,57 **** --- 52,63 ---- class MyHTTP(httplib.HTTP): + # Compatibility hack + def __init__(self, host): + httplib.HTTP.__init__(self, host) + self.connect(host) + self.sock = self._conn.sock + def putrequest(self, request, selector): self.selector = selector httplib.HTTP.putrequest(self, request, selector) From jjl at pobox.com Fri Nov 14 13:43:52 2003 From: jjl at pobox.com (John J Lee) Date: Fri Nov 14 13:44:04 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: <20031114172755.GA17424@sparcs.kaist.ac.kr> References: <20031114172755.GA17424@sparcs.kaist.ac.kr> Message-ID: On Sat, 15 Nov 2003, Seo Sanghyeon wrote: > . > [Seo, FYI: your email client stuck your message in an attachment, so it's a pain to reply to.] > Hello, I'm very interested in Python web client programming. > What do you think about resurrecting Grail? Undoubtedly a web [...] What uses did you have in mind? > Is anyone here interested in resurrecting/modernizing Grail? Not personally. > I think the effort's results will be helpful to Python web client > programming community in general. I looked at it before, and thought possibly the sort of stuff in Context.py might be useful (it would be nice to have a browser object written in Python, with support for history, frames, clicking on buttons etc. -- especially for something like http://wwwsearch.sf.net/DOMForm/). There's also caching, though that's of debatable usefulness: I'm sure there must be free caching proxies out there that you can install locally (maybe even embeddable ones?). Apart from that, I can't see anything useful, but I might well have missed things. As for the applet stuff, I have no idea how up-to-date it is with current standards (de-facto or otherwise). I would assume that the rexec stuff is not actually important now, since nobody writes internet web pages with Python applets or script in them (apart from in Jython, but that's a different thing for these purposes). In fact, I would have thought that the frequency of people wanting to run even Java applets from Python is very low, so even that probably isn't widely useful (and you can already do it using Jython and httpunit). John From jjl at pobox.com Fri Nov 14 17:11:45 2003 From: jjl at pobox.com (John J Lee) Date: Fri Nov 14 17:11:53 2003 Subject: [Web-SIG] [client side] urllib2.UserAgent class and a few functions Message-ID: Only changed slightly since my posting it here, but I've stuck it up on the web, comments appreciated (post comments on this list): http://wwwsearch.sourceforge.net/bits/ua.py John From janssen at parc.com Fri Nov 14 19:13:19 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Nov 14 19:15:04 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: Your message of "Fri, 14 Nov 2003 09:27:55 PST." <20031114172755.GA17424@sparcs.kaist.ac.kr> Message-ID: <03Nov14.161325pst."58611"@synergy1.parc.xerox.com> > What do you think about resurrecting Grail? Doesn't make a lot of sense to me. One of Python's weak points is the lack of a standard portable UI like Swing -- I write all my UI projects in Java for only that reason. Browsers are all about UI. (Yes, I'm aware that this can also be considered a strength -- lack of a standard UI allows us to have many non-standard UIs, like wxPython or PyGTK. If we standardized, it would have a "chilling effect" on the other efforts.) However, I think some of the technology used in Grail was interesting, particularly the security classes Bastion and rexec. Both of these are disabled in 2.3, because of "known and not readily fixable security holes". Bill From barry at python.org Fri Nov 14 20:39:31 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 14 20:39:43 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: <03Nov14.161325pst."58611"@synergy1.parc.xerox.com> References: <03Nov14.161325pst."58611"@synergy1.parc.xerox.com> Message-ID: <1068860371.990.97.camel@anthem> On Fri, 2003-11-14 at 19:13, Bill Janssen wrote: > However, I think some of the technology used in Grail was interesting, > particularly the security classes Bastion and rexec. Both of these > are disabled in 2.3, because of "known and not readily fixable > security holes". I thought it was more the /unknown/ that has us worried. And until somebody steps forward and owns restricted execution (or whatever it will look like), that's not likely to change. -Barry From gward at python.net Fri Nov 14 22:41:24 2003 From: gward at python.net (Greg Ward) Date: Fri Nov 14 22:41:29 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: References: <20031114172755.GA17424@sparcs.kaist.ac.kr> Message-ID: <20031115034124.GA1092@cthulhu.gerg.ca> On 14 November 2003, John J Lee said: > In fact, I would have thought that > the frequency of people wanting to run even Java applets from Python is > very low, so even that probably isn't widely useful (and you can already > do it using Jython and httpunit). As near as I can tell, Java applets are dead dead dead -- and good riddance. Java's a decent programming language, but the idea of embedding it in a web browser has got to be one of the all-time clunkers. (In case you haven't been following: Microsoft ditched Java support from IE in the initial version of Windows XP; Sun sued to get it put back in, but MS still plans to remove it for good by late 2004. For once, I think MS is actually doing the right thing. Obviously, this will mean the death knell of Java applets, a good five years since it was painfully obvious that they are a dead-end technology. Now if only they could kill off Flash...) Python applets will never be anything more than an interesting academic exercise, and I think the exercise is long-since complete. Greg -- Greg Ward http://www.gerg.ca/ OUR PLAN HAS FAILED STOP JOHN DENVER IS NOT TRULY DEAD STOP HE LIVES ON IN HIS MUSIC STOP PLEASE ADVISE FULL STOP From janssen at parc.com Fri Nov 14 23:21:18 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Nov 14 23:21:51 2003 Subject: [Web-SIG] Next steps... Message-ID: <03Nov14.202126pst."58611"@synergy1.parc.xerox.com> Discussion seems to have died down (though I didn't notice any consensus about auth toolkits...). I suggest we review the items on the list. The next logical step would be to elaborate what each list item means, in terms of both design and the work estimate. I know that some folks have working code for some of these items. I'm going to be mainly offline next week, but I'll try to catch up the following week. My list is posted at http://www.parc.com/janssen/web-sig/needed.html Please remind me of things I've missed or gotten wrong. Thanks! Bill From hendry at cs.helsinki.fi Sun Nov 16 15:22:40 2003 From: hendry at cs.helsinki.fi (Kai Hendry) Date: Sun Nov 16 15:23:16 2003 Subject: [Web-SIG] naive comments Message-ID: <20031116202239.GE1064@cs.helsinki.fi> Just read through most of the posts. So far so good. I esp. glad you dropped templating. I just wrote a CGI program this weekend. source: http://db.cs.helsinki.fi/~hendry/python/fog.py http://db.cs.helsinki.fi/~hendry/python/count.py does not seem to work on my uni shell: http://db.cs.helsinki.fi/~hendry/python/fog.cgi so try the demo from my home machine: http://tap.homelinux.org/~hendry/words/fog.cgi I must say it wasn't that easy. Maybe my style/design was just too inexperienced. Or maybe I should be looking at a "framework" to write this. Comments? Does it do it's job? +1 Client "nice" Form validation would be great. As I'm probably going to sleep wondering how you can horribly break it. The rest I am not so keen on. Server post-multipart. Doesn't libcurl do this? HTML parser. HTML should be XHTML, hence XML? CSS parser. How can a machine interpret style? :) Another worry of mine, that may not be relevant here. This could be me neglecting not to understand locales module. But I am wondering how cgi negotiates the encoding. For example ??? does not work in my CGI and I am not sure how to do it. http://tinyurl.com/v872 W3C seems to think it's latin-1. When it isn't. Is it the "Content-Type: text/html"? Unless I missed it, I did not see people chat about RDF. How does the "future-of-the-web" happen here? Kind regards, -Kai Hendry From tinuviel at sparcs.kaist.ac.kr Sun Nov 16 15:47:11 2003 From: tinuviel at sparcs.kaist.ac.kr (Seo Sanghyeon) Date: Sun Nov 16 15:47:18 2003 Subject: [Web-SIG] SelectORacle Message-ID: <20031116204711.GA28573@sparcs.kaist.ac.kr> http://gallery.theopalgroup.com/selectoracle/ This seems to be an almost complete CSS3 parser. According to their own words, it is "soon to be released in open-source form", and "implemented in pure Python code". My only question is when will be this "soon". From amk at amk.ca Sun Nov 16 16:57:18 2003 From: amk at amk.ca (A.M. Kuchling) Date: Sun Nov 16 16:57:33 2003 Subject: [Web-SIG] HTML 4.01 patch Message-ID: <20031116215718.GA22194@rogue.amk.ca> SF patch #836088 adds methods to the htmllib module for the elements in HTML 4.01. Mostly it just adds methods with 'pass' as the only statement, so it's pretty trivial. If someone wants to look over it and sanity-check the patch, it would be much appreciated. However, the way forward is the HTMLParser.py, which supports XML-style empty elements and discards various SGML-specific features that aren't relevant for HTML. There's really nothing to do for HTML 4.01 support in HTMLParser because it doesn't provide any default handlers for elements. So I think there isn't really anything to do for HTML 4.01 support in Python, beyond the above patch. --amk From davidf at sjsoft.com Mon Nov 17 02:19:29 2003 From: davidf at sjsoft.com (David Fraser) Date: Mon Nov 17 02:19:36 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: <20031116204711.GA28573@sparcs.kaist.ac.kr> References: <20031116204711.GA28573@sparcs.kaist.ac.kr> Message-ID: <3FB87681.60203@sjsoft.com> Seo Sanghyeon wrote: >http://gallery.theopalgroup.com/selectoracle/ > >This seems to be an almost complete CSS3 parser. According to their >own words, it is "soon to be released in open-source form", and >"implemented in pure Python code". > >My only question is when will be this "soon". > > Why not email them and ask? David From jjl at pobox.com Mon Nov 17 08:40:21 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 17 08:40:34 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: <3FB87681.60203@sjsoft.com> References: <20031116204711.GA28573@sparcs.kaist.ac.kr> <3FB87681.60203@sjsoft.com> Message-ID: On Mon, 17 Nov 2003, David Fraser wrote: > Seo Sanghyeon wrote: > > >http://gallery.theopalgroup.com/selectoracle/ > > > >This seems to be an almost complete CSS3 parser. According to their [...] > >My only question is when will be this "soon". > > > > > Why not email them and ask? Because we don't need a CSS parser in the standard library? Where do people use CSS parsers outside of graphical web browsers? John From sholden at holdenweb.com Mon Nov 17 10:33:05 2003 From: sholden at holdenweb.com (Steve Holden) Date: Mon Nov 17 10:37:09 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: Message-ID: [John J Lee] > On Mon, 17 Nov 2003, David Fraser wrote: > > > Seo Sanghyeon wrote: > > > > >http://gallery.theopalgroup.com/selectoracle/ > > > > > >This seems to be an almost complete CSS3 parser. According to their > [...] > > >My only question is when will be this "soon". > > > > > > > > Why not email them and ask? > > Because we don't need a CSS parser in the standard library? > > Where do people use CSS parsers outside of graphical web browsers? > Well, CSS is intended to accommodate all media, so I imaging eventually people will want to use it in aural web browsers as well. Plus CSS will be an important way to indicate presentation style, so it might conceivably be useful even for documents only intended for printed delivery. However, I don't really see it quickly becoming a standard library component. But I do think the ramifications of CSS will eventually reach further than you apparently anticipate. regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From neel at mediapulse.com Mon Nov 17 10:45:56 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Mon Nov 17 10:46:00 2003 Subject: [Web-SIG] SelectORacle Message-ID: > Because we don't need a CSS parser in the standard library? > > Where do people use CSS parsers outside of graphical web browsers? Because to properly reead an HTML document, you need to understand the data in the CSS. Off the top of my head I can think of two uses I wouild have had for this in the past; one is converting HTML documents to PDF on-the-fly, and another was converting HTML pages for display on a small screen device such as a PDA or a cell phone. Also, in this case, it sounds like it is already written, or very near completion. So unless there is something wrong with the code (makes heavy use of out dated modules for example), I'm not seeing the harm. Batteries included right? Mike From jjl at pobox.com Mon Nov 17 11:02:40 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 17 11:03:11 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: References: Message-ID: On Mon, 17 Nov 2003, Steve Holden wrote: > [John J Lee] [...] > > Because we don't need a CSS parser in the standard library? > > > > Where do people use CSS parsers outside of graphical web browsers? > > > Well, CSS is intended to accommodate all media, so I imaging eventually > people will want to use it in aural web browsers as well. Plus CSS will > be an important way to indicate presentation style, so it might > conceivably be useful even for documents only intended for printed > delivery. > > However, I don't really see it quickly becoming a standard library > component. But I do think the ramifications of CSS will eventually reach > further than you apparently anticipate. I didn't intend to imply any prediction about where people will use CSS in the future, or where CSS parsers might be "conceivably useful". I just wondered where people use them ATM. Presumably the people asking for it in the standard library are already using CSS in some way. If so, I guess they should explain what they're using it for. John From cs1spw at bath.ac.uk Mon Nov 17 11:08:01 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Mon Nov 17 11:10:30 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: References: <20031116204711.GA28573@sparcs.kaist.ac.kr> <3FB87681.60203@sjsoft.com> Message-ID: <3FB8F261.2080301@bath.ac.uk> John J Lee wrote: > On Mon, 17 Nov 2003, David Fraser wrote: > Where do people use CSS parsers outside of graphical web browsers? For implementing search engine spiders. A common trick used by search engine optimisation tricksters is to include a bunch of keywords in the body of the document that are invisible to the naked eye. There are a number of ways of doing this using CSS - you can set the text to the same or a very similar colour to the background, or you can hide the text using a CSS command such as display: none, visibility: hidden or even text-indent: -1000em. -- Simon Willison Web development weblog: http://simon.incutio.com/ From jjl at pobox.com Mon Nov 17 13:41:07 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 17 13:41:38 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: <3FB8F261.2080301@bath.ac.uk> References: <20031116204711.GA28573@sparcs.kaist.ac.kr> <3FB87681.60203@sjsoft.com> <3FB8F261.2080301@bath.ac.uk> Message-ID: On Mon, 17 Nov 2003, Simon Willison wrote: > John J Lee wrote: > > > On Mon, 17 Nov 2003, David Fraser wrote: > > Where do people use CSS parsers outside of graphical web browsers? > > For implementing search engine spiders. A common trick used by search [...] *Are* there any internet search engine spiders written in Python, other than Google's? Independent of the answer to that, though, how many people write internet search engine spiders? Not enough to justify a CSS parser in the standard library! John From neel at mediapulse.com Mon Nov 17 14:03:34 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Mon Nov 17 14:03:43 2003 Subject: [Web-SIG] SelectORacle Message-ID: > *Are* there any internet search engine spiders written in > Python, other > than Google's? Independent of the answer to that, though, > how many people > write internet search engine spiders? Not enough to justify > a CSS parser > in the standard library! And what exactly is the criteria for a module to be included? From jjl at pobox.com Mon Nov 17 17:17:10 2003 From: jjl at pobox.com (John J Lee) Date: Mon Nov 17 17:17:42 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: References: Message-ID: On Mon, 17 Nov 2003, Michael C. Neel wrote: > > *Are* there any internet search engine spiders written in Python, > > other than Google's? Independent of the answer to that, though, how > > many people write internet search engine spiders? Not enough to > > justify a CSS parser in the standard library! > > And what exactly is the criteria for a module to be included? Dunno, but at least one person having used it is probably a good start . John From hancock at anansispaceworks.com Tue Nov 18 10:49:59 2003 From: hancock at anansispaceworks.com (Terry Hancock) Date: Tue Nov 18 10:44:21 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: <20031115034124.GA1092@cthulhu.gerg.ca> References: <20031114172755.GA17424@sparcs.kaist.ac.kr> <20031115034124.GA1092@cthulhu.gerg.ca> Message-ID: On Friday 14 November 2003 09:41 pm, Greg Ward wrote: > As near as I can tell, Java applets are dead dead dead -- and good > riddance. Java's a decent programming language, but the idea of > embedding it in a web browser has got to be one of the all-time > clunkers. Why? Because of problems with Java or with the applet concept? The former I might agree with, but not the latter. > Python applets will never be anything more than an interesting academic > exercise, and I think the exercise is long-since complete. By this, you mean "applets compiled into Java from Python using Jython" I presume, and not "applets written in Python"? Also that apparently you don't have uses for it. Java applets are not really dead, although their use has become somewhat narrow. Flash, OTOH, is taking over the web. It is equally annoying to me because, like Java, it is proprietary. I really believe that we need an open-source plugin that does what flash does and does it better. Python seems like a smart possibility to me, particularly when combined with something like PyGame. I've brought this up and/or discussed it on c.l.p and the pygame list and there were a few people interested. I've never written a plugin for Netscape, but my understanding is that that's the thing to do -- since Mozilla and IE both have some way to load them. I've been wondering since this list was established whether a Python applet browser plugin is *on-topic* for web-sig or not? (Perhaps such an effort should be a separate project?). In any case, this is the bit that interests me the most, as I'm already using Zope to solve most of the other problems being addressed here (not sure what that will mean in the future). If you all think such discussion belongs here, I will be happy to discuss it with anyone who's interested. ;-D I have been hesitant to charge forward with this, because at the moment I don't have the time to learn the plugin API or write code. But it's still something I hope to make time for in the future. But if somebody has a close enough idea to this, and wants to work on it, maybe I can help? I certainly have some ideas about design issues. And I'm disturbed to hear the comment about security problems with the rexec modules, as I had been thinking those were pretty solid (but then again, I haven't tried them outside of using Zope's Python Script objects). Cheers, Terry -- Terry Hancock ( hancock at anansispaceworks.com ) Anansi Spaceworks http://www.anansispaceworks.com From amk at amk.ca Tue Nov 18 11:31:05 2003 From: amk at amk.ca (A.M. Kuchling) Date: Tue Nov 18 11:31:28 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: References: <20031114172755.GA17424@sparcs.kaist.ac.kr> <20031115034124.GA1092@cthulhu.gerg.ca> Message-ID: <20031118163105.GA26839@rogue.amk.ca> On Tue, Nov 18, 2003 at 09:49:59AM -0600, Terry Hancock wrote: > annoying to me because, like Java, it is proprietary. I really believe > that we need an open-source plugin that does what flash does and > does it better. What you want is a decent SVG plug-in with JavaScript support and animation that actually works. SVG is a W3C standard, so you don't need to either expend effort duplicating Flash's SWF format or invent a new format and convince people to use it. KDE's upcoming 3.2 release is probably the first chance at getting such a thing, because both Mozilla's SVG support and Adobe's SVG plugin (at least on non-Windows platforms) seem to be dead. --amk From hancock at anansispaceworks.com Tue Nov 18 19:26:26 2003 From: hancock at anansispaceworks.com (Terry Hancock) Date: Tue Nov 18 19:20:34 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: <20031118163105.GA26839@rogue.amk.ca> References: <20031114172755.GA17424@sparcs.kaist.ac.kr> <20031118163105.GA26839@rogue.amk.ca> Message-ID: On Tuesday 18 November 2003 10:31 am, A.M. Kuchling wrote: > On Tue, Nov 18, 2003 at 09:49:59AM -0600, Terry Hancock wrote: > > annoying to me because, like Java, it is proprietary. I really believe > > that we need an open-source plugin that does what flash does and > > does it better. > > What you want is a decent SVG plug-in with JavaScript support and animation > that actually works. SVG is a W3C standard, so you don't need to either > expend effort duplicating Flash's SWF format or invent a new format and > convince people to use it. "What flash does" doesn't mean "interpret SWF format". "What flash does" means "make it easy for web authors to introduce powerful animated presentation content to their websites". Python certainly doesn't have anything to do this, but I think it might be persuaded to by leveraging existing efforts. Whether this is accomplished using SVG or a new format entirely is immaterial as far as I and the web authors (probably) are concerned. If you think it's good, great. But bear in mind we need a *real* *implemented* standard, not something the committee thought was a good idea at the time, but so far, no one really cares to actually make work. SVG is very complicated to implement. It might be more desireable to define a clear subset of it. Also, I don't think Javascript + SVG will solve the problem of interactive input very well -- wouldn't that mostly just solve the animation part of the problem? (Admittedly, this may be all that Flash does, but if we want to offer an alternative to both Flash and Java, we need this -- and anyway, the applications I'm interested in need it). If you want people to *use* the new standard instead of just smile and nod, it has to actually give them what they want: an easy authoring environment and output that looks really cool. Nevermind all that w3c nonsense about "standards-compliance", "content" versus "presentation", and whether a "text-to-speech browser for the blind will know what to do with the pages". You don't legislate web authors, you woo them. And the proprietary world is showing us that it can do this very well. If we want the web back, we're going to have to provide something sexy, not just something sensible. IMHO, of course. Now. Having said that, I'm not sure whether SVG + Javascript can (or can't) do "sexy". I'm pretty sure it won't do "easy to author" very well, but I could be wrong about that. I personally find Python easier to deal with than Javascript, but that may be (well -- is) a biased opinion. It's also not clear to me that "there can be only one!". I'm not sure the world can't live with more than one presentation plugin. Surely there are enough of us programming in Python, that the goal of programming in our preferred language is adequate motivation? Also, SVG *is still* a new format until people start using it, which they aren't doing (much), AFAICT. Cheers, Terry -- Terry Hancock ( hancock at anansispaceworks.com ) Anansi Spaceworks http://www.anansispaceworks.com From aquarius-lists at kryogenix.org Wed Nov 19 07:10:20 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Wed Nov 19 07:09:35 2003 Subject: [Web-SIG] SelectORacle References: <20031116204711.GA28573@sparcs.kaist.ac.kr> <3FB87681.60203@sjsoft.com> <3FB8F261.2080301@bath.ac.uk> Message-ID: Simon Willison spoo'd forth: > John J Lee wrote: > >> On Mon, 17 Nov 2003, David Fraser wrote: >> Where do people use CSS parsers outside of graphical web browsers? > > For implementing search engine spiders. A common trick used by search > engine optimisation tricksters is to include a bunch of keywords in the > body of the document that are invisible to the naked eye. There are a > number of ways of doing this using CSS - you can set the text to the > same or a very similar colour to the background, or you can hide the > text using a CSS command such as display: none, visibility: hidden or > even text-indent: -1000em. *wince* It strikes me as being really difficult to tell the difference between someone using this sort of technique to hide spam text and someone using it as part of an image replacement technique like FIR. Besides, a search engine spider isn't so common a thing that it needs magic support in the stdlib, surely? If someone puts something like this together I can see it eventually being incorporated into the stdlib if lots of people use it (like, say, the xmlrpclib stuff from PythonLabs) but not to be a thing that this SIG decides should be part of the new "web" section from the outset? sil -- Hov ghajbe'bogh ram rur pegh ghajbe'bogh jaj (A day without secrets is like a night without stars) -- Klingon proverb From amk at amk.ca Wed Nov 19 08:27:35 2003 From: amk at amk.ca (A.M. Kuchling) Date: Wed Nov 19 08:28:00 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: References: <20031114172755.GA17424@sparcs.kaist.ac.kr> <20031118163105.GA26839@rogue.amk.ca> Message-ID: <20031119132735.GA28526@rogue.amk.ca> On Tue, Nov 18, 2003 at 06:26:26PM -0600, Terry Hancock wrote: > It's also not clear to me that "there can be only one!". I'm > not sure the world can't live with more than one presentation > plugin. Surely there are enough of us programming in Python, > that the goal of programming in our preferred language > is adequate motivation? The critical factor is content authors, not programmers. Assuming someone defined and implemented a Python-based animation/UI plug-in, there's basically zero chance that anyone would start providing content using it. > Also, SVG *is still* a new format until people start using it, > which they aren't doing (much), AFAICT. There's lots of SVG-supporting software out there: Adobe FrameMaker, Sketch, Sodipodi, ksvg, Apache's Batik, etc. (W3C has a list.) At work we use SVG exclusively for icons, translating to PNG with ksvgtopng. I expect that in a year or two graphing libraries will soon give up having multiple backends and just output SVG, leaving PNG/PostScript/PDF production to the SVG rendered. But note that these are all backend applications. Where this all falls down is on the browser side. At my previous job, it would have been *great* to be able to count on users having SVG support; we could have drawn really nice diagrams of what their designs looked like. But no browsers support SVG by default, and plug-ins are hard to find and install, given the zillion browser and platform variations. --amk From neel at mediapulse.com Wed Nov 19 10:09:48 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Wed Nov 19 10:09:51 2003 Subject: [Web-SIG] Grail resurrection Message-ID: > If you all think such discussion belongs here, I will be happy to > discuss it with anyone who's interested. ;-D I have been hesitant > to charge forward with this, because at the moment I don't > have the time to learn the plugin API or write code. But it's still > something I hope to make time for in the future. But if somebody > has a close enough idea to this, and wants to work on it, maybe > I can help? I certainly have some ideas about design issues. And > I'm disturbed to hear the comment about security problems > with the rexec modules, as I had been thinking those were pretty > solid (but then again, I haven't tried them outside of using Zope's > Python Script objects). I'm not sure if this is something Web-SIG tackles or not, this has to be the broadest SIG there is! This is something that has crossed my mind, esp. considering python already handles the cross platform issues well. Right now it looks like flash is setup to get a hold on the "applet". Java has some strikes against it from it's past (most people still assume it has performance issues, doubts on security) and is still closer to C++ than scripting languages in terms of development cycles (meaning it takes longer to develop in Java than Python/Perl/PHP). Java is also still controlled by Sun, and last time I checked they weren't moving toward any type of ANSI standard, and Microsoft is trying to drop Java from Windows. If only Sun would have opened up Java when they first started with it things would be very different today (so I'd like to think =). Flash is closed, but that has to be it's only draw back. It's cross platform, has good performance, and integrates well with other web technologies. Flash MX's XML abilities are really powerful, and flash can also handle complex UI for user input (a place where HTML does extremely poor). Bringing python to this ring will be tough, and take some time I think. In the long run it will be worth it however I think, because as flash becomes the only choice it will tend to offer less and less (the normal result of a monopoly). Having some competition from python will challenge both options to get better, and we the developers win out no matter which one we use. I'm interested to hear other's weigh in on this... Mike From neel at mediapulse.com Wed Nov 19 10:23:38 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Wed Nov 19 10:23:45 2003 Subject: [Web-SIG] Call for managment =) Message-ID: Now that we have the SIG, I think before anything is going to happen we need some management around to help get things going. Right now every thing is just being debated to death, and it feels like a perl conference =) So (I think) we need a "guy in charge" to take the reigns, coral the different ideas, find out who is going to work on them, etc. Bill Janssen has done a good job of getting a list of things mentioned at http://www2.parc.com/istl/members/janssen/web-sig/needed.html - wether this means he's up for leading the effort is his call =) Any volunteers? I've got no problem doing this, but it should be who the list wants, and I'm not exactly a python legend (though I have read the needed peps! ). Mike From gward at python.net Sat Nov 22 11:19:30 2003 From: gward at python.net (Greg Ward) Date: Sat Nov 22 11:19:36 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: References: <20031114172755.GA17424@sparcs.kaist.ac.kr> <20031115034124.GA1092@cthulhu.gerg.ca> Message-ID: <20031122161930.GA1071@cthulhu.gerg.ca> On 18 November 2003, Terry Hancock said: > On Friday 14 November 2003 09:41 pm, Greg Ward wrote: > > As near as I can tell, Java applets are dead dead dead -- and good > > riddance. Java's a decent programming language, but the idea of > > embedding it in a web browser has got to be one of the all-time > > clunkers. > > Why? Because of problems with Java or with the applet concept? > The former I might agree with, but not the latter. Sorry, but I'm deliberately not going to answer this here. It's off-topic for this sig, which is supposed to be about improving the support for common web programming tasks (both server- and client-side) in the Python standard library. (Maybe someday I'll write up a rant about why applets in particular, and fancy-shmancy dynamic web content in general, suck. Not today, though.) > I've been wondering since this list was established whether a > Python applet browser plugin is *on-topic* for web-sig or not? If the above paragraph is an accurate assessment of the web-sig's charter, then I would say not. Greg -- Greg Ward http://www.gerg.ca/ I went to buy some camouflage trousers the other day, but I couldn't find any! From janssen at parc.com Tue Nov 25 22:18:56 2003 From: janssen at parc.com (Bill Janssen) Date: Tue Nov 25 22:19:27 2003 Subject: [Web-SIG] Call for managment =) In-Reply-To: Your message of "Wed, 19 Nov 2003 07:23:38 PST." Message-ID: <03Nov25.191903pst."58611"@synergy1.parc.xerox.com> > Now that we have the SIG, I think before anything is going to happen we > need some management around to help get things going. Right now every > thing is just being debated to death, and it feels like a perl > conference =) Mike, I agree with you about the management part, but I think the discussion is quite useful. It tends to open up new avenues for discussion, so that we can get a fuller idea of what the work is. I think we're close to being there, though, and perhaps it's time to figure out more exactly what should be done for each particular item. To my mind, this means finding someone for each of our "projects" who believes in it strongly enough to put together a PEP on it. "Someone", of course, can be a group of people working together, too. I think the next step for this group is to start writing and reviewing these PEPs. So, if people want to move forward, pick something, send a note to the list so that others can help out if they want, and start working on a PEP! > So (I think) we need a "guy in charge" to take the reigns, coral the > different ideas, find out who is going to work on them, etc. Bill > Janssen has done a good job of getting a list of things mentioned at > http://www2.parc.com/istl/members/janssen/web-sig/needed.html - wether > this means he's up for leading the effort is his call =) I'd be happy to do this, or not do it :-). I've got plenty of other things on my plate, but I put this effort together because I thought it was important enough to work on. However, remember that this is all volunteer. "Reins" don't work -- more like herding cats :-). I'd be happy to work on an umbrella PEP, that ties together the various sub-PEPs for the various projects. I personally will also be happy to pick one of these sub-projects and work on it. I've got some experience in implementing SSL, so I could perhaps contribute to the SSL server-side support, for instance. > Any volunteers? I've got no problem doing this, but it should be who > the list wants, and I'm not exactly a python legend (though I have read > the needed peps! ). Bill From janssen at parc.com Tue Nov 25 22:20:31 2003 From: janssen at parc.com (Bill Janssen) Date: Tue Nov 25 22:20:53 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: Your message of "Sat, 22 Nov 2003 08:19:30 PST." <20031122161930.GA1071@cthulhu.gerg.ca> Message-ID: <03Nov25.192038pst."58611"@synergy1.parc.xerox.com> > > I've been wondering since this list was established whether a > > Python applet browser plugin is *on-topic* for web-sig or not? > > If the above paragraph is an accurate assessment of the web-sig's > charter, then I would say not. I'd agree with Greg, here. Java applets are the only ones that have ever become well-established. An interpreter for Javascript would make more sense to me. Can Javascript be interpreted using the Python VM? Bill From janssen at parc.com Tue Nov 25 22:25:48 2003 From: janssen at parc.com (Bill Janssen) Date: Tue Nov 25 22:26:33 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: Your message of "Mon, 17 Nov 2003 10:41:07 PST." Message-ID: <03Nov25.192553pst."58611"@synergy1.parc.xerox.com> John, I'm aware of at least 4 spiders written in the last year by various research groups at PARC, alone! Usually, it's part of something called "focussed crawling", which is examining sites for some particular purpose. It's wrong to ask whether there spiders written in Python, I think. A more interesting question is, how many times was Python rejected as a language in which to write crawlers because some other language had a better library? I know of at least one such case here in the last year. Finally, spiders are not the only reason for CSS (in fact, I'd guess they aren't the main reason.) The issue is understanding an HTML/XML page, regardless of where it comes from. It may be a book in OEBPS format, for instance, which uses CSS heavily. CSS parsing is important for understanding these formats now, and will become increasingly important as HTML fades out in favor of XHTML and other XML formats. > *Are* there any internet search engine spiders written in Python, other > than Google's? Independent of the answer to that, though, how many people > write internet search engine spiders? Not enough to justify a CSS parser > in the standard library! Bill From kmarks at mac.com Wed Nov 26 01:40:50 2003 From: kmarks at mac.com (Kevin Marks) Date: Wed Nov 26 01:41:01 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: <03Nov25.192553pst.58611@synergy1.parc.xerox.com> Message-ID: <79D6A51F-1FDB-11D8-8204-000A957FD3FE@mac.com> On Tuesday, November 25, 2003, at 07:25 PM, Bill Janssen wrote: > >> *Are* there any internet search engine spiders written in Python, >> other >> than Google's? Independent of the answer to that, though, how many >> people >> write internet search engine spiders? Not enough to justify a CSS >> parser >> in the standard library! I'm writing a spider in Python at the moment, though CSS is not something that is bothering me particularly. From kmarks at mac.com Wed Nov 26 01:46:23 2003 From: kmarks at mac.com (Kevin Marks) Date: Wed Nov 26 01:46:32 2003 Subject: [Web-SIG] Grail resurrection In-Reply-To: Message-ID: <40411955-1FDC-11D8-8204-000A957FD3FE@mac.com> On Wednesday, November 19, 2003, at 07:09 AM, Michael C. Neel wrote: > This is something that has crossed my mind, esp. considering python > already handles the cross platform issues well. > > Right now it looks like flash is setup to get a hold on the "applet". > Java has some strikes against it from it's past (most people still > assume it has performance issues, doubts on security) and is still > closer to C++ than scripting languages in terms of development cycles > (meaning it takes longer to develop in Java than Python/Perl/PHP). > Java > is also still controlled by Sun, and last time I checked they weren't > moving toward any type of ANSI standard, and Microsoft is trying to > drop > Java from Windows. If only Sun would have opened up Java when they > first started with it things would be very different today (so I'd like > to think =). > > Flash is closed, but that has to be it's only draw back. It's cross > platform, has good performance, and integrates well with other web > technologies. Flash MX's XML abilities are really powerful, and flash > can also handle complex UI for user input (a place where HTML does > extremely poor). > > Bringing python to this ring will be tough, and take some time I think. > In the long run it will be worth it however I think, because as flash > becomes the only choice it will tend to offer less and less (the normal > result of a monopoly). Having some competition from python will > challenge both options to get better, and we the developers win out no > matter which one we use. The chances of getting people to install a plug-in are pretty low. I speak from experience. I worked on QuickTime for 5 years, and although we have over 150 million copies installed, it is still seen as insufficient by many developers. Those within corporate environments can never install anything that does not come with the OS, and Flash's 95% plus penetration is very hard to tackle. Instead I'd suggest improving the Ming library that generates Flash from Python, and concentrate on the backend. in addition, Python used for standalone apps seems a much more promising idea to me. From gstein at lyra.org Wed Nov 26 04:48:31 2003 From: gstein at lyra.org (Greg Stein) Date: Wed Nov 26 04:50:33 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: ; from jjl@pobox.com on Mon, Nov 17, 2003 at 10:17:10PM +0000 References: Message-ID: <20031126014831.B12924@lyra.org> On Mon, Nov 17, 2003 at 10:17:10PM +0000, John J Lee wrote: > On Mon, 17 Nov 2003, Michael C. Neel wrote: > > > *Are* there any internet search engine spiders written in Python, > > > other than Google's? Independent of the answer to that, though, how > > > many people write internet search engine spiders? Not enough to > > > justify a CSS parser in the standard library! > > > > And what exactly is the criteria for a module to be included? > > Dunno, but at least one person having used it is probably a good start > . Agreed. -- Greg Stein, http://www.lyra.org/ From jjl at pobox.com Wed Nov 26 07:09:57 2003 From: jjl at pobox.com (John J Lee) Date: Wed Nov 26 07:10:09 2003 Subject: [Web-SIG] SelectORacle In-Reply-To: <03Nov25.192553pst."58611"@synergy1.parc.xerox.com> References: <03Nov25.192553pst."58611"@synergy1.parc.xerox.com> Message-ID: On Tue, 25 Nov 2003, Bill Janssen wrote: [...] > It's wrong to ask whether there spiders written in Python, I think. A > more interesting question is, how many times was Python rejected as a > language in which to write crawlers because some other language had a > better library? I know of at least one such case here in the last year. [...] So, before anybody even considers a CSS parser in the standard library, somebody has to write a python CSS parser (or wrap one), maintain it, and see how many people use it. John From neel at mediapulse.com Wed Nov 26 11:00:23 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Wed Nov 26 11:00:27 2003 Subject: [Web-SIG] Call for managment =) Message-ID: > Mike, I agree with you about the management part, but I think the > discussion is quite useful. It tends to open up new avenues for > discussion, so that we can get a fuller idea of what the work is. I > think we're close to being there, though, and perhaps it's time to > figure out more exactly what should be done for each particular item. That's my feeling. It seems we've hit on just about every possible topic so now it's a matter of making things real. > To my mind, this means finding someone for each of our "projects" who > believes in it strongly enough to put together a PEP on it. > "Someone", of course, can be a group of people working together, too. > I think the next step for this group is to start writing and reviewing > these PEPs. > > So, if people want to move forward, pick something, send a note to the > list so that others can help out if they want, and start working on a > PEP! > That's the best way I can think of for things to move forward. If you want to start with the PEP, or start with some code then get help with the PEP later go for it. We probably do need an umbrella PEP though to define what is and is not part of web-SIG so it's a little clearer. I also think it should be quite open. If someone want to do a css parser or a templating system, go for it. If you feel that would end life as we know it to be in the stdlib remember you don't have to work on it nor do you have to import it. In the end Guido and crew are making the call anyway, and I'd rather we send him a few modules they reject for stdlib than not. I like to think this is the grand plan for python, esp given the opening to PEP 2 (which everyone here should read; it cover adding modules to the stdlib): The Python Standard Library contributes significantly to Python's success. The language comes with "batteries included", so it is easy for people to become productive with just the standard library alone. It is therefore important that this library grows with the language, and that such growth is supported and encouraged. Personally I have an interest in templeting and a web server, but right now I'm trying to overhaul another project of mine so I'll be ready to dive in sometime in late Dec. Mike From cs1spw at bath.ac.uk Wed Nov 26 11:42:44 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Wed Nov 26 11:42:51 2003 Subject: [Web-SIG] Python version of WWW::Mechanize Message-ID: <3FC4D804.70201@bath.ac.uk> Perl's WWW::Mechanize module is awesome: http://www.perl.com/lpt/a/2003/01/22/mechanize.html my $agent = WWW::Mechanize->new(); $agent->get("http://www.radiotimes.beeb.com/"); $agent->follow("My Diary"); $agent->form(2); $agent->field("email", $email); $agent->click(); Would something like this be a worthwhile consideration for the Python web modules in the standard library, or is it specialised to the point that it works better as a separately maintained module? -- Simon Willison Web development weblog: http://simon.incutio.com/ From stuart at stuartbishop.net Thu Nov 27 03:44:33 2003 From: stuart at stuartbishop.net (Stuart Bishop) Date: Thu Nov 27 03:46:00 2003 Subject: [Web-SIG] Next steps... In-Reply-To: <03Nov14.202126pst."58611"@synergy1.parc.xerox.com> References: <03Nov14.202126pst."58611"@synergy1.parc.xerox.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/11/2003, at 3:21 PM, Bill Janssen wrote: > Discussion seems to have died down (though I didn't notice any > consensus about auth toolkits...). I suggest we review the items on > the list. The next logical step would be to elaborate what each list > item means, in terms of both design and the work estimate. I know > that some folks have working code for some of these items. > > I'm going to be mainly offline next week, but I'll try to catch up > the following week. My list is posted at > > http://www.parc.com/janssen/web-sig/needed.html > > Please remind me of things I've missed or gotten wrong. Thanks! RFC3490 and RFC3492 support everywhere (Unicode domain names). httplib already does aparently, but urllib doesn't. encodings.idna already exists and does all the real work. Unicode URIs everywhere ( http://www.w3.org/International/O-URL-code.html ). We should be able to do the equivalent of the following: urllib.urlretrieve(u'http://my.2\N{CENT SIGN}.net/\N{COPYRIGHT SIGN}') Digest Auth as a standalone library, as it is more widely adopted *outside* of the HTTP world. - -- Stuart Bishop http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/xblxAfqZj7rGN0oRAoecAJ9cTMdt7sWVSvd/POOyr054FVM9rACeOImv Gsi40HS66ymJ75XUXdf2LL4= =cyTd -----END PGP SIGNATURE----- From jjl at pobox.com Thu Nov 27 08:32:25 2003 From: jjl at pobox.com (John J Lee) Date: Thu Nov 27 08:32:33 2003 Subject: [Web-SIG] Next steps... In-Reply-To: References: <03Nov14.202126pst."58611"@synergy1.parc.xerox.com> Message-ID: On Thu, 27 Nov 2003, Stuart Bishop wrote: [...] > Unicode URIs everywhere ( > http://www.w3.org/International/O-URL-code.html ). > We should be able to do the equivalent of the following: > urllib.urlretrieve(u'http://my.2\N{CENT SIGN}.net/\N{COPYRIGHT SIGN}') [...] Do you know what standards apply to this? The page you reference talks about using UTF-8 for %-encoding, but RFC 2396 allows any encoding, I think (which makes it impossible to know what character string a %-encoded URI represents). John From jjl at pobox.com Thu Nov 27 08:37:21 2003 From: jjl at pobox.com (John J Lee) Date: Thu Nov 27 08:37:29 2003 Subject: [Web-SIG] Python version of WWW::Mechanize In-Reply-To: <3FC4D804.70201@bath.ac.uk> References: <3FC4D804.70201@bath.ac.uk> Message-ID: On Wed, 26 Nov 2003, Simon Willison wrote: > Perl's WWW::Mechanize module is awesome: > > http://www.perl.com/lpt/a/2003/01/22/mechanize.html > > my $agent = WWW::Mechanize->new(); > $agent->get("http://www.radiotimes.beeb.com/"); > $agent->follow("My Diary"); > $agent->form(2); > $agent->field("email", $email); > $agent->click(); Something like this would certainly be useful. I've not done it because I think it would be nice to have it know about standard browser objects like frames, windows &c. OTOH, it would be quick & easy to write something like the above, and maybe provide a lot of the benefit -- especially the follow_link and forward / back methods, plus handling of the Referer header. What interface did you have in mind for forms? Something like this would be useful as a base class (which uses the code in ClientCookie -- ie. HTTPCookieProcessor &c): http://wwwsearch.sf.net/bits/ua.py BTW, there's another Perl module like this, WWW::Automate. > Would something like this be a worthwhile consideration for the Python > web modules in the standard library, or is it specialised to the point > that it works better as a separately maintained module? I think, like any new stuff added to the library, it should be written and in use before this question really arises. John From cs1spw at bath.ac.uk Thu Nov 27 10:24:06 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Thu Nov 27 10:24:31 2003 Subject: [Web-SIG] Python version of WWW::Mechanize In-Reply-To: References: <3FC4D804.70201@bath.ac.uk> Message-ID: <3FC61716.90909@bath.ac.uk> John J Lee wrote: >> Would something like this be a worthwhile consideration for the Python >> web modules in the standard library, or is it specialised to the point >> that it works better as a separately maintained module? > > I think, like any new stuff added to the library, it should be written and > in use before this question really arises. That makes a lot of sense, especially if we're going to see any progress on producing something tangible. From aquarius-lists at kryogenix.org Sun Nov 30 04:59:22 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Sun Nov 30 04:57:34 2003 Subject: [Web-SIG] Python version of WWW::Mechanize References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> Message-ID: Simon Willison spoo'd forth: >> I think, like any new stuff added to the library, it should be written and >> in use before this question really arises. > > That makes a lot of sense, especially if we're going to see any progress > on producing something tangible. I've put together a first cut of something that works like WWW::Mechanize at http://www.kryogenix.org/days/2003/11/30/pybrowser. Obviously it'll need a little more work on it, but it seems to work OK initially. Do let me know if it doesn't seem to work! sil -- If God didn't want us to eat meat, then he wouldn't have invented the hamburger, would he? Logic. -- Bevier, afe From deelan at interplanet.it Sun Nov 30 06:28:11 2003 From: deelan at interplanet.it (deelan) Date: Sun Nov 30 06:27:53 2003 Subject: [Web-SIG] Coding conventions Message-ID: <3FC9D44B.2090405@interplanet.it> hi there, since some code for the "web package" is starting to appear (WWW::Mechanize!) i was wondering if you guys have planned a set of coding convetions when naming classes, mothods, getter/setters and so on. at least in the "web package" it would be cool to have some consistency between modules. just for a start i think that: http://www.kryogenix.org/code/pybrowser/browser.py gives some nice hints for this task: * package/module names in lowercase. * classes names like Java, e.g. SomeCoolClass * method and function names? seems that the trend is to name them like some_cool_method() * parameters and local vars, same as methods plus one or two underscores where appropiate. * explicit get and set methods, so when to use property(...)? maybe an "implicit" get prefix to reduce typing ala webware? i mean, title() method returns a string for the title property, set_title should set title. thanks to all. From jjl at pobox.com Sun Nov 30 10:17:41 2003 From: jjl at pobox.com (John J Lee) Date: Sun Nov 30 10:17:57 2003 Subject: [Web-SIG] Python version of WWW::Mechanize In-Reply-To: References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> Message-ID: On Sun, 30 Nov 2003, Stuart Langridge wrote: [...] > I've put together a first cut of something that works like > WWW::Mechanize at http://www.kryogenix.org/days/2003/11/30/pybrowser. > Obviously it'll need a little more work on it, but it seems to work OK > initially. Do let me know if it doesn't seem to work! Good, some code! Some comments: Is this aimed at the standard library? xml.dom.ext.reader.HtmlLib? Unless I'm confused about it (quite likely actually, thanks to PyXML insisting on fiddling with the xml package instead of creating its own), that's not part of the standard library. Is PyXML going to be by 2.4, perhaps? Even then, would 4DOM go in? The original maintainers have dropped it, it's slow, and it's not up-to-date with the DOM level 2 spec. Personally, if I were going to depend on DOM outside the standard library, I'd want a forms interface that was higher level -- but I've already done that in DOMForm (though no browser class yet), and I guess it's a matter of taste whether you like a higher-level forms interface. What do other people think? Why isn't it a subclass of urllib.OpenerDirector (or, better, from something like my (untested sketch of a) UserAgent in http://wwwsearch.sf.net/bits/ua.py)? Certainly the interface of OpenerDirector needs to be exposed by Browser (appropriately overridden). I see no reason why it shouldn't be a subclass, in fact: composition seems like needless complication. WWW::Mechanize is a subclass of LWP::UserAgent, and the author doesn't seem to have run into any problems. And why is the method analogous to OpenerDirector.open() named .get(), when the URL might be POST, or even some completely different scheme (ftp:, file:...)? It uses urlopen, which means Browser state (eg. cookies) is global. This problem goes away if you subclass from OpenerDirector. No multipart/form-data encoding? I think there has to be some way of (optionally) linking up any browser class to tidylib. Any tests? No .forward() / .backward() methods? I think it's useful to have a separate nr argument for follow_link so you can do (as in WWW::Mechanize): browser.follow_link("download", nr=3) John From cs1spw at bath.ac.uk Sun Nov 30 13:54:59 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Sun Nov 30 13:55:07 2003 Subject: [Web-SIG] Coding conventions In-Reply-To: <3FC9D44B.2090405@interplanet.it> References: <3FC9D44B.2090405@interplanet.it> Message-ID: <3FCA3D03.7080103@bath.ac.uk> deelan wrote: > at least in the "web package" it would be cool to have some > consistency between modules. PEP 008 is an excellent Style Guide for Python code, which applies to everything in the standard library and should apply to code in the web package as well: http://www.python.org/peps/pep-0008.html From aquarius-lists at kryogenix.org Sun Nov 30 15:26:24 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Sun Nov 30 15:24:28 2003 Subject: [Web-SIG] Python version of WWW::Mechanize References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> Message-ID: John J Lee spoo'd forth: > On Sun, 30 Nov 2003, Stuart Langridge wrote: > [...] >> I've put together a first cut of something that works like >> WWW::Mechanize at http://www.kryogenix.org/days/2003/11/30/pybrowser. >> Obviously it'll need a little more work on it, but it seems to work OK >> initially. Do let me know if it doesn't seem to work! > > Good, some code! > > Some comments: > > Is this aimed at the standard library? xml.dom.ext.reader.HtmlLib? > Unless I'm confused about it (quite likely actually, thanks to PyXML > insisting on fiddling with the xml package instead of creating its own), > that's not part of the standard library. Is PyXML going to be by 2.4, > perhaps? Even then, would 4DOM go in? The original maintainers have > dropped it, it's slow, and it's not up-to-date with the DOM level 2 spec. > Personally, if I were going to depend on DOM outside the standard library, > I'd want a forms interface that was higher level -- but I've already done > that in DOMForm (though no browser class yet), and I guess it's a matter > of taste whether you like a higher-level forms interface. What do other > people think? Um. What I was looking for was something that could parse HTML (including invalid HTML) and give me a DOM tree. I tried Twisted's microdom, but settled on HtmlLib. Unfortunately, my selection criterion was the intersection of "what do I have installed on my machine" and "what comes up in a Google search for 'python html dom'" :-) I think that a DOM parser for HTML is pretty important, even if that parser *actually* just does "convert broken HTML to valid XHTML and then feed it to minidom" or something similar. Are there any others? > Why isn't it a subclass of urllib.OpenerDirector (or, better, from > something like my (untested sketch of a) UserAgent in > http://wwwsearch.sf.net/bits/ua.py)? Certainly the interface of > OpenerDirector needs to be exposed by Browser (appropriately overridden). > I see no reason why it shouldn't be a subclass, in fact: composition seems > like needless complication. WWW::Mechanize is a subclass of > LWP::UserAgent, and the author doesn't seem to have run into any problems. > And why is the method analogous to OpenerDirector.open() named .get(), > when the URL might be POST, or even some completely different scheme > (ftp:, file:...)? > > It uses urlopen, which means Browser state (eg. cookies) is global. This > problem goes away if you subclass from OpenerDirector. Because I didn't know about it. This is because "urllib.urlopen" is hardwired into my fingers, and then I just overrode it with ClientCookie when I needed cookie handling. I'm entirely happy to have it work totally differently; this was really a proof-of-concept to get the ball rolling rather than a submission for direct inclusion. > No multipart/form-data encoding? Oops. > I think there has to be some way of (optionally) linking up any browser > class to tidylib. I agree; tidylib is nice. AFAIK, though (and I probably am wrong) the only interface to Tidy is mxTidy, and I can never get it to install... > Any tests? Um, um, unit testing, I'm sure it says that on a post it note somewhere around here... > No .forward() / .backward() methods? Didn't think of them until after I sent the message out. They'd be pretty trivial to implement, though, although I don't know what you'd do about the "This page contains POSTDATA" issue that browsers get. > I think it's useful to have a separate nr argument for follow_link so you > can do (as in WWW::Mechanize): > > browser.follow_link("download", nr=3) Ha! Yes, that would be clever. I'd also like to be able to pass a compiled regex to follow_link() and form(), as well as a string. sil -- Don't panic (even if your terminals start printing "all your dialup accounts are belong to us" repeatedly) -- bambam From jjl at pobox.com Sun Nov 30 16:13:54 2003 From: jjl at pobox.com (John J Lee) Date: Sun Nov 30 16:14:08 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike [was: Re: Python version...] In-Reply-To: References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk>

Message-ID: On Sun, 30 Nov 2003, Stuart Langridge wrote: > John J Lee spoo'd forth: [...] > > Is this aimed at the standard library? xml.dom.ext.reader.HtmlLib? [...] > Um. What I was looking for was something that could parse HTML > (including invalid HTML) and give me a DOM tree. I tried Twisted's Fine, but what we're talking about here is what should go into Python's standard library. [...] > I think > that a DOM parser for HTML is pretty important, even if that parser > *actually* just does "convert broken HTML to valid XHTML and then feed > it to minidom" or something similar. Are there any others? There are lots of XML DOM implementations for Python (only one HTML DOM implementation, though: 4DOM -- and that's out of date), including the one that's already in the standard library. Parsing arbitrary HTML is hard, though (xml.dom.ext.reader.HtmlLib doesn't even manage to generate an HTML DOM from arbitrary *correct* HTML, and correct HTML is not often seen in the wild ;-). tidylib is the only sane way I know of. See below. > > Why isn't it a subclass of urllib.OpenerDirector (or, better, from [...] > Because I didn't know about it. This is because "urllib.urlopen" is > hardwired into my fingers, and then I just overrode it with > ClientCookie when I needed cookie handling. I'm entirely happy to have > it work totally differently; this was really a proof-of-concept to get > the ball rolling rather than a submission for direct inclusion. Sure (you don't mean proof-of-concept, but I know what you mean). I am rolling that ball a bit :-) [...] > > I think there has to be some way of (optionally) linking up any browser > > class to tidylib. > > I agree; tidylib is nice. AFAIK, though (and I probably am wrong) the > only interface to Tidy is mxTidy, and I can never get it to install... mxTidy is not an interface to tidylib. mxTidy hacks the old HTMLTidy source to make it into a shared library, and wraps it. tidylib is a new version, that basically does the same shared library-ization as Marc-Andre did. The difference is, it's actively maintained. There's a Python wrapper: http://utidylib.sf.net/ which depends on ctypes. Should tidylib be in the standard library? On one hand, I lean towards "no", because HTML is (in theory) on the way out. OTOH, if it's going to take another thirty years for HTML to completely go away, that may be a silly attitude to take! Opinions? If it were to be in the std. lib., I guess somebody would need to write a non-ctypes wrapper. [ctypes itself would obviously be great to have in the standard library, but that's up to Thomas Heller, and it's still under development. More importantly, it only works on Linux, Windows and MacOS X (and any other platforms that libffi is ported to).] [...] > > No .forward() / .backward() methods? > > Didn't think of them until after I sent the message out. They'd be > pretty trivial to implement, though, although I don't know what you'd > do about the "This page contains POSTDATA" issue that browsers get. [...] You're allowed to do whatever you like, really (RFC 2616 section 13.13). John From aquarius-lists at kryogenix.org Sun Nov 30 17:18:56 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Sun Nov 30 17:17:00 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike [was: Re: Python version...] References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk>

Message-ID: John J Lee spoo'd forth: > On Sun, 30 Nov 2003, Stuart Langridge wrote: >> > Is this aimed at the standard library? xml.dom.ext.reader.HtmlLib? >> Um. What I was looking for was something that could parse HTML >> (including invalid HTML) and give me a DOM tree. I tried Twisted's > > Fine, but what we're talking about here is what should go into Python's > standard library. True enough. I fear, though, that without *something* that can cope with invalid HTML, a WWW::Mechanize-style thing is going to be pretty darn hard... > [...] >> I think >> that a DOM parser for HTML is pretty important, even if that parser >> *actually* just does "convert broken HTML to valid XHTML and then feed >> it to minidom" or something similar. Are there any others? > > There are lots of XML DOM implementations for Python (only one HTML DOM > implementation, though: 4DOM -- and that's out of date), including the one > that's already in the standard library. Parsing arbitrary HTML is hard, > though (xml.dom.ext.reader.HtmlLib doesn't even manage to generate an HTML > DOM from arbitrary *correct* HTML, and correct HTML is not often seen in > the wild ;-). tidylib is the only sane way I know of. See below. *nod* Your notes on tidylib are useful -- I didn't know about it. That said, though, without it in the stdlib, it's no better than HtmlLib (well, it's maintained, true, but it's still not available to the stdlib). >> > Why isn't it a subclass of urllib.OpenerDirector (or, better, from > [...] >> Because I didn't know about it. This is because "urllib.urlopen" is >> hardwired into my fingers, and then I just overrode it with >> ClientCookie when I needed cookie handling. I'm entirely happy to have >> it work totally differently; this was really a proof-of-concept to get >> the ball rolling rather than a submission for direct inclusion. > > Sure (you don't mean proof-of-concept, but I know what you mean). Very true, yes, and thanks :) > Should tidylib be in the standard library? On one hand, I lean towards > "no", because HTML is (in theory) on the way out. OTOH, if it's going to > take another thirty years for HTML to completely go away, that may be a > silly attitude to take! Opinions? If it were to be in the std. lib., I > guess somebody would need to write a non-ctypes wrapper. I really think that HTML is not going away any time soon. Moreover, there are still issues with XHTML (like which content-type to serve it as). It's certainly reasonable to make tools only *produce* newer variants, but you have to be able to consume all kinds of invalid rubbish or you'll never be able to look at the web at all :) > [...] >> > No .forward() / .backward() methods? >> >> Didn't think of them until after I sent the message out. They'd be >> pretty trivial to implement, though, although I don't know what you'd >> do about the "This page contains POSTDATA" issue that browsers get. > [...] > > You're allowed to do whatever you like, really (RFC 2616 section 13.13). Either re-posting or not doing so are both iffy, though, hence the choice. Admittedly, you could have backward() and forward() take a repostData parameter, but you'd have to know beforehand whether you'd want to do it, since use isn't interactive. Hm. sil -- Medio tutissimus ibis. (You will travel safest in a middle course) -- family motto