From cs1spw at bath.ac.uk  Fri Oct 17 11:56:21 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 17 11:54:24 2003
Subject: [Web-SIG] Useful ideas from PHP
Message-ID: <3F901125.2010300@bath.ac.uk>

I've been working with PHP for several years, but have recently started 
to make the switch to Python for web development. There follow some 
thoughts on PHP's web development capabilities compared to Python's. PHP 
has a number of tricks that are worth borrowing for the Python standard 
library - although in my opinion the ability to embed code in HTML is 
not one of them.

Things PHP does better than Python
==================================

* $_GET, $_POST, $_COOKIE, $_FILES, $_REQUEST, $_SERVER, $_ENV
http://www.php.net/manual/en/language.variables.predefined.php

These global dictionaries provide immediate access to information sent 
from the client The first three provide information from teh query 
string, posted forms and cookies respectively. $_FILES handles uploaded 
files, $_REQUEST allows access to data regardless of where it came from 
(like Python's FieldStorage() module does at the moment) and $_SERVER 
and $_ENV are server and environment variables.

This is an improvement on Python because these arrays are consistent. 
Everything is available in a straight forward dictionary (no 
fields['name'].value oddness), there's no need to explicitly parse 
cookies from their environment variable and it's possible to tell the 
difference between POST and GET data while retaining the convenience of 
just being able to get the data without caring about the method used to 
send it.

* header(), setcookie()
http://www.php.net/manual/en/features.cookies.php

These functions allow a user to manipulate the headers being sent back 
to the user and provide an easy method for setting cookies. In Python 
CGIs you have to manually ensure you send the headers before any HTML by 
being careful with your print statements. Some kind of abstraction for 
headers is a good idea.

* Native session support with session_register and $_SESSION
http://www.php.net/manual/en/ref.session.php

This is a pretty useful feature in PHP, which could be easily replicated 
in Python. It would probably be better as a separate session module 
rather than adding it straight in to the CGI module.

Things Python does better than PHP
==================================

Pretty much everything else. Python's syntax and semantics are cleaner, 
the language is more powerful and expressive and the standard library 
for the most part is outstanding. Python's database access is cleaner as 
well. If Python only had a cleaner CGI API and a more widely available 
Apache module it could make serious inroads in to PHP's market share.

Things PHP has that Python doesn't need
=======================================

A big fuss is always made of PHP's ability to embed code straight in to 
HTML, but in practise most experienced PHP developers tend to avoid this 
feature and use some kind of templating system instead, preferring to 
separate their application logic and presentation logic. Python is 
already well served by a number of excellent template libraries such as 
Cheetah.

Cheers,

Simon Willison
http://simon.incutio.com/


From ianb at colorstudy.com  Fri Oct 17 14:48:13 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 17 14:48:18 2003
Subject: [Web-SIG] Useful ideas from PHP
In-Reply-To: <3F901125.2010300@bath.ac.uk>
Message-ID: <76816A14-00D2-11D8-AD6F-000393C2D67E@colorstudy.com>

On Friday, October 17, 2003, at 10:56 AM, Simon Willison wrote:
> I've been working with PHP for several years, but have recently 
> started to make the switch to Python for web development. There follow 
> some thoughts on PHP's web development capabilities compared to 
> Python's. PHP has a number of tricks that are worth borrowing for the 
> Python standard library - although in my opinion the ability to embed 
> code in HTML is not one of them.
>
> Things PHP does better than Python
> ==================================
>
> * $_GET, $_POST, $_COOKIE, $_FILES, $_REQUEST, $_SERVER, $_ENV
> http://www.php.net/manual/en/language.variables.predefined.php

This is really PHP vs. the Python cgi module.  Other Python web 
frameworks do most of these things (some don't differentiate between 
GET and POST variables, most use a different way of indicating files, 
and there's some other features that are sometimes included in 
frameworks and sometimes not, like access to the raw POST data or 
streaming output).

So really this is a matter of getting Python's stdlib to include some 
of the functionality that has been widely implemented elsewhere.  Or at 
least, that's one possible goal.

> * header(), setcookie()
> http://www.php.net/manual/en/features.cookies.php

AFAIK, all frameworks (besides cgi) handle this.

> These functions allow a user to manipulate the headers being sent back 
> to the user and provide an easy method for setting cookies. In Python 
> CGIs you have to manually ensure you send the headers before any HTML 
> by being careful with your print statements. Some kind of abstraction 
> for headers is a good idea.
>
> * Native session support with session_register and $_SESSION
> http://www.php.net/manual/en/ref.session.php

And most handle this as well.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From cs1spw at bath.ac.uk  Fri Oct 17 14:59:51 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 17 14:58:00 2003
Subject: [Web-SIG] Useful ideas from PHP
In-Reply-To: <76816A14-00D2-11D8-AD6F-000393C2D67E@colorstudy.com>
References: <76816A14-00D2-11D8-AD6F-000393C2D67E@colorstudy.com>
Message-ID: <3F903C27.1090407@bath.ac.uk>

Ian Bicking wrote:
> This is really PHP vs. the Python cgi module.  Other Python web 
> frameworks do most of these things (some don't differentiate between GET 
> and POST variables, most use a different way of indicating files, and 
> there's some other features that are sometimes included in frameworks 
> and sometimes not, like access to the raw POST data or streaming output).
> 
> So really this is a matter of getting Python's stdlib to include some of 
> the functionality that has been widely implemented elsewhere.  Or at 
> least, that's one possible goal.

This links in to the mail I sent to Meta-SIG a few days ago. I would 
like to see the CGI module (or its replacement in the standard library) 
define a solid interface for common web tasks and then lead by example, 
encouraging other web frameworks to implement that same interface (or 
provide a wrapper to it). This would make it far easier to move from one 
framework to another, which in turn would make the process of chosing a 
framework far less intimidating (if the chosen framework doesn't work 
out moving to another becomes an easier option).

I know of at least one precedent for this already: mod_python provides 
an interface to form variables that is modelled on cgi.FieldStorage(). 
Unfortunately, in my opinion FieldStorage isn't quite as capable as it 
needs to be (see my email comparing it to PHP).

Since PHP is almost certainly Python's biggest competitor in the web 
development arena, it makes sense to look hard at the things Python does 
well that Python's standard library (and associated software) doesn't.

Best regards,

Simon Willison


From cs1spw at bath.ac.uk  Fri Oct 17 15:02:21 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 17 15:00:57 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
Message-ID: <3F903CBD.6030508@bath.ac.uk>

The following is part of an email I sent to Meta-SIG discussing possible 
targets for the Web-SIG.

An acknowledged problem with Python for web programming at the moment is 
the sheer abundance of web development frameworks currently available - 
newcomers to Python web programming literally have their work cut out 
just evaluating the options available to them.

mod_python (the framework with which I have had the most experience) 
provides an emulattion of the cgi module's FormFields interface as part 
of the mod_python package. Other frameworks may do this as well. I think 
this provides an interesting example of how the multiple framework 
problem could be partially resolved. If the Python standard library 
included a well defined interface for common web programming tasks (such 
as accessing data from forms and cookies, sending cookies and HTTP 
headers) existing web frameworks could be encouraged to either support 
this interface natively or provide some kind of wrapper from that 
interface to the internals of their framework. This would make selecting 
a web framework a far less dauntinfg process, as code written for one 
framework would be much easier to port to another.

An interesting example of this kind of process (albeit on a much larger 
scale) is Java's Servlet API specification. This defines the interfaces 
a Java servlet container must implement, but leaves the implementation 
details up to the team implementing the spec. This means commercial and 
open source vendors can create competing servlet engines, and developers 
have great flexibility in selecting a servlet container and switching to 
a different one should they run in to problems.

I'd like to see the Web SIG define a strong standard interface for 
common web tasks, which could then be supported by Python web framework 
authors.

Best regards,

Simon Willison
http://simon.incutio.com/


From anthony at interlink.com.au  Sun Oct 19 03:03:50 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Oct 19 03:06:27 2003
Subject: [Web-SIG] HTTP digest support
Message-ID: <200310190703.h9J73p1A008870@localhost.localdomain>


I'm currently working on fixing HTTP DIGEST auth support in the stdlib.
The current support in urllib2 is utterly broken. There's a patch on SF
which fixes it for the simple case (www.python.org/sf/823328). I'm also 
working on the server  side of it - see the python CVS, 
nondist/sandbox/digestauth. Right now I have a simple server framework 
that handles straight MD5 digest auth - I have a chunk of MD5-sess done, 
and should get the rest finished in the next week or so. 

Stuff still to be added:
  server side checking of client nonce
  storing away nonces and nonce-counts to prevent replay attacks
  client side checking of Authentication-info headers
  integrating the DIGEST and BASIC auth into a single chunk of code
  other stuff I've forgotten right now

I'd _like_ for the basic HTTP handling stuff in the stdlib to have full
digest auth support "out of the box" for Python 2.4.

Anthony

From gstein at lyra.org  Wed Oct 22 19:52:17 2003
From: gstein at lyra.org (Greg Stein)
Date: Wed Oct 22 22:38:02 2003
Subject: [Web-SIG] client-side support: PEP 268
Message-ID: <20031022165217.I11797@lyra.org>

I just wanted to send a reminder that I had started a PEP a while back to
pull together a bunch of disparate HTTP client-side activities under a
coherent model.

Part of the problem was how to build an HTTP Connection object which
optional had SSL support, or additional DAV facilities, and/or proxy
support. A bit orthogonal to that was how to provide an extension system
to enable arbitrary sets of authentication systems. The default set would
be Basic and Digest, but something like client certificates would also be
"in scope" given the SSL capabilities of the module.

  http://www.python.org/peps/pep-0268.html

I'd suggest taking a look at that PEP and using it as the end-point for
the discussion of client-side changes. IOW, I'm not holding any particular
"mine mine mine" on it, so it seems like a valid way to produce a final
proposal. I also happen to think that it was heading in the Right
Direction(tm) :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From janssen at parc.com  Fri Oct 17 16:51:50 2003
From: janssen at parc.com (Bill Janssen)
Date: Wed Oct 22 22:38:16 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: Your message of "Fri, 17 Oct 2003 12:02:21 PDT."
	<3F903CBD.6030508@bath.ac.uk> 
Message-ID: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>

Simon,

It seems to me that there are three basic modules which should be in
stdlib for server-side Python programming:

1)  A good CGI module.  This should allow clear access to the various
values passed in the environment, as Simon points out.  I think the
current "cgi" module isn't bad at this, but I'm sure we can find
shortcomings.

2) A standard Apache plug-in.  Does mod_python fill this role?  (Should
this really be part of the stdlib?)  It would be useful if the APIs
used here were similar to those used in the API support.

3)  A standard stand-alone solution, but better than the three standard
servers already in the stdlib.  I been using Medusa lately, and rather
like its approach to things.

There are other pan-server things that need to be done, as well, such
as server-side SSL support in the socket module.

Bill

From janssen at parc.com  Wed Oct 22 22:47:43 2003
From: janssen at parc.com (Bill Janssen)
Date: Wed Oct 22 22:48:13 2003
Subject: [Web-SIG] client-side support: PEP 268 
In-Reply-To: Your message of "Wed, 22 Oct 2003 16:52:17 PDT."
	<20031022165217.I11797@lyra.org> 
Message-ID: <03Oct22.194750pdt."58611"@synergy1.parc.xerox.com>

Great!  Thanks, Greg.

Bill

From ianb at colorstudy.com  Wed Oct 22 22:58:03 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 22 22:58:07 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <B831E792-0504-11D8-AF55-000393C2D67E@colorstudy.com>

On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote:
> 1)  A good CGI module.  This should allow clear access to the various
> values passed in the environment, as Simon points out.  I think the
> current "cgi" module isn't bad at this, but I'm sure we can find
> shortcomings.

There's a bunch of shortcomings -- some of which aren't that big a deal 
in the CGI environment (like adding headers) but make cgi-based 
programs difficult to port to other systems.

> 2) A standard Apache plug-in.  Does mod_python fill this role?  (Should
> this really be part of the stdlib?)  It would be useful if the APIs
> used here were similar to those used in the API support.

mod_python pretty much fits this.  I don't see any reason to develop 
anything else (at least in terms of Apache integration).  I don't think 
it would make sense as part of the stdlib -- it depends on Apache just 
as much as Python, and people install Apache in all sorts of different 
ways.

> 3)  A standard stand-alone solution, but better than the three standard
> servers already in the stdlib.  I been using Medusa lately, and rather
> like its approach to things.

Twisted makes as much sense as anything.  My impression is that Medusa 
is similar, but Twisted is more actively developed.  OTOH, Twisted is 
moving out into other things -- some well defined portion of Twisted 
could be included, but certainly not everything that is distributed 
with Twisted currently.  There are also some Twistedisms, like 
Deferred, which are generic but not currently used by much of anyone 
outside Twisted.

Medusa is nice because it has a limited scope.  But that's good and 
bad.  Twisted would work great if the Twisted people wanted to make a 
small defined core, and it wouldn't work well otherwise.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Wed Oct 22 23:12:42 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 22 23:12:46 2003
Subject: [Web-SIG] Request/Response features
Message-ID: <C3D98F6A-0506-11D8-AF55-000393C2D67E@colorstudy.com>

I've very interested into getting some sort of sane request/response 
object into the Python standard library, to form the basis of an 
informal standard on how those objects should look (even if wrappers or 
adaptation are required for most frameworks).  Technically I suppose 
cgi.FieldStorage is a request object, but it's not a very good one, and 
it's very incomplete outside of CGI (e.g., output goes to sys.stdout, 
headers come from os.environ), and unusable in a threaded environment.

A useful starting point might be to summarize the features that various 
request/response implementations already have.  Thoughts?  Wild 
enthusiasm from anyone to take on the project?  If no one else is 
interested I could probably take this on.

(PS: should this SIG be announced to python-announce?)

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From cs1spw at bath.ac.uk  Wed Oct 22 23:42:51 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Wed Oct 22 23:42:55 2003
Subject: [Web-SIG] Request/Response features
In-Reply-To: <C3D98F6A-0506-11D8-AF55-000393C2D67E@colorstudy.com>
References: <C3D98F6A-0506-11D8-AF55-000393C2D67E@colorstudy.com>
Message-ID: <3F974E3B.9010109@bath.ac.uk>

Ian Bicking wrote:
> I've very interested into getting some sort of sane request/response 
> object into the Python standard library, to form the basis of an 
> informal standard on how those objects should look (even if wrappers or 
> adaptation are required for most frameworks).  Technically I suppose 
> cgi.FieldStorage is a request object, but it's not a very good one, and 
> it's very incomplete outside of CGI (e.g., output goes to sys.stdout, 
> headers come from os.environ), and unusable in a threaded environment.

I think that's an absolutely fantastic idea. Request / Response objects 
are the one thing that almost all Python web frameworks deal with in 
some way, and a standardised interface for them in the standard library 
could do a lot for improving cross-framework compatibility.

> A useful starting point might be to summarize the features that various 
> request/response implementations already have.  Thoughts?  Wild 
> enthusiasm from anyone to take on the project?  If no one else is 
> interested I could probably take this on.

I have plenty of enthusiasm, but it's coupled with youthful ignorance 
(I've been using Python for web development for just over a month). That 
said, I'm happy to contribute serious time and effort to this.

In a previous post I outlined the things that I liked about PHP's web 
interface features, which while not exactly modelled on a 
request/response object do cover the same ground. I think the most 
valuable thing PHP's treatment of this brings to the table is the 
concept of GET, POST and COOKIE dictionaries for looking up data sent by 
the client (also the REQUEST dictionary which combines the three).

One other thing I've been thinking about recently is that HTTP requests 
and HTTP responses both consist of a set of headers and a body, in a 
very similar way to MIME email messages which are already well catered 
for by the standard library. I think any standard for request/response 
objects should aim to closely match the way MIME style messages are 
handled by other parts of the standard library (in particular the email 
module).

Cheers,

Simon Willison
http://simon.incutio.com/


From ianb at colorstudy.com  Thu Oct 23 00:03:40 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 23 00:03:44 2003
Subject: [Web-SIG] Request/Response features
In-Reply-To: <C3D98F6A-0506-11D8-AF55-000393C2D67E@colorstudy.com>
Message-ID: <E2FF6678-050D-11D8-AF55-000393C2D67E@colorstudy.com>

On Wednesday, October 22, 2003, at 10:12 PM, Ian Bicking wrote:
> I've very interested into getting some sort of sane request/response 
> object into the Python standard library, to form the basis of an 
> informal standard on how those objects should look (even if wrappers 
> or adaptation are required for most frameworks).  Technically I 
> suppose cgi.FieldStorage is a request object, but it's not a very good 
> one, and it's very incomplete outside of CGI (e.g., output goes to 
> sys.stdout, headers come from os.environ), and unusable in a threaded 
> environment.

I should actually append to this, on the pyweb list (archives at 
http://www.amk.ca/pipermail/pyweb/ ) I proposed a request/response 
interface, as a discussion starter if nothing else.  Well, constructive 
discussion didn't actually ensue, but the spec still exists.  In 
retrospect I shouldn't have tried to include anything outside of the 
request and response, as it's distracting and more controversial.  The 
interface I wrote is at:

   http://colorstudy.com/~ianb/IHTTP_01.py

If rewriting it, I'd probably just put the response in the request 
instead of having a transaction object -- the interface would consist 
purely of HTTPRequest and HTTPResponse.  I might remove some other 
methods as well, to make adapting/wrapping from other frameworks 
easier.  Anyway, it could serve as a starting point for other interface 
specifications.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From davidf at sjsoft.com  Thu Oct 23 02:19:56 2003
From: davidf at sjsoft.com (David Fraser)
Date: Thu Oct 23 02:20:36 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <B831E792-0504-11D8-AF55-000393C2D67E@colorstudy.com>
References: <B831E792-0504-11D8-AF55-000393C2D67E@colorstudy.com>
Message-ID: <3F97730C.70107@sjsoft.com>

Ian Bicking wrote:

> On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote:
>
>> 1)  A good CGI module.  This should allow clear access to the various
>> values passed in the environment, as Simon points out.  I think the
>> current "cgi" module isn't bad at this, but I'm sure we can find
>> shortcomings.
>
> There's a bunch of shortcomings -- some of which aren't that big a 
> deal in the CGI environment (like adding headers) but make cgi-based 
> programs difficult to port to other systems.
>
>> 2) A standard Apache plug-in.  Does mod_python fill this role?  (Should
>> this really be part of the stdlib?)  It would be useful if the APIs
>> used here were similar to those used in the API support.
>
> mod_python pretty much fits this.  I don't see any reason to develop 
> anything else (at least in terms of Apache integration).  I don't 
> think it would make sense as part of the stdlib -- it depends on 
> Apache just as much as Python, and people install Apache in all sorts 
> of different ways.

Yes, in Apache, mod_python is pretty much it. As far as the API goes, I 
think mod_python is an important one to look at at the design stage 
rather than trying to fit an API to it later, since Apache is fairly 
standard and mod_python is used by lots of different people. You don't 
want mod_python to have to be rewritten to comply with the API later.

>> 3)  A standard stand-alone solution, but better than the three standard
>> servers already in the stdlib.  I been using Medusa lately, and rather
>> like its approach to things.
>
>
> Twisted makes as much sense as anything.  My impression is that Medusa 
> is similar, but Twisted is more actively developed.  OTOH, Twisted is 
> moving out into other things -- some well defined portion of Twisted 
> could be included, but certainly not everything that is distributed 
> with Twisted currently.  There are also some Twistedisms, like 
> Deferred, which are generic but not currently used by much of anyone 
> outside Twisted.
>
> Medusa is nice because it has a limited scope.  But that's good and 
> bad.  Twisted would work great if the Twisted people wanted to make a 
> small defined core, and it wouldn't work well otherwise.

I haven't used Medusa, but I have used Twisted and the standard Python 
libraries.
Some notes:
1) Twisted is definitely too complex to include. The question is, would 
it be possible to rip out a simple web server from Twisted or would it 
require a whole lot of extras that don't fit in the standard libraries? 
This may amount to a re-write.
2) Actually, having something really simple with limited functionality 
is great, particularly if it uses a standard API that more complex 
servers support. This would allow people to develop/test/install with 
just the basic Python libraries. I actually think the standard servers 
would be fine if they were cleaned up and extended a bit.
3) It's important to define what basic functionality will be required by 
the API, and what extra functionality will be defined by it. I would 
suggest the following:
- url handling
- get/post argument support, in standard dictionaries
- cookie support, in standard dictionaries
- flexible request/response support

David


From aquarius-lists at kryogenix.org  Thu Oct 23 04:45:52 2003
From: aquarius-lists at kryogenix.org (Stuart Langridge)
Date: Thu Oct 23 04:45:04 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
References: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <E1ACb6O-0006W0-00@giles>

Bill Janssen spoo'd forth:
> Simon,
> 
> It seems to me that there are three basic modules which should be in
> stdlib for server-side Python programming:
> 
> 1)  A good CGI module.  This should allow clear access to the various
> values passed in the environment, as Simon points out.  I think the
> current "cgi" module isn't bad at this, but I'm sure we can find
> shortcomings.

Not too many, though, I wouldn't say. I think that the cgi module
shouldn't be used much by people; it's a building block, some
infrastructure. Like, say, SocketServer -- you can use it if you want
low level access, but most people use something constructed upon it.
 
> 2) A standard Apache plug-in.  Does mod_python fill this role?  (Should
> this really be part of the stdlib?)  It would be useful if the APIs
> used here were similar to those used in the API support.

Like you say, mod_python is pretty much the only option, but I wouldn't
have tought htat it should be co-opted into the stdlib; how would it be
set up? I can imagine modules that *use* mod_python if you have it (or
does the stdlib have to be closed?) but not mod_python itself.
 
> 3)  A standard stand-alone solution, but better than the three standard
> servers already in the stdlib.  I been using Medusa lately, and rather
> like its approach to things.

This is a bit of a holy war sort of question, though, isn't it? Some
people will like Medusa, some will like Twisted...

sil

-- 
Writing software is, in fact, like dancing to frozen music.
	   -- mewse

From davidf at sjsoft.com  Thu Oct 23 04:53:17 2003
From: davidf at sjsoft.com (David Fraser)
Date: Thu Oct 23 04:53:27 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <E1ACb6O-0006W0-00@giles>
References: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>
	<E1ACb6O-0006W0-00@giles>
Message-ID: <3F9796FD.6050003@sjsoft.com>

Stuart Langridge wrote:

>Bill Janssen spoo'd forth: 
>  
>
>>3)  A standard stand-alone solution, but better than the three standard
>>servers already in the stdlib.  I been using Medusa lately, and rather
>>like its approach to things.
>>    
>>
>This is a bit of a holy war sort of question, though, isn't it? Some
>people will like Medusa, some will like Twisted...
>  
>
Maybe for this reason we should stick to the existing HTTP server in 
stdlib, but fix it up and improve it and change it to match the new API

David


From davidf at sjsoft.com  Thu Oct 23 06:05:52 2003
From: davidf at sjsoft.com (David Fraser)
Date: Thu Oct 23 06:06:02 2003
Subject: [Web-SIG] Request/Response features
In-Reply-To: <E2FF6678-050D-11D8-AF55-000393C2D67E@colorstudy.com>
References: <E2FF6678-050D-11D8-AF55-000393C2D67E@colorstudy.com>
Message-ID: <3F97A800.70106@sjsoft.com>

Ian Bicking wrote:

> On Wednesday, October 22, 2003, at 10:12 PM, Ian Bicking wrote:
>
>> I've very interested into getting some sort of sane request/response 
>> object into the Python standard library, to form the basis of an 
>> informal standard on how those objects should look (even if wrappers 
>> or adaptation are required for most frameworks).  Technically I 
>> suppose cgi.FieldStorage is a request object, but it's not a very 
>> good one, and it's very incomplete outside of CGI (e.g., output goes 
>> to sys.stdout, headers come from os.environ), and unusable in a 
>> threaded environment.
>
>
> I should actually append to this, on the pyweb list (archives at 
> http://www.amk.ca/pipermail/pyweb/ ) I proposed a request/response 
> interface, as a discussion starter if nothing else.  Well, 
> constructive discussion didn't actually ensue, but the spec still 
> exists.  In retrospect I shouldn't have tried to include anything 
> outside of the request and response, as it's distracting and more 
> controversial.  The interface I wrote is at:
>
>   http://colorstudy.com/~ianb/IHTTP_01.py
>
> If rewriting it, I'd probably just put the response in the request 
> instead of having a transaction object -- the interface would consist 
> purely of HTTPRequest and HTTPResponse.  I might remove some other 
> methods as well, to make adapting/wrapping from other frameworks 
> easier.  Anyway, it could serve as a starting point for other 
> interface specifications. 

Hi

Had a look at this, it's nice for a start. However I agree with you that 
the transaction interface is confusing... for example, what does 
"setTransaction" mean/do?
Some other comments:
pathInfo/requestURI
  would be good to have some consistency between these names
getFields
  I don't think the ordering is generally important to people, so why 
not ignore it, because if people want to preserve it, they can always 
write some code to do that, but it's hardly needed as default functionality.
getFieldDict
  It would be great if the user could set the behaviour they want for 
multiple keys.
  I know I *always* want to discard any extra values. Including an 
option to do this rather than return a list would prevent lots of people 
doing post-processing

General comment here: there are quite a few different methods to handle 
getting/setting get/post fields. Perhaps this would be made simpler by 
using a standard dictionary interface. That would also clear up 
confusion about what parameters to pass to setFieldDict etc. Another 
question is whether people really need get and post arguments to be 
processed differently.

Also, is it neccessary for all attributes to be accessed by methods? 
Particularly (no pun intented) things like "method", "time" would seem 
to make more sense as attributes. If anyone really needs to run some 
code to access them,

The input method seems strange. Perhaps this should be called read? In 
general, there needs to be a clear separation between low-level 
accessing of the request stream, and higher-level accessing of processed 
get/post fields. Perhaps a way to do this would be to analyse how the 
most popular existing servers do things, then define a set of low-level 
methods which would cover their functionality. If this was done well, 
the higher-level methods could be written so that they always fall back 
to use the underlying low-level methods if they aren't overridden, so at 
least people only have to implement basic functionality to match the API.

Anyway those are just a few thoughts.

David


From davidf at sjsoft.com  Thu Oct 23 06:08:36 2003
From: davidf at sjsoft.com (David Fraser)
Date: Thu Oct 23 06:08:42 2003
Subject: [Web-SIG] Request/Response features
In-Reply-To: <3F974E3B.9010109@bath.ac.uk>
References: <C3D98F6A-0506-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F974E3B.9010109@bath.ac.uk>
Message-ID: <3F97A8A4.9020703@sjsoft.com>

Simon Willison wrote:

> Ian Bicking wrote:
>
>> I've very interested into getting some sort of sane request/response 
>> object into the Python standard library, to form the basis of an 
>> informal standard on how those objects should look (even if wrappers 
>> or adaptation are required for most frameworks).  Technically I 
>> suppose cgi.FieldStorage is a request object, but it's not a very 
>> good one, and it's very incomplete outside of CGI (e.g., output goes 
>> to sys.stdout, headers come from os.environ), and unusable in a 
>> threaded environment.
>
>
> I think that's an absolutely fantastic idea. Request / Response 
> objects are the one thing that almost all Python web frameworks deal 
> with in some way, and a standardised interface for them in the 
> standard library could do a lot for improving cross-framework 
> compatibility.

Absolutely. I am busy constructing a toolkit (jtoolkit.sourceforge.net) 
for web applications, and have been looking at making it compatible with 
more than one web framework (so far mod_python and have started work on 
a standalone HTTP server). Having a standard interface to 
requests/responses would make life a lot easier.
In particular, having a standard HTTP server included with Python that 
supports these interfaces so applications can be tested without any 
other software.

>> A useful starting point might be to summarize the features that 
>> various request/response implementations already have.  Thoughts?  
>> Wild enthusiasm from anyone to take on the project?  If no one else 
>> is interested I could probably take this on.
>
> I have plenty of enthusiasm, but it's coupled with youthful ignorance 
> (I've been using Python for web development for just over a month). 
> That said, I'm happy to contribute serious time and effort to this.
>
> In a previous post I outlined the things that I liked about PHP's web 
> interface features, which while not exactly modelled on a 
> request/response object do cover the same ground. I think the most 
> valuable thing PHP's treatment of this brings to the table is the 
> concept of GET, POST and COOKIE dictionaries for looking up data sent 
> by the client (also the REQUEST dictionary which combines the three).

I think it's important that while we can take things from PHP, they need 
to be rethought to apply to a Python context... but the dictionaries 
sound great.

> One other thing I've been thinking about recently is that HTTP 
> requests and HTTP responses both consist of a set of headers and a 
> body, in a very similar way to MIME email messages which are already 
> well catered for by the standard library. I think any standard for 
> request/response objects should aim to closely match the way MIME 
> style messages are handled by other parts of the standard library (in 
> particular the email module).

Good point. The main difference is that in parsing email messages, you 
often have the whole message available, whereas in HTTP you need to be 
able to handle parts at a time (for example, when uploading files)

David


From amk at amk.ca  Thu Oct 23 07:13:31 2003
From: amk at amk.ca (amk@amk.ca)
Date: Thu Oct 23 07:13:36 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>
References: <3F903CBD.6030508@bath.ac.uk>
	<03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <20031023111331.GA7516@rogue.amk.ca>

On Fri, Oct 17, 2003 at 01:51:50PM -0700, Bill Janssen wrote:
> 1)  A good CGI module.  This should allow clear access to the various
> values passed in the environment, as Simon points out.  I think the
> current "cgi" module isn't bad at this, but I'm sure we can find
> shortcomings.

* Too much cruft.  We could either deprecate stuff in cgi.py with a
  vengeance, or think up some new package organization.

> 2) A standard Apache plug-in.  Does mod_python fill this role?  (Should
> this really be part of the stdlib?)  

Too much work for the stdlib.  Apache support suffers from the split between
Apache versions 1.3 and 2.0; the API changed a *lot* between the two
versions, but both versions are still pretty common.  Leave it to mod_python.

> 3)  A standard stand-alone solution, but better than the three standard
> servers already in the stdlib.  I been using Medusa lately, and rather
> like its approach to things.

The problem is that the code in the Medusa package is written really
unconventionally -- classes have lowercase names, it's still 1.5 (and often
1.4!) compatible -- and there's a lot of cruft here, too; it's often not
clear which modules are intended for actual use and which ones are
half-baked experiments.  This could be cleaned up if it's deemed worth the
effort; I initially didn't want to embark on a big class renaming because I
thought Twisted would quickly and completely replace Medusa, but that seems
unlikely to happen.

--amk

From aquarius-lists at kryogenix.org  Thu Oct 23 07:51:18 2003
From: aquarius-lists at kryogenix.org (Stuart Langridge)
Date: Thu Oct 23 07:51:08 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
References: <3F903CBD.6030508@bath.ac.uk>
	<03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>
	<20031023111331.GA7516@rogue.amk.ca>
Message-ID: <E1ACdzq-0007Vq-00@giles>

amk@amk.ca spoo'd forth:
> On Fri, Oct 17, 2003 at 01:51:50PM -0700, Bill Janssen wrote:
>> 1)  A good CGI module.  This should allow clear access to the various
>> values passed in the environment, as Simon points out.  I think the
>> current "cgi" module isn't bad at this, but I'm sure we can find
>> shortcomings.
> 
> * Too much cruft.  We could either deprecate stuff in cgi.py with a
>  vengeance, or think up some new package organization.

It would be useful to know if anyone is still *using* any of the old
backwards-compatible stuff. The cgi.py API hasn't changed very much
since 1.5, has it? (I might be hideously wrong here.) If not, then
deprecating all the previous ways it used to work would probably be a
good idea; there can't be that many people still using code that old?
(I admit that this sort of assertion has a habit of coming back and
biting you, mind.)

sil

-- 
"Last week, I arrived in Sunnydale. Or perhaps it was the week before,
I don't know."
	   -- Buffy, as written by Albert Camus
	      Certic, <1004299876.18870.0.nnrp-12.9e98b74c@news.demon.co.uk>

From sholden at holdenweb.com  Thu Oct 23 08:38:39 2003
From: sholden at holdenweb.com (Steve Holden)
Date: Thu Oct 23 08:43:16 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <3F97730C.70107@sjsoft.com>
Message-ID: <CGECIJPNNHIFAJKHOLMACEMMILAA.sholden@holdenweb.com>

[David Fraser]
>
> Ian Bicking wrote:
>
> > On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote:
> >
[...]
> >
> >> 2) A standard Apache plug-in.  Does mod_python fill this
> role?  (Should
> >> this really be part of the stdlib?)  It would be useful if the APIs
> >> used here were similar to those used in the API support.
> >
> > mod_python pretty much fits this.  I don't see any reason
> to develop
> > anything else (at least in terms of Apache integration).  I don't
> > think it would make sense as part of the stdlib -- it depends on
> > Apache just as much as Python, and people install Apache in
> all sorts
> > of different ways.
>
> Yes, in Apache, mod_python is pretty much it. As far as the
> API goes, I
> think mod_python is an important one to look at at the design stage
> rather than trying to fit an API to it later, since Apache is fairly
> standard and mod_python is used by lots of different people.
> You don't
> want mod_python to have to be rewritten to comply with the API later.
>
I'm not sure that we should be arguing to include something that depends
on a specific environment like Apache in the standard library. We should
certainly be trying to promote a standard of some sort, however, which
seems to conflict.

I see the parallel more as being with the DB API - there are Oracle
modules and ODBC modules (which are cross-engine) and SQL Server modules
and so on. What we need is something to provide closely similar
interfaces to different web server engines - whether those engines are
in pure Python or external components.

The one problem I see with mod_python is its defaulting behavior - you
can get the same content several different ways. Specifically, the
following URLs

	http://server/
	http://server/index.py
	http://server/index.py.index

all refer to the same content, and this makes it rather difficult to
come up with a scheme for producing sensible relative URLs -- the
browsers don't always interpret the path the same way the server does --
which in turn can make it difficult to produce easily portable web
content.

While this is probably not an issue for the standard library I'd like to
know whether anyone has actually addressed the problem. My own solution
is to canonicalise everything to be explicit, but if there's an easier
way I'd love to hear it.

regards
--
Steve Holden          +1 703 278 8281        http://www.holdenweb.com/
Improve the Internet           http://vancouver-webpages.com/CacheNow/
Python Web Programming                http://pydish.holdenweb.com/pwp/
Interview with GvR August 14, 2003       http://www.onlamp.com/python/


From sholden at holdenweb.com  Thu Oct 23 08:49:32 2003
From: sholden at holdenweb.com (Steve Holden)
Date: Thu Oct 23 08:54:17 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <20031023111331.GA7516@rogue.amk.ca>
Message-ID: <CGECIJPNNHIFAJKHOLMACEMNILAA.sholden@holdenweb.com>

[amk]
>
> On Fri, Oct 17, 2003 at 01:51:50PM -0700, Bill Janssen wrote:
> > 1)  A good CGI module.  This should allow clear access to
> the various
> > values passed in the environment, as Simon points out.  I think the
> > current "cgi" module isn't bad at this, but I'm sure we can find
> > shortcomings.
>
> * Too much cruft.  We could either deprecate stuff in cgi.py with a
>   vengeance, or think up some new package organization.
>
My own preference would be for a new package altogether. The existing
module would be difficult to engineer onwards into something clean. Like
Topsy, it "just growed".

> > 2) A standard Apache plug-in.  Does mod_python fill this
> role?  (Should
> > this really be part of the stdlib?)
>
> Too much work for the stdlib.  Apache support suffers from
> the split between
> Apache versions 1.3 and 2.0; the API changed a *lot* between the two
> versions, but both versions are still pretty common.  Leave
> it to mod_python.
>
Agreed.

> > 3)  A standard stand-alone solution, but better than the
> three standard
> > servers already in the stdlib.  I been using Medusa lately,
> and rather
> > like its approach to things.
>
> The problem is that the code in the Medusa package is written really
> unconventionally -- classes have lowercase names, it's still
> 1.5 (and often
> 1.4!) compatible -- and there's a lot of cruft here, too;
> it's often not
> clear which modules are intended for actual use and which ones are
> half-baked experiments.  This could be cleaned up if it's
> deemed worth the
> effort;

I think it would be worth the effort. I don't think Medusa has had the
concerted support that other environments have, and that's a pity
because it appears to strike an excellent balance between complexity,
efficiency and capability. I'd be prepared to help in such an effort
(once PyCon is back on track).

> I initially didn't want to embark on a big class
> renaming because I
> thought Twisted would quickly and completely replace Medusa,
> but that seems
> unlikely to happen.
>
Well, if those Twisted guys would stop implementing neat ideas and do
some serious work explaining the structure of the framework they would
probably find their code was more widely used. I suspect it will take
Twisted a long time to mature because the developers are who and what
they are. Their enthusiasm is admirable, but sometimes I get a bit
annoyed by the hand waving :-)

My experience is that people who've been walked through the Twisted code
one-to-one by a Twisted developer "get it", but that just reading the
docs or listening to conference presentations doesn't cut the mustard.
Or maybe that's just me...

regards
--
Steve Holden          +1 703 278 8281        http://www.holdenweb.com/
Improve the Internet           http://vancouver-webpages.com/CacheNow/
Python Web Programming                http://pydish.holdenweb.com/pwp/
Interview with GvR August 14, 2003       http://www.onlamp.com/python/


From amk at amk.ca  Thu Oct 23 09:58:57 2003
From: amk at amk.ca (amk@amk.ca)
Date: Thu Oct 23 09:59:07 2003
Subject: [Web-SIG] Client-side support: what are we aiming for?
Message-ID: <20031023135857.GA8007@rogue.amk.ca>

What's the scope of improving client-side HTTP support?  

I suggest aiming for something you could write a web browser or web scraper
on top of. That means storing and returning cookies from the server, writing
them to a file, and a page cache that handles HTTP's cache expiration rules.
HTML formatting is out of scope, but a specialized parser for extracting a
list of form elements or for picking apart a table might not be.

Does anyone want to produce a feature list and proposed design?

--amk

From cs1spw at bath.ac.uk  Thu Oct 23 10:25:07 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Thu Oct 23 10:25:18 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <E1ACdzq-0007Vq-00@giles>
References: <3F903CBD.6030508@bath.ac.uk>	<03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>	<20031023111331.GA7516@rogue.amk.ca>
	<E1ACdzq-0007Vq-00@giles>
Message-ID: <3F97E4C3.4070704@bath.ac.uk>

Stuart Langridge wrote:
> It would be useful to know if anyone is still *using* any of the old
> backwards-compatible stuff. The cgi.py API hasn't changed very much
> since 1.5, has it? (I might be hideously wrong here.) If not, then
> deprecating all the previous ways it used to work would probably be a
> good idea; there can't be that many people still using code that old?
> (I admit that this sort of assertion has a habit of coming back and
> biting you, mind.)

Actually, I wrote an application using the cgi module this week - it's 
just been deployed as the system to manage http://coupons.lawrence.com/ :)

I like the idea that has been suggested before of creating a new 'web' 
packages, similar to the email one. The current cgi module could be left 
as it is (and marked as deprecated but left in the library), the new 
interface could live at web.cgi.

Cheers,

Simon Willison


From aquarius-lists at kryogenix.org  Thu Oct 23 10:42:15 2003
From: aquarius-lists at kryogenix.org (Stuart Langridge)
Date: Thu Oct 23 10:41:37 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
References: <3F903CBD.6030508@bath.ac.uk>
	<03Oct17.135159pdt."58611"@synergy1.parc.xerox.com>
	<20031023111331.GA7516@rogue.amk.ca> <E1ACdzq-0007Vq-00@giles>
	<3F97E4C3.4070704@bath.ac.uk>
Message-ID: <E1ACgfH-0008Kl-00@giles>

Simon Willison spoo'd forth:
> Stuart Langridge wrote:
>> It would be useful to know if anyone is still *using* any of the old
>> backwards-compatible stuff. The cgi.py API hasn't changed very much
>> since 1.5, has it? (I might be hideously wrong here.) If not, then
>> deprecating all the previous ways it used to work would probably be a
>> good idea; there can't be that many people still using code that old?
>> (I admit that this sort of assertion has a habit of coming back and
>> biting you, mind.)
> 
> Actually, I wrote an application using the cgi module this week - it's 
> just been deployed as the system to manage http://coupons.lawrence.com/ :)

Oh, blimey, I didn't mean deprecate the whole module, prqactically
everything I write uses it :) It seems to have a lot of very
backwards-compatible stuff in it, like everything that came before
FieldStorage, which is what I was talking about removing. I don't think
that there's anything significant that you can't *do* with it, is
there? Just that it's not all that convenient to do anything. So a
"web" module, analagous to "email", as Simon suggested, seems like a
great idea to me; a higher-level abstraction layer over the cgi module.

sil

-- 
A man, a plan, a canoe, pasta, heros, rajahs, a coloratura, maps,
snipe, percale, macaroni, a gag, a banana bag, a tan, a tag, a banana
bag again (or a camel), a crepe, pins, Spam, a rut, a Rolo, cash, a
jar, sore hats, a peon, a canal -- Panama!

From neel at mediapulse.com  Thu Oct 23 10:59:20 2003
From: neel at mediapulse.com (Michael C. Neel)
Date: Thu Oct 23 10:59:26 2003
Subject: [Web-SIG] Python and the web
Message-ID: <C0FC22C08B82074A88B500617641577787A79D@johnson.mediapulse.net>

Hi all,

	First a short introduction of myself.  I work for a web
development company and we have been focused on using python for
projects now for about one year.  I tend to lurk in python mailing lists
I have no business being in (such as the mod_python dev list =) ).  My
views here are to represent the end user of the python web related
tools, i.e. the web programmer.  I have personally launched about 10
sites now that are python driven, many are an
Apache+mod+python+Albatross+MySQL engine.

	The first question I have to everyone is the scope of the group.
There are some tasks common to the web programmer, but might be off
topic from what I'm seeing here.  Two examples are templating and
parsing SGML based documents (HTML, XML, etc).  It would be nice if
python included a basic templating module, but I wouldn't expect it to
be very powerful.  When heavy firepower is needed projects like
Albatross and PSP (being integrated into mod_python) are a better
solution.  However, sometimes a simple system is all that is needed,
lightweight and fast.  The module by Greg Stein, ezt.py, is a good
example of what I think would be handy in the stdlib.

	Parsing files might be too much for this focus, as it's a very
large task.  Still, more and more the web developer is faced with
reading in XML and applying a style sheet to it or otherwise formatting
the data.  Python is billed as batteries included, and granted this is a
car battery of a module, but it would still be nice to help out the web
developer here with the stdlib more.  Too often I find myself developing
a custom parsing engine for reading in some HTML or XML files.  Isn't
the point of a standard format so we can use standard tools with it?
Yes, I'm quite aware that, while XML is a standard, the term is applied
loosely =p.  This is a problem larger than python, as all languages seem
to be wrestling with this; how cool would it be for python to be the
first to have a really powerful, yet simple solution?

	For the CGI module, I can't comment - I've never used it.  Our
decision that python was ready for the big time here was based mostly on
mod_python's ability.  CGI is dead to us as a viable option; it simply
does not scale.  While you can use tools to string it along, like
fastCGI and co., working closely with the server API is going to be the
best gain for effort in the performance area.  For this same reason we
also skipped over mod_python's publisher handler (which is where the
relative URL complains come in - it's worth note that is something not
mod_python as a whole but just publisher).

	For client side http in python, I've been impressed with how
clean and simple it is.  Getting a file across http is no harder (in
fact easier imho) than a local disk file.  Now dealing with the file is
a different story, see the above on parsing.

	For a http server module, this is not a great need for myself
but it would still be good to have.  My idea would be a server class
that you derived a server from, overriding the phases of the request you
needed to work with, al'a the way Apache works.  Something like:

class MyServer(HTTPServer):

	def authhandler(self, req):
		if self.validate(req.user, req.pass):
			return true
		else:
			return false

	def handler(self, req):
		page = req.uri.filename
		try:
			req.send(open(page,'r').read())
			return true
		execpt:
			return false

	That's basic, but if you've worked with the apache API in
mod_python, mod_perl, or C you get the idea.  Also it would be nice if
the default handlers provided a working server, if some options were set
like a DocumentRoot:

class MyServer(HTTPServer):
	documentroot = "/var/www/html"

	I would say that Apache's 1.3 API should be a better goal, and
leave out the new features in 2.0 API.  First is the KISS principle,
next is we should be trying to replace Apache but rather provide a
reasonable useful web server in the stdlib.  Also, if someone needs a
feature of the 2.0 style API, they can always add that in the derived
class.

	The last thing to point out is using a request object is
important, as others mentioned here.  With a standard request object,
other tools like Albatross, can easily tie into this new server.

Look forward to comments and where this goes!

Mike


From grisha at modpython.org  Thu Oct 23 11:17:26 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Thu Oct 23 11:17:30 2003
Subject: [Web-SIG] some thoughts
Message-ID: <20031023105912.V10747@onyx.ispol.com>


Hello all -

The first point of this message it to make it known that I am on this list
(partially wearing my mod_python hat) and listening attentively.

Second, I do agree with those who said mod_python does not belong in
stdlib. (Not unless Python becomes an ASF project or Apache becomes a PSF
project...). Mod_python is a lot more about Apache than it is about
Python, and it is far more complex than it would seem at first sight.

What I would really like to see come out of this SIG is an agreement to
work towards developing a set of standards, rather than a bunch of code.

The following things could be standardized:

1. "Publishing" a la mod_python's publisher or Zope's ZPublisher (Bobo)
2. Request/Response interface
3. Python Server Pages (Right now mod_python and webware have a similar
syntax, but not the same). Mod_python's flex-based PSP might actually be
more appropriately placed into stdlib, rather than be part of mp.
4. PSTL, i.e. XML-compliant tag-based server pages. AFAIK nothing mature
of this sort exists.

Grisha

From ianb at colorstudy.com  Thu Oct 23 11:21:17 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 23 11:21:20 2003
Subject: [Web-SIG] Request/Response features
In-Reply-To: <3F97A800.70106@sjsoft.com>
Message-ID: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>

On Thursday, October 23, 2003, at 05:05 AM, David Fraser wrote:
>> The interface I wrote is at:
>>
>>   http://colorstudy.com/~ianb/IHTTP_01.py
 >
> Had a look at this, it's nice for a start. However I agree with you 
> that the transaction interface is confusing... for example, what does 
> "setTransaction" mean/do?

Some of the methods were for setting up the request, or modifying the 
request so it can be forwarded internally.  It might be fine to leave 
the request/response setup undefined -- it would be defined by the 
context, e.g., cgi would set it up one way, mod_python another, etc.  
For forwarding I think it might be better to simply create new object 
that would be reinjected into the framework.

> Some other comments:
> pathInfo/requestURI
>  would be good to have some consistency between these names

They are mostly based off their CGI environment equivalent.

> getFields
>  I don't think the ordering is generally important to people, so why 
> not ignore it, because if people want to preserve it, they can always 
> write some code to do that, but it's hardly needed as default 
> functionality.

I agree, it should go.

> getFieldDict
>  It would be great if the user could set the behaviour they want for 
> multiple keys.
>  I know I *always* want to discard any extra values. Including an 
> option to do this rather than return a list would prevent lots of 
> people doing post-processing

That seems to difficult to define.  I don't think there should be 
customizations, because that makes it too difficult to work in a 
heterogeneous environment.  If you turn that setting on and some 
application you are using needs it off, then you get a configuration 
mess.  Wrappers could provide more friendly interfaces.

> General comment here: there are quite a few different methods to 
> handle getting/setting get/post fields. Perhaps this would be made 
> simpler by using a standard dictionary interface. That would also 
> clear up confusion about what parameters to pass to setFieldDict etc. 
> Another question is whether people really need get and post arguments 
> to be processed differently.

People do need to access them separately, as that's a common feature 
request.  Usually they'd be accessing some combined version of those, 
but the option should be there.

> Also, is it neccessary for all attributes to be accessed by methods? 
> Particularly (no pun intented) things like "method", "time" would seem 
> to make more sense as attributes. If anyone really needs to run some 
> code to access them,

I wrote the interface with wrappers in mind, and I thought purely using 
methods would be easier and more explicit.

> The input method seems strange. Perhaps this should be called read? In 
> general, there needs to be a clear separation between low-level 
> accessing of the request stream, and higher-level accessing of 
> processed get/post fields. Perhaps a way to do this would be to 
> analyse how the most popular existing servers do things, then define a 
> set of low-level methods which would cover their functionality. If 
> this was done well, the higher-level methods could be written so that 
> they always fall back to use the underlying low-level methods if they 
> aren't overridden, so at least people only have to implement basic 
> functionality to match the API.

I guess there's two ways you could go with that -- if a method is 
derivative of other methods, then just leave it out and let a wrapper 
implement it.  But that doesn't work particularly well if we want to 
use the request/response as part of the standard library (without any 
wrapper in the library).  So an abstract base class might be a good 
idea, with subclasses implementing the actual construction and some of 
the basic methods.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From barry at python.org  Thu Oct 23 11:37:13 2003
From: barry at python.org (Barry Warsaw)
Date: Thu Oct 23 11:37:18 2003
Subject: [Web-SIG] some thoughts
In-Reply-To: <20031023105912.V10747@onyx.ispol.com>
References: <20031023105912.V10747@onyx.ispol.com>
Message-ID: <1066923432.11634.132.camel@anthem>

On Thu, 2003-10-23 at 11:17, Gregory (Grisha) Trubetskoy wrote:

> What I would really like to see come out of this SIG is an agreement to
> work towards developing a set of standards, rather than a bunch of code.

/Some/ code wouldn't hurt, but I definitely agree that the early focus
of the SIG should be on standards, much like the db-sig came up with
DB-API 1.0 and 2.0.  E.g. I'd really like for my CGI based scripts to be
written against a CGI-API that would Just Work in mod_python, Twisted,
Zope, CGIHTTPServer, etc, etc.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/web-sig/attachments/20031023/3027e0db/attachment.bin
From ianb at colorstudy.com  Thu Oct 23 12:20:31 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 23 12:20:42 2003
Subject: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <20031023135857.GA8007@rogue.amk.ca>
Message-ID: <D2A2E2EE-0574-11D8-A49B-000393C2D67E@colorstudy.com>

On Thursday, October 23, 2003, at 08:58 AM, amk@amk.ca wrote:
> What's the scope of improving client-side HTTP support?
>
> I suggest aiming for something you could write a web browser or web 
> scraper
> on top of. That means storing and returning cookies from the server, 
> writing
> them to a file, and a page cache that handles HTTP's cache expiration 
> rules.
> HTML formatting is out of scope, but a specialized parser for 
> extracting a
> list of form elements or for picking apart a table might not be.
>
> Does anyone want to produce a feature list and proposed design?

ClientCookie and ClientForm (http://wwwsearch.sourceforge.net) seem 
like a possible starting point.  I haven't used them much, but they 
seem like they resist being a framework (which is a good thing) and 
just do their one job.

I don't think you could build a browser on top of them (though that 
doesn't even apply to ClientForm, which is more of a browser 
alternative).  But if you added caching and authentication into 
ClientCookie that would probably be a reasonable basis (maybe 
authentication is already there, I don't know).  Looking at 
ClientCookie just a little more, it could even be integrated directly 
into urllib2 (it mostly matches that API already).  Really, all of this 
could go into urllib2...

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Thu Oct 23 12:27:25 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 23 12:27:31 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <CGECIJPNNHIFAJKHOLMACEMMILAA.sholden@holdenweb.com>
Message-ID: <C94AD708-0575-11D8-A49B-000393C2D67E@colorstudy.com>

On Thursday, October 23, 2003, at 07:38 AM, Steve Holden wrote:
> I'm not sure that we should be arguing to include something that 
> depends
> on a specific environment like Apache in the standard library. We 
> should
> certainly be trying to promote a standard of some sort, however, which
> seems to conflict.
>
> I see the parallel more as being with the DB API - there are Oracle
> modules and ODBC modules (which are cross-engine) and SQL Server 
> modules
> and so on. What we need is something to provide closely similar
> interfaces to different web server engines - whether those engines are
> in pure Python or external components.

This was my idea of what the request/response stdlib classes could 
accomplish -- if not a formal specification, at least a reference 
implementation which other people could use as a model.

> The one problem I see with mod_python is its defaulting behavior - you
> can get the same content several different ways. Specifically, the
> following URLs
>
> 	http://server/
> 	http://server/index.py
> 	http://server/index.py.index
>
> all refer to the same content, and this makes it rather difficult to
> come up with a scheme for producing sensible relative URLs -- the
> browsers don't always interpret the path the same way the server does 
> --
> which in turn can make it difficult to produce easily portable web
> content.

In general I would note that URL introspection is (as far as I've seen) 
poorly handled by nearly everyone.  In part because it's hard -- you 
can have multiple layers of things going on, with proxies, various 
virtual host configurations, aliases and location-specific handlers, 
mod_rewrite to mix everything up beyond hope, all before Python even 
becomes involved in the process.  Then there's a wide variety of ways 
the URL can continue to be mapped even after that.

Portably figuring out where an application exists, what its base is, 
and how it should best refer to other pages is difficult.  Then add 
things like non-cookie session IDs, :action GET variables, and other 
things that break out of the model.  It's challenging to make a good 
system to map URLs to resources, but people haven't really tried to 
meet the challenge of mapping resources back into URLs (and maybe it 
shouldn't even be attempted in a general way, but rather accomplished 
through some sort of configuration).

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From janssen at parc.com  Thu Oct 23 16:03:31 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 16:03:55 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: Your message of "Wed, 22 Oct 2003 23:19:56 PDT."
	<3F97730C.70107@sjsoft.com> 
Message-ID: <03Oct23.130335pdt."58611"@synergy1.parc.xerox.com>

> I haven't used Medusa, but I have used Twisted and the standard Python 
> libraries.
> Some notes:
> 1) Twisted is definitely too complex to include. The question is, would 
> it be possible to rip out a simple web server from Twisted or would it 
> require a whole lot of extras that don't fit in the standard libraries? 
> This may amount to a re-write.

I've become quite fond of Medusa, myself.  It's small, uses standard
Python (pure Python), is *not* under active development (a benefit, if
you think about it).  I've written a number of services using its
framework.

I haven't actually tried Twisted, because it seems overly complex for
the various tasks I want to perform.  I'm not interested in an
all-singing all-dancing Apache clone -- I'd just use Apache for that.

Bill

From janssen at parc.com  Thu Oct 23 16:04:08 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 16:04:34 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: Your message of "Thu, 23 Oct 2003 01:53:17 PDT."
	<3F9796FD.6050003@sjsoft.com> 
Message-ID: <03Oct23.130412pdt."58611"@synergy1.parc.xerox.com>

Yes, I like this idea.

David Fraser writes:
> Maybe for this reason we should stick to the existing HTTP server in 
> stdlib, but fix it up and improve it and change it to match the new API

Bill

From janssen at parc.com  Thu Oct 23 16:06:28 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 16:08:27 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: Your message of "Thu, 23 Oct 2003 05:49:32 PDT."
	<CGECIJPNNHIFAJKHOLMACEMNILAA.sholden@holdenweb.com> 
Message-ID: <03Oct23.130637pdt."58611"@synergy1.parc.xerox.com>

> My experience is that people who've been walked through the Twisted code
> one-to-one by a Twisted developer "get it", but that just reading the
> docs or listening to conference presentations doesn't cut the mustard.
> Or maybe that's just me...

No, it's not just you.

Bill

From janssen at parc.com  Thu Oct 23 16:12:14 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 16:12:49 2003
Subject: [Web-SIG] Client-side support: what are we aiming for? 
In-Reply-To: Your message of "Thu, 23 Oct 2003 06:58:57 PDT."
	<20031023135857.GA8007@rogue.amk.ca> 
Message-ID: <03Oct23.131222pdt."58611"@synergy1.parc.xerox.com>

amk writes:
> What's the scope of improving client-side HTTP support?  
> 
> I suggest aiming for something you could write a web browser or web scraper
> on top of. That means storing and returning cookies from the server, writing
> them to a file, and a page cache that handles HTTP's cache expiration rules.
> HTML formatting is out of scope, but a specialized parser for extracting a
> list of form elements or for picking apart a table might not be.

My original idea was to look at something like cURL
(http://curl.haxx.se/), and make sure anything you could do with that
tool, you could do with Python.  Might be a bit ambitious; here's the
lead paragraph from the cURL web page:

  Curl is a command line tool for transferring files with URL syntax,
  supporting FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and
  LDAP. Curl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP
  uploading, kerberos, HTTP form based upload, proxies, cookies,
  user+password authentication, file transfer resume, http proxy
  tunneling and a busload of other useful tricks.

Currently, for example, there's no way in the Python standard
libraries to do a file upload (a POST with multipart/form-data).

Then there are issues about handling the Web-centric formats you get
back.  There's no CSS parser, for instance.  It's hard to understand a
modern Web page without one.  A Javascript interpreter?

Bill


From janssen at parc.com  Thu Oct 23 16:12:50 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 16:13:19 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: Your message of "Thu, 23 Oct 2003 07:25:07 PDT."
	<3F97E4C3.4070704@bath.ac.uk> 
Message-ID: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com>

> Actually, I wrote an application using the cgi module this week - it's 
> just been deployed as the system to manage http://coupons.lawrence.com/ :)

Sure, I write them all the time.  But what's missing?  What do you
have to work around?

Bill

From janssen at parc.com  Thu Oct 23 16:14:52 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 16:16:32 2003
Subject: [Web-SIG] Python and the web 
In-Reply-To: Your message of "Thu, 23 Oct 2003 07:59:20 PDT."
	<C0FC22C08B82074A88B500617641577787A79D@johnson.mediapulse.net> 
Message-ID: <03Oct23.131453pdt."58611"@synergy1.parc.xerox.com>

Great comments!  Thanks, Mike.

Bill

From ianb at colorstudy.com  Thu Oct 23 16:15:36 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 23 16:16:34 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: <03Oct23.130335pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <AA1A232C-0595-11D8-A49B-000393C2D67E@colorstudy.com>

On Thursday, October 23, 2003, at 03:03 PM, Bill Janssen wrote:
> I've become quite fond of Medusa, myself.  It's small, uses standard
> Python (pure Python), is *not* under active development (a benefit, if
> you think about it).  I've written a number of services using its
> framework.

Those are all good arguments for Medusa being appropriate for the 
standard library.  Or, if not Medusa, something similar (it sounds like 
Medusa might just need a little love and care to modernize it).

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From barry at python.org  Thu Oct 23 16:19:59 2003
From: barry at python.org (Barry Warsaw)
Date: Thu Oct 23 16:20:07 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <03Oct23.130637pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct23.130637pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <1066940399.11634.290.camel@anthem>

On Thu, 2003-10-23 at 16:06, Bill Janssen wrote:
> > My experience is that people who've been walked through the Twisted code
> > one-to-one by a Twisted developer "get it", but that just reading the
> > docs or listening to conference presentations doesn't cut the mustard.
> > Or maybe that's just me...
> 
> No, it's not just you.

Agreed.  I've been using Twisted as the framework for my Mailman 3
experiments and I didn't get it until I spent an evening on irc with
Moshe and Itamar.  That was incredibly helpful, and I recommend that
same approach for everyone <wink>.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/web-sig/attachments/20031023/31925494/attachment.bin
From cs1spw at bath.ac.uk  Thu Oct 23 17:56:09 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Thu Oct 23 17:56:24 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <3F984E79.8080501@bath.ac.uk>

Bill Janssen wrote:
>>Actually, I wrote an application using the cgi module this week - it's 
>>just been deployed as the system to manage http://coupons.lawrence.com/ :)
> 
> Sure, I write them all the time.  But what's missing?  What do you
> have to work around?

The biggest thing for me is distinguishing between GET and POST data. 
Sending HTTP headers (including cookies) is also highly inconvenient as 
with the cgi module they have to be manually constructed as HTTP 
name:value pairs and sent before the rest of the text.

This is where the request/response object model becomes very attractive 
  - maybe something like the following:

import web.cgi

req = web.cgi.HTTPRequest() # Auto-populates with data from environment
if req.POST:
     # Form has been posted
     body = 'Hi there, %s' % req.POST['name']
else:
     body = '<form method="post"><input type="text" name="name"></form>'

res = web.cgi.HTTPResponse()
res.content_type = 'text/html'
res.set_cookie('name', 'Simon')
res['X-Additional-Header'] = 'Another header'
res.write('<html><h1>Hi there</h1>\n%s' % body)
print res

Output:

Content-Type: text/html
Set-Cookie: name=Simon
X-Additional-Header: Another header
Content-Length: 30

<html><h1>Hi there</h1></html>
<form method="post"><input type="text" name="name"></form>

Cheers,

Simon Willison
http://simon.incutio.com/


From janssen at parc.com  Thu Oct 23 19:45:53 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 19:46:16 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: Your message of "Thu, 23 Oct 2003 14:56:09 PDT."
	<3F984E79.8080501@bath.ac.uk> 
Message-ID: <03Oct23.164558pdt."58611"@synergy1.parc.xerox.com>

I usually use a simple "response" object with a few standard methods:

class response:

    def open (self, content_type = "text/html"):
	"""Returns a file object open for write"""

    def redirect (self, url):
	"""Sends a redirect message to the specified URL"""

    def error (self, code, message):
        """Sends back error CODE (a valid HTTP code) with MESSAGE"""

    def reply (self, message):
        """Sends back reply string MESSAGE"""
	return self.error(200, message)

    def return_file (self, typ, path):
        """returns the file of MIME type TYP from PATH"""

    def add_cookie (self, name, value):
        """Add the cookie to the reply"""

But that's probably too simple.

Bill

From jjl at pobox.com  Thu Oct 23 20:46:46 2003
From: jjl at pobox.com (John J Lee)
Date: Thu Oct 23 20:47:05 2003
Subject: [Web-SIG] Client-side support: what are we aiming for? 
In-Reply-To: <03Oct23.131222pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct23.131222pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <Pine.LNX.4.58.0310240039080.476@alice>

On Thu, 23 Oct 2003, Bill Janssen wrote:

> amk writes:
> > What's the scope of improving client-side HTTP support?
> >
> > I suggest aiming for something you could write a web browser or web scraper
> > on top of. That means storing and returning cookies from the server, writing
> > them to a file, and a page cache that handles HTTP's cache expiration rules.
> > HTML formatting is out of scope, but a specialized parser for extracting a
> > list of form elements or for picking apart a table might not be.

I've been working on that kind of stuff.

http://wwwsearch.sourceforge.net/

I certainly think automatic cookie handling would be appropriate for the
std lib.  I've written code to do that (based on a port from libwww-perl,
but substantially changed since then), which is already integrated into
urllib2 (albeit ATM including a lot of junk for backwards-compatibility
and some cut-n-pasting necessary because it's not (yet) actually part of
the Python standard library).  The only problem is that it's rather large.
I claim this is (mostly) not my fault ;-) because the cookie standards are
a royal mess.  For a number of reasons, it will be significantly smaller
in the form I hope will get into the Python standard lib., but it'll still
be bigish.  Still, you *could* quite easily write a much less anal
implementation that worked most of the time.  One risk of that is that
you'd have to put up with a constant stream of bugs from people finding
that website x breaks your simple impelemntation.  At least, Ronald
Tschalar (author of one of two Java libraries both named HTTPClient) tells
me that was his experience.  The fundamental problem is that the cookie
'standard' is really just Mozilla and MSIE's behaviour.  For a brief
summary of the sad tale, see:

http://wwwsearch.sourceforge.net/ClientCookie/doc.html#standards

OTOH, my code goes to some effort to enforce as many restrictions as
possible to prevent cookies getting set and returned when they shouldn't.
That could be cut without losing functionality (but obviously, losing
security, for those who care about that).  That seems pointless to me now
that code is pretty stable, though.

One thing about my implementation that might seem like it should be cut
out is RFC 2965 support.  It seems fairly safe to say that RFC 2965 is all
but officially dead as an internet standard (and the same goes for RFC
2109, though I'm told a few servers implement it in some form -- *clients*
have taken bits and pieces from the standard, but very few of those could
be called RFC 2109 implementations: I regard those bits of the RFC 2109
standard as simply parts of the current state of the de-facto Netscape
protocol).  The one guy who was driving forward errata for RFC 2965 on the
http-state mailing list seems to have succumbed to cookie-fatigue.  I
guess it's still useful on intranets.  Half of the reason it's still in my
code is simply that the Netscape cookie protocol is a messy de facto
standard, and it seems far easier and more secure to specify it by the
ways it differs from the RFC standard than to have it stand on its own
feet.  It also allows you to easily tighten up the Netscape rules if you
feel like it (assuming that doesn't break the particular site you're
using).  The remaining 25% of the reason it's there is that I don't have
the heart to rip it out ;-)

So, that's my pitch for justifying the inclusion of ClientCookie (in a
somewhat reduced form) in the standard library.  Jeremy Hylton seemed to
like the idea of having it in the std lib, but I don't know if he looked
at the code :-)


A related issue is urllib2's 'handler' system, which I've discovered isn't
quite flexible enough to implement a number useful features (including
automatic cookie handling).  I think it's possible to fix this without
breaking anybody's code.  Full details here:

http://www.python.org/sf/759792

Jeremy said a few months back that he'd look at it, but I've heard nothing
from him since...


As for forms, originally I thought the forms code I wrote (ClientForm --
again, based on a port of Gisle Aas' libwww-perl, and again quite
substantially changed since then) might be nice in the std lib, but I
changed my mind a long while ago for a number of reasons.  But if anybody
wants to talk about HTML form parsers, of course, feel free to start a
thread.  Same goes for HTML table parsing -- I'm not convinced the
standard library is the place for this.

I certainly think a function for doing file uploads would be great,
though.  Steve Purcell has some code for that in his old webunit module
(there seems to be a new Python module called webunit here
http://mechanicalcat.net/tech/webunit but the code download link is
broken), and so do I in ClientForm.  My code depends on a modified version
of MimeWriter.  I think it would be nice to fix MimeWriter so it could do
this job.  I think that's possible without breaking old code, though I
know almost nothing about MIME.


> My original idea was to look at something like cURL
> (http://curl.haxx.se/), and make sure anything you could do with that
> tool, you could do with Python.  Might be a bit ambitious; here's the
> lead paragraph from the cURL web page:
>
>   Curl is a command line tool for transferring files with URL syntax,
>   supporting FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and
>   LDAP. Curl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP
>   uploading, kerberos, HTTP form based upload, proxies, cookies,
>   user+password authentication, file transfer resume, http proxy
>   tunneling and a busload of other useful tricks.

I don't think it's a good idea to start on some new grand library,
certainly not in the std lib.  Gradual evolution seems more appropriate.
Most of the stuff you list is either already there, or would fit it quite
neatly into the current framework without any major upheavals.


> Then there are issues about handling the Web-centric formats you get
> back.  There's no CSS parser, for instance.  It's hard to understand a
> modern Web page without one.

What uses do you have in mind for that?


> A Javascript interpreter?

Whaaat??  You want a JS interpreter included with the Python distribution?
You're kidding, right?  :-)


John

From ianb at colorstudy.com  Thu Oct 23 21:10:26 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 23 21:11:09 2003
Subject: [Web-SIG] Client-side support: what are we aiming for? 
In-Reply-To: <Pine.LNX.4.58.0310240039080.476@alice>
Message-ID: <D9FCE086-05BE-11D8-A49B-000393C2D67E@colorstudy.com>

On Thursday, October 23, 2003, at 07:46 PM, John J Lee wrote:
> I've been working on that kind of stuff.
>
> http://wwwsearch.sourceforge.net/
>
> I certainly think automatic cookie handling would be appropriate for 
> the
> std lib.  I've written code to do that (based on a port from 
> libwww-perl,
> but substantially changed since then), which is already integrated into
> urllib2 (albeit ATM including a lot of junk for backwards-compatibility
> and some cut-n-pasting necessary because it's not (yet) actually part 
> of
> the Python standard library).  The only problem is that it's rather 
> large.
> I claim this is (mostly) not my fault ;-) because the cookie standards 
> are
> a royal mess.  For a number of reasons, it will be significantly 
> smaller
> in the form I hope will get into the Python standard lib., but it'll 
> still
> be bigish.

How big can it really be?  I don't see how that would be a problem.  
Cookies suck, they act all funny and always seem unpredictable.  If 
your library can hide that, great!  It's certainly not worth 
simplifying the code if it means making the library less robust.

I'm all for hiding crufty stuff behind simpler interfaces.  If the 
cruft leaks out that might be an issue, but you probably have a more 
informed opinion about whether you have been able to keep it in or not.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From jjl at pobox.com  Thu Oct 23 21:13:54 2003
From: jjl at pobox.com (John J Lee)
Date: Thu Oct 23 21:14:07 2003
Subject: [Web-SIG] client-side support: PEP 268
In-Reply-To: <20031022165217.I11797@lyra.org>
References: <20031022165217.I11797@lyra.org>
Message-ID: <Pine.LNX.4.58.0310240211380.662@alice>

Greg (or anybody else, for that matter), would you mind looking at these
doc bugs?

http://www.python.org/sf/793553
http://www.python.org/sf/798244


John

From jjl at pobox.com  Thu Oct 23 21:22:43 2003
From: jjl at pobox.com (John J Lee)
Date: Thu Oct 23 21:23:53 2003
Subject: [Web-SIG] Client-side support: what are we aiming for? 
In-Reply-To: <D9FCE086-05BE-11D8-A49B-000393C2D67E@colorstudy.com>
References: <D9FCE086-05BE-11D8-A49B-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310240217480.662@alice>

On Thu, 23 Oct 2003, Ian Bicking wrote:

> On Thursday, October 23, 2003, at 07:46 PM, John J Lee wrote:
[...]
> How big can it really be?  I don't see how that would be a problem.

Well, much bigger than it should be for the job that cookies do.  And
there's a big difference in size between a module that handles cookies,
and one that knows about all the endless nonsense involved in doing the
Right Thing.


[...]
> I'm all for hiding crufty stuff behind simpler interfaces.  If the
> cruft leaks out that might be an issue, but you probably have a more
> informed opinion about whether you have been able to keep it in or not.

That's not an issue.  For most people, it doesn't even *have* an interface
-- you'd just do urllib2.urlopen as usual (ATM, you do
ClientCookie.urlopen, of course).  Well, possibly you'd have to call
build_opener / install_opener, too, to explicitly request cookie
handling...


John

From janssen at parc.com  Thu Oct 23 22:08:29 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 22:09:45 2003
Subject: [Web-SIG] file uploads
In-Reply-To: Your message of "Thu, 23 Oct 2003 17:46:46 PDT."
	<Pine.LNX.4.58.0310240039080.476@alice> 
Message-ID: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com>

> I certainly think a function for doing file uploads would be great,
> though.

It's not difficult.

I adapted this code from a version in the Python cookbook, by Wade Leftwich:

import httplib, mimetypes

def https_post_multipart(host, port, selector, fields, files):
    """
    Post fields and files to an http host as multipart/form-data.
    FIELDS is a sequence of (name, value) elements for regular form fields.
    FILES is a sequence of (name, filename [, value]) elements for data to be uploaded as files.
    Return the server's response page.
    """
    content_type, body = encode_multipart_formdata(fields, files)
    h = httplib.HTTPS(host, port)
    h.putrequest('POST', selector)
    h.putheader('Content-Type', content_type)
    h.putheader('Content-Length', str(len(body)))
    h.endheaders()
    h.send(body)
    errcode, errmsg, headers = h.getreply()
    return errcode, errmsg, headers, h.file.read()

def http_post_multipart(host, port, password, selector, fields, files):
    """
    Post fields and files to an http host as multipart/form-data.
    FIELDS is a sequence of (name, value) elements for regular form fields.
    FILES is a sequence of (name, filename [, value]) elements for data to be uploaded as files.
    Return the server's response page.
    """
    content_type, body = encode_multipart_formdata(fields, files)
    h = httplib.HTTP(host, port)
    h.putrequest('POST', selector)
    if password:
        h.putheader('Password', password)
    h.putheader('Content-Type', content_type)
    h.putheader('Content-Length', str(len(body)))
    h.endheaders()
    h.send(body)
    errcode, errmsg, headers = h.getreply()
    return errcode, errmsg, headers, h.file.read()

def encode_multipart_formdata(fields, files):
    """
    fields is a sequence of (name, value) elements for regular form fields.
    files is a sequence of (name, filename, value) elements for data to be uploaded as files
    Return (content_type, body) ready for httplib.HTTP instance
    """
    BOUNDARY = '----------ThIs_Is_tHe_bouNdaRY_$'
    CRLF = '\r\n'
    L = []
    for (key, value) in fields:
        L.append('--' + BOUNDARY)
        L.append('Content-Disposition: form-data; name="%s"' % key)
        L.append('')
        L.append(value)
    for file in files:
        key = file[0]
        filename = file[1]
        if len(file) > 2:
            value = file[2]
        else:
            value = None
        L.append('--' + BOUNDARY)
        L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, os.path.basename(filename)))
        L.append('Content-Type: %s' % get_content_type(filename))
        if value:
            L.append('')
            L.append(value)
        else:
            L.append('Content-Transfer-Encoding: binary')
            L.append('')
            fp = open(filename, 'r')
            L.append(fp.read())
            fp.close()
    L.append('--' + BOUNDARY + '--')
    L.append('')
    body = CRLF.join(L)
    content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
    return content_type, body

def get_content_type(filename):
    return mimetypes.guess_type(filename)[0] or 'application/octet-stream'


From janssen at parc.com  Thu Oct 23 22:11:56 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 23 22:12:24 2003
Subject: [Web-SIG] Client-side API
In-Reply-To: Your message of "Thu, 23 Oct 2003 18:22:43 PDT."
	<Pine.LNX.4.58.0310240217480.662@alice> 
Message-ID: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com>

Another possibility would be to mimic the Java 1.4.1 libraries for the
Web.  For instance, we could have the "URL" object, which has a method
called "open()", which when called gives you a "Connection", which can
be of subtype "HTTPConnection", "FTPConnection", etc.  Call the
"create_request()" method on that "Connection" to get a new Request
instance, use "set_header()", "set_cookie()", "set_body()", etc., then
call the "send()" method, getting back a ReplyPromise instance, which
can then be interrogated periodically to get a Reply instance, etc.

Bill

From cs1spw at bath.ac.uk  Fri Oct 24 01:40:29 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 01:40:38 2003
Subject: [Web-SIG] Client-side API
In-Reply-To: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <3F98BB4D.2020100@bath.ac.uk>

Bill Janssen wrote:
> Another possibility would be to mimic the Java 1.4.1 libraries for the
> Web.  For instance, we could have the "URL" object, which has a method
> called "open()", which when called gives you a "Connection", which can
> be of subtype "HTTPConnection", "FTPConnection", etc.  Call the
> "create_request()" method on that "Connection" to get a new Request
> instance, use "set_header()", "set_cookie()", "set_body()", etc., then
> call the "send()" method, getting back a ReplyPromise instance, which
> can then be interrogated periodically to get a Reply instance, etc.

Ugh. One of the things I love about Python is that unlike Java it 
doesn't force you to have horribly verbose interfaces with dozens of 
different classes. A URL is a string, file-like-objects are 
file-like-objects and most of the modules in the standard library only 
make you deal with one or two classes and a few useful utility methods.

I'm all for replicating the capabilities of Java libraries (if they have 
a good bunch of features) but replicating the exact APIs seems to me 
like a lost opportunity to take advantage of Python's more expressive 
syntax.

Cheers,

Simon Willison
http://simon.incutio.com/


From thijs at fngtps.com  Fri Oct 24 02:11:11 2003
From: thijs at fngtps.com (Thijs van der Vossen)
Date: Fri Oct 24 02:11:16 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
Message-ID: <200310240811.13760.thijs@fngtps.com>


----------  Forwarded Message  ----------

Subject: Re: [Web-SIG] Client-side support: what are we aiming for?
Date: Friday 24 October 2003 08:08
From: Thijs van der Vossen <t.vandervossen@fngtps.com>
To: web-sig@python.org

On Friday 24 October 2003 02:46, John J Lee wrote:
> So, that's my pitch for justifying the inclusion of ClientCookie (in a
> somewhat reduced form) in the standard library. Jeremy Hylton seemed to
> like the idea of having it in the std lib, but I don't know if he looked
> at the code :-)

I would really like to have client cookie support in the standard library
 too.

> As for forms, originally I thought the forms code I wrote (ClientForm --
> again, based on a port of Gisle Aas' libwww-perl, and again quite
> substantially changed since then) might be nice in the std lib, but I
> changed my mind a long while ago for a number of reasons.  But if anybody
> wants to talk about HTML form parsers, of course, feel free to start a
> thread.  Same goes for HTML table parsing -- I'm not convinced the
> standard library is the place for this.

I tend to agree with this. HTML form and/or table parsing is almost only used
for stuff like screen-scraping, but I don't think this is so common it should
be included in the stanstard library. Retrieving data from the web will be
done more and more through web service interfaces like XML-RPC and SOAP or
with REST-style interfaces.

Regards,
Thijs

--
Fingertips __ www.fngtps.com __ +31.(0)20.4896540

-------------------------------------------------------

-- 
Fingertips __ www.fngtps.com __ +31.(0)20.4896540


From davidf at sjsoft.com  Fri Oct 24 03:33:27 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:33:33 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <CGECIJPNNHIFAJKHOLMACEMMILAA.sholden@holdenweb.com>
References: <CGECIJPNNHIFAJKHOLMACEMMILAA.sholden@holdenweb.com>
Message-ID: <3F98D5C7.8030802@sjsoft.com>

Steve Holden wrote:

>[David Fraser]
>  
>
>>Ian Bicking wrote:
>>
>>    
>>
>>>On Friday, October 17, 2003, at 03:51 PM, Bill Janssen wrote:
>>>
>>>      
>>>
>[...]
>  
>
>>>>2) A standard Apache plug-in.  Does mod_python fill this
>>>>        
>>>>
>>role?  (Should
>>    
>>
>>>>this really be part of the stdlib?)  It would be useful if the APIs
>>>>used here were similar to those used in the API support.
>>>>        
>>>>
>>>mod_python pretty much fits this.  I don't see any reason
>>>      
>>>
>>to develop
>>    
>>
>>>anything else (at least in terms of Apache integration).  I don't
>>>think it would make sense as part of the stdlib -- it depends on
>>>Apache just as much as Python, and people install Apache in
>>>      
>>>
>>all sorts
>>    
>>
>>>of different ways.
>>>      
>>>
>>Yes, in Apache, mod_python is pretty much it. As far as the
>>API goes, I
>>think mod_python is an important one to look at at the design stage
>>rather than trying to fit an API to it later, since Apache is fairly
>>standard and mod_python is used by lots of different people.
>>You don't
>>want mod_python to have to be rewritten to comply with the API later.
>>
>>    
>>
>I'm not sure that we should be arguing to include something that depends
>on a specific environment like Apache in the standard library. We should
>certainly be trying to promote a standard of some sort, however, which
>seems to conflict.
>
>I see the parallel more as being with the DB API - there are Oracle
>modules and ODBC modules (which are cross-engine) and SQL Server modules
>and so on. What we need is something to provide closely similar
>interfaces to different web server engines - whether those engines are
>in pure Python or external components.
>  
>
Agreed. What I'm saying isn't that mod_python should be put in the 
standard library, but that the design of the web server API should be 
carefully done so that it doesn't require major changes to mod_python etc.

>The one problem I see with mod_python is its defaulting behavior - you
>can get the same content several different ways. Specifically, the
>following URLs
>
>	http://server/
>	http://server/index.py
>	http://server/index.py.index
>
>all refer to the same content, and this makes it rather difficult to
>come up with a scheme for producing sensible relative URLs -- the
>browsers don't always interpret the path the same way the server does --
>which in turn can make it difficult to produce easily portable web
>content.
>  
>
Hmmm ... looks like you are using AddHandler for .py files. I generally 
find that placing the Python files outside of the web directory, in 
libraries, works better. Then you can use SetHandler to get mod_python 
to handle everything, or AddHandler for specific file types to get it to 
handle some URLs. It makes more sense to me to have a URL of index.htm 
rather than index.py (why should the user care what I'm using to produce 
the file?)

Hope that is relevant and/or helpful

David


From davidf at sjsoft.com  Fri Oct 24 03:35:43 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:35:49 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <CGECIJPNNHIFAJKHOLMACEMNILAA.sholden@holdenweb.com>
References: <CGECIJPNNHIFAJKHOLMACEMNILAA.sholden@holdenweb.com>
Message-ID: <3F98D64F.5050905@sjsoft.com>

Steve Holden wrote:

>>I initially didn't want to embark on a big class
>>renaming because I
>>thought Twisted would quickly and completely replace Medusa,
>>but that seems
>>unlikely to happen.
>>    
>>
>Well, if those Twisted guys would stop implementing neat ideas and do
>some serious work explaining the structure of the framework they would
>probably find their code was more widely used. I suspect it will take
>Twisted a long time to mature because the developers are who and what
>they are. Their enthusiasm is admirable, but sometimes I get a bit
>annoyed by the hand waving :-)
>
>My experience is that people who've been walked through the Twisted code
>one-to-one by a Twisted developer "get it", but that just reading the
>docs or listening to conference presentations doesn't cut the mustard.
>Or maybe that's just me...
>
>regards
>  
>
I think Twisted is inappropriate for a basic standalone web server to be 
included in the standard library. It's very fancy and hand-wavy, but 
that's great - we need something like that around. Eventually their 
ideas will find their way into other systems as well.
But the standard library one should be simple, nice, clean, and extendable.

David


From davidf at sjsoft.com  Fri Oct 24 03:38:49 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:38:58 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <3F984E79.8080501@bath.ac.uk>
References: <03Oct23.131258pdt."58611"@synergy1.parc.xerox.com>
	<3F984E79.8080501@bath.ac.uk>
Message-ID: <3F98D709.9070806@sjsoft.com>

Simon Willison wrote:

> Bill Janssen wrote:
>
>>> Actually, I wrote an application using the cgi module this week - 
>>> it's just been deployed as the system to manage 
>>> http://coupons.lawrence.com/ :)
>>
>>
>> Sure, I write them all the time.  But what's missing?  What do you
>> have to work around?
>
>
> The biggest thing for me is distinguishing between GET and POST data. 
> Sending HTTP headers (including cookies) is also highly inconvenient 
> as with the cgi module they have to be manually constructed as HTTP 
> name:value pairs and sent before the rest of the text.
>
> This is where the request/response object model becomes very 
> attractive  - maybe something like the following:
>
> import web.cgi
>
> req = web.cgi.HTTPRequest() # Auto-populates with data from environment
> if req.POST:
>     # Form has been posted
>     body = 'Hi there, %s' % req.POST['name']
> else:
>     body = '<form method="post"><input type="text" name="name"></form>'
>
> res = web.cgi.HTTPResponse()
> res.content_type = 'text/html'
> res.set_cookie('name', 'Simon')
> res['X-Additional-Header'] = 'Another header'
> res.write('<html><h1>Hi there</h1>\n%s' % body)
> print res

For CGI, it would seem to make sense that you do something like the 
following:
res = web.cgi.HTTPResponse(sys.stdout)
res.content_type = 'text/html'
res.set_cookie('name', 'Simon')
res['X-Additional-Header'] = 'Another header'
res.send_headers()
res.write('<html><h1>Hi there</h1>\n%s' % body)

Then if you end up writing multiple parts, they can be output to stdout 
as they are written, rather than having to generate the entire response 
object first

David


From davidf at sjsoft.com  Fri Oct 24 03:48:36 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:48:42 2003
Subject: [Web-SIG] Python and the web
In-Reply-To: <C0FC22C08B82074A88B500617641577787A79D@johnson.mediapulse.net>
References: <C0FC22C08B82074A88B500617641577787A79D@johnson.mediapulse.net>
Message-ID: <3F98D954.7060603@sjsoft.com>

Michael C. Neel wrote:

>	For a http server module, this is not a great need for myself
>but it would still be good to have.  My idea would be a server class
>that you derived a server from, overriding the phases of the request you
>needed to work with, al'a the way Apache works.  Something like:
>
>class MyServer(HTTPServer):
>
>	def authhandler(self, req):
>		if self.validate(req.user, req.pass):
>			return true
>		else:
>			return false
>
>	def handler(self, req):
>		page = req.uri.filename
>		try:
>			req.send(open(page,'r').read())
>			return true
>		execpt:
>			return false
>
>	That's basic, but if you've worked with the apache API in
>mod_python, mod_perl, or C you get the idea.  Also it would be nice if
>the default handlers provided a working server, if some options were set
>like a DocumentRoot:
>
>class MyServer(HTTPServer):
>	documentroot = "/var/www/html"
>  
>
I think this architecture is great... but obviously people coming from a 
non-Apache background may have other ideas.
However, the key is defining the request/response system well, and then 
multiple different server structures could be built on that

>	I would say that Apache's 1.3 API should be a better goal, and
>leave out the new features in 2.0 API.  First is the KISS principle,
>next is we should be trying to replace Apache but rather provide a
>reasonable useful web server in the stdlib.  Also, if someone needs a
>feature of the 2.0 style API, they can always add that in the derived
>class.
>  
>
On the other hand, the 2.0 API is an improvement of the 1.3 API, and 
allows things like Filters etc which would be great to include.

David


From davidf at sjsoft.com  Fri Oct 24 03:50:30 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:50:34 2003
Subject: [Web-SIG] some thoughts
In-Reply-To: <1066923432.11634.132.camel@anthem>
References: <20031023105912.V10747@onyx.ispol.com>
	<1066923432.11634.132.camel@anthem>
Message-ID: <3F98D9C6.4040704@sjsoft.com>

Barry Warsaw wrote:

>On Thu, 2003-10-23 at 11:17, Gregory (Grisha) Trubetskoy wrote:
>
>  
>
>>What I would really like to see come out of this SIG is an agreement to
>>work towards developing a set of standards, rather than a bunch of code.
>>    
>>
>
>/Some/ code wouldn't hurt, but I definitely agree that the early focus
>of the SIG should be on standards, much like the db-sig came up with
>DB-API 1.0 and 2.0.  E.g. I'd really like for my CGI based scripts to be
>written against a CGI-API that would Just Work in mod_python, Twisted,
>Zope, CGIHTTPServer, etc, etc.
>  
>
Maybe a better way round to look at it is that your Python Web-API based 
scripts could also run in a CGI server.
I think if the API is based too much on CGI we will lose the benefits of 
web/application servers, whereas the conversion the other way round 
should be simpler

David


From davidf at sjsoft.com  Fri Oct 24 03:51:22 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:51:26 2003
Subject: [Web-SIG] file uploads
In-Reply-To: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <3F98D9FA.7000603@sjsoft.com>

Bill Janssen wrote:

>>I certainly think a function for doing file uploads would be great,
>>though.
>>    
>>
>
>It's not difficult.
>  
>
But it should be in the standard library... something based on the code 
you included would be great

David


From davidf at sjsoft.com  Fri Oct 24 03:53:53 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:53:58 2003
Subject: [Web-SIG] Client-side API
In-Reply-To: <3F98BB4D.2020100@bath.ac.uk>
References: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com>
	<3F98BB4D.2020100@bath.ac.uk>
Message-ID: <3F98DA91.80908@sjsoft.com>

Simon Willison wrote:

> Bill Janssen wrote:
>
>> Another possibility would be to mimic the Java 1.4.1 libraries for the
>> Web.  For instance, we could have the "URL" object, which has a method
>> called "open()", which when called gives you a "Connection", which can
>> be of subtype "HTTPConnection", "FTPConnection", etc.  Call the
>> "create_request()" method on that "Connection" to get a new Request
>> instance, use "set_header()", "set_cookie()", "set_body()", etc., then
>> call the "send()" method, getting back a ReplyPromise instance, which
>> can then be interrogated periodically to get a Reply instance, etc.
>
>
> Ugh. One of the things I love about Python is that unlike Java it 
> doesn't force you to have horribly verbose interfaces with dozens of 
> different classes. A URL is a string, file-like-objects are 
> file-like-objects and most of the modules in the standard library only 
> make you deal with one or two classes and a few useful utility methods.
>
> I'm all for replicating the capabilities of Java libraries (if they 
> have a good bunch of features) but replicating the exact APIs seems to 
> me like a lost opportunity to take advantage of Python's more 
> expressive syntax.

Absolutely. We need to be using attributes rather than method accesses, 
and dictionary/list-derived classes where sensible and possible.
(We shouldn't imitate dictionaries by mimicing methods, and we should 
use the new-style classes if we need to extend them)

David


From davidf at sjsoft.com  Fri Oct 24 03:55:46 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:55:55 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <200310240811.13760.thijs@fngtps.com>
References: <200310240811.13760.thijs@fngtps.com>
Message-ID: <3F98DB02.3010407@sjsoft.com>

Thijs van der Vossen wrote:

>>As for forms, originally I thought the forms code I wrote (ClientForm --
>>again, based on a port of Gisle Aas' libwww-perl, and again quite
>>substantially changed since then) might be nice in the std lib, but I
>>changed my mind a long while ago for a number of reasons.  But if anybody
>>wants to talk about HTML form parsers, of course, feel free to start a
>>thread.  Same goes for HTML table parsing -- I'm not convinced the
>>standard library is the place for this.
>>    
>>
>I tend to agree with this. HTML form and/or table parsing is almost only used
>for stuff like screen-scraping, but I don't think this is so common it should
>be included in the stanstard library. Retrieving data from the web will be
>done more and more through web service interfaces like XML-RPC and SOAP or
>with REST-style interfaces.
>  
>
Actually HTML parsing would be fantastic for testing web applications, 
so maybe that could be related to the Web API.
The parsing doesn't have to be very intelligent or do validation, HTML 
syntax is fairly simple.
I think that does belong in the standard library.

David


From davidf at sjsoft.com  Fri Oct 24 03:58:03 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 03:58:26 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <200310240946.37859.t.vandervossen@fngtps.com>
References: <CGECIJPNNHIFAJKHOLMACEMMILAA.sholden@holdenweb.com>
	<3F98D5C7.8030802@sjsoft.com>
	<200310240946.37859.t.vandervossen@fngtps.com>
Message-ID: <3F98DB8B.1000605@sjsoft.com>

Thijs van der Vossen wrote:

>On Friday 24 October 2003 09:33, David Fraser wrote:
>  
>
>>>I'm not sure that we should be arguing to include something that depends
>>>on a specific environment like Apache in the standard library. We should
>>>certainly be trying to promote a standard of some sort, however, which
>>>seems to conflict.
>>>
>>>I see the parallel more as being with the DB API - there are Oracle
>>>modules and ODBC modules (which are cross-engine) and SQL Server modules
>>>and so on. What we need is something to provide closely similar
>>>interfaces to different web server engines - whether those engines are
>>>in pure Python or external components.
>>>      
>>>
>>Agreed. What I'm saying isn't that mod_python should be put in the
>>standard library, but that the design of the web server API should be
>>carefully done so that it doesn't require major changes to mod_python etc.
>>    
>>
>Mod_python is probably _not_ a good starting point for a generic web server 
>API because it's purpose is to directly expose the Apache API. It makes no 
>sense to model a generic interface on a mostly direct mapping to the 
>internals of _one_ specific server.
>  
>
I'm not saying that the interface should be modelled on mod_python. But 
that mod_python is an important thing to consider when designing the 
interface.
Apache is the most popular web server on the web.
What this means is, if a Python Web API is designed that requires lots 
of unintuitive code for a mod_python implementation, it's badly designed

David


From davidf at sjsoft.com  Fri Oct 24 04:04:57 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 04:05:07 2003
Subject: [Web-SIG] Request/Response features
In-Reply-To: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
Message-ID: <3F98DD29.30706@sjsoft.com>

Ian Bicking wrote:

> On Thursday, October 23, 2003, at 05:05 AM, David Fraser wrote:
>
>>> The interface I wrote is at:
>>>
>>>   http://colorstudy.com/~ianb/IHTTP_01.py
>>
>> Had a look at this, it's nice for a start. However I agree with you 
>> that the transaction interface is confusing... for example, what does 
>> "setTransaction" mean/do?
>
> Some of the methods were for setting up the request, or modifying the 
> request so it can be forwarded internally.  It might be fine to leave 
> the request/response setup undefined -- it would be defined by the 
> context, e.g., cgi would set it up one way, mod_python another, etc.  
> For forwarding I think it might be better to simply create new object 
> that would be reinjected into the framework.
>
>> Some other comments:
>> pathInfo/requestURI
>>  would be good to have some consistency between these names
>
> They are mostly based off their CGI environment equivalent.

OK, now I understand, but is this a good way to name things for the future?

>> getFieldDict
>>  It would be great if the user could set the behaviour they want for 
>> multiple keys.
>>  I know I *always* want to discard any extra values. Including an 
>> option to do this rather than return a list would prevent lots of 
>> people doing post-processing
>
> That seems to difficult to define.  I don't think there should be 
> customizations, because that makes it too difficult to work in a 
> heterogeneous environment.  If you turn that setting on and some 
> application you are using needs it off, then you get a configuration 
> mess.  Wrappers could provide more friendly interfaces.

If you defined a setField method as you said above, then people could 
override it to throw away duplicate values. Maybe this is the way to go

>> General comment here: there are quite a few different methods to 
>> handle getting/setting get/post fields. Perhaps this would be made 
>> simpler by using a standard dictionary interface. That would also 
>> clear up confusion about what parameters to pass to setFieldDict etc. 
>> Another question is whether people really need get and post arguments 
>> to be processed differently.
>
> People do need to access them separately, as that's a common feature 
> request.  Usually they'd be accessing some combined version of those, 
> but the option should be there.

Fine. So we need a clever way of providing them in either form. I think 
using dictionaries for this is essential - even if it means defining a 
single dictionary that remembers which fields are get and which are 
post, and provides different wrappers to see the different elements.

>> Also, is it neccessary for all attributes to be accessed by methods? 
>> Particularly (no pun intented) things like "method", "time" would 
>> seem to make more sense as attributes. If anyone really needs to run 
>> some code to access them,
>
> I wrote the interface with wrappers in mind, and I thought purely 
> using methods would be easier and more explicit.

It is more explicit, but I don't think it's easier. It may require a few 
extra lines of code for some servers to require attributes, but it makes 
the user side much easier. The server side gets written once, many users 
use it, so it makes sense to make things as easy as possible for the user.
For example, the DB-API has an attribute called rowcount. In order to 
implement that, I had to create a getrowcount() method, then put in a 
__getattr__ method that called getrowcount() to read the rowcount 
attribute. A few lines of code. But it makes all the client-side code 
much more logical and easier to read.
Obviously this is a tradeoff and some things should be methods, some 
should be attributes.

>> The input method seems strange. Perhaps this should be called read? 
>> In general, there needs to be a clear separation between low-level 
>> accessing of the request stream, and higher-level accessing of 
>> processed get/post fields. Perhaps a way to do this would be to 
>> analyse how the most popular existing servers do things, then define 
>> a set of low-level methods which would cover their functionality. If 
>> this was done well, the higher-level methods could be written so that 
>> they always fall back to use the underlying low-level methods if they 
>> aren't overridden, so at least people only have to implement basic 
>> functionality to match the API.
>
> I guess there's two ways you could go with that -- if a method is 
> derivative of other methods, then just leave it out and let a wrapper 
> implement it.  But that doesn't work particularly well if we want to 
> use the request/response as part of the standard library (without any 
> wrapper in the library).  So an abstract base class might be a good 
> idea, with subclasses implementing the actual construction and some of 
> the basic methods.

Right, abstract base class + simple implementation is the way to go.

David


From t.vandervossen at fngtps.com  Fri Oct 24 04:09:38 2003
From: t.vandervossen at fngtps.com (Thijs van der Vossen)
Date: Fri Oct 24 04:10:33 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <3F98DB8B.1000605@sjsoft.com>
References: <CGECIJPNNHIFAJKHOLMACEMMILAA.sholden@holdenweb.com>
	<200310240946.37859.t.vandervossen@fngtps.com>
	<3F98DB8B.1000605@sjsoft.com>
Message-ID: <200310241009.39834.t.vandervossen@fngtps.com>

On Friday 24 October 2003 09:58, David Fraser wrote:
> Thijs van der Vossen wrote:
> >Mod_python is probably _not_ a good starting point for a generic web
> > server API because it's purpose is to directly expose the Apache API. It
> > makes no sense to model a generic interface on a mostly direct mapping to
> > the internals of _one_ specific server.
>
> I'm not saying that the interface should be modelled on mod_python. 

Ok. That's clear then.

> But that mod_python is an important thing to consider when designing the
> interface. Apache is the most popular web server on the web.
> What this means is, if a Python Web API is designed that requires lots
> of unintuitive code for a mod_python implementation, it's badly designed

If a Python Web API is designed that requires lots of unintuitive code for a 
_any_ server implementation, it's badly designed.

Regards,
Thijs

-- 
Fingertips __ www.fngtps.com __ +31.(0)20.4896540


From moof at metamoof.net  Fri Oct 24 10:50:36 2003
From: moof at metamoof.net (Moof)
Date: Fri Oct 24 10:51:12 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <3F98DB02.3010407@sjsoft.com>
References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com>
Message-ID: <3F993C3C.5080402@metamoof.net>

David Fraser wrote:


 > Actually HTML parsing would be fantastic for testing web 
applications, > so maybe that could be related to the Web API.


Actually, that is a very important point. Many python programmers are 
fans of Test-driven development. I'm currently developing an app with 
Webware and Cheetah, and find it very difficult to write tests for a lot 
of the stuff I do. This is mostly due to a huge amount of background 
work I need to do to set up an emulation environment first (make sure my 
request and session objects work correctly as far as I need them to for 
my testing, replacing the Page write and writeln methods, and so on) and 
even then, verifying a whole generated page is a pain.

So a standard HTML parser would be nice, as well as keeping TDD in mind 
when we design request and response (and possibly session) objects.

 > The parsing doesn't have to be very intelligent or do validation, 
HTML > syntax is fairly simple.
 > I think that does belong in the standard library.


Speaking of validation, a sort of standard form validation library would 
be nice: something to say "I'm expecting this value to be an int between 
1-31" or "I'm expecting this to be a string with the following legal 
characters" and so on. It's not that difficult to write yourself, but I 
seem to find myself reinventing the wheel every time I do. A standard 
"best practice" way of doing this would be wonderful.

Moof
-- 
            Giles Antonio Radford, a.k.a Moof
Sympathy, eupathy, and, currently, apathy coming to you at:
                 <http://metamoof.net/>


From cs1spw at bath.ac.uk  Fri Oct 24 11:25:29 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 11:25:36 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F98DD29.30706@sjsoft.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com>
Message-ID: <3F994469.20304@bath.ac.uk>

David Fraser wrote:
> Fine. So we need a clever way of providing them in either form. I think 
> using dictionaries for this is essential - even if it means defining a 
> single dictionary that remembers which fields are get and which are 
> post, and provides different wrappers to see the different elements.

I am convinced that the neatest way of handling this is to replicate 
PHP's form field dictionaries, in particular these three:

GET - the form data that came in via GET
POST - the form data that came in via POST
REQUEST - the above two dictionaries combined (POST over-riding GET)

However, there is one special case that needs considering: multiple form 
fields of the same name. For example, the following URL:

script.py?a=1&a=2

What should GET['a'] return? There are three possibilities: return a 
list (or tuple) containing 1 and 2, or return 1 (the first value) or 
return 2 (the second value). The first has a huge disadvantage in that 
all of a sudden accessing the GET dictionary could return a list or a 
string - code will then have to start checking the type of the returned 
data before doing anything with it. The second and third have the 
disadvantage that some form data gets "lost" by the dictionary.

PHP has an interesting way of dealing with this, based on special syntax 
used for the names of form elements. If you have two query string 
arguments of the same name, PHP over-writes the first with the second. 
However, if the form field names end in [] PHP creates an array of them 
instead. For example:

script.py?a=1&a=2

GET['a'] == 2

script.py?a[]=1&a[]=2

GET['a'] == [1, 2]

In fact, PHP extends this to allow for dictionary style data structures 
to be passed in from forms as well:

script.py?a[first]=1&a[second]=2

GET['a'] == {'first': 1, 'second': 2}

This is a pretty neat solution, but carries the slight disadvantage that 
information about the way an application is internally structured (i.e 
that it processes form input as a list or dictionary) is exposed in the 
application HTML. That said, from previousl experience with PHP it is an 
extremely powerful technique. For example, check out this example form 
for editing a blog entry:

<form action="updateentry.py" method="post">
<input type="hidden" name="id" value="43">
Title: <input type="text" name="entry[title]"><br>
Author: <input type="text" name="entry[author]"><br>
Entry: <textarea name="entry[body]"></textarea>
</form>

Submitting this form in PHP results in a dictionary style data 
structures called 'entry' being made available to the script, neatly 
encapsulating the data about the entry sent from the form.

I'm sure there's an elegant solution to all of this, but I'm not sure 
what it is :)

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From ianb at colorstudy.com  Fri Oct 24 11:35:32 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 11:35:38 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <3F994469.20304@bath.ac.uk>
Message-ID: <B45DFAE0-0637-11D8-A49B-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 10:25 AM, Simon Willison wrote:
> David Fraser wrote:
>> Fine. So we need a clever way of providing them in either form. I 
>> think using dictionaries for this is essential - even if it means 
>> defining a single dictionary that remembers which fields are get and 
>> which are post, and provides different wrappers to see the different 
>> elements.
>
> I am convinced that the neatest way of handling this is to replicate 
> PHP's form field dictionaries, in particular these three:
>
> GET - the form data that came in via GET
> POST - the form data that came in via POST
> REQUEST - the above two dictionaries combined (POST over-riding GET)
>
> However, there is one special case that needs considering: multiple 
> form fields of the same name. For example, the following URL:
>
> script.py?a=1&a=2
>
> What should GET['a'] return? There are three possibilities: return a 
> list (or tuple) containing 1 and 2, or return 1 (the first value) or 
> return 2 (the second value). The first has a huge disadvantage in that 
> all of a sudden accessing the GET dictionary could return a list or a 
> string - code will then have to start checking the type of the 
> returned data before doing anything with it. The second and third have 
> the disadvantage that some form data gets "lost" by the dictionary.

I think this is already really decided -- if (and only if) there are 
multiple values, then a list should appear in the output.  I.e., {'a': 
['1', '2']}.  This is how cgi works, and how almost all Python request 
objects work.  When there's near-consensus in previous implementations, 
I think we should keep the conventional behavior.  Plus, it means less 
things to decide, which should make the design faster to create.

There are more elegant ways to deal with this, but they are also more 
complex, and there's no One Right Way.  The conventional way throws 
nothing away, can be adapted to any previously existing URL scheme, and 
does not require any trust in the user agent to submit correct input.  
It's also easy to go from the conventional input to another style of 
input, but difficult to go the other way.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From barry at python.org  Fri Oct 24 11:44:49 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Oct 24 11:44:55 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <B45DFAE0-0637-11D8-A49B-000393C2D67E@colorstudy.com>
References: <B45DFAE0-0637-11D8-A49B-000393C2D67E@colorstudy.com>
Message-ID: <1067010289.11634.378.camel@anthem>

On Fri, 2003-10-24 at 11:35, Ian Bicking wrote:

> I think this is already really decided -- if (and only if) there are 
> multiple values, then a list should appear in the output.  I.e., {'a': 
> ['1', '2']}.  This is how cgi works, and how almost all Python request 
> objects work.  When there's near-consensus in previous implementations, 
> I think we should keep the conventional behavior.  Plus, it means less 
> things to decide, which should make the design faster to create.

I agree that it's basically decided, but I want to be clear in any
standard that we develop, exactly what the return types are in that
case, and/or how to test for one or the other.  E.g. you can't use len()
because both lists and strings are sequences.  If the way to type test
the value is going to be "isinstance(val, list)", let's set that in
stone.

Here's another alternative, if Python 2.2 is the minimal requirement
(and I think it should be, if not Python 2.3).  Return string and list
subclasses, which will act perfectly string-like and list-like in those
contexts, but which support extended protocols.  See attached example.

>>> show(s)
single value: hello
>>> show(l)
multi value: hello, world

-Barry


-------------- next part --------------
A non-text attachment was scrubbed...
Name: websig.py
Type: text/x-python
Size: 307 bytes
Desc: 
Url : http://mail.python.org/pipermail/web-sig/attachments/20031024/0ba07080/websig.py
From grisha at modpython.org  Fri Oct 24 11:47:19 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 24 11:47:24 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <200310241009.39834.t.vandervossen@fngtps.com>
References: <CGECIJPNNHIFAJKHOLMACEMMILAA.sholden@holdenweb.com>
	<200310240946.37859.t.vandervossen@fngtps.com>
	<3F98DB8B.1000605@sjsoft.com>
	<200310241009.39834.t.vandervossen@fngtps.com>
Message-ID: <20031024113028.P26153@onyx.ispol.com>


On Fri, 24 Oct 2003, Thijs van der Vossen wrote:

> On Friday 24 October 2003 09:58, David Fraser wrote:
> > Thijs van der Vossen wrote:
> > >Mod_python is probably _not_ a good starting point for a generic web
> > > server API because it's purpose is to directly expose the Apache API. It
> > > makes no sense to model a generic interface on a mostly direct mapping to
> > > the internals of _one_ specific server.
> >
> > I'm not saying that the interface should be modelled on mod_python.
>
> Ok. That's clear then.

I don't know how useful the mod_python interface would be since, as Thijs
pointed out, it exposes the Apache API, with only a slight effort to make
it user-friendly. All of the "cool" parts of mod_python (publisher, psp)
exist as a layer on top of the core API.

However it would be nice if whatever API we come up with, it would be
*implementable* within mod_python.

Of particular concern would the multi-process nature of httpd, which
implies that one cannot simply assume that the memory space is global to
all requests and there needs to be an inter-process communication/locking
mechanism if state is to be maintained on the server side (easier said
than done).

As a sidenote, a multi-process server is a feature, not a limitation,
because it works around the Python GIL bottleneck, allowing you to take
advantage of multiprocessor machines, which is a very important
consideration on the high-end applications.

Grisha


From ianb at colorstudy.com  Fri Oct 24 11:48:56 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 11:49:01 2003
Subject: [Web-SIG] Client-side API
In-Reply-To: <03Oct23.191158pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <939C9B28-0639-11D8-A49B-000393C2D67E@colorstudy.com>

On Thursday, October 23, 2003, at 09:11 PM, Bill Janssen wrote:
> Another possibility would be to mimic the Java 1.4.1 libraries for the
> Web.  For instance, we could have the "URL" object, which has a method
> called "open()",

While I wouldn't necessarily endorse copying a Java library -- because  
there's good stuff in Python already -- there was some ideas about  
unifying filesystem and URL access with the path module as a model:

http://www.jorendorff.com/articles/python/path/

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF- 
8&threadm=mailman.1057651032.22842.python-list%40python.org


--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Fri Oct 24 11:54:01 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 11:54:07 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <3F98D709.9070806@sjsoft.com>
Message-ID: <4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com>

Some minor nits...

On Friday, October 24, 2003, at 02:38 AM, David Fraser wrote:
> For CGI, it would seem to make sense that you do something like the 
> following:
> res = web.cgi.HTTPResponse(sys.stdout)

req = web.cgi.HTTPRequest()
res = req.response

> res.content_type = 'text/html'

res.setHeader('content-type', 'text/html')
# I don't really see a reason that this header needs special attention

> res.set_cookie('name', 'Simon')
> res['X-Additional-Header'] = 'Another header'

res.setHeader('X-additional-header', 'Another header')
# It's not clear what dictionary access to the response object would 
mean.
# res.headers['X-additional-header'] = 'Another header' might be okay
# but it makes it difficult to add multiple headers by the same name -- 
but
# I don't know if HTTP ever really calls for that anyway.

> res.send_headers()
> res.write('<html><h1>Hi there</h1>\n%s' % body)

# This, but also:
res.write('<html><h1>Hi there</h1>\n%s' % body)
res.setHeader('X-Yet-Another-Header', 'Yet another value')
res.commit()
# res.flush()?  Sends headers *and* any body, can be called multiple 
times
res.setHeader('Content-type', 'text/plain')
# raises exception
res.write('</html>')
# does not raise exception

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From grisha at modpython.org  Fri Oct 24 11:56:40 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 24 11:58:18 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F994469.20304@bath.ac.uk>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
Message-ID: <20031024114945.M26153@onyx.ispol.com>


On Fri, 24 Oct 2003, Simon Willison wrote:

> script.py?a=1&a=2
>
> What should GET['a'] return?

I think this is adequately addressed in the FieldStorage starting with
Python 2.2 with getfirst() and getlist():

http://www.python.org/doc/current/lib/node404.html

Grisha

From ianb at colorstudy.com  Fri Oct 24 12:01:51 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 12:03:27 2003
Subject: [Web-SIG] Request/Response features
In-Reply-To: <3F98DD29.30706@sjsoft.com>
Message-ID: <613CA59C-063B-11D8-A49B-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 03:04 AM, David Fraser wrote:
>>> Some other comments:
>>> pathInfo/requestURI
>>>  would be good to have some consistency between these names
>>
>> They are mostly based off their CGI environment equivalent.
>
> OK, now I understand, but is this a good way to name things for the 
> future?

Pluses: CGI request variable names are being used by nearly every 
framework.  They are translatable from other languages, where they are 
also often used.  They are familiar and documented.  If we don't use 
CGI names, then we can't use the names at all, as it would be a bad 
false cognate to use (for instance) requestURI and extraPath.

Minuses: CGI variables don't have well standardized (or implemented) 
semantics.  IIS and Apache (at least) send slightly different things.  
But we can probably paper over those differences as they arise.

>>> getFieldDict
>>>  It would be great if the user could set the behaviour they want for 
>>> multiple keys.
>>>  I know I *always* want to discard any extra values. Including an 
>>> option to do this rather than return a list would prevent lots of 
>>> people doing post-processing
>>
>> That seems to difficult to define.  I don't think there should be 
>> customizations, because that makes it too difficult to work in a 
>> heterogeneous environment.  If you turn that setting on and some 
>> application you are using needs it off, then you get a configuration 
>> mess.  Wrappers could provide more friendly interfaces.
>
> If you defined a setField method as you said above, then people could 
> override it to throw away duplicate values. Maybe this is the way to > go

No, I retract my suggestion for setField ;)  This could be handled by a 
wrapper, like:

def getField(self, key):
     try:
         value = HTTPRequest.getField(self, key)
     except KeyError:
         value = HTTPRequest.getField(self, key + '[]')
         if not isinstance(value, list): return [value]
         return value
     if isinstance(value, list): return value[0]
     return value

This can be adapted to whatever equivalent of getField we use.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From neel at mediapulse.com  Fri Oct 24 12:02:18 2003
From: neel at mediapulse.com (Michael C. Neel)
Date: Fri Oct 24 12:03:41 2003
Subject: [Web-SIG] Re: Form field dictionaries
Message-ID: <C0FC22C08B82074A88B50061764157776B978D@johnson.mediapulse.net>

> I agree that it's basically decided, but I want to be clear in any
> standard that we develop, exactly what the return types are in that
> case, and/or how to test for one or the other.  E.g. you 
> can't use len()
> because both lists and strings are sequences.  If the way to type test
> the value is going to be "isinstance(val, list)", let's set that in
> stone.
> 

I've always used if type(val) == type([]), because I can never remember
type names =p

Mike

From greg-keyword-python.0eae23 at subrosa.ca  Fri Oct 24 12:23:49 2003
From: greg-keyword-python.0eae23 at subrosa.ca (Gregory Collins)
Date: Fri Oct 24 12:17:22 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031024114945.M26153@onyx.ispol.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
Message-ID: <87he1yhaei.fsf@genghis.subrosa.ca>

"Gregory (Grisha) Trubetskoy" <grisha@modpython.org> writes:

> On Fri, 24 Oct 2003, Simon Willison wrote:
> 
> > script.py?a=1&a=2
> >
> > What should GET['a'] return?
> 
> I think this is adequately addressed in the FieldStorage starting with
> Python 2.2 with getfirst() and getlist():

I agree, I think this is the appropriate solution; I'd rather see all
the typechecking pushed down into the library function rather than
being exposed to the programmer. If the argument I'm looking for
doesn't make sense as a list then I wouldn't care if it was given
twice; if I'm expecting something to be a list then I'd want it to be
a list even if it were empty or singleton.

Gregory D. Collins
<greg@subrosa.ca>

From davidf at sjsoft.com  Fri Oct 24 12:28:28 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 12:28:33 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031024114945.M26153@onyx.ispol.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
Message-ID: <3F99532C.9020308@sjsoft.com>

Gregory (Grisha) Trubetskoy wrote:

>On Fri, 24 Oct 2003, Simon Willison wrote:
>
>  
>
>>script.py?a=1&a=2
>>
>>What should GET['a'] return?
>>    
>>
>
>I think this is adequately addressed in the FieldStorage starting with
>Python 2.2 with getfirst() and getlist():
>
>http://www.python.org/doc/current/lib/node404.html
>
>Grisha
>
>  
>
That's fine, but I think it's important that these methods are available 
as an addition to a standard dictionary interface.
I think the key point is, if somebody wants a list of values, they 
probably know that they want a list.
It's very difficult to write code by accident that would handle a list 
of values as well as a string.
So if somebody knows they want a list in certain circumstances, they 
could call getlist()
But I think the default dictionary return value should be the same as 
getfirst().
That saves endless checks for lists for those who don't need them.

David


From cs1spw at bath.ac.uk  Fri Oct 24 12:31:13 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 12:31:34 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <B45DFAE0-0637-11D8-A49B-000393C2D67E@colorstudy.com>
References: <B45DFAE0-0637-11D8-A49B-000393C2D67E@colorstudy.com>
Message-ID: <3F9953D1.5050604@bath.ac.uk>

Ian Bicking wrote:
> I think this is already really decided -- if (and only if) there are 
> multiple values, then a list should appear in the output.  I.e., {'a': 
> ['1', '2']}.  This is how cgi works, and how almost all Python request 
> objects work.

I don't have enough practical Python web development experience to back 
this up, but it seems to me that this could lead to an awful lot of 
unhandled exceptions. For example:

username = GET['username'].lower()

This would work fine provided no one fed two username values to the 
script, at which point it would die with an exception:

Traceback (most recent call last):
   File "<pyshell#44>", line 1, in -toplevel-
     GET['username'].lower()
AttributeError: 'list' object has no attribute 'lower'

Adding exception handling to every piece of code that accesses string 
values from a form field dictionary would be a pretty tall order.

One alternative might be some kind of enhanced form field access object 
that adds a layer of validation. For example:

form = web.cgi.ValidatingForm()

try:
     username = form.get_string('username')
     id = form.get_int('id')
     permissions = form.get_list('permissions')
except ValidationError:
     print 'Invalid form data'
     redisplayform()

Form validation like this though is really a whole other topic.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From davidf at sjsoft.com  Fri Oct 24 12:31:13 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 12:31:36 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <C0FC22C08B82074A88B50061764157776B978D@johnson.mediapulse.net>
References: <C0FC22C08B82074A88B50061764157776B978D@johnson.mediapulse.net>
Message-ID: <3F9953D1.6050407@sjsoft.com>

Michael C. Neel wrote:

>>I agree that it's basically decided, but I want to be clear in any
>>standard that we develop, exactly what the return types are in that
>>case, and/or how to test for one or the other.  E.g. you 
>>can't use len()
>>because both lists and strings are sequences.  If the way to type test
>>the value is going to be "isinstance(val, list)", let's set that in
>>stone.
>>    
>>
>I've always used if type(val) == type([]), because I can never remember
>type names =p
>
>Mike
>  
>
If the list is actually a class derived from a list, then that won't 
catch it. That's why isinstance is used here

David


From barry at python.org  Fri Oct 24 12:36:52 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Oct 24 12:36:57 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <3F9953D1.6050407@sjsoft.com>
References: <C0FC22C08B82074A88B50061764157776B978D@johnson.mediapulse.net>
	<3F9953D1.6050407@sjsoft.com>
Message-ID: <1067013412.10257.9.camel@anthem>

On Fri, 2003-10-24 at 12:31, David Fraser wrote:

> If the list is actually a class derived from a list, then that won't 
> catch it. That's why isinstance is used here

Right, and remember since Python 2.2, the type names (well for strings
and lists) are what used to be built-in coercion functions in earlier
Pythons.

But I'd love to see a solution that didn't require type tests.

-Barry


From ianb at colorstudy.com  Fri Oct 24 12:36:59 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 12:37:04 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F99532C.9020308@sjsoft.com>
Message-ID: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote:
> That's fine, but I think it's important that these methods are 
> available as an addition to a standard dictionary interface.
> I think the key point is, if somebody wants a list of values, they 
> probably know that they want a list.
> It's very difficult to write code by accident that would handle a list 
> of values as well as a string.
> So if somebody knows they want a list in certain circumstances, they 
> could call getlist()
> But I think the default dictionary return value should be the same as 
> getfirst().
> That saves endless checks for lists for those who don't need them.

Every time I have encountered an unexpected list it has been because of 
a bug somewhere else in my code.  I might use a getone() method that 
threw some exception when a list was encountered, but I'd *never* want 
to use getfirst().  getfirst() is sloppy programming.  (getlist() is 
perfectly fine though)

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From cs1spw at bath.ac.uk  Fri Oct 24 12:37:06 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 12:37:16 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <87he1yhaei.fsf@genghis.subrosa.ca>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>	<3F98DD29.30706@sjsoft.com>
	<3F994469.20304@bath.ac.uk>	<20031024114945.M26153@onyx.ispol.com>
	<87he1yhaei.fsf@genghis.subrosa.ca>
Message-ID: <3F995532.9040309@bath.ac.uk>

Gregory Collins wrote:
>>I think this is adequately addressed in the FieldStorage starting with
>>Python 2.2 with getfirst() and getlist():
> 
> I agree, I think this is the appropriate solution; I'd rather see all
> the typechecking pushed down into the library function rather than
> being exposed to the programmer. If the argument I'm looking for
> doesn't make sense as a list then I wouldn't care if it was given
> twice; if I'm expecting something to be a list then I'd want it to be
> a list even if it were empty or singleton.

The vast majority of data sent from forms coems in as simple name/value 
pairs, which are crying out to be accessed from a dictionary. This is my 
problem with the current FieldStorage() class - it forces you to write 
code like this:

username = form.getfirst("username", "")

When code like this is far more intuitive:

username = form['username']

The extended syntax is there to deal with the very rare case of multiple 
dataa arriving for the same key. Is it really worth doubling the length 
of the code needed to access the form variables for the sake of a very 
rare edge case? This is why I'd prefer to find an alternative solution.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From barry at python.org  Fri Oct 24 12:40:10 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Oct 24 12:40:14 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F99532C.9020308@sjsoft.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>  <3F99532C.9020308@sjsoft.com>
Message-ID: <1067013609.10257.12.camel@anthem>

BTW, I'll note interestingly enough that in some recent cgi-ish
applications I've written, I've always wanted __getitem__() to return a
list.  If there's one form variable by that name, I coerce the singleton
string to a list of one element.  For various reasons, it's been quite
handy to treat everything uniformly in this manner.

Maybe that's another option for the library.
-Barry


From greg-keyword-python.0eae23 at subrosa.ca  Fri Oct 24 12:50:01 2003
From: greg-keyword-python.0eae23 at subrosa.ca (Gregory Collins)
Date: Fri Oct 24 12:43:06 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F995532.9040309@bath.ac.uk>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
	<87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk>
Message-ID: <87d6cmh96u.fsf@genghis.subrosa.ca>

Simon Willison <cs1spw@bath.ac.uk> writes:

> The extended syntax is there to deal with the very rare case of
> multiple dataa arriving for the same key. Is it really worth doubling
> the length of the code needed to access the form variables for the
> sake of a very rare edge case? This is why I'd prefer to find an
> alternative solution.

So if you access the object as a dictionary, should it behave as
getfirst() or not? I'd argue for the former, in the rare instances
you'd want a list I don't think it's onerous to have to type out
obj.getlist("foo").

Gregory D. Collins
<greg@subrosa.ca>

From cs1spw at bath.ac.uk  Fri Oct 24 13:01:49 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 13:02:34 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F99532C.9020308@sjsoft.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
	<3F99532C.9020308@sjsoft.com>
Message-ID: <3F995AFD.1010607@bath.ac.uk>

David Fraser wrote:
> It's very difficult to write code by accident that would handle a list 
> of values as well as a string.
> So if somebody knows they want a list in certain circumstances, they 
> could call getlist()
> But I think the default dictionary return value should be the same as 
> getfirst().
> That saves endless checks for lists for those who don't need them.

+1 - that sounds like a nice compromise

Simon


From ianb at colorstudy.com  Fri Oct 24 13:02:23 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 13:02:41 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <1067013609.10257.12.camel@anthem>
Message-ID: <D679593C-0643-11D8-A49B-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 11:40 AM, Barry Warsaw wrote:
> BTW, I'll note interestingly enough that in some recent cgi-ish
> applications I've written, I've always wanted __getitem__() to return a
> list.  If there's one form variable by that name, I coerce the 
> singleton
> string to a list of one element.  For various reasons, it's been quite
> handy to treat everything uniformly in this manner.
>
> Maybe that's another option for the library.

No, please no options!  You already could get this through some 
getlist() method, or just make a wrapper, or just fiddle with the 
request in place:

req.lfields = {}
for name, value in req.fields(): # or whatever
     if not isinstance(value, list): value = [value]
     request.lfields[name] = value

Any of these will leave the request object usable by other code that 
expects normal behavior.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From sholden at holdenweb.com  Fri Oct 24 13:17:49 2003
From: sholden at holdenweb.com (Steve Holden)
Date: Fri Oct 24 13:22:51 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <3F98D5C7.8030802@sjsoft.com>
Message-ID: <CGECIJPNNHIFAJKHOLMAKEGJIMAA.sholden@holdenweb.com>

> >The one problem I see with mod_python is its defaulting
> behavior - you
> >can get the same content several different ways. Specifically, the
> >following URLs
> >
> >	http://server/
> >	http://server/index.py
> >	http://server/index.py.index
> >
> >all refer to the same content, and this makes it rather difficult to
> >come up with a scheme for producing sensible relative URLs -- the
> >browsers don't always interpret the path the same way the
> server does --
> >which in turn can make it difficult to produce easily portable web
> >content.
> >  > >
> Hmmm ... looks like you are using AddHandler for .py files. I
> generally
> find that placing the Python files outside of the web directory, in
> libraries, works better. Then you can use SetHandler to get
> mod_python
> to handle everything, or AddHandler for specific file types
> to get it to
> handle some URLs. It makes more sense to me to have a URL of
> index.htm
> rather than index.py (why should the user care what I'm using
> to produce
> the file?)
>
> Hope that is relevant and/or helpful
>
Both, thanks very much. I only recently started using mod_python - it's
already been pointed out that my complaint is specific to the publisher
subsystem, and now I have what looks like a much better idea. Thanks a
lot.

The real problem is I just did what many new users do, and followed
along from the documentation. Which, by the way, is rather better than
for many other pieces of open source software, but there's always room
for improvement. Thanks again!

regards
--
Steve Holden          +1 703 278 8281        http://www.holdenweb.com/
Improve the Internet           http://vancouver-webpages.com/CacheNow/
Python Web Programming                http://pydish.holdenweb.com/pwp/
Interview with GvR August 14, 2003       http://www.onlamp.com/python/


From sholden at holdenweb.com  Fri Oct 24 13:27:08 2003
From: sholden at holdenweb.com (Steve Holden)
Date: Fri Oct 24 13:32:03 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <1067010289.11634.378.camel@anthem>
Message-ID: <CGECIJPNNHIFAJKHOLMAEEGKIMAA.sholden@holdenweb.com>

> -----Original Message-----
> From: web-sig-bounces+sholden=holdenweb.com@python.org
> [mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of
> Barry Warsaw
> Sent: Friday, October 24, 2003 11:45 AM
> To: Ian Bicking
> Cc: web-sig@python.org
> Subject: Re: [Web-SIG] Re: Form field dictionaries
>
>
> On Fri, 2003-10-24 at 11:35, Ian Bicking wrote:
>
> > I think this is already really decided -- if (and only if)
> there are
> > multiple values, then a list should appear in the output.
> I.e., {'a':
> > ['1', '2']}.  This is how cgi works, and how almost all
> Python request
> > objects work.  When there's near-consensus in previous
> implementations,
> > I think we should keep the conventional behavior.  Plus, it
> means less
> > things to decide, which should make the design faster to create.
>
> I agree that it's basically decided, but I want to be clear in any
> standard that we develop, exactly what the return types are in that
> case, and/or how to test for one or the other.  E.g. you
> can't use len()
> because both lists and strings are sequences.  If the way to type test
> the value is going to be "isinstance(val, list)", let's set that in
> stone.
>
> Here's another alternative, if Python 2.2 is the minimal requirement
> (and I think it should be, if not Python 2.3).  Return string and list
> subclasses, which will act perfectly string-like and
> list-like in those
> contexts, but which support extended protocols.  See attached example.
>
> >>> show(s)
> single value: hello
> >>> show(l)
> multi value: hello, world
>
I've argued in the past that the correct approach is to determine in
advance which fields can take multiple values, and reject multiple
values for other fields as an error early in the form processing.

The reason I say this is that it's annoying and inefficient to
distinguish between a possibly-multi-valued field with only one value
and a possibly-multi-valued field with multiple values.

Here's an exerpt from a message I sent to ReportLab colleagues, and
although it refers to a specific framework the intent should be obvious.
The bottom line is that if a field *could* have multiple values I
*always* want to see it as a list, even if the list only has a single
member. And, of course, I *know* I'm right about this :-)

> def getCGIParams(*names):
>     "returns dictionary of parameters found in the cgi script"
>     dictionary = {}
>     import cgi
>     form = cgi.FieldStorage()
>     for name in form.keys():
>         value = form.getvalue(name)
>         if isinstance(value, list):
>             if name not in names:
>                 raise IllegalMultiValue
>             dictionary[name] = [quoteValue(v) for v in value]
>         else:
>             if name in names:
>                 dictionary[name] = [quotevalue(value)]
>             else:
>                 dictionary[name] = quotevalue(value)
>     return dictionary
>
> This has the advantages that a) possibly-multi-valued arguments
> are always represented as lists, and client code can use
> len(arglist) to determine iteration count and subscripting
> to select items, and b) we trap much earlier the case
> where we see multiple values of arguments that are only
> supposed to occur once.

regards
--
Steve Holden          +1 703 278 8281        http://www.holdenweb.com/
Improve the Internet           http://vancouver-webpages.com/CacheNow/
Python Web Programming                http://pydish.holdenweb.com/pwp/
Interview with GvR August 14, 2003       http://www.onlamp.com/python/


From ianb at colorstudy.com  Fri Oct 24 13:39:02 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 13:39:06 2003
Subject: [Web-SIG] Exceptions (was: Form field dictionaries)
In-Reply-To: <CGECIJPNNHIFAJKHOLMAEEGKIMAA.sholden@holdenweb.com>
Message-ID: <F4D61EEC-0648-11D8-A49B-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 12:27 PM, Steve Holden wrote:
> I've argued in the past that the correct approach is to determine in
> advance which fields can take multiple values, and reject multiple
> values for other fields as an error early in the form processing.

This brings up error handling.  If you encounter a bad request (e.g., 
mutliple fields where it's not expected), what do you do?  An internal 
exception isn't good, because it's not really an internal error -- you 
get Internal Server Error, log messages imply your code is broken, etc.

It would be nice instead to be able to throw an exception that would be 
translated into the proper response (in the CGI environment this could 
use a process like cgitb, in other environments the hook is a bit 
easier to put in).

Of course, once you have a bad request exception, redirect, forbidden, 
authentication required, and other responses all make sense as 
exceptions too...

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From davidf at sjsoft.com  Fri Oct 24 13:53:14 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 13:53:25 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <CGECIJPNNHIFAJKHOLMAEEGKIMAA.sholden@holdenweb.com>
References: <CGECIJPNNHIFAJKHOLMAEEGKIMAA.sholden@holdenweb.com>
Message-ID: <3F99670A.8050901@sjsoft.com>

Steve Holden wrote:

>>-----Original Message-----
>>From: web-sig-bounces+sholden=holdenweb.com@python.org
>>[mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of
>>Barry Warsaw
>>Sent: Friday, October 24, 2003 11:45 AM
>>To: Ian Bicking
>>Cc: web-sig@python.org
>>Subject: Re: [Web-SIG] Re: Form field dictionaries
>>
>>
>>On Fri, 2003-10-24 at 11:35, Ian Bicking wrote:
>>
>>    
>>
>>>I think this is already really decided -- if (and only if)
>>>      
>>>
>>there are
>>    
>>
>>>multiple values, then a list should appear in the output.
>>>      
>>>
>>I.e., {'a':
>>    
>>
>>>['1', '2']}.  This is how cgi works, and how almost all
>>>      
>>>
>>Python request
>>    
>>
>>>objects work.  When there's near-consensus in previous
>>>      
>>>
>>implementations,
>>    
>>
>>>I think we should keep the conventional behavior.  Plus, it
>>>      
>>>
>>means less
>>    
>>
>>>things to decide, which should make the design faster to create.
>>>      
>>>
>>I agree that it's basically decided, but I want to be clear in any
>>standard that we develop, exactly what the return types are in that
>>case, and/or how to test for one or the other.  E.g. you
>>can't use len()
>>because both lists and strings are sequences.  If the way to type test
>>the value is going to be "isinstance(val, list)", let's set that in
>>stone.
>>
>>Here's another alternative, if Python 2.2 is the minimal requirement
>>(and I think it should be, if not Python 2.3).  Return string and list
>>subclasses, which will act perfectly string-like and
>>list-like in those
>>contexts, but which support extended protocols.  See attached example.
>>
>>    
>>
>>>>>show(s)
>>>>>          
>>>>>
>>single value: hello
>>    
>>
>>>>>show(l)
>>>>>          
>>>>>
>>multi value: hello, world
>>
>>    
>>
>I've argued in the past that the correct approach is to determine in
>advance which fields can take multiple values, and reject multiple
>values for other fields as an error early in the form processing.
>
>The reason I say this is that it's annoying and inefficient to
>distinguish between a possibly-multi-valued field with only one value
>and a possibly-multi-valued field with multiple values.
>
>Here's an exerpt from a message I sent to ReportLab colleagues, and
>although it refers to a specific framework the intent should be obvious.
>The bottom line is that if a field *could* have multiple values I
>*always* want to see it as a list, even if the list only has a single
>member. And, of course, I *know* I'm right about this :-)
>  
>
Agreed, you should get a list, and only a list, if you want a list.
You should otherwise get a single value regardless.

>>def getCGIParams(*names):
>>    "returns dictionary of parameters found in the cgi script"
>>    dictionary = {}
>>    import cgi
>>    form = cgi.FieldStorage()
>>    for name in form.keys():
>>        value = form.getvalue(name)
>>        if isinstance(value, list):
>>            if name not in names:
>>                raise IllegalMultiValue
>>            dictionary[name] = [quoteValue(v) for v in value]
>>        else:
>>            if name in names:
>>                dictionary[name] = [quotevalue(value)]
>>            else:
>>                dictionary[name] = quotevalue(value)
>>    return dictionary
>>
>>This has the advantages that a) possibly-multi-valued arguments
>>are always represented as lists, and client code can use
>>len(arglist) to determine iteration count and subscripting
>>to select items, and b) we trap much earlier the case
>>where we see multiple values of arguments that are only
>>supposed to occur once.
>>
This kind of thing could be fairly simply implemented on top of the API, 
so the question is, is it common and important enough to be included, 
and what should be the syntax

David


From davidf at sjsoft.com  Fri Oct 24 14:00:30 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 24 14:00:53 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com>
References: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com>
Message-ID: <3F9968BE.1010009@sjsoft.com>

Ian Bicking wrote:

> On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote:
>
>> That's fine, but I think it's important that these methods are 
>> available as an addition to a standard dictionary interface.
>> I think the key point is, if somebody wants a list of values, they 
>> probably know that they want a list.
>> It's very difficult to write code by accident that would handle a 
>> list of values as well as a string.
>> So if somebody knows they want a list in certain circumstances, they 
>> could call getlist()
>> But I think the default dictionary return value should be the same as 
>> getfirst().
>> That saves endless checks for lists for those who don't need them.
>
>
> Every time I have encountered an unexpected list it has been because 
> of a bug somewhere else in my code.  I might use a getone() method 
> that threw some exception when a list was encountered, but I'd *never* 
> want to use getfirst().  getfirst() is sloppy programming.  (getlist() 
> is perfectly fine though)

There seems to be a lot of agreement on this...
So let's take it that the interface will be a dictionary, with an extra 
method defined, getlist, which will return multiple items if multiple 
items were defined, or a list containing a single item otherwise.
The next question is, how do we handle the Get/Post/Both situation?
One way would be to have methods on the request object that return the 
desired dictionary
Somebody also suggested including Cookies, as is done in PHP - I'm not 
sure this is a good idea

David


From grisha at modpython.org  Fri Oct 24 14:55:07 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 24 14:55:13 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F9968BE.1010009@sjsoft.com>
References: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com>
	<3F9968BE.1010009@sjsoft.com>
Message-ID: <20031024141819.R70244@onyx.ispol.com>


On Fri, 24 Oct 2003, David Fraser wrote:

> The next question is, how do we handle the Get/Post/Both situation?

Just to clarify nomenclature -

POST /blah/blah.py?foo=bar

is a valid request. The part after ? is called "query information", this
is defined in RFC 1808 and RFC1738.

CGI (which has no formal RFC, but there is Ken Coar's excellent draft)
introduces something called "path-info", but its meaning is rather vague
outside of cgi since it relies on a notion of a script, which isn't very
meaningful in most non-CGI environments.

The data submitted in the body of the POST request is called "form data"
and I believe is described in RFC 1867.

I think that query information and form data can be combined in a single
mapping object, because if you want just query data, you can always parse
the url directly via urlparse, and if you want only form data, you can
read and parse it directly as a mime object.

Path-info I think should be left where it belongs - in the cgi-specific
module.

Grisha

From sholden at holdenweb.com  Fri Oct 24 14:52:20 2003
From: sholden at holdenweb.com (Steve Holden)
Date: Fri Oct 24 14:57:15 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F9968BE.1010009@sjsoft.com>
Message-ID: <CGECIJPNNHIFAJKHOLMAKEHDIMAA.sholden@holdenweb.com>

> -----Original Message-----
> From: web-sig-bounces+sholden=holdenweb.com@python.org
> [mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of
> David Fraser
> Sent: Friday, October 24, 2003 2:01 PM
> To: Ian Bicking
> Cc: web-sig@python.org
> Subject: Re: [Web-SIG] Form field dictionaries
>
>
> Ian Bicking wrote:
>
> > On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote:
> >
> >> That's fine, but I think it's important that these methods are
> >> available as an addition to a standard dictionary interface.
> >> I think the key point is, if somebody wants a list of values, they
> >> probably know that they want a list.
> >> It's very difficult to write code by accident that would handle a
> >> list of values as well as a string.
> >> So if somebody knows they want a list in certain
> circumstances, they
> >> could call getlist()
> >> But I think the default dictionary return value should be
> the same as
> >> getfirst().
> >> That saves endless checks for lists for those who don't need them.
> >
> >
> > Every time I have encountered an unexpected list it has
> been because
> > of a bug somewhere else in my code.  I might use a getone() method
> > that threw some exception when a list was encountered, but
> I'd *never*
> > want to use getfirst().  getfirst() is sloppy programming.
> (getlist()
> > is perfectly fine though)
>
> There seems to be a lot of agreement on this...
> So let's take it that the interface will be a dictionary,
> with an extra
> method defined, getlist, which will return multiple items if multiple
> items were defined, or a list containing a single item otherwise.
> The next question is, how do we handle the Get/Post/Both situation?
> One way would be to have methods on the request object that
> return the
> desired dictionary
> Somebody also suggested including Cookies, as is done in PHP
> - I'm not
> sure this is a good idea
>
The only nit I would pick is to have getlist() return a list even when
the response contained a single value.

regards
--
Steve Holden          +1 703 278 8281        http://www.holdenweb.com/
Improve the Internet           http://vancouver-webpages.com/CacheNow/
Python Web Programming                http://pydish.holdenweb.com/pwp/
Interview with GvR August 14, 2003       http://www.onlamp.com/pytho


From ianb at colorstudy.com  Fri Oct 24 15:11:54 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 15:12:02 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F9968BE.1010009@sjsoft.com>
Message-ID: <EE433922-0655-11D8-A49B-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 01:00 PM, David Fraser wrote:
> Ian Bicking wrote:
>> Every time I have encountered an unexpected list it has been because 
>> of a bug somewhere else in my code.  I might use a getone() method 
>> that threw some exception when a list was encountered, but I'd 
>> *never* want to use getfirst().  getfirst() is sloppy programming.  
>> (getlist() is perfectly fine though)
>
> There seems to be a lot of agreement on this...
> So let's take it that the interface will be a dictionary, with an 
> extra method defined, getlist, which will return multiple items if 
> multiple items were defined, or a list containing a single item 
> otherwise.

Additionally, getlist should return the empty list if the key isn't 
found, as this follows naturally (but a KeyError for normal access when 
a value isn't found).  I also think cgi's default of throwing away 
empty fields should not be supported, even optionally.


But I haven't really heard reaction to the idea that you get a 
BadRequest or other exception if you try to get key that has multiple 
values.  Throwing information away is bad, and unPythonic (though very 
PHPish).  I don't think we should copy PHP here.  I have *never* 
encountered a situation where throwing away extra values found in the 
query is the correct solution.  Either the form that is doing the 
submission has a bug, or else the script needs to figure out some 
(explicit!) way to handle the ambiguity.

We also need a way to get at the raw values.  I suppose you could do:

fields = {}
for key in req.fields.items():
     v = req.getlist(key)
     if len(v) == 1: fields[key] = v[0]
     else: fields[key] = v

But that's kind of annoying, since the request object probably contains 
this kind of dictionary already.  This will be required for backward 
compatibility, if we want this request to be wrapped to support 
existing request interfaces.

As long as we're thinking about type information, there's also file 
uploads.  cgi makes them look like normal fields, but at considerable 
expense to the overall API (always using .value).  Everyone else puts 
the file-like objects into the variable, so you might end up testing:

val = req['somefield']
try:
     val = val.read()
except AttributeError:
     pass

Most of the time this isn't required, as you will seldom get a file 
upload from a source where you don't expect it.  But though less 
common, it's the same basic issue as the list issue.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From jjl at pobox.com  Fri Oct 24 15:12:26 2003
From: jjl at pobox.com (John J Lee)
Date: Fri Oct 24 15:12:31 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <3F98DB02.3010407@sjsoft.com>
References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com>
Message-ID: <Pine.LNX.4.58.0310241936540.526@alice>

On Fri, 24 Oct 2003, David Fraser wrote:
> Thijs van der Vossen wrote:
[...]
> Actually HTML parsing would be fantastic for testing web applications,
> so maybe that could be related to the Web API.

There's already HTML parsing in the std lib, of course.  Do you mean a
DOM-like API of some kind?  What in particular?  I'm not certain there
would be agreement about what is needed.


> The parsing doesn't have to be very intelligent or do validation, HTML
> syntax is fairly simple.
[...]

Hmm, well, it's simple when it's valid, and especially when it doesn't
miss out optional tags, etc.  Could you specify more closely what you have
in mind?


John

From jjl at pobox.com  Fri Oct 24 15:31:03 2003
From: jjl at pobox.com (John J Lee)
Date: Fri Oct 24 15:31:27 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <3F993C3C.5080402@metamoof.net>
References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com>
	<3F993C3C.5080402@metamoof.net>
Message-ID: <Pine.LNX.4.58.0310242012541.526@alice>

On Fri, 24 Oct 2003, Moof wrote:

> David Fraser wrote:
>
>
>  > Actually HTML parsing would be fantastic for testing web
> applications, > so maybe that could be related to the Web API.
>
>
> Actually, that is a very important point. Many python programmers are
> fans of Test-driven development. I'm currently developing an app with
> Webware and Cheetah, and find it very difficult to write tests for a lot
> of the stuff I do. This is mostly due to a huge amount of background
> work I need to do to set up an emulation environment first (make sure my
> request and session objects work correctly as far as I need them to for
> my testing, replacing the Page write and writeln methods, and so on) and
> even then, verifying a whole generated page is a pain.

Isn't it the fault of the framework you're using if it doesn't make unit
testing easy?  Still, I guess it's true that HTML parsing is a necessary
part of some unit tests (not only functional tests).


> So a standard HTML parser would be nice, as well as keeping TDD in mind
> when we design request and response (and possibly session) objects.

We already have an HTML parser (two, in fact).


>  > The parsing doesn't have to be very intelligent or do validation,
> HTML > syntax is fairly simple.
>  > I think that does belong in the standard library.
>
>
> Speaking of validation, a sort of standard form validation library would
> be nice: something to say "I'm expecting this value to be an int between
> 1-31" or "I'm expecting this to be a string with the following legal
> characters" and so on. It's not that difficult to write yourself, but I
> seem to find myself reinventing the wheel every time I do. A standard
> "best practice" way of doing this would be wonderful.

I guess that would look similar to ClientForm?  If not, what?

I'm not enthusiastic about putting something like that in the std lib.
One reason is that, unless you build it on top of a DOM-like API, you end
up with a library that gets you 'so far and no further' -- as soon as you
need to know what that <a href> URL is (the one underneath the third table
from the top), you're stuck, because the parser that built the forms
object model didn't record that.  So, it makes a lot of sense to build
this kind of forms- and tables-parsing code on top of a DOM-like API that
represents the whole document, not just the forms and/or the tables.  And
if you're going to be DOM-*like*, it makes sense to do it on top of the
HTML DOM *itself*, so you can support embedded scripting.  But HTML DOM
'as deployed' in browsers is not pretty, and doesn't really belong in the
standard library.  Well, that was the train of thought that lead to
DOMForm, anyway.  I can see that embedded scripting might be of little
interest to many people, so maybe there's a place for a Pythonic HTML
DOM-like API in the std lib.  Does anybody else care about
interoperability with the HTML DOM proper, or is it just me?


John

From gstein at lyra.org  Fri Oct 24 16:20:28 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 24 16:26:13 2003
Subject: [Web-SIG] [server-side] request/response objects
Message-ID: <20031024132028.C15765@lyra.org>

In the most recent incarnation of a webapp of mine (subwiki), I almost
went with a request/response object paradigm and even started a bit of
refactoring along those lines. However, I ended up throwing out that
dual-object concept.

When you stop and think about it: *every* request object will have a
matching response object. Why have two objects if they come in pairs? You
will never see one without the other, and they are intrinsically tied to
each other. So why separate them?

I set up the subwiki core to instantiate a "handler" each time a request
comes in. That Handler instance provides access to the request info, and
is the conduit for generating the response. The app dispatches to the
appropriate command function, passing the handler.

The Handler is actually set up as a base class, with two subclasses so
far: cgi, and cmdline. This lets me do some testing from the command line,
along with the standard cgi model of access. At some point, I'll implement
a mod_python subclass to do the request/response handling.

(as a side note, I'll also point out that Apache operates this way, too;
 everything is based around the request_rec structure; it holds all the
 input data, output headers, the input and output filter chains, etc)


In any kind of server-side framework design, I would give a big +1 to
keeping it simple with a single "handler" type of object rather than a
dual-object design.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From gstein at lyra.org  Fri Oct 24 16:22:03 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 24 16:27:06 2003
Subject: [Web-SIG] urllib docs (was: client-side support: PEP 268)
In-Reply-To: <Pine.LNX.4.58.0310240211380.662@alice>;
	from jjl@pobox.com on Fri, Oct 24, 2003 at 02:13:54AM +0100
References: <20031022165217.I11797@lyra.org>
	<Pine.LNX.4.58.0310240211380.662@alice>
Message-ID: <20031024132203.D15765@lyra.org>

On Fri, Oct 24, 2003 at 02:13:54AM +0100, John J Lee wrote:
> Greg (or anybody else, for that matter), would you mind looking at these
> doc bugs?
> 
> http://www.python.org/sf/793553
> http://www.python.org/sf/798244

I avoid the urllib libraries in my client code, and tend to stick to just
the httplib connections. I only barely get near those, so I don't have any
particular knowledge to fix those doc issues.

Sorry :-(

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From gstein at lyra.org  Fri Oct 24 16:24:40 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 24 16:29:37 2003
Subject: [Web-SIG] file uploads
In-Reply-To: <03Oct23.190833pdt."58611"@synergy1.parc.xerox.com>;
	from janssen@parc.com on Thu, Oct 23, 2003 at 07:08:29PM -0700
References: <Pine.LNX.4.58.0310240039080.476@alice>
	<03Oct23.190833pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <20031024132440.E15765@lyra.org>

On Thu, Oct 23, 2003 at 07:08:29PM -0700, Bill Janssen wrote:
> > I certainly think a function for doing file uploads would be great,
> > though.
>...
> def https_post_multipart(host, port, selector, fields, files):
>     """
>     Post fields and files to an http host as multipart/form-data.
>     FIELDS is a sequence of (name, value) elements for regular form fields.
>     FILES is a sequence of (name, filename [, value]) elements for data to be uploaded as files.
>     Return the server's response page.
>     """
>     content_type, body = encode_multipart_formdata(fields, files)
>     h = httplib.HTTPS(host, port)

Note that that class is deprecated.

In any "new" code which is developed [by this SIG], please stick with the
HTTP(S)Connection objects.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From gstein at lyra.org  Fri Oct 24 16:45:30 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 24 16:50:57 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <3F98DB02.3010407@sjsoft.com>;
	from davidf@sjsoft.com on Fri, Oct 24, 2003 at 09:55:46AM +0200
References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com>
Message-ID: <20031024134530.F15765@lyra.org>

On Fri, Oct 24, 2003 at 09:55:46AM +0200, David Fraser wrote:
>...
> Actually HTML parsing would be fantastic for testing web applications, 
> so maybe that could be related to the Web API.
> The parsing doesn't have to be very intelligent or do validation, HTML 
> syntax is fairly simple.
> I think that does belong in the standard library.

There has been an HTML parser in the standard library for *YEARS*. I don't
think there is an action item here.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From gstein at lyra.org  Fri Oct 24 16:52:36 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 24 16:57:35 2003
Subject: [Web-SIG] validation (was: Form field dictionaries)
In-Reply-To: <CGECIJPNNHIFAJKHOLMAEEGKIMAA.sholden@holdenweb.com>;
	from sholden@holdenweb.com on Fri, Oct 24, 2003 at 01:27:08PM
	-0400
References: <1067010289.11634.378.camel@anthem>
	<CGECIJPNNHIFAJKHOLMAEEGKIMAA.sholden@holdenweb.com>
Message-ID: <20031024135236.G15765@lyra.org>

On Fri, Oct 24, 2003 at 01:27:08PM -0400, Steve Holden wrote:
>...
> I've argued in the past that the correct approach is to determine in
> advance which fields can take multiple values, and reject multiple
> values for other fields as an error early in the form processing.

Actually, I would upgrade this *way* past what you're thinking here. I
think that every input/form field should have a definition and associated
validation for it. Simple reason: cross-site scripting attacks.

CSS attacks are a very real worry, and I think any core, form-handling on
the server should provide easy mechanisms for dealing with it.

Within ViewCVS, I process all incoming parameters. If the param is not
recognized, an error is thrown. If the param does not match a specific
format (e.g. numeric or matching <some-regex>), then an error is thrown.
ViewCVS doesn't have multi-valued parameters, but the validation concept
could easily test a mismatch between single/multi values.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From gstein at lyra.org  Fri Oct 24 17:06:44 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 24 17:12:29 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com>;
	from ianb@colorstudy.com on Fri, Oct 24, 2003 at 10:54:01AM -0500
References: <3F98D709.9070806@sjsoft.com>
	<4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com>
Message-ID: <20031024140644.H15765@lyra.org>

On Fri, Oct 24, 2003 at 10:54:01AM -0500, Ian Bicking wrote:
>...
> > res.content_type = 'text/html'
> 
> res.setHeader('content-type', 'text/html')
> # I don't really see a reason that this header needs special attention

Because it is a header that should almost always be set. Might even be
required (dunno off the top of my head; RFC 2616 would say).

Note that a character set should be set in there, too. Omitting the
character set can cause problems, although I forget the exact nature of
those. A few years ago, Apache httpd went and did a lot of work to add
character sets into the Content-Type header; providing defaults and
directives to make it easier and whatnot. It was done for some security
reason, if I recall correctly.

> > res.set_cookie('name', 'Simon')
> > res['X-Additional-Header'] = 'Another header'
> 
> res.setHeader('X-additional-header', 'Another header')
> # It's not clear what dictionary access to the response object would 
> mean.
> # res.headers['X-additional-header'] = 'Another header' might be okay
> # but it makes it difficult to add multiple headers by the same name -- 
> but
> # I don't know if HTTP ever really calls for that anyway.

HTTP specifically discusses what happens when you see two headers with the
same name:

    Some-Header: foo
    Some-Header: bar

is equivalent to:

    Some-Header: foo, bar

i.e. concatenate with a comma. While it is allowed, there is *generally*
no reason for the API to enable writing separate headers, nor a reason to
expose same-named headers as separate (i.e. just concatenate them
internally).

Note that I say "generally" because I've seen a client that could not deal
properly with a long header value. By separating the tokens in the header
across multiple instances, the client worked. IOW, a single line couldn't
be longer than about 64 characters, but its internal value-concatenation
worked just fine for long logical values.

> > res.send_headers()
> > res.write('<html><h1>Hi there</h1>\n%s' % body)
> 
> # This, but also:
> res.write('<html><h1>Hi there</h1>\n%s' % body)
> res.setHeader('X-Yet-Another-Header', 'Yet another value')
> res.commit()
> # res.flush()?  Sends headers *and* any body, can be called multiple 
> times
> res.setHeader('Content-type', 'text/plain')
> # raises exception

That shouldn't raise an exception *until* you use a method which writes
body-content. If you're talking about a method to send and *end* the
header block, then it needs a better name. i.e. send_headers()  (and yes,
after calling that method, further headers would raise an exception)

Both the client and server libraries should also respect the "Expect"
header around the header/body transition point. See RFC 2616 for more info
about that. Essentially, the client can send the headers, wait for the
server to say "go ahead" (or throw errors back to the client), and *then*
upload that 5 gigabyte body. It provides a way for the server to resolve
authz/authn (or other) problems before you get into the business of
uploading huge bodies.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From janssen at parc.com  Fri Oct 24 17:55:26 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 17:55:58 2003
Subject: [Web-SIG] Client-side API 
In-Reply-To: Your message of "Thu, 23 Oct 2003 22:40:29 PDT."
	<3F98BB4D.2020100@bath.ac.uk> 
Message-ID: <03Oct24.145534pdt."58611"@synergy1.parc.xerox.com>

> I'm all for replicating the capabilities of Java libraries (if they have 
> a good bunch of features) but replicating the exact APIs seems to me 
> like a lost opportunity to take advantage of Python's more expressive 
> syntax.

Sure, I agree with that completely.  I was thinking more of defining a
few classes from which to export these API's we've been discussing.

Bill

From janssen at parc.com  Fri Oct 24 17:59:11 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 17:59:39 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks 
In-Reply-To: Your message of "Fri, 24 Oct 2003 00:38:49 PDT."
	<3F98D709.9070806@sjsoft.com> 
Message-ID: <03Oct24.145919pdt."58611"@synergy1.parc.xerox.com>

> res = web.cgi.HTTPResponse(sys.stdout)
> res.content_type = 'text/html'
> res.set_cookie('name', 'Simon')
> res[
> res.send_headers()
> res.write('

How about something like:

result = web.cgi.HTTPResponse(request)
result.set_content_type('text/html')
result.set_cookie('name', 'Simon')
result.set_header('X-Additional-Header', 'Another header')
result.write('<html><h1>Hi there</h1>\n%s' % body)')
...

I would assume that the 'result' instance would be auto-buffered and
flushed when necessary (or when the flush() method is called, just as
with file objects).

Bill

From janssen at parc.com  Fri Oct 24 18:20:49 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 18:21:59 2003
Subject: [Web-SIG] Client-side API 
In-Reply-To: Your message of "Thu, 23 Oct 2003 22:40:29 PDT."
	<3F98BB4D.2020100@bath.ac.uk> 
Message-ID: <03Oct24.152055pdt."58611"@synergy1.parc.xerox.com>

> Ugh. One of the things I love about Python is that unlike Java it 
> doesn't force you to have horribly verbose interfaces with dozens of 
> different classes. A URL is a string, file-like-objects are 

I agree that the Java standard library is horrible, and mainly because
of all the different variations on different classes (plus the
horrible lack of (1) multiple inheritance, and (2) operator
overloading).  However, classes are a useful way to partition an API
-- I'm not ready to give up on them yet.

Bill

From janssen at parc.com  Fri Oct 24 18:23:54 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 18:24:13 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? 
In-Reply-To: Your message of "Fri, 24 Oct 2003 00:55:46 PDT."
	<3F98DB02.3010407@sjsoft.com> 
Message-ID: <03Oct24.152400pdt."58611"@synergy1.parc.xerox.com>

> The parsing doesn't have to be very intelligent or do validation, HTML 
> syntax is fairly simple.

Successfully parsing HTML is incredibly complex, because of the
variations in the various standards.

> I think that does belong in the standard library.

I agree, the ability should be there.  My sense is that the existing
XML packages do pretty well in handling both XHTML and HTML; the
missing pieces are the ancillary standards, like CSS and Javascript.

Bill

From janssen at parc.com  Fri Oct 24 18:27:59 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 18:28:23 2003
Subject: [Web-SIG] Form field dictionaries 
In-Reply-To: Your message of "Fri, 24 Oct 2003 08:25:29 PDT."
	<3F994469.20304@bath.ac.uk> 
Message-ID: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com>

First of all, let me say that I find FieldStorage so distasteful that
the first thing I do is wrap it in a dictionary.  Secondly, I don't
think there's a need for separate GET and POST dictionaries -- there's
only one kind of request at any one time, all you need is a REQUEST
dictionary.  Thirdly, the case where the same parameter is used more
than once is so rare (and well-known to the implementor of the server
script) that providing the value as a tuple in that case makes more
sense than anything else.

Bill

From janssen at parc.com  Fri Oct 24 18:29:54 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 18:30:25 2003
Subject: [Web-SIG] Re: Form field dictionaries 
In-Reply-To: Your message of "Fri, 24 Oct 2003 08:44:49 PDT."
	<1067010289.11634.378.camel@anthem> 
Message-ID: <03Oct24.153000pdt."58611"@synergy1.parc.xerox.com>

Cute idea, Barry.  It ties in with what I was thinking about for URLs,
which would also be "string" subclasses, but support methods (OK,
attributes) such as "scheme", "host", "port", etc.

Bill

From janssen at parc.com  Fri Oct 24 18:36:18 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 18:36:50 2003
Subject: [Web-SIG] [server-side] request/response objects 
In-Reply-To: Your message of "Fri, 24 Oct 2003 13:20:28 PDT."
	<20031024132028.C15765@lyra.org> 
Message-ID: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com>

> When you stop and think about it: *every* request object will have a
> matching response object. Why have two objects if they come in pairs? You
> will never see one without the other, and they are intrinsically tied to
> each other. So why separate them?
> 

Mainly because they are two separate concepts.  For instance, in my
code, I always pass two arguments; one is the response, which the user
manipulates to send back something to the caller, and the other is the
request, which is basically a dictionary of all parameter values, plus
a few extra special ones like 'path'.

Bill

From janssen at parc.com  Fri Oct 24 18:41:27 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 18:41:50 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for? 
In-Reply-To: Your message of "Fri, 24 Oct 2003 13:45:30 PDT."
	<20031024134530.F15765@lyra.org> 
Message-ID: <03Oct24.154130pdt."58611"@synergy1.parc.xerox.com>

> There has been an HTML parser in the standard library for *YEARS*. I don't
> think there is an action item here.

It's not a particularl *good* HTML parser, though.  It's just a simple
syntax framework.  It doesn't know about things like block elements,
which elements take IDs and which don't, etc.  When I was working on
the Plucker distiller (a web crawler and HTML parser), I had to add
oodles of code to it.

Looking at the documentation for 2.3, I see "class HTMLParser: This is
the basic HTML parser class. It supports all entity names required by
the HTML 2.0 specification (RFC 1866). It also defines handlers for
all HTML 2.0 and many HTML 3.0 and 3.2 elements."  We can do better
than that.  4.01, at least.

Bill


From barry at python.org  Fri Oct 24 19:09:47 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Oct 24 19:09:55 2003
Subject: [Web-SIG] Re: Form field dictionaries
In-Reply-To: <03Oct24.153000pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.153000pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <1067036987.10257.69.camel@anthem>

On Fri, 2003-10-24 at 18:29, Bill Janssen wrote:
> Cute idea, Barry.  It ties in with what I was thinking about for URLs,
> which would also be "string" subclasses, but support methods (OK,
> attributes) such as "scheme", "host", "port", etc.

+1!

I'm using objects of this style in several places in my Mailman3
experiments, and it's really really cool.

-Barry


From cs1spw at bath.ac.uk  Fri Oct 24 19:12:45 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 19:12:51 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <3F99B1ED.1090802@bath.ac.uk>

Bill Janssen wrote:
> Secondly, I don't think there's a need for separate GET and POST
> dictionaries -- there's only one kind of request at any one time, all
> you need is a REQUEST dictionary.

I'm a huge fan of being able to distinguish between that data from a 
query string (GET data) and data that has been POSTed. I posted my 
reasons for caring about this to the Quixote mailing list a few days 
ago, but I'll repeat them here through the magic of copy and paste:

1. By differentiating between the two the same 'key' can be used twice.
For example, a form submiting to a page called 'forms?id=1' can itself
include an id attribute in the POST data without over-riding the id in
the URL

2. My rule of thumb is "only modify data on a POST" - that way there's
no chance of someone bookmarking a URL that updates a database (for
example).

3. It is useful to be able to detect if a form has been submitted or
not. In PHP, I frequently check for POSTed data and display a form if
none is available, assume the form has been submitted if there is.

4. Security. While ensuring data has come from POST rather than GET
provides absolutely no security against a serious intruder, it does 
discourage amateurs from "hacking the URL" to see if they can cause any 
damage. Security through obscurity admitedly, but it adds a bit of extra 
peace of mind.

( From 
http://mail.mems-exchange.org/pipermail/quixote-users/2003-October/002013.html 
)

The 2nd point above is supported by this quote from the HTTP spec:

"""
In particular, the convention has been established that the GET and HEAD
methods SHOULD NOT have the significance of taking an action other than
retrieval. These methods ought to be considered "safe"
"""

http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1

If you don't know which bit of data came from GET and which came from 
POST you have no way of ensuring that only POSTed data changes the 
"state" of data on the server.

I accept that there is a great deal of convenience in only having to 
look in one place for data from both POST and GET, which is why I 
advocate a third dictionary (or dictionary like object) called something 
like REQUEST which combines the data from the other two.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From grisha at modpython.org  Fri Oct 24 19:36:01 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 24 19:36:10 2003
Subject: [Web-SIG] [server-side] request/response objects 
In-Reply-To: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <20031024192925.R71890@onyx.ispol.com>


For what it's worth, I never liked the request/response separation either.
I like a single object from which you can read() and to which you can
write(), just like a file. Imagine if for file IO you had to have an
object to read and another one to write?

(I would agree that perhaps "request" is a misnomer, but I can't think of
anything better)

On Fri, 24 Oct 2003, Bill Janssen wrote:

> > When you stop and think about it: *every* request object will have a
> > matching response object. Why have two objects if they come in pairs? You
> > will never see one without the other, and they are intrinsically tied to
> > each other. So why separate them?
> >
>
> Mainly because they are two separate concepts.  For instance, in my
> code, I always pass two arguments; one is the response, which the user
> manipulates to send back something to the caller, and the other is the
> request, which is basically a dictionary of all parameter values, plus
> a few extra special ones like 'path'.
>
> Bill
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/grisha%40modpython.org
>

From janssen at parc.com  Fri Oct 24 20:51:41 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 20:52:11 2003
Subject: [Web-SIG] Form field dictionaries 
In-Reply-To: Your message of "Fri, 24 Oct 2003 16:12:45 PDT."
	<3F99B1ED.1090802@bath.ac.uk> 
Message-ID: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com>

> I'm a huge fan of being able to distinguish between that data from a 
> query string (GET data) and data that has been POSTed. I posted my 
> reasons for caring about this to the Quixote mailing list a few days 
> ago, but I'll repeat them here through the magic of copy and paste:
> [...list of reasons you want to know the HTTP command omitted...]

The way to differentiate them (if you care) is to look at the
"command" attribute of the request object, IMO.  That would tell you
whether you were looking at data from GET, or POST, or HEAD, or
whatever.  I see no reason to pass the data differently, though.
Parameters are parameters.

(Of course, you can use query data with POST as well as with GET.)

Bill

From grisha at modpython.org  Fri Oct 24 21:59:46 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 24 21:59:50 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F99B1ED.1090802@bath.ac.uk>
References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com>
	<3F99B1ED.1090802@bath.ac.uk>
Message-ID: <20031024215125.A1810@onyx.ispol.com>


On Fri, 24 Oct 2003, Simon Willison wrote:

> The 2nd point above is supported by this quote from the HTTP spec:
>
> """
> In particular, the convention has been established that the GET and HEAD
> methods SHOULD NOT have the significance of taking an action other than
> retrieval. These methods ought to be considered "safe"
> """
>
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1

For everyone's amusement, here is last two out of the three paragraphs of
this section:

   In particular, the convention has been established that the GET and
   HEAD methods SHOULD NOT have the significance of taking an action
   other than retrieval. These methods ought to be considered "safe".
   This allows user agents to represent other methods, such as POST, PUT
   and DELETE, in a special way, so that the user is made aware of the
   fact that a possibly unsafe action is being requested.

   Naturally, it is not possible to ensure that the server does not
   generate side-effects as a result of performing a GET request; in
   fact, some dynamic resources consider that a feature. The important
   distinction here is that the user did not request the side-effects,
   so therefore cannot be held accountable for them.

At first I thought this was completely wacky and didn't belong in an RFC
at all. But having read it a couple of times, I'm thinking that they are
referring here to *browser implementations*, not web apps, so I don't
think it's relevant to our discussion.

Grisha

From grisha at modpython.org  Fri Oct 24 22:10:04 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 24 22:10:08 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F99B1ED.1090802@bath.ac.uk>
References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com>
	<3F99B1ED.1090802@bath.ac.uk>
Message-ID: <20031024220036.K1810@onyx.ispol.com>


On Fri, 24 Oct 2003, Simon Willison wrote:

> 2. My rule of thumb is "only modify data on a POST" - that way there's
> no chance of someone bookmarking a URL that updates a database (for
> example).

I get upset at web pages that refuse to cooperate when I submit things via
query strings.

I think a reliable way to avoid accidental updates is to rely on a session
mechanism; only modifying on POST only results in mild user annoyance
IMHO.

> 3. It is useful to be able to detect if a form has been submitted or
> not. In PHP, I frequently check for POSTed data and display a form if
> none is available, assume the form has been submitted if there is.

I don't like doing things like this because they rely on protocol
internals to drive application logic...

> 4. Security. While ensuring data has come from POST rather than GET
> provides absolutely no security against a serious intruder, it does
> discourage amateurs from "hacking the URL" to see if they can cause any
> damage. Security through obscurity admitedly, but it adds a bit of extra
> peace of mind.

Again, I don't agree; hackable URL's are a good thing! :-)

And it is, indeed, security by obscurity. If you have good data
validation, there should be no need for any obscurity.

Grisha

From ianb at colorstudy.com  Fri Oct 24 22:06:36 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 22:11:07 2003
Subject: [Web-SIG] More prior art, less experimentation
Message-ID: <DCFAD68C-068F-11D8-A49B-000393C2D67E@colorstudy.com>

I'm feeling a little uncomfortable with the way some of these 
suggestions are moving.  I feel like people are trying to make another 
framework, which I don't think is the appropriate goal for web-sig or 
for the standard library.  Python doesn't need another framework, and I 
don't think it's reasonable or particularly polite to try to trump the 
work that a lot of people have done over the years, using some sort of 
back door to perceived authority that web-sig might provide.

Nothing we're talking about is anything that hasn't been discussed 
before in the context of other projects.  Nothing we are considering 
implementing (at least server-side) is something that hasn't been 
implemented before.

We *do* have the opportunity to create something that can unify the 
Python web experience and provide the basis for more adoption of Python 
for web programming.  To do that we will have to repeat the work done 
many times before.  We should aspire to quality, but I think we need to 
hold ourselves back from aesthetic experimentation, and respect 
convention above our own preferences.  We can still indulge our own 
fancies outside of the standard library, and building on the standard 
library -- nothing we do should preclude your individual preferences 
toward web programming, but it should not preclude other people's 
preference either.  But most of all it should provide the foundation 
upon which the mature, *existing* frameworks can build.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From cs1spw at bath.ac.uk  Fri Oct 24 22:40:58 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 22:41:03 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <3F99E2BA.9030701@bath.ac.uk>

Bill Janssen wrote:
>>I'm a huge fan of being able to distinguish between that data from a 
>>query string (GET data) and data that has been POSTed. I posted my 
>>reasons for caring about this to the Quixote mailing list a few days 
>>ago, but I'll repeat them here through the magic of copy and paste:
>>[...list of reasons you want to know the HTTP command omitted...]
>  
> The way to differentiate them (if you care) is to look at the
> "command" attribute of the request object, IMO.  That would tell you
> whether you were looking at data from GET, or POST, or HEAD, or
> whatever.  I see no reason to pass the data differently, though.
> Parameters are parameters.

That doesn't work. The following is a perfectly valid form that sends 
data via GET and POST at the same time:

<form action="edit.cgi?id=5" method="post">
Name: <input type="text" name="name" value="Simon"><br>
Email: <input type="text" name="email" value="cs1spw@bath.ac.uk"><br>
<input type="submit">
</form>

I really don't understand why people are opposed to being able to tell 
the difference between GET and POST data. To me it seems like a basic 
requirement of any web library - but I'm obviously almost alone in 
thinking that.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From cs1spw at bath.ac.uk  Fri Oct 24 22:44:57 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 22:45:02 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031024215125.A1810@onyx.ispol.com>
References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com>
	<3F99B1ED.1090802@bath.ac.uk> <20031024215125.A1810@onyx.ispol.com>
Message-ID: <3F99E3A9.2010801@bath.ac.uk>

Gregory (Grisha) Trubetskoy wrote:
> For everyone's amusement, here is last two out of the three paragraphs of
> this section:
> 
>    In particular, the convention has been established that the GET and
>    HEAD methods SHOULD NOT have the significance of taking an action
>    other than retrieval. These methods ought to be considered "safe".
>    This allows user agents to represent other methods, such as POST, PUT
>    and DELETE, in a special way, so that the user is made aware of the
>    fact that a possibly unsafe action is being requested.
> 
>    Naturally, it is not possible to ensure that the server does not
>    generate side-effects as a result of performing a GET request; in
>    fact, some dynamic resources consider that a feature. The important
>    distinction here is that the user did not request the side-effects,
>    so therefore cannot be held accountable for them.
> 
> At first I thought this was completely wacky and didn't belong in an RFC
> at all. But having read it a couple of times, I'm thinking that they are
> referring here to *browser implementations*, not web apps, so I don't
> think it's relevant to our discussion.

I understand it to be a recommendation to developers of server side 
applications. It's saying "don't write apps that do something other than 
just blindly serve up content on a GET or HEAD" - in other words, only 
modify data stored on the server (the classic example being altering 
data in a database) in a POST or PUT request. Obviously this doesn't 
mean you shouldn't do anything dynamic on GETs, it just means that a 
user GETing a resource shouldn't result in a permanent change to the 
state maintained by the server.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From ianb at colorstudy.com  Fri Oct 24 23:01:41 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 24 23:02:13 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031024215125.A1810@onyx.ispol.com>
Message-ID: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 08:59 PM, Gregory (Grisha) Trubetskoy 
wrote:
[GET vs. POST semantics...]
> At first I thought this was completely wacky and didn't belong in an 
> RFC
> at all. But having read it a couple of times, I'm thinking that they 
> are
> referring here to *browser implementations*, not web apps, so I don't
> think it's relevant to our discussion.

It's very relevant to web applications, but not to the environment in 
which those applications are written.  It's not relevant to our work 
here.

In reference to the rest of the discussion -- I think it's enough to 
say that some people want to distinguish (sometimes) between these two 
types of variables.  Simon is not the only one.  It should be an 
option, because it's not hard to do.  We're not telling people how to 
write their applications, we're giving them the tools to write their 
applications as they choose, and this is a valid way to write an 
application.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From grisha at modpython.org  Fri Oct 24 23:20:19 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 24 23:20:22 2003
Subject: [Web-SIG] Moving forward
Message-ID: <20031024230301.B51905@onyx.ispol.com>


I get the feeling that so far most of what has been posted to this list
has been water under the bridge. I would be delighted to see this SIG
succeed, and to that end here is my proposal of a first baby step.

Problem:

The scope of this SIG seems so broad that I doubt any single member of
this list has a good grasp on more than half of what the SIG covers.
Frankly, it's not clear what the scope is.

Proposed Resolution:

Begin defining the scope in detail.

Once we agree on general categories, we can start playing with it by
looking at what presently exists in stdlib in each cateogry, what exists,
but not in stdlib, and what does not exist.

Once we have a clear scope, we can talk about a PEP.

I think we can start with the at least the following general categories:

o HTML Parsing and Generation
o XML Parsing and Generation
o An HTTP Server
o An HTTP Client
o SSL
o A general interface to an HTTP server for web apps
o PSP Standards and Implementation

Grisha

From janssen at parc.com  Fri Oct 24 23:21:52 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 23:22:26 2003
Subject: [Web-SIG] Form field dictionaries 
In-Reply-To: Your message of "Fri, 24 Oct 2003 20:01:41 PDT."
	<8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com> 
Message-ID: <03Oct24.202156pdt."58611"@synergy1.parc.xerox.com>

> In reference to the rest of the discussion -- I think it's enough to 
> say that some people want to distinguish (sometimes) between these two 
> types of variables.  Simon is not the only one.  It should be an 
> option, because it's not hard to do.  We're not telling people how to 
> write their applications, we're giving them the tools to write their 
> applications as they choose, and this is a valid way to write an 
> application.

+1.

Bill

From janssen at parc.com  Fri Oct 24 23:32:53 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 24 23:33:21 2003
Subject: [Web-SIG] So what's missing?
Message-ID: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>

Apropos Ian's comments today, I'd like to suggest that at this stage
we focus on what's missing, rather than on how to fix/change things.
What have you needed that isn't in the standard libraries?  Here's my
list:

Client-side:

* CSS parser.  I can't really do visual interpretation of Web pages
  without understanding their layout.

* post-multipart (both http and https).

* Asynchronous fetch.  When working over the Plucker distiller, which
  is a web crawler of sorts, I really wanted a higher-powered client
  side HTTP library.  In particular, I wanted to be able to start a
  fetch, go on to other things, and come back to the fetch
  periodically, checking to see whether there was data available.

* Connection caching.  Again, when pulling lots of pages from lots of
  sites, I want to be able to save the open connection to a host/port
  combo and re-use it, if the server doesn't kill the connection.
  There should be a pool of connections, with a user-settable limit,
  so that we don't run out of sockets/file-descriptors.

* Anything else I can do with cURL to an HTTP or HTTPS URL.

Server-side:

* Server-side SSL support in the socket module, and some interface to
  management of certificates/identities for SSL.  I want to build
  HTTPS servers with Python.

* Some kind of response object usable in CGI scripts.  This would make
  a few simple actions simple: write a response as a file (instead of
  using sys.stdout), return an error with a message, redirect to
  another URL, return a file.

* A standard server framework on the order of Medusa.  This should
  support a standalone Python web server, with the ability to serve
  files, and the ability to add new handlers.  Not sure it has to
  support CGI invocation.

What else are we missing?

Bill

From cs1spw at bath.ac.uk  Fri Oct 24 23:39:28 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 23:39:32 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <3F99F070.4080202@bath.ac.uk>

Bill Janssen wrote:
> What else are we missing?

An improved standard request object providing an interface to data sent 
from the client. This should include an interface that is designed for 
re-implementation by major web frameworks. This will essentially be an 
improved version of the CGI module.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From cs1spw at bath.ac.uk  Fri Oct 24 23:59:11 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 24 23:59:15 2003
Subject: [Web-SIG] CSS parsers: prior art
Message-ID: <3F99F50F.8020400@bath.ac.uk>

I just had a look around for prior art of libraries that handle CSS 
parsing for languages other than Python.

Perl has a number of interesting modules for handling CSS in CPAN:

CSS-Tiny
http://search.cpan.org/~adamk/CSS-Tiny-1.02/lib/CSS/Tiny.pm

This module provides a small, light weight object oriented style API for 
reading and writing CSS files "with as little code as possible". It 
seems like it would map to a nicely Pythonic simple module that makes 
use of operator overloading.

CSS 1.05
http://search.cpan.org/~iamcal/CSS-1.05/CSS.pm

This is	a large, object oriented library that appears to provide access 
to a variety of alternative parsers and formatters. The basic principle 
involves converting CSS declarations in to an object tree.

CSS-SAC
http://search.cpan.org/~rberjon/CSS-SAC-0.05/SAC.pm

This is an event based CSS parser modelled on the W3C's Simple API for 
CSS: http://www.w3.org/TR/SAC/

Also of interest is the W3C's DOM specification for styling:

http://www.w3.org/TR/DOM-Level-2-Style/

It seems we are spoilt for choice when it comes to picking an API to 
base a Python CSS module on.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From ianb at colorstudy.com  Sat Oct 25 01:03:20 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Oct 25 01:03:26 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <8DB31624-06A8-11D8-93B2-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 10:32 PM, Bill Janssen wrote:
> Client-side:

To client-side, I would add that authentication is too hard in urllib2, 
and only works for HTTP (for trivially reasons).  I think think 
urllib2's subclasses are unnecessarily complicated -- authentication 
handling could be put directly in the HTTP/HTTPS, both basic and 
digest.  Goes together with post/multipart, and I think these shouldn't 
be too hard to add.

There is also some talk about putting urllib2 and urlparse together, 
i.e., have a URL object.  The distinction between the urllib, urllib2, 
and urlparse libraries is not very good, e.g., urllib.quote (and 
friends) are more related to urlparse than urllib.  A URL object could 
unify all these.

Cookie handling also fits into this, but from the opposite direction 
from a URL object, since we are creating something of a user agent.  
You'd almost want to do:

ua = UserAgent()
url = web.URL('http://whatever.com')
content = ua.get(url)

Or something like that.  I think an explicit agent is called for, 
separate from the URLs that it may retrieve.  But only when you start 
considering cookies and caching.

If you want to take it a little further, WebDAV URLs support a bunch of 
other features.  Nice to at least keep the door open for that.

> Server-side:
>
> * Server-side SSL support in the socket module, and some interface to
>   management of certificates/identities for SSL.  I want to build
>   HTTPS servers with Python.
>
> * Some kind of response object usable in CGI scripts.  This would make
>   a few simple actions simple: write a response as a file (instead of
>   using sys.stdout), return an error with a message, redirect to
>   another URL, return a file.

I'd still really like to get a response and request object, first 
implemented for CGI but possible to target to other environments.  It's 
because I really want this that I didn't want us to get too 
experimental -- just a request and response object are very doable, and 
would be a real accomplishment.  But we can get off track with this.  
Good (standard) Python libraries aren't frameworks, they are 
straight-forward, well-documented interfaces, which is all I'm looking 
for here.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From moof at metamoof.net  Sat Oct 25 06:00:22 2003
From: moof at metamoof.net (Moof)
Date: Sat Oct 25 06:02:10 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <Pine.LNX.4.58.0310242012541.526@alice>
References: <200310240811.13760.thijs@fngtps.com>
	<3F98DB02.3010407@sjsoft.com>	<3F993C3C.5080402@metamoof.net>
	<Pine.LNX.4.58.0310242012541.526@alice>
Message-ID: <3F9A49B6.4050709@metamoof.net>

John J Lee wrote:

 > On Fri, 24 Oct 2003, Moof wrote:


 >> [moaning about unit testing with webware and cheetah]

 > Isn't it the fault of the framework you're using if it doesn't make unit
 > testing easy?


Quite so. However I'm appealing to say that whatever we develop here be 
reasonable to use when we're trying to simulate requests and responses 
for unit tests...

 > Still, I guess it's true that HTML parsing is a necessary
 > part of some unit tests (not only functional tests).


 >
 >> So a standard HTML parser would be nice, as well as keeping TDD in mind
 >> when we design request and response (and possibly session) objects.
 >
 >
 >
 > We already have an HTML parser (two, in fact).


Bill seems to be talking about this elsewhere.

 >> > The parsing doesn't have to be very intelligent or do validation,
 >> HTML > syntax is fairly simple.
 >> > I think that does belong in the standard library.
 >>
 >>
 >> Speaking of validation, a sort of standard form validation library would
 >> be nice: something to say "I'm expecting this value to be an int between
 >> 1-31" or "I'm expecting this to be a string with the following legal
 >> characters" and so on. It's not that difficult to write yourself, but I
 >> seem to find myself reinventing the wheel every time I do. A standard
 >> "best practice" way of doing this would be wonderful.
 >
 >
 >
 > I guess that would look similar to ClientForm?  If not, what?


Nonono, though that could be useful for some sort of automated testing, 
the thought hadn't even occurred.

When somebody fills in a form, the browser will return a string either 
as a GET or POST method with all the values filled in, which we seem to 
have decided elsewhere should be returned as a dictionary to the 
programmer. I want a nice standard way of saying field 'fromDay' in this 
dictionary should be an integer between 1 and 31. That 'fromMonth' 
should be an integer between 1 and 12. That 'username' should contain 
only ascii letters and numbers and this small amount of punctuation, and 
should be no shorter than x characters, and no longer than y characters 
which is what I set the length limit to on the form. That sort of thing. 
I should be able to get a list of errors that I can associate with the 
various fields, process them into intelligible sentences I can throw 
back at the user as errors on the page.

Yes, you can do this from javascript on the same page, but this is for 
people who either have javascript turned off, or aren't necessarily 
using my page as input as it's been hijacked form elsewhere, or 
someone's trying to be malicious.

Moof
-- 
            Giles Antonio Radford, a.k.a Moof
Sympathy, eupathy, and, currently, apathy coming to you at:
                 <http://metamoof.net/>


From jjl at pobox.com  Sat Oct 25 08:16:35 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 08:16:41 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <03Oct24.152400pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.152400pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <Pine.LNX.4.58.0310250225040.957@alice>

On Fri, 24 Oct 2003, Bill Janssen wrote:
[...HTML parsing...]
> > I think that does belong in the standard library.
>
> I agree, the ability should be there.  My sense is that the existing
> XML packages do pretty well in handling both XHTML and HTML; the
> missing pieces are the ancillary standards, like CSS and Javascript.

Again: what do CSS and JavaScript have to do with the standard library?


John

From jjl at pobox.com  Sat Oct 25 08:16:52 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 08:16:59 2003
Subject: [Web-SIG] Defining a standard interface for common web tasks
In-Reply-To: <20031024140644.H15765@lyra.org>
References: <3F98D709.9070806@sjsoft.com>
	<4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com>
	<20031024140644.H15765@lyra.org>
Message-ID: <Pine.LNX.4.58.0310250205170.957@alice>

On Fri, 24 Oct 2003, Greg Stein wrote:

> On Fri, Oct 24, 2003 at 10:54:01AM -0500, Ian Bicking wrote:
[...]
> > # res.headers['X-additional-header'] = 'Another header' might be okay
> > # but it makes it difficult to add multiple headers by the same name --
> > but
> > # I don't know if HTTP ever really calls for that anyway.
>
> HTTP specifically discusses what happens when you see two headers with the
> same name:
>
>     Some-Header: foo
>     Some-Header: bar
>
> is equivalent to:
>
>     Some-Header: foo, bar
>
> i.e. concatenate with a comma. While it is allowed, there is *generally*
> no reason for the API to enable writing separate headers, nor a reason to
> expose same-named headers as separate (i.e. just concatenate them
> internally).
>
> Note that I say "generally" because I've seen a client that could not deal
> properly with a long header value. By separating the tokens in the header
[...]

Another thing that breaks this is the Cookie header: cookie values may
contain commas (and they do!).  Of course, this may not be relevant here,
since Python programmers aren't going to be so silly as to put commas in
their cookie values :-)


John

From jjl at pobox.com  Sat Oct 25 08:38:55 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 08:39:00 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <8DB31624-06A8-11D8-93B2-000393C2D67E@colorstudy.com>
References: <8DB31624-06A8-11D8-93B2-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310251317190.464@alice>

On Sat, 25 Oct 2003, Ian Bicking wrote:

> On Friday, October 24, 2003, at 10:32 PM, Bill Janssen wrote:
> > Client-side:
>
> To client-side, I would add that authentication is too hard in urllib2,
> and only works for HTTP (for trivially reasons).  I think think
> urllib2's subclasses are unnecessarily complicated -- authentication
> handling could be put directly in the HTTP/HTTPS, both basic and
> digest.

It's a minor issue, but it seems nicer to me to have authentication
separate if it can easily be separate -- that fits in with the general
philosophy of urllib2 that you pick 'n mix the features you want.  What
are the trivial reasons for it breaking on non-HTTP auth?


> Goes together with post/multipart, and I think these shouldn't
> be too hard to add.

How does this go together with post/multipart?  Do you just mean that
you're likely to post the multipart data using urllib2.urlopen?


> There is also some talk about putting urllib2 and urlparse together,
> i.e., have a URL object.  The distinction between the urllib, urllib2,
> and urlparse libraries is not very good, e.g., urllib.quote (and
> friends) are more related to urlparse than urllib.  A URL object could
> unify all these.

It's an appealing idea, especially given the cuteness of string
subclassing ;-)


> Cookie handling also fits into this, but from the opposite direction
> from a URL object, since we are creating something of a user agent.
> You'd almost want to do:
>
> ua = UserAgent()
> url = web.URL('http://whatever.com')
> content = ua.get(url)
>
> Or something like that.  I think an explicit agent is called for,
> separate from the URLs that it may retrieve.  But only when you start
> considering cookies and caching.
[...]

Are you suggesting replacing urllib2, building on top of it, or extending
it?  urllib2's handlers already gets a lot of the 'user-agent' job done.
What requirements does caching impose that urllib2 doesn't meet?  There's
already a CacheFTPHandler.


John

From jjl at pobox.com  Sat Oct 25 08:51:05 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 08:52:01 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <Pine.LNX.4.58.0310251344480.464@alice>

On Fri, 24 Oct 2003, Bill Janssen wrote:

[...]
> * CSS parser.  I can't really do visual interpretation of Web pages
>   without understanding their layout.

Does anybody other than Bill want this?


> * post-multipart (both http and https).

Everybody is agreed this is needed.


> * Asynchronous fetch.  When working over the Plucker distiller, which
[...]

Nice, but not easy.  Would it not introduce a lot of new code?  There used
to be asynchttp and asyncurl libraries, I think, built on top of asyncore.
First (obviously) somebody would need to actually put the work in here.
Second, would it be possible to do this without a lot of code duplication
between the current urllib{2,} / httplib libraries and the new stuff?  Is
it worth it, when you can use threads instead?


> * Connection caching.  Again, when pulling lots of pages from lots of
[...]

That would be nice.  Are you volunteering?


John

From t.vandervossen at fngtps.com  Sat Oct 25 08:54:20 2003
From: t.vandervossen at fngtps.com (Thijs van der Vossen)
Date: Sat Oct 25 08:54:24 2003
Subject: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <Pine.LNX.4.58.0310241936540.526@alice>
Message-ID: <59B56DA0-06EA-11D8-8AA9-000393678182@fngtps.com>

On vrijdag, okt 24, 2003, at 21:12 Europe/Amsterdam, John J Lee wrote:
> On Fri, 24 Oct 2003, David Fraser wrote:
>> Thijs van der Vossen wrote:
> [...]
>> Actually HTML parsing would be fantastic for testing web applications,
>> so maybe that could be related to the Web API.

Sorry, but _I_ never said _that_.

Regards,
Thijs

--
Fingertips __ www.fngtps.com __ +31.(0)20.4896540


From jjl at pobox.com  Sat Oct 25 09:00:03 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 09:00:10 2003
Subject: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <59B56DA0-06EA-11D8-8AA9-000393678182@fngtps.com>
References: <59B56DA0-06EA-11D8-8AA9-000393678182@fngtps.com>
Message-ID: <Pine.LNX.4.58.0310251356370.464@alice>

On Sat, 25 Oct 2003, Thijs van der Vossen wrote:

> On vrijdag, okt 24, 2003, at 21:12 Europe/Amsterdam, John J Lee wrote:
> > On Fri, 24 Oct 2003, David Fraser wrote:
> >> Thijs van der Vossen wrote:
> > [...]
> >> Actually HTML parsing would be fantastic for testing web applications,
> >> so maybe that could be related to the Web API.
>
> Sorry, but _I_ never said _that_.

No, David did, as my quoting indicates.  I should have removed the
reference to you, though -- sorry.


John

From t.vandervossen at fngtps.com  Sat Oct 25 09:10:02 2003
From: t.vandervossen at fngtps.com (Thijs van der Vossen)
Date: Sat Oct 25 09:10:47 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310251344480.464@alice>
Message-ID: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com>

On zaterdag, okt 25, 2003, at 14:51 Europe/Amsterdam, John J Lee wrote:
>> * Asynchronous fetch.  When working over the Plucker distiller, which
> [...]
>
> Nice, but not easy.  Would it not introduce a lot of new code?  There 
> used
> to be asynchttp and asyncurl libraries, I think, built on top of 
> asyncore.
> First (obviously) somebody would need to actually put the work in here.
> Second, would it be possible to do this without a lot of code 
> duplication
> between the current urllib{2,} / httplib libraries and the new stuff?  
> Is
> it worth it, when you can use threads instead?

This is already trivial with the asyncore libraries. If I remember 
correctly there is a nice example of this in Steve's 'Python Web 
Programming', but you might also want to take a look at 
http://python.org/doc/current/lib/asyncore-example.html

Regards,
Thijs

--
Fingertips __ www.fngtps.com __ +31.(0)20.4896540


From ianb at colorstudy.com  Sat Oct 25 14:25:25 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Oct 25 14:25:30 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310251317190.464@alice>
Message-ID: <99FE7110-0718-11D8-93B2-000393C2D67E@colorstudy.com>

On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote:
> On Sat, 25 Oct 2003, Ian Bicking wrote:
>> To client-side, I would add that authentication is too hard in 
>> urllib2,
>> and only works for HTTP (for trivially reasons).  I think think
>> urllib2's subclasses are unnecessarily complicated -- authentication
>> handling could be put directly in the HTTP/HTTPS, both basic and
>> digest.
>
> It's a minor issue, but it seems nicer to me to have authentication
> separate if it can easily be separate -- that fits in with the general
> philosophy of urllib2 that you pick 'n mix the features you want.  What
> are the trivial reasons for it breaking on non-HTTP auth?

There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and 
though the two concepts are orthogonal they are still tied into each 
other.  Another option would be to take HTTPS out of the class 
hierarchy, and make SSL a feature of HTTPHandler (and maybe the other 
handlers too, FTP/SSL does exist after all).

The AuthHandlers are a little annoying too, you can't just give them a 
username/password.  You have to give them some manager object that can 
be queried for a password for a username/realm/URL.  This is a nice 
option to have, but in most cases you don't need that kind of 
generality, and it makes it a lot harder to understand what you need to 
do.  username=x, password=y are very easy to understand.

>> Goes together with post/multipart, and I think these shouldn't
>> be too hard to add.
>
> How does this go together with post/multipart?  Do you just mean that
> you're likely to post the multipart data using urllib2.urlopen?

Yes, that's what I mean -- same code involved.

>
>> There is also some talk about putting urllib2 and urlparse together,
>> i.e., have a URL object.  The distinction between the urllib, urllib2,
>> and urlparse libraries is not very good, e.g., urllib.quote (and
>> friends) are more related to urlparse than urllib.  A URL object could
>> unify all these.
>
> It's an appealing idea, especially given the cuteness of string
> subclassing ;-)
>
>
>> Cookie handling also fits into this, but from the opposite direction
>> from a URL object, since we are creating something of a user agent.
>> You'd almost want to do:
>>
>> ua = UserAgent()
>> url = web.URL('http://whatever.com')
>> content = ua.get(url)
>>
>> Or something like that.  I think an explicit agent is called for,
>> separate from the URLs that it may retrieve.  But only when you start
>> considering cookies and caching.
> [...]
>
> Are you suggesting replacing urllib2, building on top of it, or 
> extending
> it?  urllib2's handlers already gets a lot of the 'user-agent' job 
> done.
> What requirements does caching impose that urllib2 doesn't meet?  
> There's
> already a CacheFTPHandler.

I think a URL class would probably building on top of urllib2, but 
would also need some more features.  And obviously urllib2 can't go 
anywhere, so we might as well use it.

The caching in CacheFTPHandler is connection caching, not result 
caching.  HTTP has a wide array of ways to indicate caching, check for 
updates, etc.  Enough that it becomes kind of complicated, which is why 
I don't think that fits well into the idea of a URL object (which 
should be quite simple, at least from the outside).


--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From jjl at pobox.com  Sat Oct 25 16:23:29 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 16:23:55 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com>
References: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com>
Message-ID: <Pine.LNX.4.58.0310252119540.1870@alice>

On Sat, 25 Oct 2003, Thijs van der Vossen wrote:
[...]
> > > * Asynchronous fetch.  When working over the Plucker distiller,
[...]
> > Second, would it be possible to do this without a lot of code
> > duplication between the current urllib{2,} / httplib libraries and the
> > new stuff?  Is it worth it, when you can use threads instead?
>
> This is already trivial with the asyncore libraries. If I remember
[...]

So what is this for?

http://asynchttp.sourceforge.net/


28k of Python code isn't exactly 'trivial', is it?


John

From jjl at pobox.com  Sat Oct 25 16:54:06 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 16:54:33 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <99FE7110-0718-11D8-93B2-000393C2D67E@colorstudy.com>
References: <99FE7110-0718-11D8-93B2-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310252126560.1870@alice>

On Sat, 25 Oct 2003, Ian Bicking wrote:

> On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote:
[...]
> > It's a minor issue, but it seems nicer to me to have authentication
> > separate if it can easily be separate -- that fits in with the general
> > philosophy of urllib2 that you pick 'n mix the features you want.  What
> > are the trivial reasons for it breaking on non-HTTP auth?
>
> There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and
> though the two concepts are orthogonal they are still tied into each
> other.  Another option would be to take HTTPS out of the class
> hierarchy, and make SSL a feature of HTTPHandler (and maybe the other

Well, that would break code.  And adding an HTTPSBasicAuthHandler is only
five lines or so (even less if you want a class that handles both HTTP and
HTTPS).


[...]
> The AuthHandlers are a little annoying too, you can't just give them a
> username/password.  You have to give them some manager object that can
> be queried for a password for a username/realm/URL.  This is a nice
> option to have, but in most cases you don't need that kind of
> generality, and it makes it a lot harder to understand what you need to
> do.  username=x, password=y are very easy to understand.

That's just a documentation issue, I think -- and possibly adding some
convenience method.  I wrote some docs for this, and I keep asking for
people who seem to be actually using these features to check this
documentation bug, but nobody has yet:

http://www.python.org/sf/798244


You don't have to provide a password manager object in fact: just let the
HTTPBasicAuthHandler create one for you, and use the add_password method
(which admittedly does require realm and uri as well as username /
password -- perhaps None should act as a wildcard there?).


> >> Cookie handling also fits into this, but from the opposite direction
> >> from a URL object, since we are creating something of a user agent.
> >> You'd almost want to do:
> >>
> >> ua = UserAgent()
> >> url = web.URL('http://whatever.com')
> >> content = ua.get(url)
> >>
> >> Or something like that.  I think an explicit agent is called for,
> >> separate from the URLs that it may retrieve.  But only when you start
> >> considering cookies and caching.
> > [...]
> >
> > Are you suggesting replacing urllib2, building on top of it, or
> > extending it?  urllib2's handlers already gets a lot of the
> > 'user-agent' job done.  What requirements does caching impose that
> > urllib2 doesn't meet?  There's already a CacheFTPHandler.
>
> I think a URL class would probably building on top of urllib2, but
> would also need some more features.  And obviously urllib2 can't go
> anywhere, so we might as well use it.

OK.  Does this URL class proposal fit with that path module PEP, do you
think?  Somebody mentioned that PEP (it was a PEP, wasn't it...?) before,
but I've forgotten everything about it :-)


> The caching in CacheFTPHandler is connection caching, not result

OK.


> caching.  HTTP has a wide array of ways to indicate caching, check for
> updates, etc.  Enough that it becomes kind of complicated, which is why
> I don't think that fits well into the idea of a URL object (which
> should be quite simple, at least from the outside).

That doesn't answer my question.  To repeat: What requirements does
caching impose that *urllib2* doesn't meet?  And why do we need a new
UserAgent class when we already have urllib2 and its handlers?


John

From jjl at pobox.com  Sat Oct 25 17:05:18 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 17:07:05 2003
Subject: Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
In-Reply-To: <3F9A49B6.4050709@metamoof.net>
References: <200310240811.13760.thijs@fngtps.com> <3F98DB02.3010407@sjsoft.com>
	<3F993C3C.5080402@metamoof.net> <Pine.LNX.4.58.0310242012541.526@alice>
	<3F9A49B6.4050709@metamoof.net>
Message-ID: <Pine.LNX.4.58.0310252202500.1870@alice>

On Sat, 25 Oct 2003, Moof wrote:
> John J Lee wrote:
[...]
>  >> Speaking of validation, a sort of standard form validation library would
[...]
>  > I guess that would look similar to ClientForm?  If not, what?
>
>
> Nonono, though that could be useful for some sort of automated testing,
[...]
> programmer. I want a nice standard way of saying field 'fromDay' in this
> dictionary should be an integer between 1 and 31. That 'fromMonth'
[...]

Oh, I see -- I was fooled by the close proximity of the discussion of HTML
parsing.  I agree that would be useful.


John

From ianb at colorstudy.com  Sat Oct 25 17:41:36 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Oct 25 17:42:11 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310252126560.1870@alice>
Message-ID: <028299D8-0734-11D8-93B2-000393C2D67E@colorstudy.com>

On Saturday, October 25, 2003, at 03:54 PM, John J Lee wrote:
> On Sat, 25 Oct 2003, Ian Bicking wrote:
>
>> On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote:
> [...]
>>> It's a minor issue, but it seems nicer to me to have authentication
>>> separate if it can easily be separate -- that fits in with the  
>>> general
>>> philosophy of urllib2 that you pick 'n mix the features you want.   
>>> What
>>> are the trivial reasons for it breaking on non-HTTP auth?
>>
>> There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and
>> though the two concepts are orthogonal they are still tied into each
>> other.  Another option would be to take HTTPS out of the class
>> hierarchy, and make SSL a feature of HTTPHandler (and maybe the other
>
> Well, that would break code.  And adding an HTTPSBasicAuthHandler is  
> only
> five lines or so (even less if you want a class that handles both HTTP  
> and
> HTTPS).

All the handlers start getting in the way.  If we added authentication  
support to HTTPHandler, it the other classes could still be left in  
there.  Authentication is part of HTTP, after all -- and the  
distinction between basic and digest auth doesn't seem necessary  
(implemented differently, but you shouldn't need to know which one  
you're going to need).  It seems like HTTPHandler could do what  
HTTPBasicAuthHandler (and DigestAuthHandler) do if it is given a  
password manager.  And that it could even create a password manager if  
it was given a username and password, or now, but then the password  
manager should accept a username and password in __init__ so that you  
don't have to do multiple sets to set that up.

In general, I just don't feel like there needs to be quite so many  
handlers in urllib2.  One featureful HTTP implementation would be  
easier to work with (and, I think, easier to extend).

> [...]
>> The AuthHandlers are a little annoying too, you can't just give them a
>> username/password.  You have to give them some manager object that can
>> be queried for a password for a username/realm/URL.  This is a nice
>> option to have, but in most cases you don't need that kind of
>> generality, and it makes it a lot harder to understand what you need  
>> to
>> do.  username=x, password=y are very easy to understand.
>
> That's just a documentation issue, I think -- and possibly adding some
> convenience method.  I wrote some docs for this, and I keep asking for
> people who seem to be actually using these features to check this
> documentation bug, but nobody has yet:
>
> http://www.python.org/sf/798244
>
>
> You don't have to provide a password manager object in fact: just let  
> the
> HTTPBasicAuthHandler create one for you, and use the add_password  
> method
> (which admittedly does require realm and uri as well as username /
> password -- perhaps None should act as a wildcard there?).

Yes, a wildcard could definitely be good.  This is particularly  
important with scripts, i.e., one-off programs where you just want to  
grab something from a URL.

>>>> Cookie handling also fits into this, but from the opposite direction
>>>> from a URL object, since we are creating something of a user agent.
>>>> You'd almost want to do:
>>>>
>>>> ua = UserAgent()
>>>> url = web.URL('http://whatever.com')
>>>> content = ua.get(url)
>>>>
>>>> Or something like that.  I think an explicit agent is called for,
>>>> separate from the URLs that it may retrieve.  But only when you  
>>>> start
>>>> considering cookies and caching.
>>> [...]
>>>
>>> Are you suggesting replacing urllib2, building on top of it, or
>>> extending it?  urllib2's handlers already gets a lot of the
>>> 'user-agent' job done.  What requirements does caching impose that
>>> urllib2 doesn't meet?  There's already a CacheFTPHandler.
>>
>> I think a URL class would probably building on top of urllib2, but
>> would also need some more features.  And obviously urllib2 can't go
>> anywhere, so we might as well use it.
>
> OK.  Does this URL class proposal fit with that path module PEP, do you
> think?  Somebody mentioned that PEP (it was a PEP, wasn't it...?)  
> before,
> but I've forgotten everything about it :-)

No, there's no PEP, for this or for a filesystem path object.  These  
were the links from the other email:

http://www.jorendorff.com/articles/python/path/

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF- 
8&threadm=mailman.1057651032.22842.python-list%40python.org

>> The caching in CacheFTPHandler is connection caching, not result
>
> OK.
>
>
>> caching.  HTTP has a wide array of ways to indicate caching, check for
>> updates, etc.  Enough that it becomes kind of complicated, which is  
>> why
>> I don't think that fits well into the idea of a URL object (which
>> should be quite simple, at least from the outside).
>
> That doesn't answer my question.  To repeat: What requirements does
> caching impose that *urllib2* doesn't meet?  And why do we need a new
> UserAgent class when we already have urllib2 and its handlers?

All the normal HTTP caching, like If-Modified-Since and E-Tags.  If you  
handle this, you have to store the retrieved results, handle the  
metadata for those results, and provide control (where to put the  
cache, when and how to expire it, what items are in the cache, flush  
the cache, maybe a memory cache, etc).  That could be done in a  
handler, but it feels like a separate object to me (an object which  
might still go in urllib2).

But looking back on what Bill was asking for, I think he was thinking  
more along the lines of connection caching, like CacheFTPHandler, and  
that would probably go in a handler.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From jjl at pobox.com  Sat Oct 25 20:12:09 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 20:12:33 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <028299D8-0734-11D8-93B2-000393C2D67E@colorstudy.com>
References: <028299D8-0734-11D8-93B2-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310260050410.2248@alice>

On Sat, 25 Oct 2003, Ian Bicking wrote:
[...]
> In general, I just don't feel like there needs to be quite so many
> handlers in urllib2.  One featureful HTTP implementation would be
> easier to work with (and, I think, easier to extend).

Well, that was a large part of the purpose of urllib2 -- to let you choose
what 'clever' stuff it does.  If you don't want something, you just don't
use that handler.  More importantly, if you want to do something slightly
differently, you supply your own handler.

If you shift stuff from an auth handler into the HTTP{S,}Handler, anybody
out there who's written their own auth handler will have their auth code
suddenly stop being invoked by urllib2.  Whatever special authorization
they were doing (maybe just reading from a database, maybe fixing a bug,
real or imagined, in urllib2) will stop happening, and their code will
probably break.

Anyway, it may or may not be the perfect system, but I'm not convinced it
needs changing.  Can you give a specific example of where having lots of
handlers becomes oppressive?


[...about inconvenience of having to provide realm and URI for auth...]
> Yes, a wildcard could definitely be good.  This is particularly
> important with scripts, i.e., one-off programs where you just want to
> grab something from a URL.

OK.  Do we have a document where we're recording these proposals?  Is
there a wiki?

[...]
> > OK.  Does this URL class proposal fit with that path module PEP, do you
> > think?  Somebody mentioned that PEP (it was a PEP, wasn't it...?)
> > before,
> > but I've forgotten everything about it :-)
>
> No, there's no PEP, for this or for a filesystem path object.  These
> were the links from the other email:
>
> http://www.jorendorff.com/articles/python/path/
>
> http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-
> 8&threadm=mailman.1057651032.22842.python-list%40python.org

Thanks.  Again, is there somewhere to record this URL class idea and the
fact that this path module is related?

[...]
> > That doesn't answer my question.  To repeat: What requirements does
> > caching impose that *urllib2* doesn't meet?  And why do we need a new
> > UserAgent class when we already have urllib2 and its handlers?
>
> All the normal HTTP caching, like If-Modified-Since and E-Tags.  If you
> handle this, you have to store the retrieved results, handle the
> metadata for those results, and provide control (where to put the
> cache, when and how to expire it, what items are in the cache, flush
> the cache, maybe a memory cache, etc).  That could be done in a
> handler, but it feels like a separate object to me (an object which
> might still go in urllib2).

So, merely because you think "it feels like a new object", you're
proposing to create a whole new layer of complexity for users to learn?
Why should people have to learn a new API just to get caching?  If
somebody had implemented HTTP caching and found the handler mechanism
lacking, or had a specific argument that showed it to be so, a new layer
*might* be justified.  Otherwise, I think it's a bad idea.


> But looking back on what Bill was asking for, I think he was thinking
> more along the lines of connection caching, like CacheFTPHandler, and
> that would probably go in a handler.

Yep.


John

From jjl at pobox.com  Sat Oct 25 20:18:23 2003
From: jjl at pobox.com (John J Lee)
Date: Sat Oct 25 20:18:44 2003
Subject: [Web-SIG] Threading and client-side support
Message-ID: <Pine.LNX.4.58.0310260112380.2248@alice>

First, I should state that I'm almost entirely ignorant of all things
threads.  Be gentle with me.

What is the current state of thread-safety in the Python standard library
client-side web code (ie. httplib, urllib, urllib2)?

I ask because my cookies code is currently entirely thread-ignorant, and
I'm wondering if it should have appropriate thread synchronization -- and
if so, what problems I'm supposed to be preventing, and how to prevent
them.


John

From ianb at colorstudy.com  Sat Oct 25 21:00:39 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Oct 25 21:01:24 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310260050410.2248@alice>
Message-ID: <D0CC7930-074F-11D8-93B2-000393C2D67E@colorstudy.com>

On Saturday, October 25, 2003, at 07:12 PM, John J Lee wrote:
> On Sat, 25 Oct 2003, Ian Bicking wrote:
> [...]
>> In general, I just don't feel like there needs to be quite so many
>> handlers in urllib2.  One featureful HTTP implementation would be
>> easier to work with (and, I think, easier to extend).
>
> Well, that was a large part of the purpose of urllib2 -- to let you 
> choose
> what 'clever' stuff it does.  If you don't want something, you just 
> don't
> use that handler.  More importantly, if you want to do something 
> slightly
> differently, you supply your own handler.
>
> If you shift stuff from an auth handler into the HTTP{S,}Handler, 
> anybody
> out there who's written their own auth handler will have their auth 
> code
> suddenly stop being invoked by urllib2.  Whatever special authorization
> they were doing (maybe just reading from a database, maybe fixing a 
> bug,
> real or imagined, in urllib2) will stop happening, and their code will
> probably break.

a) There's not a lot of different ways to deal with a 401 response.  Is 
there something that's not covered by basic and digest authentication?
b) Accessing a database should happen in the password manager, not the 
handler.  The handler handles the protocol, the database is not tied to 
the protocol.  I'm not proposing that the password manager go away 
(though it would be nice if it was hidden for simple usage)
c) This doesn't have to effect backward compatibility anyway.  We can 
leave HTTPBasicAuthHandler in there (deprecated), but also fold it's 
functionality into HTTPHandler.  HTTPBasicAuthHandler doesn't require 
that HTTPHandler *not* handle authentication.

> Anyway, it may or may not be the perfect system, but I'm not convinced 
> it
> needs changing.  Can you give a specific example of where having lots 
> of
> handlers becomes oppressive?

The documentation is certainly a problem (e.g., the 
HTTPBasicAuthHandler page), though it could be organized differently 
without changing the code.  It's definitely ravioli code 
(http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO 
it's hard to document ravioli code well.  (It's not so important how 
things are structured internally, but currently urllib2 also exposes 
that complex class structure)

Also urlopen is not really extensible.  You can't tell urlopen to use 
authentication information (and it doesn't obey browser URL 
conventions, like http://user:password@domain/).  And we want to add 
structured POST data to that method (but also allow non-structured 
data), and cookies, and it might be nice to set the user-agent, and 
maybe other things that I haven't thought of.  If urlopen doesn't 
support these extra features then programmers have to learn a new API 
as their program becomes more complex.  Yet none of these features 
would be all that difficult to add via urlopen or perhaps other simple 
functions, (instead of via classes).  I don't think there's any need 
for classes in the external API -- fetching URLs is about doing things, 
not representing things, and functions are easier to understand for 
doing.

> [...about inconvenience of having to provide realm and URI for auth...]
>> Yes, a wildcard could definitely be good.  This is particularly
>> important with scripts, i.e., one-off programs where you just want to
>> grab something from a URL.
>
> OK.  Do we have a document where we're recording these proposals?  Is
> there a wiki?

No, we don't have anything.  Should we use the main Python Wiki?  
Something else?  Opinions?

[...]
>>> That doesn't answer my question.  To repeat: What requirements does
>>> caching impose that *urllib2* doesn't meet?  And why do we need a new
>>> UserAgent class when we already have urllib2 and its handlers?
>>
>> All the normal HTTP caching, like If-Modified-Since and E-Tags.  If 
>> you
>> handle this, you have to store the retrieved results, handle the
>> metadata for those results, and provide control (where to put the
>> cache, when and how to expire it, what items are in the cache, flush
>> the cache, maybe a memory cache, etc).  That could be done in a
>> handler, but it feels like a separate object to me (an object which
>> might still go in urllib2).
>
> So, merely because you think "it feels like a new object", you're
> proposing to create a whole new layer of complexity for users to learn?
> Why should people have to learn a new API just to get caching?  If
> somebody had implemented HTTP caching and found the handler mechanism
> lacking, or had a specific argument that showed it to be so, a new 
> layer
> *might* be justified.  Otherwise, I think it's a bad idea.

I think fetching and caching are two separate things.  The caching 
requires a context.  The fetching doesn't.  I think fetching things 
should be simplified, with an API that's not very object-oriented.  
Since a cache is persistent it has to have a persistent representation, 
so it needs to be some sort of object.

I also don't see how caching would fit very well into the handler 
structure.  Maybe there'd be a HTTPCachingHandler, and you'd 
instantiate it with your caching policy? (where it stores files, how 
many files, etc)  Also a HTTPBasicAuthCachingHandler, 
HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on?  This 
caching is orthogonal -- not just to things like authentication, but 
even to HTTP (to some degree).  The handler structure doesn't allow 
orthogonal features.  Except through mixins, but don't get me started 
on mixins...

Using a separate class, not related to Handlers, isn't more complex.  
Either way we have to provide the same features and the same options, 
and document all of those.  No matter which way you cut it, it's new 
stuff, it's another layer.  Implementing it in a new class is just 
calling it what it is.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From jjl at pobox.com  Sun Oct 26 08:24:39 2003
From: jjl at pobox.com (John J Lee)
Date: Sun Oct 26 08:24:47 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <D0CC7930-074F-11D8-93B2-000393C2D67E@colorstudy.com>
References: <D0CC7930-074F-11D8-93B2-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310260215200.2379@alice>

On Sat, 25 Oct 2003, Ian Bicking wrote:
> On Saturday, October 25, 2003, at 07:12 PM, John J Lee wrote:
[...]
> a) There's not a lot of different ways to deal with a 401 response.  Is
> there something that's not covered by basic and digest authentication?

You may have a point.


> b) Accessing a database should happen in the password manager, not the
> handler.  The handler handles the protocol, the database is not tied to
> the protocol.  I'm not proposing that the password manager go away
> (though it would be nice if it was hidden for simple usage)

OK, and another one.  :-)


> c) This doesn't have to effect backward compatibility anyway.  We can
> leave HTTPBasicAuthHandler in there (deprecated), but also fold it's
> functionality into HTTPHandler.  HTTPBasicAuthHandler doesn't require
> that HTTPHandler *not* handle authentication.

Well, it does if you do something important in your auth handler that
never gets called because HTTPHandler has decided it knows best when it
comes to 40x.  But like you say, there's probably not much important that
you could do since password management is already abstracted out.

I *still* don't see why you're complaining about the current state of
affairs, though.


> > Anyway, it may or may not be the perfect system, but I'm not convinced
> > it needs changing.  Can you give a specific example of where having lots
> > of handlers becomes oppressive?
>
> The documentation is certainly a problem (e.g., the
> HTTPBasicAuthHandler page), though it could be organized differently
> without changing the code.  It's definitely ravioli code
> (http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO
> it's hard to document ravioli code well.  (It's not so important how
> things are structured internally, but currently urllib2 also exposes
> that complex class structure)

It's pretty simple conceptually: OpenerDirector asks all the handlers if
they want to handle, not handle, or abort a response.  It does the same
for errors.  Most of the handlers' functions are self-explanatory from
their class names (OK, I guessed CacheFTPHandler wrong, but it was 50-50
:-).  I wouldn't call that ravioli.

I'm still waiting for that example.


> Also urlopen is not really extensible.  You can't tell urlopen to use

Not directly, no.  You have to do it via build_opener, or via
OpenerDirector itself (or another class.  That's probably not ideal: what
did you have in mind instead?


> authentication information (and it doesn't obey browser URL
> conventions, like http://user:password@domain/).

What is that convention?  Is it standardised in an RFC?  I see
ProxyHandler knows about that syntax.  Obviously it's not an intrinsic
limitation of the handler system.


> And we want to add
> structured POST data to that method (but also allow non-structured

We do?  Why not just have a function (to make file upload data, assuming
that's what you're thinking of)?


> data), and cookies, and it might be nice to set the user-agent, and
> maybe other things that I haven't thought of.  If urlopen doesn't
> support these extra features then programmers have to learn a new API
> as their program becomes more complex.

Well, I can do those things already (cookies, set user-agent) using
urllib2.  User-Agent is a bit ugly, I'll grant you, but I don't lose sleep
over it.  I did find an extension (backwards-compatible, I hope & believe)
made things much cleaner -- see the RFE I mentioned earlier.  But no need
for a whole new layer.

Mind you, if your idea can do the same job as my RFE, then it should
certainly be considered alongside that.


> Yet none of these features
> would be all that difficult to add via urlopen or perhaps other simple
> functions, (instead of via classes).  I don't think there's any need
> for classes in the external API -- fetching URLs is about doing things,
> not representing things, and functions are easier to understand for
> doing.

Details?  The only example you've given so far involved a UserAgent class.

[...]
> > So, merely because you think "it feels like a new object", you're
> > proposing to create a whole new layer of complexity for users to learn?
> > Why should people have to learn a new API just to get caching?  If
> > somebody had implemented HTTP caching and found the handler mechanism
> > lacking, or had a specific argument that showed it to be so, a new
> > layer *might* be justified.  Otherwise, I think it's a bad idea.
>
> I think fetching and caching are two separate things.  The caching
> requires a context.  The fetching doesn't.  I think fetching things

The context is provided by the handler.

[...]
> I also don't see how caching would fit very well into the handler
> structure.  Maybe there'd be a HTTPCachingHandler, and you'd
> instantiate it with your caching policy? (where it stores files, how
> many files, etc)  Also a HTTPBasicAuthCachingHandler,
> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on?  This
> caching is orthogonal -- not just to things like authentication, but

My assumption was that it wasn't orthogonal, since RFC 2616 seems to have
rather a lot to say on the subject.

If it *is* (or part of it is) orthogonal, three options come to mind.
Let's say you have a cache class.

1. All the normal handlers know about the cache class, but have caching
   off by default.

2. Write a CacheHandler with a default_open.  If there's a cache hit,
   return it, otherwise return None (let somebody else try to handle it).

3. Subclass (or replace without bothering to subclassing) OpenerDirector.
   I guess open is probably what you'd want to change, but I don't know
   about HTTP and other protocols' caching rules.

I haven't thought it through so I certainly don't claim to know how any of
these will turn out (though I'd guess 2. would do the job of any caching
that's orthogonal to the various protocol schemes).  If you want to
justify a new layer, though, it's up to you to show caching *doesn't* fit
urllib2 as-is.  YAGNI.


> even to HTTP (to some degree).  The handler structure doesn't allow
> orthogonal features.  Except through mixins, but don't get me started
> on mixins...

I don't think that's true -- see above.

Again, my 'processors' patch is relevant here (see that RFE).  But no
point in re-iterating here the long discussion I posted on the SF bug
tracker.


> Using a separate class, not related to Handlers, isn't more complex.
> Either way we have to provide the same features and the same options,
> and document all of those.

I think it would be fruitless to comment on this until you put forward
some details.


> No matter which way you cut it, it's new
> stuff, it's another layer.  Implementing it in a new class is just
> calling it what it is.

Well, um, no.  Having a new layer is different to not having a new layer.
Otherwise, what was this little discussion of ours all about??

Another thing I think we shouldn't forget is that nobody has actually said
they're going to write any caching code yet!  Are you?  Do you have any
other requirements driving the need for this new layer, or is it all down
to caching?


John

From t.vandervossen at fngtps.com  Sun Oct 26 14:50:11 2003
From: t.vandervossen at fngtps.com (Thijs van der Vossen)
Date: Sun Oct 26 14:50:41 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310252119540.1870@alice>
References: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com>
	<Pine.LNX.4.58.0310252119540.1870@alice>
Message-ID: <3F9C2573.8070207@fngtps.com>

John J Lee wrote:
> On Sat, 25 Oct 2003, Thijs van der Vossen wrote:
> [...]
> 
>>>>* Asynchronous fetch.  When working over the Plucker distiller,
> 
> [...]
> 
>>>Second, would it be possible to do this without a lot of code
>>>duplication between the current urllib{2,} / httplib libraries and the
>>>new stuff?  Is it worth it, when you can use threads instead?
>>
>>This is already trivial with the asyncore libraries. If I remember
> 
> [...]
> 
> So what is this for?
> 
> http://asynchttp.sourceforge.net/

 From this page: "Our goal is to provide the functionality of the 
excellent 'httplib' module without using blocking sockets."

> 28k of Python code isn't exactly 'trivial', is it?

Nope, but it's relatively trivial to use the asyncore libraries to 
asynchronous get multiple pages (once again, there is a nice example in 
Steve's book). Providing exactly the same functionality as httplib will 
obviously be more work.

Regards,
Thijs


From jjl at pobox.com  Sun Oct 26 15:24:46 2003
From: jjl at pobox.com (John J Lee)
Date: Sun Oct 26 15:24:53 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <3F9C2573.8070207@fngtps.com>
References: <8B4D11EB-06EC-11D8-8AA9-000393678182@fngtps.com>
	<Pine.LNX.4.58.0310252119540.1870@alice> <3F9C2573.8070207@fngtps.com>
Message-ID: <Pine.LNX.4.58.0310262016470.1326@alice>

On Sun, 26 Oct 2003, Thijs van der Vossen wrote:
[...]
> >>>Second, would it be possible to do this without a lot of code
> >>>duplication between the current urllib{2,} / httplib libraries and the
> >>>new stuff?  Is it worth it, when you can use threads instead?
> >>
> >>This is already trivial with the asyncore libraries. If I remember
[...]
> > So what is this for?
> >
> > http://asynchttp.sourceforge.net/
[...]
> > 28k of Python code isn't exactly 'trivial', is it?
>
> Nope, but it's relatively trivial to use the asyncore libraries to
> asynchronous get multiple pages (once again, there is a nice example in
> Steve's book). Providing exactly the same functionality as httplib will
> obviously be more work.

Bill said he wanted a 'higher-powered HTTP client library', by which I
assume he meant something more than sub-httplib.


John

From ianb at colorstudy.com  Sun Oct 26 17:39:45 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun Oct 26 17:40:03 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310260215200.2379@alice>
Message-ID: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com>

On Sunday, October 26, 2003, at 07:24 AM, John J Lee wrote:
>> c) This doesn't have to effect backward compatibility anyway.  We can
>> leave HTTPBasicAuthHandler in there (deprecated), but also fold it's
>> functionality into HTTPHandler.  HTTPBasicAuthHandler doesn't require
>> that HTTPHandler *not* handle authentication.
>
> Well, it does if you do something important in your auth handler that
> never gets called because HTTPHandler has decided it knows best when it
> comes to 40x.  But like you say, there's probably not much important 
> that
> you could do since password management is already abstracted out.

Essentially we'd just move HTTPBasicAuthHandler.http_error_401 into 
HTTPHandler.  You could still override it, and HTTPBasicAuthHandler 
would still override it (and somewhat differently, because 
HTTPHandler.http_error_401 should handle both basic and digest auth).  
It's a pretty small change, really.

>>> Anyway, it may or may not be the perfect system, but I'm not 
>>> convinced
>>> it needs changing.  Can you give a specific example of where having 
>>> lots
>>> of handlers becomes oppressive?
>>
>> The documentation is certainly a problem (e.g., the
>> HTTPBasicAuthHandler page), though it could be organized differently
>> without changing the code.  It's definitely ravioli code
>> (http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO
>> it's hard to document ravioli code well.  (It's not so important how
>> things are structured internally, but currently urllib2 also exposes
>> that complex class structure)
>
> It's pretty simple conceptually: OpenerDirector asks all the handlers 
> if
> they want to handle, not handle, or abort a response.  It does the same
> for errors.  Most of the handlers' functions are self-explanatory from
> their class names (OK, I guessed CacheFTPHandler wrong, but it was 
> 50-50
> :-).  I wouldn't call that ravioli.

It might work conceptually internally, and probably big internal 
changes aren't necessary.  But it doesn't work conceptually for the 
programmer that has a task-oriented desire.  The programmer starting to 
use urllib2 doesn't want to understand a framework of handlers, they 
want to get something off the net.  urlopen() is the only easy way to 
do that in urllib2, everything else requires a lot more thinking.  And 
urlopen() isn't very featureful.

> I'm still waiting for that example.

I thought I gave examples: documentation, proliferation of classes, 
non-orthogonality of features (e.g., HTTPS vs. HTTP isn't orthogonal to 
authentication).
>
>> Also urlopen is not really extensible.  You can't tell urlopen to use
>
> Not directly, no.  You have to do it via build_opener, or via
> OpenerDirector itself (or another class.  That's probably not ideal: 
> what
> did you have in mind instead?

Maybe keyword arguments that get passed to the handlers.  E.g.:

urlopen('http://whatever.com',
     username='bob',
     password='secret',
     postFields={...},
     postFiles={'image': ('test.jpg', '... image body ...')},
     addHeaders={'User-Agent': 'superbot 3000'})

It could get a little out of hand with all the protocols and all the 
features, but I can't think of a better way to do it.  And I think the 
features would still be easier to document even when urlopen() took all 
sorts of funny options, than they are when there's separate handlers.  
But maybe urllib2 just needs better documentation with useful examples; 
that signature is pretty hairy.  But it's still easier to read and 
write than any OO-based system.  I'm concerned about the external ease 
of use, not the internal conceptual integrity.

>> authentication information (and it doesn't obey browser URL
>> conventions, like http://user:password@domain/).
>
> What is that convention?  Is it standardised in an RFC?

It's a URL convention that's been around a very long time, I don't know 
if it is in an RFC.

> I see
> ProxyHandler knows about that syntax.  Obviously it's not an intrinsic
> limitation of the handler system.

I don't really know how a handler is chosen -- can it figure out 
whether it should use HTTPHandler, HTTPBasicAuthHandler, or 
HTTPDigestAuthHandler just from this URL?  Obviously basic vs. digest 
can't be determined until you try to fetch the object.

>> And we want to add
>> structured POST data to that method (but also allow non-structured
>
> We do?  Why not just have a function (to make file upload data, 
> assuming
> that's what you're thinking of)?

That would work too.

>> data), and cookies, and it might be nice to set the user-agent, and
>> maybe other things that I haven't thought of.  If urlopen doesn't
>> support these extra features then programmers have to learn a new API
>> as their program becomes more complex.
>
> Well, I can do those things already (cookies, set user-agent) using
> urllib2.  User-Agent is a bit ugly, I'll grant you, but I don't lose 
> sleep
> over it.  I did find an extension (backwards-compatible, I hope & 
> believe)
> made things much cleaner -- see the RFE I mentioned earlier.  But no 
> need
> for a whole new layer.
>
> Mind you, if your idea can do the same job as my RFE, then it should
> certainly be considered alongside that.

Hmm... I just looked at the RFE now, so I'm still not sure what it 
would mean to this.

>> Yet none of these features
>> would be all that difficult to add via urlopen or perhaps other simple
>> functions, (instead of via classes).  I don't think there's any need
>> for classes in the external API -- fetching URLs is about doing 
>> things,
>> not representing things, and functions are easier to understand for
>> doing.
>
> Details?  The only example you've given so far involved a UserAgent 
> class.

Details about what?  Your asking for details and examples, but I've 
provided some already and I don't know what you're looking for.  
Example of what?  I don't have an implementation, or any set 
implementation in mind, and I haven't suggested that.

> [...]
>>> So, merely because you think "it feels like a new object", you're
>>> proposing to create a whole new layer of complexity for users to 
>>> learn?
>>> Why should people have to learn a new API just to get caching?  If
>>> somebody had implemented HTTP caching and found the handler mechanism
>>> lacking, or had a specific argument that showed it to be so, a new
>>> layer *might* be justified.  Otherwise, I think it's a bad idea.
>>
>> I think fetching and caching are two separate things.  The caching
>> requires a context.  The fetching doesn't.  I think fetching things
>
> The context is provided by the handler.

But we're fetching URLs, not handlers.  The URL is context-less, 
intrinsically.  The handler isn't context-less, but that's part of what 
I don't like about urllib2's handler-oriented perspective.

> [...]
>> I also don't see how caching would fit very well into the handler
>> structure.  Maybe there'd be a HTTPCachingHandler, and you'd
>> instantiate it with your caching policy? (where it stores files, how
>> many files, etc)  Also a HTTPBasicAuthCachingHandler,
>> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on?  This
>> caching is orthogonal -- not just to things like authentication, but
>
> My assumption was that it wasn't orthogonal, since RFC 2616 seems to 
> have
> rather a lot to say on the subject.

Well, if they aren't orthogonal, then they should all be implemented in 
a single class.  Implementing features in subclasses means that they 
can't be easily used in combination.  Why not have just one good HTTP 
handler class?  It's all one protocol (and HTTPS is exactly the same 
protocol).

Many parts of the caching mechanics aren't part of RFC 2616 -- 
specifically persistence, metadata storage and querying, and cache 
control.  These aren't part of HTTP at all.

> If it *is* (or part of it is) orthogonal, three options come to mind.
> Let's say you have a cache class.
>
> 1. All the normal handlers know about the cache class, but have caching
>    off by default.
>
> 2. Write a CacheHandler with a default_open.  If there's a cache hit,
>    return it, otherwise return None (let somebody else try to handle 
> it).
>
> 3. Subclass (or replace without bothering to subclassing) 
> OpenerDirector.
>    I guess open is probably what you'd want to change, but I don't know
>    about HTTP and other protocols' caching rules.
>
> I haven't thought it through so I certainly don't claim to know how 
> any of
> these will turn out (though I'd guess 2. would do the job of any 
> caching
> that's orthogonal to the various protocol schemes).  If you want to
> justify a new layer, though, it's up to you to show caching *doesn't* 
> fit
> urllib2 as-is.  YAGNI.

1 seems like a lot of trouble.  2 won't work, since CacheHandler can't 
return None and let someone else do the work, because it has to know 
about what the result is so that it can cache the result.  It would 
have to be 3, since it's really about intercepting handler calls.  I 
would imagine that it should wrap OpenerDirector, and perhaps subclass 
it as well.  Then protocols can be added to the caching and non-caching 
directors at the same time.

But it seems like there can be only one OpenDirector... that messes 
things up.  Multiple caches with different policies should be possible. 
  Which leads us back to a separate class that handles caching.

>> even to HTTP (to some degree).  The handler structure doesn't allow
>> orthogonal features.  Except through mixins, but don't get me started
>> on mixins...
>
> I don't think that's true -- see above.
>
> Again, my 'processors' patch is relevant here (see that RFE).  But no
> point in re-iterating here the long discussion I posted on the SF bug
> tracker.

I missed that when you posted it.  That might handle some of these 
features.  It seems a little too global to me.  For instance, how would 
you handle two distinct user agents with respect to the referer header?

Seems like it would also make sense as a OpenerDirectory 
subclass/wrapper.  At least portions of it are similar to doing caching 
(like cookies and referers), which is to say a request that is made in 
a specific context.  One example of an application that would require 
separate contexts would be when testing concurrency in a web 
application -- you want to simulate multiple users logging in and 
performing actions concurrently.  You can't do this if the context is 
stored globally.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Sun Oct 26 17:51:45 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun Oct 26 17:52:12 2003
Subject: [Web-SIG] Threading and client-side support
In-Reply-To: <Pine.LNX.4.58.0310260112380.2248@alice>
Message-ID: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>

On Saturday, October 25, 2003, at 07:18 PM, John J Lee wrote:
> First, I should state that I'm almost entirely ignorant of all things
> threads.  Be gentle with me.
>
> What is the current state of thread-safety in the Python standard 
> library
> client-side web code (ie. httplib, urllib, urllib2)?

As far as I know they are threadsafe.

> I ask because my cookies code is currently entirely thread-ignorant, 
> and
> I'm wondering if it should have appropriate thread synchronization -- 
> and
> if so, what problems I'm supposed to be preventing, and how to prevent
> them.

It's all about concurrent access.  For instance, looking at 
ClientCookie, the question would be what would happen when 
ClientCookie.urlopen was called while another ClientCookie.urlopen was 
running.  For instance, in ClientCookie._urllib2_support.urlopen, 
build_opener() can be called twice.  If this is a problem then the code 
isn't threadsafe (i.e., if build_opener() isn't threadsafe then urlopen 
isn't threadsafe).  urlopen() can protect build_opener() with a lock, 
like:

urlopen_lock = threading.Lock()
def urlopen(url, data=None):
     global _opener
     if _opener is None:
         urlopen_lock.acquire()
         try:
             if _opener is None:
                 # it might not be None, because we might have called 
build_opener()
                 # sometime between the first if and acquiring the 
lock...
                 _opener = build_opener()
         finally:
             urlopen_lock.release()
      return _opener.open(url, data)

There's a little more complexity there so that you don't have to 
acquire the lock every time you call urlopen().  _opener.open() still 
has to be threadsafe at this point (and you'll definitely want it to be 
threadsafe, so requests don't have to be done serially).

Where you have to do this sort of thing depends on what parts of the 
system are exposed so that they can be used concurrently.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From gstein at lyra.org  Mon Oct 27 00:30:27 2003
From: gstein at lyra.org (Greg Stein)
Date: Mon Oct 27 00:31:15 2003
Subject: [Web-SIG] http headers (was: Defining a standard interface for
	common web tasks)
In-Reply-To: <Pine.LNX.4.58.0310250205170.957@alice>;
	from jjl@pobox.com on Sat, Oct 25, 2003 at 01:16:52PM +0100
References: <3F98D709.9070806@sjsoft.com>
	<4987A126-063A-11D8-A49B-000393C2D67E@colorstudy.com>
	<20031024140644.H15765@lyra.org>
	<Pine.LNX.4.58.0310250205170.957@alice>
Message-ID: <20031026213027.A24764@lyra.org>

On Sat, Oct 25, 2003 at 01:16:52PM +0100, John J Lee wrote:
>...
> > i.e. concatenate with a comma. While it is allowed, there is *generally*
> > no reason for the API to enable writing separate headers, nor a reason to
> > expose same-named headers as separate (i.e. just concatenate them
> > internally).
> >
> > Note that I say "generally" because I've seen a client that could not deal
> > properly with a long header value. By separating the tokens in the header
> [...]
> 
> Another thing that breaks this is the Cookie header: cookie values may
> contain commas (and they do!).  Of course, this may not be relevant here,
> since Python programmers aren't going to be so silly as to put commas in
> their cookie values :-)

Yup. Good point. The WWW-Authenticate header has ambiguity here, too,
although most of those issues have been sorted. With a bit of work, you
can usually tease apart the header into the various challenges the server
is offering up.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From gstein at lyra.org  Mon Oct 27 00:47:41 2003
From: gstein at lyra.org (Greg Stein)
Date: Mon Oct 27 00:47:46 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <03Oct24.175145pdt."58611"@synergy1.parc.xerox.com>;
	from janssen@parc.com on Fri, Oct 24, 2003 at 05:51:41PM -0700
References: <3F99B1ED.1090802@bath.ac.uk>
	<03Oct24.175145pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <20031026214741.B24764@lyra.org>

On Fri, Oct 24, 2003 at 05:51:41PM -0700, Bill Janssen wrote:
> > I'm a huge fan of being able to distinguish between that data from a 
> > query string (GET data) and data that has been POSTed. I posted my 
> > reasons for caring about this to the Quixote mailing list a few days 
> > ago, but I'll repeat them here through the magic of copy and paste:
> > [...list of reasons you want to know the HTTP command omitted...]
> 
> The way to differentiate them (if you care) is to look at the
> "command" attribute of the request object, IMO.  That would tell you

Actually, it is called the "method" rather than "command". See section 9
of RFC 2616.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From davidf at sjsoft.com  Mon Oct 27 02:41:16 2003
From: davidf at sjsoft.com (David Fraser)
Date: Mon Oct 27 02:41:35 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <CGECIJPNNHIFAJKHOLMAKEHDIMAA.sholden@holdenweb.com>
References: <CGECIJPNNHIFAJKHOLMAKEHDIMAA.sholden@holdenweb.com>
Message-ID: <3F9CCC1C.7010400@sjsoft.com>

Steve Holden wrote:

>>-----Original Message-----
>>From: web-sig-bounces+sholden=holdenweb.com@python.org
>>[mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of
>>David Fraser
>>Sent: Friday, October 24, 2003 2:01 PM
>>To: Ian Bicking
>>Cc: web-sig@python.org
>>Subject: Re: [Web-SIG] Form field dictionaries
>>
>>
>>Ian Bicking wrote:
>>
>>    
>>
>>>On Friday, October 24, 2003, at 11:28 AM, David Fraser wrote:
>>>
>>>      
>>>
>>>>That's fine, but I think it's important that these methods are
>>>>available as an addition to a standard dictionary interface.
>>>>I think the key point is, if somebody wants a list of values, they
>>>>probably know that they want a list.
>>>>It's very difficult to write code by accident that would handle a
>>>>list of values as well as a string.
>>>>So if somebody knows they want a list in certain
>>>>        
>>>>
>>circumstances, they
>>    
>>
>>>>could call getlist()
>>>>But I think the default dictionary return value should be
>>>>        
>>>>
>>the same as
>>    
>>
>>>>getfirst().
>>>>That saves endless checks for lists for those who don't need them.
>>>>        
>>>>
>>>Every time I have encountered an unexpected list it has
>>>      
>>>
>>been because
>>    
>>
>>>of a bug somewhere else in my code.  I might use a getone() method
>>>that threw some exception when a list was encountered, but
>>>      
>>>
>>I'd *never*
>>    
>>
>>>want to use getfirst().  getfirst() is sloppy programming.
>>>      
>>>
>>(getlist()
>>    
>>
>>>is perfectly fine though)
>>>      
>>>
>>There seems to be a lot of agreement on this...
>>So let's take it that the interface will be a dictionary,
>>with an extra
>>method defined, getlist, which will return multiple items if multiple
>>items were defined, or a list containing a single item otherwise.
>>The next question is, how do we handle the Get/Post/Both situation?
>>One way would be to have methods on the request object that
>>return the
>>desired dictionary
>>Somebody also suggested including Cookies, as is done in PHP
>>- I'm not
>>sure this is a good idea
>>
>>    
>>
>The only nit I would pick is to have getlist() return a list even when
>the response contained a single value.
>  
>
Sure, that's what I meant above, sorry if it wasn't clear

David


From davidf at sjsoft.com  Mon Oct 27 02:42:34 2003
From: davidf at sjsoft.com (David Fraser)
Date: Mon Oct 27 02:42:39 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031024141819.R70244@onyx.ispol.com>
References: <4A29F5B4-0640-11D8-A49B-000393C2D67E@colorstudy.com>
	<3F9968BE.1010009@sjsoft.com>
	<20031024141819.R70244@onyx.ispol.com>
Message-ID: <3F9CCC6A.7020503@sjsoft.com>

Gregory (Grisha) Trubetskoy wrote:

>On Fri, 24 Oct 2003, David Fraser wrote:
>
>  
>
>>The next question is, how do we handle the Get/Post/Both situation?
>>    
>>
>
>Just to clarify nomenclature -
>
>POST /blah/blah.py?foo=bar
>
>is a valid request. The part after ? is called "query information", this
>is defined in RFC 1808 and RFC1738.
>
>CGI (which has no formal RFC, but there is Ken Coar's excellent draft)
>introduces something called "path-info", but its meaning is rather vague
>outside of cgi since it relies on a notion of a script, which isn't very
>meaningful in most non-CGI environments.
>
>The data submitted in the body of the POST request is called "form data"
>and I believe is described in RFC 1867.
>
>I think that query information and form data can be combined in a single
>mapping object, because if you want just query data, you can always parse
>the url directly via urlparse, and if you want only form data, you can
>read and parse it directly as a mime object.
>  
>
Great. No need to complicate things unneccessarily!

>Path-info I think should be left where it belongs - in the cgi-specific
>module.
>  
>
Yes, we shouldn't integrate cgi-specific things into a general API

David


From davidf at sjsoft.com  Mon Oct 27 02:45:11 2003
From: davidf at sjsoft.com (David Fraser)
Date: Mon Oct 27 02:45:48 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <EE433922-0655-11D8-A49B-000393C2D67E@colorstudy.com>
References: <EE433922-0655-11D8-A49B-000393C2D67E@colorstudy.com>
Message-ID: <3F9CCD07.4070502@sjsoft.com>

Ian Bicking wrote:

> On Friday, October 24, 2003, at 01:00 PM, David Fraser wrote:
>
>> Ian Bicking wrote:
>>
>>> Every time I have encountered an unexpected list it has been because 
>>> of a bug somewhere else in my code.  I might use a getone() method 
>>> that threw some exception when a list was encountered, but I'd 
>>> *never* want to use getfirst().  getfirst() is sloppy programming.  
>>> (getlist() is perfectly fine though)
>>
>>
>> There seems to be a lot of agreement on this...
>> So let's take it that the interface will be a dictionary, with an 
>> extra method defined, getlist, which will return multiple items if 
>> multiple items were defined, or a list containing a single item 
>> otherwise.
>
>
> Additionally, getlist should return the empty list if the key isn't 
> found, as this follows naturally (but a KeyError for normal access 
> when a value isn't found).  I also think cgi's default of throwing 
> away empty fields should not be supported, even optionally.
>
> But I haven't really heard reaction to the idea that you get a 
> BadRequest or other exception if you try to get key that has multiple 
> values.  Throwing information away is bad, and unPythonic (though very 
> PHPish).  I don't think we should copy PHP here.  I have *never* 
> encountered a situation where throwing away extra values found in the 
> query is the correct solution.  Either the form that is doing the 
> submission has a bug, or else the script needs to figure out some 
> (explicit!) way to handle the ambiguity.

What about comparing multiple values to see if they're the same. I don't 
see throwing values away as such a bad problem...

> We also need a way to get at the raw values.  I suppose you could do:
>
> fields = {}
> for key in req.fields.items():
>     v = req.getlist(key)
>     if len(v) == 1: fields[key] = v[0]
>     else: fields[key] = v
>
> But that's kind of annoying, since the request object probably 
> contains this kind of dictionary already.  This will be required for 
> backward compatibility, if we want this request to be wrapped to 
> support existing request interfaces.

I think the correct solution is to be explicit about the keys you want 
lists for ... as this would have to be coded explicitly somewhere in the 
code anyway.

David


From davidf at sjsoft.com  Mon Oct 27 02:47:12 2003
From: davidf at sjsoft.com (David Fraser)
Date: Mon Oct 27 02:47:26 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <20031024192925.R71890@onyx.ispol.com>
References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com>
	<20031024192925.R71890@onyx.ispol.com>
Message-ID: <3F9CCD80.10502@sjsoft.com>

Gregory (Grisha) Trubetskoy wrote:

>For what it's worth, I never liked the request/response separation either.
>I like a single object from which you can read() and to which you can
>write(), just like a file. Imagine if for file IO you had to have an
>object to read and another one to write?
>
>(I would agree that perhaps "request" is a misnomer, but I can't think of
>anything better)
>  
>
Connection? I think someone suggested "Transaction" for this, but it 
sounds out of place here...

David

>On Fri, 24 Oct 2003, Bill Janssen wrote:
>  
>
>>>When you stop and think about it: *every* request object will have a
>>>matching response object. Why have two objects if they come in pairs? You
>>>will never see one without the other, and they are intrinsically tied to
>>>each other. So why separate them?
>>>      
>>>
>>Mainly because they are two separate concepts.  For instance, in my
>>code, I always pass two arguments; one is the response, which the user
>>manipulates to send back something to the caller, and the other is the
>>request, which is basically a dictionary of all parameter values, plus
>>a few extra special ones like 'path'.
>>
>>Bill
>>    
>>


From davidf at sjsoft.com  Mon Oct 27 02:49:00 2003
From: davidf at sjsoft.com (David Fraser)
Date: Mon Oct 27 02:49:05 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <20031024132028.C15765@lyra.org>
References: <20031024132028.C15765@lyra.org>
Message-ID: <3F9CCDEC.30303@sjsoft.com>

Greg Stein wrote:

>In the most recent incarnation of a webapp of mine (subwiki), I almost
>went with a request/response object paradigm and even started a bit of
>refactoring along those lines. However, I ended up throwing out that
>dual-object concept.
>
>When you stop and think about it: *every* request object will have a
>matching response object. Why have two objects if they come in pairs? You
>will never see one without the other, and they are intrinsically tied to
>each other. So why separate them?
>
>I set up the subwiki core to instantiate a "handler" each time a request
>comes in. That Handler instance provides access to the request info, and
>is the conduit for generating the response. The app dispatches to the
>appropriate command function, passing the handler.
>
>The Handler is actually set up as a base class, with two subclasses so
>far: cgi, and cmdline. This lets me do some testing from the command line,
>along with the standard cgi model of access. At some point, I'll implement
>a mod_python subclass to do the request/response handling.
>
>(as a side note, I'll also point out that Apache operates this way, too;
> everything is based around the request_rec structure; it holds all the
> input data, output headers, the input and output filter chains, etc)
>
>
>In any kind of server-side framework design, I would give a big +1 to
>keeping it simple with a single "handler" type of object rather than a
>dual-object design.
>
>Cheers,
>-g
>
+1 from me too.
We should also think about things that may/may not be supported by the 
API, such as filters in Apache 2
These seem to me to be a very Pythonic concept that could easily be 
layered on top of any underlying API
If the request-response object is well designed, filters could fit 
snugly on top of it.

David


From davidf at sjsoft.com  Mon Oct 27 02:52:59 2003
From: davidf at sjsoft.com (David Fraser)
Date: Mon Oct 27 02:53:06 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031024220036.K1810@onyx.ispol.com>
References: <03Oct24.152805pdt."58611"@synergy1.parc.xerox.com>	<3F99B1ED.1090802@bath.ac.uk>
	<20031024220036.K1810@onyx.ispol.com>
Message-ID: <3F9CCEDB.6090506@sjsoft.com>

Gregory (Grisha) Trubetskoy wrote:

>On Fri, 24 Oct 2003, Simon Willison wrote:
>  
>
>>2. My rule of thumb is "only modify data on a POST" - that way there's
>>no chance of someone bookmarking a URL that updates a database (for
>>example).
>>    
>>
>I get upset at web pages that refuse to cooperate when I submit things via
>query strings.
>
>I think a reliable way to avoid accidental updates is to rely on a session
>mechanism; only modifying on POST only results in mild user annoyance
>IMHO.
>  
>
>>3. It is useful to be able to detect if a form has been submitted or
>>not. In PHP, I frequently check for POSTed data and display a form if
>>none is available, assume the form has been submitted if there is.
>>    
>>
>I don't like doing things like this because they rely on protocol
>internals to drive application logic...
>  
>
>>4. Security. While ensuring data has come from POST rather than GET
>>provides absolutely no security against a serious intruder, it does
>>discourage amateurs from "hacking the URL" to see if they can cause any
>>damage. Security through obscurity admitedly, but it adds a bit of extra
>>peace of mind.
>>    
>>
>Again, I don't agree; hackable URL's are a good thing! :-)
>
>And it is, indeed, security by obscurity. If you have good data
>validation, there should be no need for any obscurity.
>  
>
Absolutely. And I really like the bookmarklet for Mozilla that lets you 
transform all POST forms into Get forms so you can hack the URLs :-)
http://www.squarefree.com/bookmarklets/forms.html

David


From davidf at sjsoft.com  Mon Oct 27 02:58:23 2003
From: davidf at sjsoft.com (David Fraser)
Date: Mon Oct 27 02:58:27 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>
References: <03Oct24.203301pdt."58611"@synergy1.parc.xerox.com>
Message-ID: <3F9CD01F.4070204@sjsoft.com>

Bill Janssen wrote:

>Apropos Ian's comments today, I'd like to suggest that at this stage
>we focus on what's missing, rather than on how to fix/change things.
>What have you needed that isn't in the standard libraries?  Here's my
>list:
>  
>
I think the key reason for discussing how to fix/change things is that a 
general web application API is needed that will allow code to run on top 
of any framework that's supported. This requires a lot of in-depth 
discussion about the subtleties of how things work...
Anyway we seem to have cleared up a fair amount of that, so I would see 
the definition of this API as my primary interest

David


From jjl at pobox.com  Mon Oct 27 09:45:16 2003
From: jjl at pobox.com (John J Lee)
Date: Mon Oct 27 09:45:56 2003
Subject: [Web-SIG] Threading and client-side support
In-Reply-To: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>
References: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310270400450.1734@alice>

On Sun, 26 Oct 2003, Ian Bicking wrote:

> On Saturday, October 25, 2003, at 07:18 PM, John J Lee wrote:
[...]
> > What is the current state of thread-safety in the Python standard
> > library client-side web code (ie. httplib, urllib, urllib2)?
>
> As far as I know they are threadsafe.

I suppose I should ask on python-dev if there's a policy / tradition here.

[...]
> urlopen_lock = threading.Lock()
> def urlopen(url, data=None):
[...]

OK, thanks, that's basically as my vague understanding had it, but I had
the impression that there were all kinds of flavours of thread-safety,
guaranteeing various subtly different things?  I guess I've got some
reading to do...

Some thinking out loud in case anybody cares to help clear up my current
confusion:

Hmm, urllib2 doesn't do what your example does, but I suppose
OpenerDirectors don't currently have any state that could get lost in a
race condition in that particular case.  That would change with cookie
handling.

Am I going to have a hard time spotting all the places where I need locks?
I can't see any other place where I'd need locks other than in CookieJar.
I suppose I need to lock all access to all CookieJar methods, so that
neither reading or writing state can happen whenever CookieJar state is
changing?  I suppose I'd also need to just label the .cookies attribute as
non-threadsafe (or get rid of it, or add a __getattr__ to allow locking it
-- yuck).  Can I justify saying that some of this is the application's
problem?  For example, perhaps the .filename and attribute of CookieJar
could mess things up if altered by one thread while another thread was
reading it in order to open a file?  Is it the application's own stupid
fault if it fails to lock access to that attribute in cases where that
might happen, or is it CookieJar's problem?


John

From jjl at pobox.com  Mon Oct 27 10:00:20 2003
From: jjl at pobox.com (John J Lee)
Date: Mon Oct 27 10:01:26 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com>
References: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310271458430.1780@alice>

On Sun, 26 Oct 2003, Ian Bicking wrote:
> On Sunday, October 26, 2003, at 07:24 AM, John J Lee wrote:
[...]
> Essentially we'd just move HTTPBasicAuthHandler.http_error_401 into
> HTTPHandler.  You could still override it, and HTTPBasicAuthHandler
> would still override it (and somewhat differently, because
> HTTPHandler.http_error_401 should handle both basic and digest auth).
> It's a pretty small change, really.

So is the benefit.  It's

a = HTTPBasicAuthHandler()
a.add_password(user="joe", password="joe")
o = build_opener(a)

vs.

o = build_opener(HTTPHandler(user="joe", password="joe"))


(assuming defaults for realm and uri -- BTW, there seems to be an
HTTPPasswordMgrWithDefaultRealm already, which I guess is some way to what
you want)

If we're still using build_opener, and HTTPBasicAuthHandler were to
override HTTPHandler, it would have to be derived from it.  Not that a
build_opener work-alike couldn't be devised, of course.

[...]
> > I'm still waiting for that example.
>
> I thought I gave examples: documentation, proliferation of classes,
> non-orthogonality of features (e.g., HTTPS vs. HTTP isn't orthogonal to
> authentication).

Lack of documentation doesn't justify changes to the code.  There is not
any harmful proliferation of classes, I think: the function of the
handlers is pretty obvious in most cases (though obviously the docs could
be better).  I don't recognize the orthogonality problem you're referring
to.

[...]
> urlopen('http://whatever.com',
>      username='bob',
>      password='secret',
>      postFields={...},
>      postFiles={'image': ('test.jpg', '... image body ...')},
>      addHeaders={'User-Agent': 'superbot 3000'})
[...]
> write than any OO-based system.  I'm concerned about the external ease
> of use, not the internal conceptual integrity.

OK, maybe I'm overconcerned about this layer -- if it's a simple
convenience thing like this, fine (as long as it actually is useful
and simple, of course).

My biggest concern was that you seemed to be advocating a new UserAgent
class, which would presumably more-or-less duplicate OpenerDirector (you
probably want to skip to the end of this post at this point, because I
think you may have missed a crucial point about that class).
OpenerDirector is not such a great name, actually: maybe UserAgent or
URLOpener would have been better...


> >> authentication information (and it doesn't obey browser URL
> >> conventions, like http://user:password@domain/).
> >
> > What is that convention?  Is it standardised in an RFC?
>
> It's a URL convention that's been around a very long time, I don't know
> if it is in an RFC.
>
> > I see
> > ProxyHandler knows about that syntax.  Obviously it's not an intrinsic
> > limitation of the handler system.
>
> I don't really know how a handler is chosen -- can it figure out
> whether it should use HTTPHandler, HTTPBasicAuthHandler, or
> HTTPDigestAuthHandler just from this URL?  Obviously basic vs. digest
> can't be determined until you try to fetch the object.

The user and password here are for the proxy, not the server (there's some
code duplication here actually, but that's just a bug).  Dunno if that's
standard use of that syntax.


[...]
> > Mind you, if your idea can do the same job as my RFE, then it should
> > certainly be considered alongside that.
>
> Hmm... I just looked at the RFE now, so I'm still not sure what it
> would mean to this.

Sorry, I don't understand 'what it would mean to this'.  What's 'this'?


> >> Yet none of these features
> >> would be all that difficult to add via urlopen or perhaps other simple
> >> functions, (instead of via classes).  I don't think there's any need
> >> for classes in the external API -- fetching URLs is about doing
> >> things,
> >> not representing things, and functions are easier to understand for
> >> doing.
> >
> > Details?  The only example you've given so far involved a UserAgent
> > class.
>
> Details about what?  Your asking for details and examples, but I've
> provided some already and I don't know what you're looking for.

You provided some examples of features you think would require some kind
of layer on top of urllib2.  I thought you were originally suggesting a
new UserAgent class or similar (that was you, wasn't it?).  I don't think
that's necessary.

But in the post I'm replying to here, you gave an example of adding args
to urlopen.  I do agree that something like that could be useful. I think
the docs should be changed here to make it clear that urlopen is just a
convenience function that uses a global OpenerDirector.

[...]
> >> I think fetching and caching are two separate things.  The caching
> >> requires a context.  The fetching doesn't.  I think fetching things
> >
> > The context is provided by the handler.
>
> But we're fetching URLs, not handlers.  The URL is context-less,
> intrinsically.  The handler isn't context-less, but that's part of what
> I don't like about urllib2's handler-oriented perspective.

I don't understand what you just said, but I think we're agreed something
that doesn't require calling build_opener or OpenerDirector.add_handler
could be convenient.


> > [...]
> >> I also don't see how caching would fit very well into the handler
> >> structure.  Maybe there'd be a HTTPCachingHandler, and you'd
> >> instantiate it with your caching policy? (where it stores files, how
> >> many files, etc)  Also a HTTPBasicAuthCachingHandler,
> >> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on?  This
> >> caching is orthogonal -- not just to things like authentication, but
> >
> > My assumption was that it wasn't orthogonal, since RFC 2616 seems to
> > have
> > rather a lot to say on the subject.
>
> Well, if they aren't orthogonal, then they should all be implemented in
> a single class.

Yes.  Off the top of my head, I'd say something like (taking note of your
point below about needing to actually cache responses as well as return
cached data!):

class AbstractHTTPCacheHandler:
    def cached_open(self, request):
        # return cached response, or None if no cache hit
    def cache(self, response):
        # cache response if appropriate

class HTTPCacheHandler(AbstractHTTPCacheHandler):
    http_open = cached_open
    http_response = cache

or, if you want a class that does both HTTP and HTTPS:

class HTTPXCacheHandler(AbstractHTTPCacheHandler):
    https_open = http_open = cached_open
    https_response = http_response = cache


[...]
> Why not have just one good HTTP handler class?

Why would you want one when you can easily do whatever you want with a
convenience function or two, and / or a class derived from OpenerDirector,
or something that works like build_opener, etc.?  Not so easy to go in the
other direction, and separate out the various features of a big,
all-singing all-dancing HTTP handler.  That was a big part of the
motivation for urllib2 in the first place: inflexibility of urllib.


> Many parts of the caching mechanics aren't part of RFC 2616 --
> specifically persistence, metadata storage and querying, and cache
> control.  These aren't part of HTTP at all.

I'll take your word for that, but I admit I don't see where that
causes problems for urllib2.


> > If it *is* (or part of it is) orthogonal, three options come to mind.
> > Let's say you have a cache class.
> >
> > 1. All the normal handlers know about the cache class, but have caching
> >    off by default.
> >
> > 2. Write a CacheHandler with a default_open.  If there's a cache hit,
> >    return it, otherwise return None (let somebody else try to handle
> > it).
> >
> > 3. Subclass (or replace without bothering to subclassing)
> > OpenerDirector.
> >    I guess open is probably what you'd want to change, but I don't know
> >    about HTTP and other protocols' caching rules.
> >
> > I haven't thought it through so I certainly don't claim to know how
> > any of
> > these will turn out (though I'd guess 2. would do the job of any
> > caching
> > that's orthogonal to the various protocol schemes).  If you want to
> > justify a new layer, though, it's up to you to show caching *doesn't*
> > fit
> > urllib2 as-is.  YAGNI.
>
> 1 seems like a lot of trouble.

Doesn't appeal to me either.


> 2 won't work, since CacheHandler can't
> return None and let someone else do the work, because it has to know
> about what the result is so that it can cache the result.

At last, a real problem!  Actually, I think this is a problem already
solved by my 'processors' idea, though perhaps not quite in its current
form -- that should be easy to fix, though (ATM, IIRC, they're separate
from handlers: you can't have an object that is both a handler and a
processor -- and they don't currently have default_request and
default_response methods, either).


> It would
> have to be 3, since it's really about intercepting handler calls.  I
> would imagine that it should wrap OpenerDirector, and perhaps subclass
> it as well.  Then protocols can be added to the caching and non-caching
> directors at the same time.
>
> But it seems like there can be only one OpenDirector... that messes

Nope.  You can have as many as you like, with as many different
implementations as you like.  There is only the inconvenience of having to
cut-n-paste build_opener (certainly build_opener isn't ideal as it is, but
I guess people agree with me that that's a pretty small issue, since
nobody has bothered to finish OpenerFactory).


> things up.  Multiple caches with different policies should be possible.
>   Which leads us back to a separate class that handles caching.
>
> >> even to HTTP (to some degree).  The handler structure doesn't allow
> >> orthogonal features.  Except through mixins, but don't get me started
> >> on mixins...
> >
> > I don't think that's true -- see above.
> >
> > Again, my 'processors' patch is relevant here (see that RFE).  But no
> > point in re-iterating here the long discussion I posted on the SF bug
> > tracker.
>
> I missed that when you posted it.  That might handle some of these
> features.  It seems a little too global to me.  For instance, how would
> you handle two distinct user agents with respect to the referer header?

Two OpenerDirectors!

new_opener = build_opener()
new_opener.addheaders = [("User-agent", "Mozilla/5.0")]

old_opener = build_opener()
old_opener.addheaders = [("User-agent", "Mozilla/4.0")]

new_opener.open("http://www.a.com/")
old_opener.open("http://www.b.com/")


> Seems like it would also make sense as a OpenerDirectory
> subclass/wrapper.

IIRC, there are issues with redirection that prevent that.


> At least portions of it are similar to doing caching
> (like cookies and referers), which is to say a request that is made in
> a specific context.  One example of an application that would require
> separate contexts would be when testing concurrency in a web
> application -- you want to simulate multiple users logging in and
> performing actions concurrently.  You can't do this if the context is
> stored globally.

Perhaps this is all you're missing?  Nothing is global until you use
install_opener.

o = build_opener()  # build OpenerDirector
o.open(url)  # nothing global here, urlopen doesn't know about our opener

install_opener(o)  # install OpenerDirector globally, for use by urlopen
urlopen(url)


John

From amk at amk.ca  Mon Oct 27 10:07:09 2003
From: amk at amk.ca (amk@amk.ca)
Date: Mon Oct 27 10:07:14 2003
Subject: [Web-SIG] Threading and client-side support
In-Reply-To: <Pine.LNX.4.58.0310270400450.1734@alice>
References: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310270400450.1734@alice>
Message-ID: <20031027150709.GA29045@rogue.amk.ca>

On Mon, Oct 27, 2003 at 02:45:16PM +0000, John J Lee wrote:
> I suppose I should ask on python-dev if there's a policy / tradition here.

The rough tradition would be: Thread-safety is good, and library modules
shouldn't be non-threadsafe unless there's a very good reason.  

> changing?  I suppose I'd also need to just label the .cookies attribute as
> non-threadsafe (or get rid of it, or add a __getattr__ to allow locking it
> -- yuck). 

Assuming .cookies is a Python dictionary (I haven't looked at the CookieJar
code), there's no locking needed.  Locking is necessary when a data
structure is temporarily inconsistent, or some invariant is temporarily
broken.  

For example, let's say you had two dictionaries, .cookies which maps name ->
Cookie object, and .durations which maps name -> an integer given the
duration of the cookie, and it's stated that every entry in .cookies always
has a corresponding entry in .durations.  In this case you need locking,
because when you add an entry like this:

           self.cookies[name] = Cookie()
	   # danger point
	   self.durations[name] = value
	   
If a thread switch occurs at the danger point, another thread might loop
over cookies.items(), see the missing duration, and die with a KeyError, so
you need to have a lock around the two statements, and make read accesses
use the lock.  (You could also set the value in .durations first and avoid
locking, but that's not possible in general.)

But if you're assigning to a single attribute (self.filename = 'foo'),
there's no point in time where the attribute is inconsistent, a mix of the
old and new names; instead it's first the old value, and then it's set to
'foo'.  So no lock is needed.

--amk


From ianb at colorstudy.com  Mon Oct 27 13:47:49 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Oct 27 13:48:01 2003
Subject: [Web-SIG] Threading and client-side support
In-Reply-To: <Pine.LNX.4.58.0310270400450.1734@alice>
Message-ID: <0FD31AC9-08AE-11D8-A3EF-000393C2D67E@colorstudy.com>

On Monday, October 27, 2003, at 08:45 AM, John J Lee wrote:
> [...]
>> urlopen_lock = threading.Lock()
>> def urlopen(url, data=None):
> [...]
>
> OK, thanks, that's basically as my vague understanding had it, but I 
> had
> the impression that there were all kinds of flavours of thread-safety,
> guaranteeing various subtly different things?  I guess I've got some
> reading to do...

Different parts of the system may be threadsafe, while others are not.  
For instance DB-API has threadsafety "levels", which is just a way of 
indicating which parts of the system are threadsafe, e.g., level 0 
means nothing is threadsafe, level 1 means connections aren't 
threadsafe so you have to use one connection for each thread, and 
higher levels mean that objects deeper in the system become threadsafe. 
  The analog of level 0 is bad, because you have to serialize all 
operations for the entire process.  Level 1 isn't so bad (it's what 
most DB-API drivers have), it just means you have to create a new 
handler/connection/whatever object for each thread (but you have to be 
very explicit about that requirement).  Or if object creation is 
expensive you have to do pooling, which is an incentive to make object 
creation cheap.

> Some thinking out loud in case anybody cares to help clear up my 
> current
> confusion:
>
> Hmm, urllib2 doesn't do what your example does, but I suppose
> OpenerDirectors don't currently have any state that could get lost in a
> race condition in that particular case.  That would change with cookie
> handling.

I'm not sure about urllib2 in particular, but anything you initialize 
at the module level doesn't have to be protected.  So in ClientCookie 
if you didn't lazily create the opener, it wouldn't be a problem.  Or, 
if it's no big deal if you recreate the object twice then it's not a 
problem -- just unnecessarily recreating an object because of a very 
specific race condition isn't a problem.  But if that meant that one of 
the objects created got lost, but maybe someone would still have a 
reference to that object (so it wasn't *completely* lost), then that 
would be a problem (and probably a very hard to debug problem if you 
encounter it).

> Am I going to have a hard time spotting all the places where I need 
> locks?
> I can't see any other place where I'd need locks other than in 
> CookieJar.
> I suppose I need to lock all access to all CookieJar methods, so that
> neither reading or writing state can happen whenever CookieJar state is
> changing?  I suppose I'd also need to just label the .cookies 
> attribute as
> non-threadsafe (or get rid of it, or add a __getattr__ to allow 
> locking it
> -- yuck).  Can I justify saying that some of this is the application's
> problem?  For example, perhaps the .filename and attribute of CookieJar
> could mess things up if altered by one thread while another thread was
> reading it in order to open a file?  Is it the application's own stupid
> fault if it fails to lock access to that attribute in cases where that
> might happen, or is it CookieJar's problem?

You can't be sure of what concurrency expectations the application has. 
  But in general reads don't have to be protected, unless someone is 
reading multiple things and expecting consistency between those reads.  
If it's a problem that you read value A, then someone changes the 
related value B in another thread, then you read B and it doesn't fit 
with A, then there's a threading issue for a read.  Andrew pointed out 
a possible example of this with cookies and expiration.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From gstein at lyra.org  Mon Oct 27 14:26:48 2003
From: gstein at lyra.org (Greg Stein)
Date: Mon Oct 27 14:26:59 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <3F9CCD80.10502@sjsoft.com>;
	from davidf@sjsoft.com on Mon, Oct 27, 2003 at 09:47:12AM +0200
References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com>
	<20031024192925.R71890@onyx.ispol.com> <3F9CCD80.10502@sjsoft.com>
Message-ID: <20031027112648.A27607@lyra.org>

On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote:
> Gregory (Grisha) Trubetskoy wrote:
> 
> >For what it's worth, I never liked the request/response separation either.
> >I like a single object from which you can read() and to which you can
> >write(), just like a file. Imagine if for file IO you had to have an
> >object to read and another one to write?

Woah. Nice analogy. Thanks.

> >(I would agree that perhaps "request" is a misnomer, but I can't think of
> >anything better)
>
> Connection? I think someone suggested "Transaction" for this, but it 
> sounds out of place here...

Nope. A number of request/response pairs occur on a given connection. The
two are rather independent concepts. That was one of the basic tenets to
my redesign of httplib. The old HTTP(S) classes are individual requests,
which sucks for performance. With the new HTTP(S)Connection, you can open
a connection, and then issue multiple requests over it.

The name for the thing can be one of two things, I believe, depending on
where you focus:

  - focus on the transaction itself
  - focus on the thing handling the transaction

Per my original note here, SubWiki tends towards the latter. Each incoming
request instantiates a Handler which deals with both reading/writing at a
basic level (although there are still external entities which treat the
Handler instance like in the first focus type).

"Transaction" does sound out of place since that has connotations of a
database transaction. I don't have any better suggestions (as I've never
had to ponder a name for it since I didn't choose that focus :-)

In the first model, the transaction is a passive entity, dealt with by
some other code which does the processing. In the second model, the
transaction and processing are bundled into the same object -- this is
where you'd instantiate some thing and call a "run" method on it, which
Does The Right Thing. I tend to disfavor that model because the conflation
of data and request processing gets to be very cumbersome and tangled.
Instantiating objects, custom to the request (type), to do the processing
is all well and fine, but pass along a (relatively) passive data object to
it (IMO).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From jjl at pobox.com  Mon Oct 27 16:47:31 2003
From: jjl at pobox.com (John J Lee)
Date: Mon Oct 27 16:48:20 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <20031027112648.A27607@lyra.org>
References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com>
	<20031024192925.R71890@onyx.ispol.com> <3F9CCD80.10502@sjsoft.com>
	<20031027112648.A27607@lyra.org>
Message-ID: <Pine.LNX.4.58.0310272144310.2331@alice>

On Mon, 27 Oct 2003, Greg Stein wrote:
> On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote:
[...]
> > Connection? I think someone suggested "Transaction" for this, but it
> > sounds out of place here...
>
> Nope. A number of request/response pairs occur on a given connection. The
[...]
> "Transaction" does sound out of place since that has connotations of a
> database transaction. I don't have any better suggestions (as I've never
[...]

Exchange?


John

From neel at mediapulse.com  Mon Oct 27 17:06:29 2003
From: neel at mediapulse.com (Michael C. Neel)
Date: Mon Oct 27 17:06:33 2003
Subject: [Web-SIG] [server-side] request/response objects
Message-ID: <C0FC22C08B82074A88B50061764157776B9797@johnson.mediapulse.net>

> On Mon, 27 Oct 2003, Greg Stein wrote:
> > On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote:
> [...]
> > > Connection? I think someone suggested "Transaction" for 
> this, but it
> > > sounds out of place here...
> >
> > Nope. A number of request/response pairs occur on a given 
> connection. The
> [...]
> > "Transaction" does sound out of place since that has 
> connotations of a
> > database transaction. I don't have any better suggestions 
> (as I've never
> [...]
> 
> Exchange?
> 
Yea, that word isn't loaded with connotations =)  (anyone who's office
is on MS Exchange for email knows what I mean)

mike

From janssen at parc.com  Mon Oct 27 17:57:30 2003
From: janssen at parc.com (Bill Janssen)
Date: Mon Oct 27 17:57:50 2003
Subject: [Web-SIG] Form field dictionaries 
In-Reply-To: Your message of "Sun, 26 Oct 2003 21:47:41 PST."
	<20031026214741.B24764@lyra.org> 
Message-ID: <03Oct27.145733pst."58611"@synergy1.parc.xerox.com>

> Actually, it is called the "method" rather than "command". See section 9
> of RFC 2616.

Sure.  I was slipping into a Medusa-ism.

Bill

From janssen at parc.com  Mon Oct 27 18:21:07 2003
From: janssen at parc.com (Bill Janssen)
Date: Mon Oct 27 18:21:30 2003
Subject: [Web-SIG] A list is available
	(http://www.parc.com/janssen/web-sig/needed.html)
Message-ID: <03Oct27.152114pst."58611"@synergy1.parc.xerox.com>

I'll try to act as a scribe and gather various individual suggestions
together.  Please feel free to send mail to correct any malscription
you spot.

http://www.parc.com/janssen/web-sig/needed.html

Bill

From jjl at pobox.com  Tue Oct 28 05:31:05 2003
From: jjl at pobox.com (John J Lee)
Date: Tue Oct 28 05:31:13 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310271458430.1780@alice>
References: <4C6956A2-0805-11D8-A3EF-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310271458430.1780@alice>
Message-ID: <Pine.LNX.4.58.0310281029220.467@alice>

On Mon, 27 Oct 2003, John J Lee wrote:
[...]
> class AbstractHTTPCacheHandler:
>     def cached_open(self, request):
>         # return cached response, or None if no cache hit
>     def cache(self, response):
>         # cache response if appropriate
[...]

I should have said:

      def cache(self, request, response):


John

From jjl at pobox.com  Tue Oct 28 05:35:33 2003
From: jjl at pobox.com (John J Lee)
Date: Tue Oct 28 05:35:40 2003
Subject: [Web-SIG] Threading and client-side support
In-Reply-To: <20031027150709.GA29045@rogue.amk.ca>
References: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310270400450.1734@alice>
	<20031027150709.GA29045@rogue.amk.ca>
Message-ID: <Pine.LNX.4.58.0310281033410.467@alice>

On Mon, 27 Oct 2003 amk@amk.ca wrote:

> On Mon, Oct 27, 2003 at 02:45:16PM +0000, John J Lee wrote:
> > I suppose I should ask on python-dev if there's a policy / tradition here.
>
> The rough tradition would be: Thread-safety is good, and library modules
> shouldn't be non-threadsafe unless there's a very good reason.

Thanks.  So, in particular, httplib, urllib and urllib2 are thread-safe
(except for problems noted in the source: FTP connection caching in
urllib2, FTP content caching in urllib)?


> > changing?  I suppose I'd also need to just label the .cookies attribute as
> > non-threadsafe (or get rid of it, or add a __getattr__ to allow locking it
> > -- yuck).
>
> Assuming .cookies is a Python dictionary (I haven't looked at the CookieJar
> code), there's no locking needed.  Locking is necessary when a data
> structure is temporarily inconsistent, or some invariant is temporarily
> broken.

Yes, I realise that.  .cookies is a nested dict (currently documented as
publicly readable, though FWLIW will probably have to cease to be soon,
for non-thread related reasons):

self.cookies[domain][path][name]


So my set_cookie method certainly needs locking, because there are tests
like this:

 c = self.cookies
 if not c.has_key(cookie.domain): c[cookie.domain] = {}


I guess what I was really worrying about, though (without fully realizing
it), was higher-level integrity issues over and above mere thread-safety.
For example, if one thread is iterating over cookies and reading their
values, and halfway through, another thread calls extract_cookies to
extract the cookies from an HTTP response, causing some cookies to be
added and/or removed, that might cause trouble, but isn't a thread-safety
issue (and is the application's problem, not mine).  I guess the methods I
have for loading / saving to a file also fall into this category, but I'm
still a little confused.

Since the relevant level of granularity is the bytecode instruction
(right?), am I right in assuming you may have to start thinking about what
your code looks like in bytecode form?  I guess you play with the compiler
module until you get to know which operations are single bytecode
instructions and which are not?

[...]
> But if you're assigning to a single attribute (self.filename = 'foo'),
> there's no point in time where the attribute is inconsistent, a mix of the
> old and new names; instead it's first the old value, and then it's set to
> 'foo'.  So no lock is needed.

OK.  I wasn't sure whether that was a single bytecode or not, but I
suppose that makes sense given Python's semantics.  I saw masses of
'synchronize's on strings in a Java implementation of cookie handling
(jCookie), and I'm far from sure what they're all there for...


John

From amk at amk.ca  Tue Oct 28 07:46:46 2003
From: amk at amk.ca (amk@amk.ca)
Date: Tue Oct 28 07:46:51 2003
Subject: [Web-SIG] Threading and client-side support
In-Reply-To: <Pine.LNX.4.58.0310281033410.467@alice>
References: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310270400450.1734@alice>
	<20031027150709.GA29045@rogue.amk.ca>
	<Pine.LNX.4.58.0310281033410.467@alice>
Message-ID: <20031028124646.GB1095@rogue.amk.ca>

On Tue, Oct 28, 2003 at 10:35:33AM +0000, John J Lee wrote:
> Thanks.  So, in particular, httplib, urllib and urllib2 are thread-safe?

No idea; reading the code would be needed to figure that out.

> So my set_cookie method certainly needs locking, because there are tests
> like this:

Correct; that case would need locking.

--amk

From davidf at sjsoft.com  Tue Oct 28 07:52:24 2003
From: davidf at sjsoft.com (David Fraser)
Date: Tue Oct 28 07:52:52 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <20031027112648.A27607@lyra.org>
References: <03Oct24.153625pdt."58611"@synergy1.parc.xerox.com>	<20031024192925.R71890@onyx.ispol.com>
	<3F9CCD80.10502@sjsoft.com> <20031027112648.A27607@lyra.org>
Message-ID: <3F9E6688.9030805@sjsoft.com>

Greg Stein wrote:

>On Mon, Oct 27, 2003 at 09:47:12AM +0200, David Fraser wrote:
>  
>
>>Gregory (Grisha) Trubetskoy wrote:
>>
>>    
>>
>>>For what it's worth, I never liked the request/response separation either.
>>>I like a single object from which you can read() and to which you can
>>>write(), just like a file. Imagine if for file IO you had to have an
>>>object to read and another one to write?
>>>      
>>>
>
>Woah. Nice analogy. Thanks.
>
>  
>
>>>(I would agree that perhaps "request" is a misnomer, but I can't think of
>>>anything better)
>>>      
>>>
>>Connection? I think someone suggested "Transaction" for this, but it 
>>sounds out of place here...
>>    
>>
>
>Nope. A number of request/response pairs occur on a given connection. The
>two are rather independent concepts. That was one of the basic tenets to
>my redesign of httplib. The old HTTP(S) classes are individual requests,
>which sucks for performance. With the new HTTP(S)Connection, you can open
>a connection, and then issue multiple requests over it.
>  
>
OK. Now in this case, you clearly can't handle more than one 
request/response on a single connection at a time.
So would it be feasible (I'm not suggesting it's neccessarily a good 
idea) to use a Connection object, which changes state to reflect the 
request-responses?
Or should a Connection object create separate request-response objects 
for each event?
The reason I'm asking is, surely the response write method will simply 
flow through to the underlying Connection.
Though this may be an implementation detail, it may say something about 
how the API should work.

>The name for the thing can be one of two things, I believe, depending on
>where you focus:
>
>  - focus on the transaction itself
>  - focus on the thing handling the transaction
>
>Per my original note here, SubWiki tends towards the latter. Each incoming
>request instantiates a Handler which deals with both reading/writing at a
>basic level (although there are still external entities which treat the
>Handler instance like in the first focus type).
>
>"Transaction" does sound out of place since that has connotations of a
>database transaction. I don't have any better suggestions (as I've never
>had to ponder a name for it since I didn't choose that focus :-)
>
>In the first model, the transaction is a passive entity, dealt with by
>some other code which does the processing. In the second model, the
>transaction and processing are bundled into the same object -- this is
>where you'd instantiate some thing and call a "run" method on it, which
>Does The Right Thing. I tend to disfavor that model because the conflation
>of data and request processing gets to be very cumbersome and tangled.
>Instantiating objects, custom to the request (type), to do the processing
>is all well and fine, but pass along a (relatively) passive data object to
>it (IMO).
>
>Cheers,
>-g
>
>  
>
Looking at it from an API point of view, the difference is between 
creating a request-response object structure which any of the various 
implementors can handle in their code, and creating a handler object 
structure which the various implementors have to conform to, either by 
changing the existing implementation or by writing wrappers around their 
existing code. I think the request-response object idea is clearly 
simpler from this point of view...

David


From jjl at pobox.com  Tue Oct 28 12:25:54 2003
From: jjl at pobox.com (John J Lee)
Date: Tue Oct 28 12:27:19 2003
Subject: [Web-SIG] Threading and client-side support
In-Reply-To: <20031028124646.GB1095@rogue.amk.ca>
References: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310270400450.1734@alice>
	<20031027150709.GA29045@rogue.amk.ca>
	<Pine.LNX.4.58.0310281033410.467@alice>
	<20031028124646.GB1095@rogue.amk.ca>
Message-ID: <Pine.LNX.4.58.0310281718580.601@alice>

[background for python-dev-ers: In the process of making my client-side
cookie module a suitable candidate for inclusion in the standard library,
I'm trying to make it thread-safe]

On Tue, 28 Oct 2003 amk@amk.ca wrote:
> On Tue, Oct 28, 2003 at 10:35:33AM +0000, John J Lee wrote:
> > Thanks.  So, in particular, httplib, urllib and urllib2 are thread-safe?
>
> No idea; reading the code would be needed to figure that out.

That might not be helpful if the person reading it (me) has zero threading
experience ;-)

I certainly plan to gain that experience, but surely *somebody* already
knows whether they're thread-safe?  I presume they are, broadly, since a
couple of violations of thread safety are commented in urllib2 and urllib.
Right?


John

From jbauer at rubic.com  Tue Oct 28 15:58:54 2003
From: jbauer at rubic.com (Jeff Bauer)
Date: Tue Oct 28 16:00:39 2003
Subject: [Web-SIG] A list is available
Message-ID: <3F9ED88E.577AEB9@rubic.com>

I haven't had a need for this (well, not since old-Zope PCGI
days), but a way to monitor HTTP/S traffic with stdlib tools
might represent a valid use case, if not an actual web
component.  I was thinking of something based around a
simple asynchronous proxy server.

Anyway, I thought I'd bring it up since Bill Janssen is
compiling a list.

Jeff Bauer
Rubicon Research

From moof at metamoof.net  Tue Oct 28 17:49:45 2003
From: moof at metamoof.net (Moof)
Date: Tue Oct 28 17:53:01 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <20031024132028.C15765@lyra.org>
References: <20031024132028.C15765@lyra.org>
Message-ID: <3F9EF289.8040500@metamoof.net>

Greg Stein wrote:

 > When you stop and think about it: *every* request object will have a
 > matching response object. Why have two objects if they come in pairs? You
 > will never see one without the other, and they are intrinsically tied to
 > each other. So why separate them?


An example where a separate response object is useful, though this could 
well be due to lazy programming, or could be circumvented other ways:

I'm currently writing an app in WebKit, and amongst other things, I find 
myself writing parts of the page, followed by doing some calculations, 
followed by writing other parts of the page. Alternatively, I find 
myself validating user input and doing calculations, and then writing 
the whole page as a result. Either way, if there's an error that occurs 
somewhere along the line, due to faulty input, I tell the page to 
forward the request to another servlet that can handle the errors 
(normally right back to the servlet that generated the form that 
inputted the faulty data).

It's a bit of a poor man's exception, because Page.forward() doesn't 
*actually* break out of the current context, so I need to break out 
manually, either with a break statement or more normally by continuing 
til an uncaught exception is thrown.

The forward directive will be taken into account as soon as the page 
ends, and will just delete the current response object and call the 
forwarded servlet with a new response object which will buffer and 
eventually send out the data that the servlet eventually generates.

Then again, it could just be lazy programming on my part.

Moof
-- 
            Giles Antonio Radford, a.k.a Moof
Sympathy, eupathy, and, currently, apathy coming to you at:
                 <http://metamoof.net/>


From moof at metamoof.net  Tue Oct 28 17:51:06 2003
From: moof at metamoof.net (Moof)
Date: Tue Oct 28 17:54:20 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com>
References: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com>
Message-ID: <3F9EF2DA.3060604@metamoof.net>

Ian Bicking wrote:

 > In reference to the rest of the discussion -- I think it's enough to 
say that some people want to distinguish (sometimes) between these two 
types of variables.  Simon is not the only one.  It should be an option, 
because it's not hard to do.  We're not telling people how to write 
their applications, we're giving them the tools to write their 
applications as they choose, and this is a valid way to write an 
application.


+1 for the reasons stated. It's good to be able to distinguish.

On appeasement approach would be do a webkit-like thing. Currently in 
webkit you can choose to get stuff out of the submitted data (GET and 
POST are scrunched together) or out of the cookies, or you can just ask 
for request.value() (which is aliased also to request.__getattr__) which 
will look in both places and returns the first thing it comes to.

so how about a request.postvalues dict, a request.getvalues dict, and a 
request.values dict (or pseudo-dict) which will return value out of 
whichever. The main downside I can see with this is a long ensuing 
argument about whether GET should take precedence over POST or vice-versa.

Moof
-- 
            Giles Antonio Radford, a.k.a Moof
Sympathy, eupathy, and, currently, apathy coming to you at:
                 <http://metamoof.net/>


From gtalvola at nameconnector.com  Tue Oct 28 18:08:52 2003
From: gtalvola at nameconnector.com (Geoffrey Talvola)
Date: Tue Oct 28 18:09:03 2003
Subject: [Web-SIG] [server-side] request/response objects
Message-ID: <61957B071FF421419E567A28A45C7FE59AF763@mailbox.nameconnector.com>

Moof wrote:
> An example where a separate response object is useful, though
> this could
> well be due to lazy programming, or could be circumvented other ways:
> 
> I'm currently writing an app in WebKit, and amongst other
> things, I find
> myself writing parts of the page, followed by doing some calculations,
> followed by writing other parts of the page. Alternatively, I find
> myself validating user input and doing calculations, and then writing
> the whole page as a result. Either way, if there's an error
> that occurs
> somewhere along the line, due to faulty input, I tell the page to
> forward the request to another servlet that can handle the errors
> (normally right back to the servlet that generated the form that
> inputted the faulty data). 
> 
> It's a bit of a poor man's exception, because Page.forward() doesn't
> *actually* break out of the current context, so I need to break out
> manually, either with a break statement or more normally by continuing
> til an uncaught exception is thrown.

Actually, in Webware CVS Page.forward() _does_ break out of the current
context by raising an EndResponse exception that gets caught in the
framework.  You are probably using a released version of Webware which
doesn't work this way, but instead substitutes a "dummy" response object for
the real response object to swallow up any output from the original servlet.

I agree with your point, which I take to be this: it's nice to be able to
throw away any response that may have accumulated so far and re-process the
request.  And that seems to argue for separate request and response objects.

> 
> The forward directive will be taken into account as soon as the page
> ends, and will just delete the current response object and call the
> forwarded servlet with a new response object which will buffer and
> eventually send out the data that the servlet eventually generates.

I don't think this is how WebKit ever worked.  I'm pretty sure that both in
current Webware CVS and in previous releases, the forwarded-to servlet
processes the request immediately, not when the page ends.

> 
> Then again, it could just be lazy programming on my part.


- Geoff

From cs1spw at bath.ac.uk  Tue Oct 28 18:24:37 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Tue Oct 28 18:24:44 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F9EF2DA.3060604@metamoof.net>
References: <8EEBD92C-0697-11D8-93B2-000393C2D67E@colorstudy.com>
	<3F9EF2DA.3060604@metamoof.net>
Message-ID: <3F9EFAB5.2090800@bath.ac.uk>

Moof wrote:
> On appeasement approach would be do a webkit-like thing. Currently in 
> webkit you can choose to get stuff out of the submitted data (GET and 
> POST are scrunched together) or out of the cookies, or you can just ask 
> for request.value() (which is aliased also to request.__getattr__) which 
> will look in both places and returns the first thing it comes to.
> 
> so how about a request.postvalues dict, a request.getvalues dict, and a 
> request.values dict (or pseudo-dict) which will return value out of 
> whichever. The main downside I can see with this is a long ensuing 
> argument about whether GET should take precedence over POST or vice-versa.

I'm quite fond of request.GET and request.POST personally, but that's my 
PHP background speaking. I'm not sure that upper case dictionary names 
are particularly pythonic. request.getvalues and request.postvalues seem 
a bit verbose to my liking.

Is there really a long ensuing argument about precedence of GET over 
POST? I had always assumed that the standard way of tackling this was 
for POST data to over-write GET data since POST was the actual HTTP 
action used in a combined request.
-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From davidf at sjsoft.com  Wed Oct 29 00:02:55 2003
From: davidf at sjsoft.com (David Fraser)
Date: Wed Oct 29 00:03:03 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <3F9EF289.8040500@metamoof.net>
References: <20031024132028.C15765@lyra.org> <3F9EF289.8040500@metamoof.net>
Message-ID: <3F9F49FF.2000201@sjsoft.com>

Moof wrote:

> Greg Stein wrote:
>
> > When you stop and think about it: *every* request object will have a
> > matching response object. Why have two objects if they come in 
> pairs? You
> > will never see one without the other, and they are intrinsically 
> tied to
> > each other. So why separate them?
>
>
> An example where a separate response object is useful, though this 
> could well be due to lazy programming, or could be circumvented other 
> ways:
>
> I'm currently writing an app in WebKit, and amongst other things, I 
> find myself writing parts of the page, followed by doing some 
> calculations, followed by writing other parts of the page. 
> Alternatively, I find myself validating user input and doing 
> calculations, and then writing the whole page as a result. Either way, 
> if there's an error that occurs somewhere along the line, due to 
> faulty input, I tell the page to forward the request to another 
> servlet that can handle the errors (normally right back to the servlet 
> that generated the form that inputted the faulty data).
>
> It's a bit of a poor man's exception, because Page.forward() doesn't 
> *actually* break out of the current context, so I need to break out 
> manually, either with a break statement or more normally by continuing 
> til an uncaught exception is thrown.
>
> The forward directive will be taken into account as soon as the page 
> ends, and will just delete the current response object and call the 
> forwarded servlet with a new response object which will buffer and 
> eventually send out the data that the servlet eventually generates.
>
> Then again, it could just be lazy programming on my part.
>
> Moof

There's no requirement that just because the API defines a response 
object that is the same as the request object, that you have to use that 
object to build up your response.
The response side of the request object would mainly be used to *write* 
the response back to the client. Since once you have started writing, 
you can't throw it away, it seems to me your situation would be entirely 
the same (you would have your own "response" which you would only write 
back when you wanted to)

David


From aquarius-lists at kryogenix.org  Wed Oct 29 01:21:14 2003
From: aquarius-lists at kryogenix.org (Stuart Langridge)
Date: Wed Oct 29 01:19:25 2003
Subject: [Web-SIG] Form field dictionaries
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
	<87he1yhaei.fsf@genghis.subrosa.ca> <3F995532.9040309@bath.ac.uk>
Message-ID: <E1AEjhi-0004mP-00@giles>

Simon Willison spoo'd forth:
> Gregory Collins wrote:
>>>I think this is adequately addressed in the FieldStorage starting with
>>>Python 2.2 with getfirst() and getlist():
>> 
>> I agree, I think this is the appropriate solution; I'd rather see all
>> the typechecking pushed down into the library function rather than
>> being exposed to the programmer. If the argument I'm looking for
>> doesn't make sense as a list then I wouldn't care if it was given
>> twice; if I'm expecting something to be a list then I'd want it to be
>> a list even if it were empty or singleton.
> 
> The vast majority of data sent from forms coems in as simple name/value 
> pairs, which are crying out to be accessed from a dictionary. This is my 
> problem with the current FieldStorage() class - it forces you to write 
> code like this:
> 
> username = form.getfirst("username", "")
> 
> When code like this is far more intuitive:
> 
> username = form['username']

Would it be worth having form['fieldname'] default to doing a
getfirst()? That way, if you're *expecting* a list, you can look for
one by doing form.getlist("username") and if not you just get one entry
(getfirst should possibly be getlast, but that's a different issue).
This is a bit non-discoverable, though...

sil

-- 
"Computer games don't affect kids. I mean if Pacman had affected us as
kids, we'd all be running around in a darkened room munching pills and
listening to repetitive music."
	   -- Kristian Wilson, Nintendo

From ianb at colorstudy.com  Wed Oct 29 01:46:22 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 01:46:42 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <E1AEjhi-0004mP-00@giles>
Message-ID: <9C0B687E-09DB-11D8-ABB3-000393C2D67E@colorstudy.com>

On Wednesday, October 29, 2003, at 12:21 AM, Stuart Langridge wrote:
> Would it be worth having form['fieldname'] default to doing a
> getfirst()? That way, if you're *expecting* a list, you can look for
> one by doing form.getlist("username") and if not you just get one entry
> (getfirst should possibly be getlast, but that's a different issue).
> This is a bit non-discoverable, though...

getfirst, getlast?  Why ever would you choose one over the other?  (Why 
ever would you choose either?)

Explicit is better than implicit.
In the face of ambiguity, refuse the temptation to guess.


--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Wed Oct 29 02:17:57 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 02:18:38 2003
Subject: [Web-SIG] So what's missing?
In-Reply-To: <Pine.LNX.4.58.0310271458430.1780@alice>
Message-ID: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com>

On Monday, October 27, 2003, at 09:00 AM, John J Lee wrote:
> On Sun, 26 Oct 2003, Ian Bicking wrote:
>> On Sunday, October 26, 2003, at 07:24 AM, John J Lee wrote:
> [...]
>> Essentially we'd just move HTTPBasicAuthHandler.http_error_401 into
>> HTTPHandler.  You could still override it, and HTTPBasicAuthHandler
>> would still override it (and somewhat differently, because
>> HTTPHandler.http_error_401 should handle both basic and digest auth).
>> It's a pretty small change, really.
>
> So is the benefit.  It's
>
> a = HTTPBasicAuthHandler()
> a.add_password(user="joe", password="joe")
> o = build_opener(a)
>
> vs.
>
> o = build_opener(HTTPHandler(user="joe", password="joe"))
>
>
> (assuming defaults for realm and uri -- BTW, there seems to be an
> HTTPPasswordMgrWithDefaultRealm already, which I guess is some way to 
> what
> you want)

Yes, I just recently noticed that too.  Why it is implemented in a 
separate class I cannot fathom.

> If we're still using build_opener, and HTTPBasicAuthHandler were to
> override HTTPHandler, it would have to be derived from it.  Not that a
> build_opener work-alike couldn't be devised, of course.
>
> [...]
>>> I'm still waiting for that example.
>>
>> I thought I gave examples: documentation, proliferation of classes,
>> non-orthogonality of features (e.g., HTTPS vs. HTTP isn't orthogonal 
>> to
>> authentication).
>
> Lack of documentation doesn't justify changes to the code.  There is 
> not
> any harmful proliferation of classes, I think: the function of the
> handlers is pretty obvious in most cases (though obviously the docs 
> could
> be better).  I don't recognize the orthogonality problem you're 
> referring
> to.

I'm not as concerned with the internals, but rather the exposed 
interface.  This isn't a concern purely about lack of documentation 
either, but about the thoroughness and conciseness of that 
documentation.  A good interface lends itself to good documentation.  I 
don't think this interface can result in good documentation -- it will 
either be incomplete, difficult to navigate, or verbose (or all), as a 
reflection of the way in which internal implementation is exposed.

> [...]
>> urlopen('http://whatever.com',
>>      username='bob',
>>      password='secret',
>>      postFields={...},
>>      postFiles={'image': ('test.jpg', '... image body ...')},
>>      addHeaders={'User-Agent': 'superbot 3000'})
> [...]
>> write than any OO-based system.  I'm concerned about the external ease
>> of use, not the internal conceptual integrity.
>
> OK, maybe I'm overconcerned about this layer -- if it's a simple
> convenience thing like this, fine (as long as it actually is useful
> and simple, of course).
>
> My biggest concern was that you seemed to be advocating a new UserAgent
> class, which would presumably more-or-less duplicate OpenerDirector 
> (you
> probably want to skip to the end of this post at this point, because I
> think you may have missed a crucial point about that class).
> OpenerDirector is not such a great name, actually: maybe UserAgent or
> URLOpener would have been better...
>
>>>> authentication information (and it doesn't obey browser URL
>>>> conventions, like http://user:password@domain/).
>>>
>>> What is that convention?  Is it standardised in an RFC?
>>
>> It's a URL convention that's been around a very long time, I don't 
>> know
>> if it is in an RFC.
>>
>>> I see
>>> ProxyHandler knows about that syntax.  Obviously it's not an 
>>> intrinsic
>>> limitation of the handler system.
>>
>> I don't really know how a handler is chosen -- can it figure out
>> whether it should use HTTPHandler, HTTPBasicAuthHandler, or
>> HTTPDigestAuthHandler just from this URL?  Obviously basic vs. digest
>> can't be determined until you try to fetch the object.
>
> The user and password here are for the proxy, not the server (there's 
> some
> code duplication here actually, but that's just a bug).  Dunno if 
> that's
> standard use of that syntax.
>
>
> [...]
>>> Mind you, if your idea can do the same job as my RFE, then it should
>>> certainly be considered alongside that.
>>
>> Hmm... I just looked at the RFE now, so I'm still not sure what it
>> would mean to this.
>
> Sorry, I don't understand 'what it would mean to this'.  What's 'this'?

This discussion.

>>>> Yet none of these features
>>>> would be all that difficult to add via urlopen or perhaps other 
>>>> simple
>>>> functions, (instead of via classes).  I don't think there's any need
>>>> for classes in the external API -- fetching URLs is about doing
>>>> things,
>>>> not representing things, and functions are easier to understand for
>>>> doing.
>>>
>>> Details?  The only example you've given so far involved a UserAgent
>>> class.
>>
>> Details about what?  Your asking for details and examples, but I've
>> provided some already and I don't know what you're looking for.
>
> You provided some examples of features you think would require some 
> kind
> of layer on top of urllib2.  I thought you were originally suggesting a
> new UserAgent class or similar (that was you, wasn't it?).  I don't 
> think
> that's necessary.

In the context of stateful HTTP requests, yes, I still think some 
object along the lines of a UserAgent is the best interface.

> But in the post I'm replying to here, you gave an example of adding 
> args
> to urlopen.  I do agree that something like that could be useful. I 
> think
> the docs should be changed here to make it clear that urlopen is just a
> convenience function that uses a global OpenerDirector.
>
> [...]
>>>> I think fetching and caching are two separate things.  The caching
>>>> requires a context.  The fetching doesn't.  I think fetching things
>>>
>>> The context is provided by the handler.
>>
>> But we're fetching URLs, not handlers.  The URL is context-less,
>> intrinsically.  The handler isn't context-less, but that's part of 
>> what
>> I don't like about urllib2's handler-oriented perspective.
>
> I don't understand what you just said, but I think we're agreed 
> something
> that doesn't require calling build_opener or OpenerDirector.add_handler
> could be convenient.

Okay, good.  That my statement was nonsensical was part of my point, 
but that's probably not a helpful way to make a point ;)

>>> [...]
>>>> I also don't see how caching would fit very well into the handler
>>>> structure.  Maybe there'd be a HTTPCachingHandler, and you'd
>>>> instantiate it with your caching policy? (where it stores files, how
>>>> many files, etc)  Also a HTTPBasicAuthCachingHandler,
>>>> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on?  This
>>>> caching is orthogonal -- not just to things like authentication, but
>>>
>>> My assumption was that it wasn't orthogonal, since RFC 2616 seems to
>>> have
>>> rather a lot to say on the subject.
>>
>> Well, if they aren't orthogonal, then they should all be implemented 
>> in
>> a single class.
>
> Yes.  Off the top of my head, I'd say something like (taking note of 
> your
> point below about needing to actually cache responses as well as return
> cached data!):
>
> class AbstractHTTPCacheHandler:
>     def cached_open(self, request):
>         # return cached response, or None if no cache hit
>     def cache(self, response):
>         # cache response if appropriate
>
> class HTTPCacheHandler(AbstractHTTPCacheHandler):
>     http_open = cached_open
>     http_response = cache
>
> or, if you want a class that does both HTTP and HTTPS:
>
> class HTTPXCacheHandler(AbstractHTTPCacheHandler):
>     https_open = http_open = cached_open
>     https_response = http_response = cache
>
>
> [...]
>> Why not have just one good HTTP handler class?
>
> Why would you want one when you can easily do whatever you want with a
> convenience function or two, and / or a class derived from 
> OpenerDirector,
> or something that works like build_opener, etc.?  Not so easy to go in 
> the
> other direction, and separate out the various features of a big,
> all-singing all-dancing HTTP handler.  That was a big part of the
> motivation for urllib2 in the first place: inflexibility of urllib.

Why would I want two pieces if I could have one that can do both their 
jobs?  And why fold different ideas together into one notion of 
handler?  HTTP and HTTPS are almost exactly the same.  Basic and digest 
auth are almost exactly the same.  Using a cache and not using a cache 
are almost exactly the same.  All these details can be combined 
reliably in many ways, but the structure of handlers seems to get in 
the way.

But maybe this comes down to a disagreement about coding aesthetics.  I 
don't like inheritance, especially when it gets clever.  But if that's 
just an implementation detail, then eh... I can live.  It's when it 
gets exposed through the public interface (as it is in urllib2) that it 
bothers me.

[...]
>> 2 won't work, since CacheHandler can't
>> return None and let someone else do the work, because it has to know
>> about what the result is so that it can cache the result.
>
> At last, a real problem!  Actually, I think this is a problem already
> solved by my 'processors' idea, though perhaps not quite in its current
> form -- that should be easy to fix, though (ATM, IIRC, they're separate
> from handlers: you can't have an object that is both a handler and a
> processor -- and they don't currently have default_request and
> default_response methods, either).

The processors really sound like wrappers to me.

>> I missed that when you posted it.  That might handle some of these
>> features.  It seems a little too global to me.  For instance, how 
>> would
>> you handle two distinct user agents with respect to the referer 
>> header?
>
> Two OpenerDirectors!
>
> new_opener = build_opener()
> new_opener.addheaders = [("User-agent", "Mozilla/5.0")]
>
> old_opener = build_opener()
> old_opener.addheaders = [("User-agent", "Mozilla/4.0")]
>
> new_opener.open("http://www.a.com/")
> old_opener.open("http://www.b.com/")

Okay, I didn't realize that.  That makes it much better, though the 
name OpenerDirector distracts.

>> Seems like it would also make sense as a OpenerDirectory
>> subclass/wrapper.
>
> IIRC, there are issues with redirection that prevent that.

How so?  For instance, with referer, don't you essentially just want to 
do something like:

class RefererDirector(OpenerDirector):
     def __init__(self):
         OpenerDirector.__init__(self)
         self.last_url = ''
     def open(self, fullurl, data=None):
         if isinstance(fullurl, str):
             fullurl = Request(fullurl)
         if self.last_url:
             fullurl.add_header('HTTP-Referer', self.last_url)
         result = OpenerDirector.open(self, fullurl, data=data)
         self.last_url = result.geturl()
         return result


This is essentially how a browser works, isn't it?  Does a header get 
lost somewhere?  If so, then that seems like a bug in the handler.


--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Wed Oct 29 01:51:20 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 02:18:41 2003
Subject: [Web-SIG] A list is available
	(http://www.parc.com/janssen/web-sig/needed.html)
In-Reply-To: <03Oct27.152114pst."58611"@synergy1.parc.xerox.com>
Message-ID: <4D8D751E-09DC-11D8-ABB3-000393C2D67E@colorstudy.com>

On Monday, October 27, 2003, at 05:21 PM, Bill Janssen wrote:
> I'll try to act as a scribe and gather various individual suggestions
> together.  Please feel free to send mail to correct any malscription
> you spot.
>
> http://www.parc.com/janssen/web-sig/needed.html

Should this list go on the Wiki?

A list was already started at 
http://www.python.org/cgi-bin/moinmoin/WebSIGTasks

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Wed Oct 29 02:32:23 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 02:32:30 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <20031024132028.C15765@lyra.org>
Message-ID: <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com>

On Friday, October 24, 2003, at 03:20 PM, Greg Stein wrote:
> In the most recent incarnation of a webapp of mine (subwiki), I almost
> went with a request/response object paradigm and even started a bit of
> refactoring along those lines. However, I ended up throwing out that
> dual-object concept.
>
> When you stop and think about it: *every* request object will have a
> matching response object. Why have two objects if they come in pairs? 
> You
> will never see one without the other, and they are intrinsically tied 
> to
> each other. So why separate them?

The biggest justification for me is: because that's what everyone does. 
  SkunkWeb doesn't separate them, but I can't think of any others in 
Python.  The request/response distinction is ubiquitous throughout web 
programming.  I guess it's natural to people.  But it doesn't even 
matter why: it is the way it is.

Another justification is that the request is essentially static.  It is 
created and complete, then it is processed.  When the request is 
complete, the response has just barely begun existence.  The request 
object could very well be immutable at this point.  (Unfortunately that 
probably would make compatibility with previous code too difficult, but 
that's an aside)  You can very reasonably pass around the request with 
the expectation that the response will not be touched, or even vice 
versa (though that is less common -- which is a bit backwards if you 
follow a convention that the response belongs to the request).

The request and response aren't particularly interwoven either.  
Request cookies have nothing to do with response cookies (and any 
attempt to combine their semantics would be futile).  Request variables 
follow arcane paths through all kinds of representations when you trace 
them back to their source.

And then there's simply the naming issue: request and response are 
pretty clear names.  Everyone knows what they are.  Everyone can guess 
at their interface, and certainly can read their interface.  There's no 
compelling alternative name for the combined object -- "handler" 
implies almost nothing, "transaction" implies the incorrect thing, 
"connection" implies a low-level interface...

The difficulty of writing, say, request.response.write(something) vs. 
handler.write(something) doesn't seem like a big deal to me.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From jjl at pobox.com  Wed Oct 29 06:56:59 2003
From: jjl at pobox.com (John J Lee)
Date: Wed Oct 29 06:57:17 2003
Subject: urllib2.UserAgent [was: Re: [Web-SIG] So what's missing?]
In-Reply-To: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com>
References: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310291048210.1145@alice>

OK, having slept on it, I just had a tiny epiphany.  I haven't been
listening to my own arguments.  Since OpenerDirector is *already* a
UserAgent-type thing, but not a very friendly one, we should just create a
new OpenerDirector, and name it UserAgent.  I don't see that as a wrapper,
so my delicate sensibilities aren't offended by it ;-) So, I'm persuaded:
sorry it took me so long...

Problems to be solved:

 - awkward to dynamically change behaviour of user-agent -- you have
   to build a OpenerDirector every time you want to change things
 - unhelpful separation by default of HTTP and HTTPS
 - unhelpful separation by default of various server authentication
   schemes
 - no ability to do partial fetches
 - no ability to do HEAD and PUT

...any more?

The last two need changes in the rest of urllib2, of course.

I'll have a look at some of the Perl & Java UserAgent-type classes for
ideas, and probably write a class to base discussions on.


On Wed, 29 Oct 2003, Ian Bicking wrote:
[...about my processors patch to urllib2...]
> The processors really sound like wrappers to me.
[...]

No, they work rather like handlers, and are definitely internal to
urllib2.


> >> Seems like it would also make sense as a OpenerDirectory
> >> subclass/wrapper.
> >
> > IIRC, there are issues with redirection that prevent that.
>
> How so?  For instance, with referer, don't you essentially just want to
> do something like:

I've forgotten the details, but I'm pretty confident they're not very
interesting :-)  They're in my bug tracker item, I think.


John

From david at sundayta.com  Wed Oct 29 08:21:55 2003
From: david at sundayta.com (david)
Date: Wed Oct 29 08:22:00 2003
Subject: [Web-SIG] Request and Response objects
Message-ID: <3F9FBEF3.5090307@sundayta.com>

Hi,

New to the list, but I have read the archive.

There was a discussion about whether there should be a single object for 
both request and response.

I would like to suggest that the best is pinched from the solution used 
by Turbine

http://jakarta.apache.org/turbine/turbine-2.3/apidocs/org/apache/turbine/util/RunData.html

This is a single object that contains the request and response as well 
as anything else useful to pass around the server.

Would that be a way to move forward? By the way I like names that 
include context for this eg a RunContext that contains Request and Response

Just a small 2c.

Dave
-- 
David Warnock: http://davew.typepad.com/42  |  Sundayta Ltd: 
http://www.sundayta.com
iDocSys for Document Management. VisibleResults for Fundraising.
Development and Hosting of Web Applications and Sites.


From grisha at modpython.org  Wed Oct 29 09:26:08 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Wed Oct 29 09:26:43 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <E1AEjhi-0004mP-00@giles>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
	<87he1yhaei.fsf@genghis.subrosa.ca>
	<3F995532.9040309@bath.ac.uk> <E1AEjhi-0004mP-00@giles>
Message-ID: <20031029092503.E78438@onyx.ispol.com>


On Wed, 29 Oct 2003, Stuart Langridge wrote:

> Would it be worth having form['fieldname'] default to doing a
> getfirst()?

+1 on this

From grisha at modpython.org  Wed Oct 29 09:29:34 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Wed Oct 29 09:29:37 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031029092503.E78438@onyx.ispol.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
	<87he1yhaei.fsf@genghis.subrosa.ca>
	<3F995532.9040309@bath.ac.uk> <E1AEjhi-0004mP-00@giles>
	<20031029092503.E78438@onyx.ispol.com>
Message-ID: <20031029092715.M78438@onyx.ispol.com>


On Wed, 29 Oct 2003, Gregory (Grisha) Trubetskoy wrote:

>
>
> On Wed, 29 Oct 2003, Stuart Langridge wrote:
>
> > Would it be worth having form['fieldname'] default to doing a
> > getfirst()?
>
> +1 on this
>

Sorry I take it back - I misread it (not finished my coffee yet) -

I am in favor of form['fieldname'] to act the same as getfirst() only if
there is a single element, otherwise it should return a list.

Grisha


From barry at python.org  Wed Oct 29 09:30:50 2003
From: barry at python.org (Barry Warsaw)
Date: Wed Oct 29 09:30:55 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <20031029092715.M78438@onyx.ispol.com>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>
	<3F98DD29.30706@sjsoft.com> <3F994469.20304@bath.ac.uk>
	<20031024114945.M26153@onyx.ispol.com>
	<87he1yhaei.fsf@genghis.subrosa.ca>
	<3F995532.9040309@bath.ac.uk> <E1AEjhi-0004mP-00@giles>
	<20031029092503.E78438@onyx.ispol.com>
	<20031029092715.M78438@onyx.ispol.com>
Message-ID: <1067437849.4918.12.camel@anthem>

On Wed, 2003-10-29 at 09:29, Gregory (Grisha) Trubetskoy wrote:

> I am in favor of form['fieldname'] to act the same as getfirst() only if
> there is a single element, otherwise it should return a list.

+1

-Barry


From davidf at sjsoft.com  Wed Oct 29 09:40:42 2003
From: davidf at sjsoft.com (David Fraser)
Date: Wed Oct 29 09:41:51 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <1067437849.4918.12.camel@anthem>
References: <8C59525A-056C-11D8-AF55-000393C2D67E@colorstudy.com>	<3F98DD29.30706@sjsoft.com>
	<3F994469.20304@bath.ac.uk>	<20031024114945.M26153@onyx.ispol.com>	<87he1yhaei.fsf@genghis.subrosa.ca>	<3F995532.9040309@bath.ac.uk>
	<E1AEjhi-0004mP-00@giles>	<20031029092503.E78438@onyx.ispol.com>	<20031029092715.M78438@onyx.ispol.com>
	<1067437849.4918.12.camel@anthem>
Message-ID: <3F9FD16A.404@sjsoft.com>

Barry Warsaw wrote:

>On Wed, 2003-10-29 at 09:29, Gregory (Grisha) Trubetskoy wrote:
>
>  
>
>>I am in favor of form['fieldname'] to act the same as getfirst() only if
>>there is a single element, otherwise it should return a list.
>>    
>>
>
>+1
>
>  
>
-2!
(That's two factorial :-)
I want form['fieldname'] to always return a single element.
You should always know when you're wanting a list.
Returning a list otherwise requires lots of exceptional-case-checking 
code that is unneccessary.
But then I'm just repeating myself...

David


From sholden at holdenweb.com  Wed Oct 29 10:11:33 2003
From: sholden at holdenweb.com (Steve Holden)
Date: Wed Oct 29 10:16:35 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <3F9FD16A.404@sjsoft.com>
Message-ID: <CGECIJPNNHIFAJKHOLMAMEJCIOAA.sholden@holdenweb.com>

[David Fraser]
> Barry Warsaw wrote:
>
> >On Wed, 2003-10-29 at 09:29, Gregory (Grisha) Trubetskoy wrote:
> >
> >
> >
> >>I am in favor of form['fieldname'] to act the same as
> getfirst() only if
> >>there is a single element, otherwise it should return a list.
> >>
> >>
> >
> >+1
> >
> >
> >
> -2!
> (That's two factorial :-)
> I want form['fieldname'] to always return a single element.
> You should always know when you're wanting a list.
> Returning a list otherwise requires lots of exceptional-case-checking
> code that is unneccessary.
> But then I'm just repeating myself...
>
In which case, let _me_ repeat _myself_ :-)

If an argument has multiple values, this should only be handled if the
processing element (page code) has indicated that multiple values are
acceptable for that argument. When an argument is possibly multi-valued,
form['fieldname'] should *always* be a list, even if it has only one
element [and I don't see why it shouldn't be legal to see an empty list
if the argument doesn't appear in the URL or POST input at all]. If no
indication has been given that multiple occurrences are acceptable then
an exception should be raised which, if not trapped by the web app,
should eventually result in (say) a 422 (unprocessable entity) or a 406
(not acceptable) server response.

When an argument is *not* allowed to be multi-valued then
form['fieldname'] should return a string, and an error should be raised
if the argument has multiple occurrences. If the argument doesn't appear
in the URL or POST data at all then it's arguable that a KeyError should
be raised, again resulting in a server error if untrapped.

I'd be prepared to allow a "sloppy" option to have form['fieldname']
return an empty string under those circumstances, a la ASP, and to
return the first of multiple occurrences. But I *do* think that "sloppy"
would be an apposite name for such tactics.

if we're building a new API then for heaven's sake let's not repeat the
mistakes of earlier generations. And let's try not to have each of these
discussions more that two or three times a month :-)

regards
--
Steve Holden          +1 703 278 8281        http://www.holdenweb.com/
Improve the Internet           http://vancouver-webpages.com/CacheNow/
Python Web Programming                http://pydish.holdenweb.com/pwp/
Interview with GvR August 14, 2003       http://www.onlamp.com/python/


From neel at mediapulse.com  Wed Oct 29 10:21:00 2003
From: neel at mediapulse.com (Michael C. Neel)
Date: Wed Oct 29 10:21:06 2003
Subject: [Web-SIG] Request and Response objects
Message-ID: <C0FC22C08B82074A88B500617641577787A7A6@johnson.mediapulse.net>

> There was a discussion about whether there should be a single 
> object for 
> both request and response.

At first I thought that having a separate request and response object
didn't offer any advantages.  This is most likely because I've work with
Apache for so long, which only has one object to handle both.

Upon more thought though, I'm starting to think having them as separate
objects might be better.  Separate, a project could focus only on the
side of the process they are interested in.  An example would be an XLST
engine.  So in theory take in any request object, from cgi, mod_python,
python's stdlib server and prepare my XML response based on the request,
then pass this XML data to the XLST response object to do the skinning.
Since the developers of this XLST response object have no reason to care
about the request side, it seems better that they don't need to even be
aware of it.

On a related note, for all those out there like mod_python that have in
place a request or request and response objects now, I think the best
solution would be for them to include a conversion function in their
objects to convert their foramts to whatever the SIG comes up with as
the standards.  I *do not* want mod_python to match the SIG's standard,
I want it to match the Apache API; but being able to convert between the
two at the cost of a few cpu cycles would be great.

On a less related note, I don't know if XLST parser made the list yet,
but if it could be added it's something I would really like to see.

Mike

From barry at python.org  Wed Oct 29 12:12:21 2003
From: barry at python.org (Barry Warsaw)
Date: Wed Oct 29 11:12:18 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <CGECIJPNNHIFAJKHOLMAMEJCIOAA.sholden@holdenweb.com>
References: <CGECIJPNNHIFAJKHOLMAMEJCIOAA.sholden@holdenweb.com>
Message-ID: <1067447541.13656.3.camel@geddy>

On Wed, 2003-10-29 at 10:11, Steve Holden wrote:

> If an argument has multiple values, this should only be handled if the
> processing element (page code) has indicated that multiple values are
> acceptable for that argument. When an argument is possibly multi-valued,
> form['fieldname'] should *always* be a list, even if it has only one
> element [and I don't see why it shouldn't be legal to see an empty list
> if the argument doesn't appear in the URL or POST input at all]. If no
> indication has been given that multiple occurrences are acceptable then
> an exception should be raised which, if not trapped by the web app,
> should eventually result in (say) a 422 (unprocessable entity) or a 406
> (not acceptable) server response.

I tend to agree with Steve here, but maybe we can have our cake and eat
it too.

Dumb-ass suggestion of the day: what if the field values were
represented by a dict subclass, and we had several different subclasses,
each of which specified the exact behavior for __getitem__().  E.g. 
David could have his "_getitem__ is getfirst" behavior, Steve could have
his verified-multiples behavior, and I could have my "always return a
list" behavior.  We'd then be reduced to choosing a default and a few
interfaces and everyone would be happy <wink>.

-Barry


From mailinglists at qinternet.com  Wed Oct 29 11:17:39 2003
From: mailinglists at qinternet.com (Brian Olsen - Lists)
Date: Wed Oct 29 11:17:41 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com>
Message-ID: <6A4B74EA-0A2B-11D8-9FB2-000502B9AE42@qinternet.com>


On Wednesday, October 29, 2003, at 02:32  AM, Ian Bicking wrote:

> On Friday, October 24, 2003, at 03:20 PM, Greg Stein wrote:
>> In the most recent incarnation of a webapp of mine (subwiki), I almost
>> went with a request/response object paradigm and even started a bit of
>> refactoring along those lines. However, I ended up throwing out that
>> dual-object concept.
>>
>> When you stop and think about it: *every* request object will have a
>> matching response object. Why have two objects if they come in pairs? 
>> You
>> will never see one without the other, and they are intrinsically tied 
>> to
>> each other. So why separate them?
>
> The biggest justification for me is: because that's what everyone 
> does.  SkunkWeb doesn't separate them, but I can't think of any others 
> in Python.  The request/response distinction is ubiquitous throughout 
> web programming.  I guess it's natural to people.  But it doesn't even 
> matter why: it is the way it is.

> Another justification is that the request is essentially static.  It 
> is created and complete, then it is processed.  When the request is 
> complete, the response has just barely begun existence.  The request 
> object could very well be immutable at this point.  (Unfortunately 
> that probably would make compatibility with previous code too 
> difficult, but that's an aside)  You can very reasonably pass around 
> the request with the expectation that the response will not be 
> touched, or even vice versa (though that is less common -- which is a 
> bit backwards if you follow a convention that the response belongs to 
> the request).


> The request and response aren't particularly interwoven either.  
> Request cookies have nothing to do with response cookies (and any 
> attempt to combine their semantics would be futile).  Request 
> variables follow arcane paths through all kinds of representations 
> when you trace them back to their source.
>
> And then there's simply the naming issue: request and response are 
> pretty clear names.  Everyone knows what they are.  Everyone can guess 
> at their interface, and certainly can read their interface.  There's 
> no compelling alternative name for the combined object -- "handler" 
> implies almost nothing, "transaction" implies the incorrect thing, 
> "connection" implies a low-level interface...
>
> The difficulty of writing, say, request.response.write(something) vs. 
> handler.write(something) doesn't seem like a big deal to me.

Reading this thread, it sounds more like an aesthetic choice than 
anything. I like single objects, but this is also aesthetic. (Maybe you 
can call the single object HTTPConnection? That's what it is, no?)

But if I am going to fight something for no particular reason, it will 
be against dual-objects, just to be against the dual-object status quo. 
:-)

Fight for the single-object!!

Beian


From ianb at colorstudy.com  Wed Oct 29 11:17:53 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 11:17:57 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <1067447541.13656.3.camel@geddy>
Message-ID: <7325A4A1-0A2B-11D8-ABB3-000393C2D67E@colorstudy.com>

On Wednesday, October 29, 2003, at 11:12 AM, Barry Warsaw wrote:
> Dumb-ass suggestion of the day: what if the field values were
> represented by a dict subclass, and we had several different 
> subclasses,
> each of which specified the exact behavior for __getitem__().  E.g.
> David could have his "_getitem__ is getfirst" behavior, Steve could 
> have
> his verified-multiples behavior, and I could have my "always return a
> list" behavior.  We'd then be reduced to choosing a default and a few
> interfaces and everyone would be happy <wink>.

That would make me unhappy... next thing you know, you'll be 
introducing a magic quoting dict subclass...

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Wed Oct 29 11:31:59 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 11:39:56 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <CGECIJPNNHIFAJKHOLMAMEJCIOAA.sholden@holdenweb.com>
Message-ID: <6B07DB2A-0A2D-11D8-ABB3-000393C2D67E@colorstudy.com>

On Wednesday, October 29, 2003, at 09:11 AM, Steve Holden wrote:
> In which case, let _me_ repeat _myself_ :-)
>
> If an argument has multiple values, this should only be handled if the
> processing element (page code) has indicated that multiple values are
> acceptable for that argument. When an argument is possibly  
> multi-valued,
> form['fieldname'] should *always* be a list, even if it has only one
> element [and I don't see why it shouldn't be legal to see an empty list
> if the argument doesn't appear in the URL or POST input at all]. If no
> indication has been given that multiple occurrences are acceptable then
> an exception should be raised which, if not trapped by the web app,
> should eventually result in (say) a 422 (unprocessable entity) or a 406
> (not acceptable) server response.
>
> When an argument is *not* allowed to be multi-valued then
> form['fieldname'] should return a string, and an error should be raised
> if the argument has multiple occurrences. If the argument doesn't  
> appear
> in the URL or POST data at all then it's arguable that a KeyError  
> should
> be raised, again resulting in a server error if untrapped.

We can also handle this through the particulars of the method calls we  
use.  E.g.:

     def getone(self, field, default=NoDefault):
         try:
             value = self._rawfields[field]
             if isinstance(value, list):
                 raise BadRequestError, "Multiple values were not  
expected for the field %s" % field
             return value
         except KeyError:
             if default is NoDefault: raise
             return default

This doesn't require any declaration, and follows the typical implicit  
type checking that's usually done in Python code.  Of course, that  
BadRequestError is another (important) point of discussion.

> I'd be prepared to allow a "sloppy" option to have form['fieldname']
> return an empty string under those circumstances, a la ASP, and to
> return the first of multiple occurrences. But I *do* think that  
> "sloppy"
> would be an apposite name for such tactics.
>
> if we're building a new API then for heaven's sake let's not repeat the
> mistakes of earlier generations. And let's try not to have each of  
> these
> discussions more that two or three times a month :-)

I'm not sure if people will ever really be happy with one decision.   
Which makes me feel like we should just expose the status quo -- you  
get a dictionary that might contain lists -- and let people process  
that as they want.  If we provide multiple options, then we do so  
explicitly and without strong bias.

http://cvs.sourceforge.net/viewcvs.py/*checkout*/webware-sandbox/ 
Sandbox/ianbicking/FormEncode/DictCall.txt?content- 
type=text%2Fplain&rev=1.1
http://cvs.sourceforge.net/viewcvs.py/*checkout*/webware-sandbox/ 
Sandbox/ianbicking/FormEncode/DictCall.py?content- 
type=text%2Fplain&rev=1.1

This uses a method signature to handle list conversion and a bunch of  
other conversions as well, like ints, ordered lists, and dictionaries.   
But it's not complete, and I doubt it could be made into something  
complete (and I've already moved on).  OTOH, something that was  
complete would be burdensome in some situations.  And other possible  
features, like Zope's :action or Webware's _action_ fields, are  
naturally tied to a specific environment.  List vs. string fields are  
just the tip of an iceberg of general validation, and validation is not  
something we can tackle right now (at least not for the standard  
library).

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From grisha at modpython.org  Wed Oct 29 16:12:35 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Wed Oct 29 16:12:41 2003
Subject: [Web-SIG] htmlgen
Message-ID: <20031029161141.G82536@onyx.ispol.com>


Should HTML-generating capability a la HTMLgen go on the missing list as
well?

Grisha

From jjl at pobox.com  Wed Oct 29 16:20:50 2003
From: jjl at pobox.com (John J Lee)
Date: Wed Oct 29 16:21:23 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031029161141.G82536@onyx.ispol.com>
References: <20031029161141.G82536@onyx.ispol.com>
Message-ID: <Pine.LNX.4.58.0310292119470.2759@alice>

On Wed, 29 Oct 2003, Gregory (Grisha) Trubetskoy wrote:

> Should HTML-generating capability a la HTMLgen go on the missing list as
> well?

The trouble is, there's no "one obvious way" to do this, so I'd think it's
not a great candidate for the standard library.


John

From ianb at colorstudy.com  Wed Oct 29 16:29:52 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 16:29:56 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031029161141.G82536@onyx.ispol.com>
Message-ID: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com>

On Wednesday, October 29, 2003, at 03:12 PM, Gregory (Grisha) 
Trubetskoy wrote:
> Should HTML-generating capability a la HTMLgen go on the missing list 
> as
> well?

It seems like a good candidate -- it's been around a long time (in one 
form or another), its scope is very defined, and it's something people 
often look for.

HTMLgen has some quirkiness to it, though.  It's not as tight as a 
simple HTML generator could be.  Would it make sense to use a more 
minimal XML generator, that could also do XHTML generation (maybe with 
a little validation thrown in)?  Is there any library like this already 
included in Python xml packages?  I do like generating HTML with a 
Python syntax (when in-code HTML generation is called for).

Quixote's PTL has some stuff related to this as well (at least related 
to quoting), but I don't remember much about it.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From randyp at cycla.com  Wed Oct 29 16:30:38 2003
From: randyp at cycla.com (Randy Pearson)
Date: Wed Oct 29 16:31:07 2003
Subject: [Web-SIG] Request and Response objects
In-Reply-To: <C0FC22C08B82074A88B500617641577787A7A6@johnson.mediapulse.net>
Message-ID: <21294262503873@dserver.cycla.com>

The case seems to be whether to have two closely-coordinated
request/response objects versus a single object. I can see a few points in
favor of the former:

1. Independence for extensibility. By having separate classes for each
object, they are free to grow independently. So, if Jane develops an
intersesting extension to the Request class (by way of subclassing), and Bob
does the same for the Response class, it becomes much easier to combine
these in a best-of-breed approach. If all one class, this would be difficult
or impossible.

2. Multiplicity. If a single object is used, there is an implicit assumption
of a 1:1 relation between requests and responses. But is that always the
case? Consider two cases. Case 1: Your "response" to a request includes both
the standard response _and_ a generated email of content-type text/html.
Case 2: You have a mixed-mode site that includes both static and dynamic
content, and in some instances you update some of the static (published)
content in response to an incoming request. In both of these cases, you are
producing more than one "response", and if your response class encapsulates
the ability to produce both, you might easily want to operate on multiple
response objects in parallel.

3. Timing. If processing a request may cause a time-out, you may prefer to
queue the request and provide the usual auto-refresh type of HTML response,
polling for completion status. In this case, you have new needs: the ability
to queue a request and the ability to store and resurrect a response. It's
hard to see a single combined object dealing with all of this.

Perhaps some form of a mediator or facade could be created to provide an
interface between these objects, but in any event, they don't strike me as
deriving from a single class.

-- Randy


From jjl at pobox.com  Wed Oct 29 16:51:24 2003
From: jjl at pobox.com (John J Lee)
Date: Wed Oct 29 16:51:34 2003
Subject: urllib2.UserAgent [was: Re: [Web-SIG] So what's missing?]
In-Reply-To: <Pine.LNX.4.58.0310291048210.1145@alice>
References: <058AF77D-09E0-11D8-ABB3-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310291048210.1145@alice>
Message-ID: <Pine.LNX.4.58.0310292135200.2759@alice>

On Wed, 29 Oct 2003, John J Lee wrote:
[...]
> Problems to be solved:
>
>  - awkward to dynamically change behaviour of user-agent -- you have
>    to build a OpenerDirector every time you want to change things
>  - unhelpful separation by default of HTTP and HTTPS
>  - unhelpful separation by default of various server authentication
>    schemes
>  - no ability to do partial fetches
>  - no ability to do HEAD and PUT
>
> ...any more?
>
> The last two need changes in the rest of urllib2, of course.
[...]

A few other things this class should handle (eventually) in a friendly
fashion.  Some of them require work to httplib / urllib / urllib2.

 - timeouts
 - connection caching
 - robots.txt observance (using existing std. lib. module)
 - caching
 - convenient debugging (showing redirections, response bodies, etc.)
 - cookies
 - HTML HEAD section http-equiv handling
 - Refresh handling
 - seekability of responses (required for doing http-equiv)
 - control of From and User-Agent headers; maybe just leave this as-
    is: ie. the addheaders attribute


John

From cs1spw at bath.ac.uk  Wed Oct 29 17:26:43 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Wed Oct 29 18:13:54 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com>
References: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com>
Message-ID: <3FA03EA3.9000907@bath.ac.uk>

Ian Bicking wrote:
> HTMLgen has some quirkiness to it, though.  It's not as tight as a 
> simple HTML generator could be.  Would it make sense to use a more 
> minimal XML generator, that could also do XHTML generation (maybe with a 
> little validation thrown in)?  Is there any library like this already 
> included in Python xml packages?  I do like generating HTML with a 
> Python syntax (when in-code HTML generation is called for).

I think any XML generating libraries would be more suited to the xml 
package than the web package. Personally I'm not too keen on including 
any HTML generation or templating capabilities in the standard library - 
a large number of templating systems already exist, mainly because 
there's no "one obvious way" of doing templating so people tend to go 
with different solutions based on personal preference.

As a side note, I'm a big fan of this XMLWriter class for generating 
XML. It uses an extremely simple stack based push/pop model: 
http://www.xml.com/pub/a/2003/04/09/py-xml.html

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From janssen at parc.com  Wed Oct 29 18:48:07 2003
From: janssen at parc.com (Bill Janssen)
Date: Wed Oct 29 18:48:32 2003
Subject: [Web-SIG] [server-side] request/response objects 
In-Reply-To: Your message of "Tue, 28 Oct 2003 23:32:23 PST."
	<09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com> 
Message-ID: <03Oct29.154815pst."58611"@synergy1.parc.xerox.com>

I think you can separate them and combine them at the same time,
without much trouble.  For instance, Ian used the example
"request.response.write()", implying that the response object is
accessible from the request object, which makes sense to me.  So in
one view, there's just one object, the request, and the response
object is just a part of that.  But for those who prefer it, it's easy
to assign

   response = request.response

and deal with the two different variables.

Bill

From cs1spw at bath.ac.uk  Wed Oct 29 19:26:34 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Wed Oct 29 19:26:40 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <03Oct29.154815pst."58611"@synergy1.parc.xerox.com>
References: <03Oct29.154815pst."58611"@synergy1.parc.xerox.com>
Message-ID: <3FA05ABA.5050909@bath.ac.uk>

Bill Janssen wrote:

> I think you can separate them and combine them at the same time,
> without much trouble.  For instance, Ian used the example
> "request.response.write()", implying that the response object is
> accessible from the request object, which makes sense to me.  So in
> one view, there's just one object, the request, and the response
> object is just a part of that.  But for those who prefer it, it's easy
> to assign
> 
>    response = request.response
> 
> and deal with the two different variables.

I have to admit I prefer keeping the two completely separate, as is done 
by the Java servlet specification. As mentioned by someone else, the big 
difference between the two is that request should be read only while 
response can have its state altered. An advantage of this is that you 
can potentially do interesting things with the two objects - like 
pickling the request object and logging it somewhere, or pickling the 
response object and caching it once it has been populated to speed up 
future requests for the same data.

I can see the POV of people who prefer a single object or nested objects 
as well though. This is going to be a tricky issue to resolve. If there 
are no utterly convincing arguments for one approach or the other we 
could take it to a vote?

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From janssen at parc.com  Wed Oct 29 20:04:37 2003
From: janssen at parc.com (Bill Janssen)
Date: Wed Oct 29 20:05:03 2003
Subject: [Web-SIG] htmlgen 
In-Reply-To: Your message of "Wed, 29 Oct 2003 13:12:35 PST."
	<20031029161141.G82536@onyx.ispol.com> 
Message-ID: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com>

Can you describe what HTMLgen does?

Bill

> 
> Should HTML-generating capability a la HTMLgen go on the missing list as
> well?
> 
> Grisha

From janssen at parc.com  Wed Oct 29 20:09:34 2003
From: janssen at parc.com (Bill Janssen)
Date: Wed Oct 29 20:11:20 2003
Subject: [Web-SIG] [server-side] request/response objects 
In-Reply-To: Your message of "Wed, 29 Oct 2003 16:26:34 PST."
	<3FA05ABA.5050909@bath.ac.uk> 
Message-ID: <03Oct29.170937pst."58611"@synergy1.parc.xerox.com>

> I can see the POV of people who prefer a single object or nested objects 
> as well though. This is going to be a tricky issue to resolve. If there 
> are no utterly convincing arguments for one approach or the other we 
> could take it to a vote?

I tend to prefer protracted formal discussion till the pros and cons
force a choice, a la Rittel/Webber "wicked problems".  See
http://www.poppendieck.com/wicked.htm.

Bill

From grisha at modpython.org  Wed Oct 29 20:14:31 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Wed Oct 29 20:14:37 2003
Subject: [Web-SIG] Request and Response objects
In-Reply-To: <21294262503873@dserver.cycla.com>
References: <21294262503873@dserver.cycla.com>
Message-ID: <20031029195318.C82536@onyx.ispol.com>


Let me argue the single request point with some specifics.

IMO dual objects create a semantics mess, here is a couple of examples:

o The point that I already brought up that reading from one object and
writing to another is unintuitive and misleading.

o Where does the connection information such as remote host, the raw
socket, etc information belong, request or response?

o Mod_python (or httpd rather) allows for cleanups to be registered, to
run after the request is finished being processed. Again - where would a
clean up fit in, at the end of a _request_ or at the end of a _response_?
(and when _does_ a request really end?)

o What about server information (document root, etc)?

o If there exists such a thing as a subrequest or internal redirect, then
in httpd's single object framework you can access the previous and next
request objects via req.prev or req.next. With two objects, it would be
something like response.subreq and response.subreq.resp, and to dig one
level deeper (req.next.next in single object model), it would be
response.subreq.resp.subreq.resp

Or if I am within a subrequest, how can I get at the parent (req.prev)?
 - you see my point, I hope.

6. When processing is aborted, which could happen while the request is
being read or while the response is being written - the logic should not
be duplicated in two different objects.

These are a few problems that I can think of with the dual object model,
yet so far I haven't seen anything seriously convincing in advocacy of the
dual object model :-)

Grisha


From amk at amk.ca  Wed Oct 29 21:48:07 2003
From: amk at amk.ca (A.M. Kuchling)
Date: Wed Oct 29 21:47:16 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <080F620B-0A57-11D8-ABB3-000393C2D67E@colorstudy.com>
Message-ID: <7DDEF574-0A83-11D8-B82C-0003931BF218@amk.ca>

On Wednesday, October 29, 2003, at 04:29  PM, Ian Bicking wrote:
> Quixote's PTL has some stuff related to this as well (at least related 
> to quoting), but I don't remember much about it.

http://www.mems-exchange.org/software/quixote/doc/PTL.html is the 
relevant documentation.

Basically, the 'htmltext' data type behaves like a string.  In 
operations involving both htmltext and regular strings, the regular 
string is coerced to htmltext; coercing a string to htmltext involves 
quoting HTML/XML special characters.  For example:

 >>> from quixote import html
 >>> html.htmltext('abc')
<htmltext 'abc'>
 >>> h = html.htmltext
 >>> h('<title>%s</title>') % 'Magic chars: <, >, &'
<htmltext '<title>Magic chars: &lt;, &gt;, &amp;</title>'>
 >>> h('abc') + '&'
<htmltext 'abc&amp;'>

If a templating package uses htmltext for portions of the template that 
were known to be trusted, then you don't have to remember to pass 
untrusted data from the browser through cgi,escape() or some 
equivalent; the coercion handles it for you, thus closing one source of 
security holes.

Quixote's PTL then layers some compiler magic on top of this so you 
don't have htmltext() constructors all over the place, but you don't 
need to buy into PTL to use htmltext.  Adding it to the stdlib might 
not be a bad idea.

--amk


From grisha at modpython.org  Wed Oct 29 22:46:31 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Wed Oct 29 22:46:37 2003
Subject: [Web-SIG] htmlgen 
In-Reply-To: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com>
References: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com>
Message-ID: <20031029224104.O82536@onyx.ispol.com>


On Wed, 29 Oct 2003, Bill Janssen wrote:
>
> Can you describe what HTMLgen does?


>>> from HTMLgen import *
>>> ul = UL(["blah", "blah"])
>>> ul.append(H(1, "bleh"))
>>> print ul
<UL>
<LI>blah
<LI>blah
<LI><H1>bleh</H1>

</UL>

>>>

From cs1spw at bath.ac.uk  Wed Oct 29 23:01:51 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Wed Oct 29 23:02:00 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031029224104.O82536@onyx.ispol.com>
References: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com>
	<20031029224104.O82536@onyx.ispol.com>
Message-ID: <3FA08D2F.6040701@bath.ac.uk>

Gregory (Grisha) Trubetskoy wrote:
>>Can you describe what HTMLgen does?
> 
>>>>from HTMLgen import *
>>>>ul = UL(["blah", "blah"])
>>>>ul.append(H(1, "bleh"))
>>>>print ul
> 
> <UL>
> <LI>blah
> <LI>blah
> <LI><H1>bleh</H1>
> </UL>

A big problem here is one of style. I prefer my HTML to be lower case 
with explicit end tags (even when optional), and often work in XHTML 
where end tags are required. I also like my lists to have their <li>s 
indented with 2 spaces.

The point I'm trying to make is that different people have different 
preferences for HTML, and there is no one correct way of writing it. 
This is why I'm opposed to HTML generation tools in the standard library 
- there are simply too many styles. HTML generation tools already exist 
outside the standard library in abundance and I see no pressing need for 
the default Python install to ship with one that has been chosen over 
all of the others.

If there's an obvious demand from Python's user base for an HTML 
generation system in the standard library then by all means there should 
be one, but I don't see any reason to include one without good reason 
when there is no obviously "correct" way of going about it.

-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From ianb at colorstudy.com  Wed Oct 29 22:52:52 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Oct 29 23:03:16 2003
Subject: [Web-SIG] htmlgen 
In-Reply-To: <03Oct29.170442pst."58611"@synergy1.parc.xerox.com>
Message-ID: <892DED96-0A8C-11D8-ABB3-000393C2D67E@colorstudy.com>

On Wednesday, October 29, 2003, at 07:04 PM, Bill Janssen wrote:
> Can you describe what HTMLgen does?

http://starship.python.net/crew/friedrich/HTMLgen/html/main.html

But the core portion is really about creating HTML, something along the 
lines of:

HTML(HEAD(TITLE('my page')),
      BODY(H1('my page'), IMG(src="/mypicture.jpg" width=100, 
height=100),
      ...)

With the output -- either directly or through str() -- being 
corresponding HTML.

There's several similar systems, with slight differences.  I used a 
class with magic attributes, like html.br(), for one system.  Someone 
else did something like BODY(bgcolor="#aaaaaa")[H1('title'), P()['some 
content']], and some other variations exist.

HTMLgen also includes some aspects that are more like templating, where 
you define the structure for an entire page.  But in its more basic 
form it's often useful for creating valid HTML snippets inside Python 
code.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From davidf at sjsoft.com  Wed Oct 29 23:31:08 2003
From: davidf at sjsoft.com (David Fraser)
Date: Wed Oct 29 23:31:14 2003
Subject: [Web-SIG] Request and Response objects
In-Reply-To: <20031029195318.C82536@onyx.ispol.com>
References: <21294262503873@dserver.cycla.com>
	<20031029195318.C82536@onyx.ispol.com>
Message-ID: <3FA0940C.2080301@sjsoft.com>

Gregory (Grisha) Trubetskoy wrote:

>Let me argue the single request point with some specifics.
>
>IMO dual objects create a semantics mess, here is a couple of examples:
>
>o The point that I already brought up that reading from one object and
>writing to another is unintuitive and misleading.
>
>o Where does the connection information such as remote host, the raw
>socket, etc information belong, request or response?
>
>o Mod_python (or httpd rather) allows for cleanups to be registered, to
>run after the request is finished being processed. Again - where would a
>clean up fit in, at the end of a _request_ or at the end of a _response_?
>(and when _does_ a request really end?)
>
>o What about server information (document root, etc)?
>
>o If there exists such a thing as a subrequest or internal redirect, then
>in httpd's single object framework you can access the previous and next
>request objects via req.prev or req.next. With two objects, it would be
>something like response.subreq and response.subreq.resp, and to dig one
>level deeper (req.next.next in single object model), it would be
>response.subreq.resp.subreq.resp
>
>Or if I am within a subrequest, how can I get at the parent (req.prev)?
> - you see my point, I hope.
>
>6. When processing is aborted, which could happen while the request is
>being read or while the response is being written - the logic should not
>be duplicated in two different objects.
>
>These are a few problems that I can think of with the dual object model,
>yet so far I haven't seen anything seriously convincing in advocacy of the
>dual object model :-)
>
>Grisha
>  
>
Great explanation, Grisha.
A lot of the arguments for the dual object model are about what you can 
do with a separate object.
But these seem to me to miss the point .... you can create your own 
"response"-type class that holds the *value* of a response, and as many 
instances of it per request as you want to. But the actual Web API 
response object is for *writing* the response back to the client. You 
can only write one response back per request, so it makes sense for them 
to be the same object.
(The only extension would be to filter what is being written, but this 
is a separate issue).

David


From ianb at colorstudy.com  Thu Oct 30 01:00:01 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 30 01:00:44 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <3FA08D2F.6040701@bath.ac.uk>
Message-ID: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com>

On Wednesday, October 29, 2003, at 10:01 PM, Simon Willison wrote:
> A big problem here is one of style. I prefer my HTML to be lower case 
> with explicit end tags (even when optional), and often work in XHTML 
> where end tags are required. I also like my lists to have their <li>s 
> indented with 2 spaces.

HTMLgen is kind of old and predates XHTML.  Any newer system would 
create XHTML and use lower-case tags.  As far as indentation, well, the 
HTML isn't intended to be terribly readable from these systems.  The 
point is to make the source readable.  (And you actually could make the 
HTML well indented using these systems, but it's usually not that 
important)

> The point I'm trying to make is that different people have different 
> preferences for HTML, and there is no one correct way of writing it. 
> This is why I'm opposed to HTML generation tools in the standard 
> library - there are simply too many styles. HTML generation tools 
> already exist outside the standard library in abundance and I see no 
> pressing need for the default Python install to ship with one that has 
> been chosen over all of the others.

I think you probably have more opinion about HTML than many Python 
programmers.

> If there's an obvious demand from Python's user base for an HTML 
> generation system in the standard library then by all means there 
> should be one, but I don't see any reason to include one without good 
> reason when there is no obviously "correct" way of going about it.

If we were talking about a templating system, then yes, way too much 
personal preference there, but this isn't really a templating system.  
While not everyone will want to use this, the actual variations 
(despite frequent reimplementation) are not that great.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From aquarius-lists at kryogenix.org  Thu Oct 30 02:46:21 2003
From: aquarius-lists at kryogenix.org (Stuart Langridge)
Date: Thu Oct 30 02:44:22 2003
Subject: [Web-SIG] htmlgen
References: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com>
Message-ID: <E1AF7Vd-00041a-00@giles>

Ian Bicking spoo'd forth:
> On Wednesday, October 29, 2003, at 10:01 PM, Simon Willison wrote:
>> A big problem here is one of style. I prefer my HTML to be lower case 
>> with explicit end tags (even when optional), and often work in XHTML 
>> where end tags are required. I also like my lists to have their <li>s 
>> indented with 2 spaces.
> 
> HTMLgen is kind of old and predates XHTML.  Any newer system would 
> create XHTML and use lower-case tags.

Without wishing to make life more complex for everything, it should be
able to do HTML 4.01 as well; there are still problems with XHTML (by
which I mean which content-type it's served as -- serving it as xml
doesn't work in all browsers and serving it as html means that browsers
treat it as tag soup), so I'm still using 4.01 Strict for most
projects.

sil

-- 
Soon -- as it measured time -- it would have work to do once again.
Thousands upon thousands of worlds.
	   -- "Fallen Star", Simon Clay

From grisha at modpython.org  Thu Oct 30 11:33:18 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Thu Oct 30 11:33:22 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com>
References: <4CB0F998-0A9E-11D8-9E10-000393C2D67E@colorstudy.com>
Message-ID: <20031030112919.J97494@onyx.ispol.com>


On Thu, 30 Oct 2003, Ian Bicking wrote:

> > should be one, but I don't see any reason to include one without good
> > reason when there is no obviously "correct" way of going about it.
>
> If we were talking about a templating system, then yes, way too much
> personal preference there, but this isn't really a templating system.

HTMLgen has a DocumentTemplate thing which is a bare bones templating
system allowing for substitution in a text file. I think something
primitive of this sort and perhaps implemented based on this:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330

(which can probably be even further optimized)

would be nice to have in stdlib.

Grisha

From ianb at colorstudy.com  Thu Oct 30 11:44:52 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 30 11:44:59 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031030112919.J97494@onyx.ispol.com>
Message-ID: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>

On Thursday, October 30, 2003, at 10:33 AM, Gregory (Grisha) Trubetskoy 
wrote:
> HTMLgen has a DocumentTemplate thing which is a bare bones templating
> system allowing for substitution in a text file. I think something
> primitive of this sort and perhaps implemented based on this:
>
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330
>
> (which can probably be even further optimized)
>
> would be nice to have in stdlib.

A templating system in its most naive form is just a kind of string 
substitution.  If that's the kind of thing we're looking for, then 
perhaps -- but it has to be usefully better than %.  (Though % would be 
more useful if it had other formatting options, like %h does HTML 
quoting, or %u does URL quoting... but where would it stop?)

There's a PEP out there for $ string substitution, but it's static 
substitution (i.e., it always fills from locals()).  Guido just 
mentioned recently on python-dev that he didn't want to improve % 
(specifically a request that "%{var}" be equivalent to "%(var)s") 
because he wanted to leave room for a better solution.  What better 
solution?  I don't know... I think it has to be something both elegant 
and useful, minimal and flexible.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From amk at amk.ca  Thu Oct 30 11:53:16 2003
From: amk at amk.ca (amk@amk.ca)
Date: Thu Oct 30 11:53:24 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <03Oct29.170937pst."58611"@synergy1.parc.xerox.com>
References: <3FA05ABA.5050909@bath.ac.uk>
	<03Oct29.170937pst."58611"@synergy1.parc.xerox.com>
Message-ID: <20031030165316.GA12422@rogue.amk.ca>

On Wed, Oct 29, 2003 at 05:09:34PM -0800, Bill Janssen wrote:
> I tend to prefer protracted formal discussion till the pros and cons
> force a choice, a la Rittel/Webber "wicked problems".  See
> http://www.poppendieck.com/wicked.htm.

I doubt this is such a problem, though; it doesn't really *matter* if
there's one object or two, and neither side has any overwhelming arguments
on the point, so ultimately it'll come down to taste.  

--amk

From randyp at cycla.com  Thu Oct 30 12:02:10 2003
From: randyp at cycla.com (Randy Pearson)
Date: Thu Oct 30 12:02:43 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <3FA05ABA.5050909@bath.ac.uk>
Message-ID: <17011139006433@dserver.cycla.com>

> ... the big 
> difference between the two is that request should be read only while 
> response can have its state altered....

If request is read-only, how would you create unit tests for other
components? A testing harness would need the ability to instantiate and
alter pseudo requests outside of the HTTP server context.

I do agree that, from the response's point-of-view, the request is
immutable.

-- Randy


From grisha at modpython.org  Thu Oct 30 12:05:38 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Thu Oct 30 12:06:03 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>
References: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>
Message-ID: <20031030120232.T98038@onyx.ispol.com>


On Thu, 30 Oct 2003, Ian Bicking wrote:

> substitution (i.e., it always fills from locals()).  Guido just
> mentioned recently on python-dev that he didn't want to improve %
> (specifically a request that "%{var}" be equivalent to "%(var)s")
> because he wanted to leave room for a better solution.  What better
> solution?  I don't know... I think it has to be something both elegant
> and useful, minimal and flexible.

This is along the lines of what I think. Another thing with %() is that if
the dictionary doesn't have a corresponding value you get key error as
opposed to leaving it as is or defaulting to nothing.

I might actually take the time to put something together, then we can
ponder on whether it's worth including.

Grisha

From davidf at sjsoft.com  Thu Oct 30 12:20:37 2003
From: davidf at sjsoft.com (David Fraser)
Date: Thu Oct 30 12:21:14 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031030120232.T98038@onyx.ispol.com>
References: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>
	<20031030120232.T98038@onyx.ispol.com>
Message-ID: <3FA14865.7090902@sjsoft.com>

Gregory (Grisha) Trubetskoy wrote:

>On Thu, 30 Oct 2003, Ian Bicking wrote:
>
>  
>
>>substitution (i.e., it always fills from locals()).  Guido just
>>mentioned recently on python-dev that he didn't want to improve %
>>(specifically a request that "%{var}" be equivalent to "%(var)s")
>>because he wanted to leave room for a better solution.  What better
>>solution?  I don't know... I think it has to be something both elegant
>>and useful, minimal and flexible.
>>    
>>
>
>This is along the lines of what I think. Another thing with %() is that if
>the dictionary doesn't have a corresponding value you get key error as
>opposed to leaving it as is or defaulting to nothing.
>
>I might actually take the time to put something together, then we can
>ponder on whether it's worth including.
>
>Grisha
>  
>
I'm not sure about how useful this kind of variable substitution would 
be for html ... any examples?

David


From ianb at colorstudy.com  Thu Oct 30 12:37:27 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 30 12:37:33 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <17011139006433@dserver.cycla.com>
Message-ID: <BA95D726-0AFF-11D8-9E10-000393C2D67E@colorstudy.com>

On Thursday, October 30, 2003, at 11:02 AM, Randy Pearson wrote:
>> ... the big
>> difference between the two is that request should be read only while
>> response can have its state altered....
>
> If request is read-only, how would you create unit tests for other
> components? A testing harness would need the ability to instantiate and
> alter pseudo requests outside of the HTTP server context.

You'd be able to create artificial requests, and copy requests with 
changes.  Immutable objects usually have to have better support for 
these sorts of things for just this reason.  So maybe you'd have 
something like:

     # Ignoring some details here...
     vars = request.vars
     vars.update({'action': 'delete'})
     forward(request.clone(path='/target/delete', variables = vars))

Or:

     req = HTTPRequest(variables={}, method='GET', ...)

While perhaps with CGI you'd use:

    req = HTTPRequest.fromEnvironment()

(.fromCGI()? just .cgi()?)

Anyway, I think there's compatibility problems with this, but if I was 
doing it from scratch I might do this.  (Immutability would be a little 
soft, though -- you could, for instance, set the response for the 
request after the object was created, but you could change the response 
one it had been set)


--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Thu Oct 30 12:46:05 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 30 12:46:12 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <3FA14865.7090902@sjsoft.com>
Message-ID: <EF5CC9B8-0B00-11D8-9E10-000393C2D67E@colorstudy.com>

On Thursday, October 30, 2003, at 11:20 AM, David Fraser wrote:
> I'm not sure about how useful this kind of variable substitution would 
> be for html ... any examples?

defaults = {'username': req.cookie('username', '')}
defaults.update(req.fields)

if request.fields.get('username'):
     defaults['message'] = "<b>Login incorrect</b><br>"

defaults['action'] = '/loginform'

form = '''
<form action="%(action)h" method="POST">
%(message)s
Username: <input type="text" name="username" value="%(username)h"> <br>
Password: <input type="password" name="password" value="%(password)h">
<input type="submit">
</form>
'''.substitute(defaults)


## Using something HTMLgen-ish:

defaults = {'username': req.cookie('username', '')}
defaults.update(req.fields)

if request.fields.get('username'):
     defaults['message'] = html.b("Login incorrect") + html.br()

defaults['action'] = '/loginform'

form = html.form(action=defaults['action'], method="POST")(
     defaults.get('message'),
     'Username: ',
     html.input(type="text", name="username", 
value=defaults.get('username')),
     html.br(),
     'Password: ',
     html.input(type="password", name="password", 
value=defaults.get('password')),
     html.input(type="submit"))


--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From grisha at modpython.org  Thu Oct 30 12:54:09 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Thu Oct 30 12:54:13 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <3FA14865.7090902@sjsoft.com>
References: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>
	<20031030120232.T98038@onyx.ispol.com> <3FA14865.7090902@sjsoft.com>
Message-ID: <20031030124343.Q98038@onyx.ispol.com>


On Thu, 30 Oct 2003, David Fraser wrote:

> Gregory (Grisha) Trubetskoy wrote:
>
> >This is along the lines of what I think. Another thing with %() is that if
> >the dictionary doesn't have a corresponding value you get key error as
> >opposed to leaving it as is or defaulting to nothing.
> >
> I'm not sure about how useful this kind of variable substitution would
> be for html ... any examples?


It comes in handy in various HTML formatting, e.g. let's say we have a
menu, and you want one item highlighted:

HTML = """
  <a href="home" %(home)s >Home</a><br>
  <a href="products" %(prod)s >Products</a><br>
  <a href="about" %(about)s >About</a><br>
"""

To highlight home you'd have to do something like:

HTML % {'home' : 'class="highlighted"', 'prod':'', 'about':''}

But it's nice to not have to list every menu option (less typing, and
more importantly, you can change the template without having to fix the
code), something functionally equivalent to:

HTML % {'home' : 'class="highlighted"'}

(this would raise key error)

Grisha


From ianb at colorstudy.com  Thu Oct 30 13:11:08 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Oct 30 13:11:14 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031030124343.Q98038@onyx.ispol.com>
Message-ID: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com>

On Thursday, October 30, 2003, at 11:54 AM, Gregory (Grisha) Trubetskoy 
wrote:
> It comes in handy in various HTML formatting, e.g. let's say we have a
> menu, and you want one item highlighted:
>
> HTML = """
>   <a href="home" %(home)s >Home</a><br>
>   <a href="products" %(prod)s >Products</a><br>
>   <a href="about" %(about)s >About</a><br>
> """
>
> To highlight home you'd have to do something like:
>
> HTML % {'home' : 'class="highlighted"', 'prod':'', 'about':''}
>
> But it's nice to not have to list every menu option (less typing, and
> more importantly, you can change the template without having to fix the
> code), something functionally equivalent to:
>
> HTML % {'home' : 'class="highlighted"'}
>
> (this would raise key error)

This would solve this particular problem:

class EmptyStringDict(dict):
     def __getitem__(self, item):
         try:
             return dict.__getitem__(self, item)
         except KeyError:
             return ''

You might add a test for None as well, and replace None with '' (which 
is what I always want in these sorts of situations).  A more structured 
description can work even better, though.  Something like:

classes = {'home': 'highlighted'}
html(
     html.a(href="home", class_=classes.get('home'))('Home'), html.br(),
     html.a(href="products", 
class_=classes.get('products'))('Produccts'), html.br(),
     html.a(href="about", class_=classes.get('about'))('About'), 
html.br(),
     )

In this example, any attribute with a value None will simply be 
excluded.  (Perhaps there should also be a way to indicate an attribute 
with no value, like "checked" -- I've used None for that and a special 
object for exclude before, or that could be reversed)

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From grisha at modpython.org  Thu Oct 30 14:09:29 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Thu Oct 30 14:09:33 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com>
References: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com>
Message-ID: <20031030140302.M98038@onyx.ispol.com>


On Thu, 30 Oct 2003, Ian Bicking wrote:

> This would solve this particular problem:
>
> class EmptyStringDict(dict):
>      def __getitem__(self, item):
>          try:
>              return dict.__getitem__(self, item)
>          except KeyError:
>              return ''

Neat trick! Here is an even more generic version:

class DefaultDict(dict):

    def __init__(self, init={}, default=""):
        self.default = default
        dict.__init__(self, init)

    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            return self.default


Now I can do:

>>> "Hello %(title)s %(name)s, how are you?" % DefaultDict({'title' : 'Mr.', 'name' : 'Smith'})
'Hello Mr. Smith, how are you?'
>>>
>>> "Hello %(title)s %(name)s, how are you?" % DefaultDict({'name' : 'Smith'})
'Hello  Smith, how are you?'
>>>

Grisha

From amk at amk.ca  Thu Oct 30 14:27:18 2003
From: amk at amk.ca (amk@amk.ca)
Date: Thu Oct 30 14:27:28 2003
Subject: [Web-SIG] HTML parsing: anyone use formatter?
Message-ID: <20031030192718.GA13220@rogue.amk.ca>

[Crossposted to python-dev, web-sig, and xml-sig.  Followups to
web-sig@python.org, please.]

I'm working on bringing htmllib.py up to HTML 4.01 by adding handlers for
all the missing elements.  I've currently been adding just empty methods to
the HTMLParser class, but the existing methods actually help render the HTML
by calling methods on a Formatter object. For example, the definitions for
the H1 element look like this:

    def start_h1(self, attrs):
        self.formatter.end_paragraph(1)
        self.formatter.push_font(('h1', 0, 1, 0))
		    
    def end_h1(self):
        self.formatter.end_paragraph(1)
        self.formatter.pop_font()

Question: should I continue supporting this in new methods?  This can only
go so far; a tag such as <big> or <small> is easy for me to handle, but
handling <form> or <frameset> or <table> would require greatly expanding the
Formatter class's repertoire.

I suppose the more general question is, does anyone use Python's formatter
module?  Do we want to keep it around, or should htmllib be pushed toward
doing just HTML parsing?  formatter.py is a long way from being able to
handle modern web pages and it would be a lot of work to build a decent
renderer.

--amk

From barry at python.org  Thu Oct 30 16:01:00 2003
From: barry at python.org (Barry Warsaw)
Date: Thu Oct 30 16:01:08 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031030140302.M98038@onyx.ispol.com>
References: <6F834591-0B04-11D8-9E10-000393C2D67E@colorstudy.com>
	<20031030140302.M98038@onyx.ispol.com>
Message-ID: <1067547658.5295.165.camel@anthem>

On Thu, 2003-10-30 at 14:09, Gregory (Grisha) Trubetskoy wrote:

> Neat trick! Here is an even more generic version:

http://mail.python.org/pipermail/python-dev/2003-October/039369.html

:)

-Barry


From gward at python.net  Thu Oct 30 21:51:17 2003
From: gward at python.net (Greg Ward)
Date: Thu Oct 30 21:51:20 2003
Subject: [Web-SIG] Random thoughts
Message-ID: <20031031025116.GA7401@cthulhu.gerg.ca>

I'm just catching up on the archive for this list.  Some random
thoughts:

  * a new package, 'web', is definitely in order.
    "from web import cookies", "from web import http" just sounds right.
    (That contradicts Greg Stein's proposal in PEP 267, but I assume
    he's not strongly wedded to that.)

  * I'm all for stealing good ideas from other sources (eg. PHP,
    the Java servlet API), but I'm not keen on the exact semantics
    Simon has mentioned from PHP.  In particular, I hope no one is
    seriously considering global dictionaries called COOKIES or GET.

    Clearly, the Right Way is:

      request.get_cookie("session_id")

      request.get_form_var("name")

    (spelling and terminology yet to be decided; eg. I could live with
    it getcookie() and getformvar() ;-)

An aside: in the query string

   ?name=Greg&colour=blue&age=31

what exactly are 'name', 'colour', and 'age'?  Are they form variables?
query variables? parameters? fields?  Is this specified anywhere?

(In Quixote's HTTPRequest class, they're called "form variables" --
hence get_form_var() -- but I've never been terribly thrilled with that
terminology.  At the moment, I like "query variables".)

  * on the fields-with-multiple-values issue: I'm with Steve Holden and
    David Fraser.  (I.e., the programmer should know which query
    variables expect multiple values, and the request object should
    always return a list for those variables.)  cgi.py is Dead Wrong
    here; the type of an object should be predictable from the code, not
    dependent on the HTTP client!

    (But I disagree with Steve on handling multiple values for a query
    variable that expects a single value: in that case, IMHO sloppy
    should be the default, and you should get the first value.  I don't
    want to guard every get_form_var() call with a "except KeyError" to
    avoid broken/malicious clients crashing the script!)

    (I've mentally toyed with funky types like Barry suggested, but I
    think that sort of context-sensitive trickery is unPythonic.  Just
    because you can do something doesn't mean you should.)

    Perhaps something like Quixote's form framework belongs in the
    standard library -- the Widget classes solve a lot of problems
    with handling HTML forms.  There's some out-of-date documentation
    here:
      http://www.mems-exchange.org/software/quixote/doc/widgets.html

  * the "PATHINFO" variable is not CGI-specific.  Zope and Quixote are
    both utterly dependent on PATHINFO, and they're not tied to CGI.
    (There are strong connections, but you can run a Quixote app with
    mod_python, Medusa, or Twisted -- no CGI there!)

    Also, the Java servlet API has a getPathInfo() method, and Java
    servlets are most certainly not CGI scripts.

    "pathinfo" is just the part of the URL that the HTTP server doesn't
    look at.  ;-)

  * I oppose Simon Willison's practice of using the same variable
    in the "GET" and "POST" part of a request, but I will defend to the
    death his right to do so.  (But not in Quixote, where a narrower
    definition of what is Right, Good, and Truthfull prevails.)

Enough for now.  I still have lots of archive to read.  ;-(

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
And now for something completely different.

From janssen at parc.com  Thu Oct 30 22:46:21 2003
From: janssen at parc.com (Bill Janssen)
Date: Thu Oct 30 22:46:51 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: Your message of "Thu, 30 Oct 2003 18:51:17 PST."
	<20031031025116.GA7401@cthulhu.gerg.ca> 
Message-ID: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com>

>   * I oppose Simon Willison's practice of using the same variable
>     in the "GET" and "POST" part of a request, but I will defend to the
>     death his right to do so.  (But not in Quixote, where a narrower
>     definition of what is Right, Good, and Truthfull prevails.)

I don't get it.  Any particular request only has one method, not two:
"GET" and "POST".  Are you talking about for some reason
special-casing these two methods in the Request class?  I think it
makes more sense to do things generically:

   request.path		(e.g., '/foo/bar')
   request.method	(e.g., "GET")
   request.part		(e.g., "#bletch", perhaps without the #)
   request.headers
   request.parameters	(either the query parms, or the multipart/form-data values)

   request.response() => returns a Response object tied to this request

   response.error(code, message)	Sends back an error
   response.reply(htmltext)		Sends back a message
   response.open(ContentType="text/html", code=200) => file object to write to
	fp.write(...)
	fp.close()			Sends back the response
   response.redirect(URL)		Sends back redirect to the URL

Bill

From gstein at lyra.org  Fri Oct 31 01:26:20 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 31 01:26:42 2003
Subject: [Web-SIG] Random thoughts
In-Reply-To: <20031031025116.GA7401@cthulhu.gerg.ca>;
	from gward@python.net on Thu, Oct 30, 2003 at 09:51:17PM -0500
References: <20031031025116.GA7401@cthulhu.gerg.ca>
Message-ID: <20031030222620.B1901@lyra.org>

On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote:
> I'm just catching up on the archive for this list.  Some random
> thoughts:
> 
>   * a new package, 'web', is definitely in order.
>     "from web import cookies", "from web import http" just sounds right.
>     (That contradicts Greg Stein's proposal in PEP 267, but I assume
>     he's not strongly wedded to that.)

Correct. The name isn't the important part of the PEP. That said, "web" is
a big misnomer for [package containing] an http client library, but that's
a bikeshed of an entirely different color :-)

I'm more interested in a way of constructing a connection to a server,
where that connection has some various combination of features:

  * SSL
  * Basic/Digest/??? authentication
  * WebDAV
  * Proxy
  * Proxy auth

The current model for the client side uses two, distinct classes to deal
with the SSL feature. I have an entirely separate module for the WebDAV
stuff. And authentication isn't even handled in the core http classes, but
over in urllib(2). Same for proxy support.

PEP 267 is about a refactoring to bring these features under one cover,
and to move some features from urllib down into the basic connection
classes to be used by any http client. (and yah, urllib would still expose
some concepts since ftp still needs some authn, but it could just defer to
the "new" httplib authn facilities)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From ianb at colorstudy.com  Fri Oct 31 01:28:01 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 31 01:28:32 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com>
References: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com>
Message-ID: <607181AE-0B6B-11D8-88D0-000393C2D67E@colorstudy.com>

On Oct 30, 2003, at 9:46 PM, Bill Janssen wrote:
>>   * I oppose Simon Willison's practice of using the same variable
>>     in the "GET" and "POST" part of a request, but I will defend to 
>> the
>>     death his right to do so.  (But not in Quixote, where a narrower
>>     definition of what is Right, Good, and Truthfull prevails.)
>
> I don't get it.  Any particular request only has one method, not two:
> "GET" and "POST".  Are you talking about for some reason
> special-casing these two methods in the Request class?  I think it
> makes more sense to do things generically:
>
>    request.path		(e.g., '/foo/bar')
>    request.method	(e.g., "GET")
>    request.part		(e.g., "#bletch", perhaps without the #)

No real way to access this.

>    request.headers
>    request.parameters	(either the query parms, or the 
> multipart/form-data values)

I think fields is better name -- common, and a bit shorter (since it's 
the most used part of the request)

>    request.response() => returns a Response object tied to this request
>
>    response.error(code, message)	Sends back an error

Message, like response.error(404, "Not Found"), or response.error(403, 
"Administrator permission is required to access this resource")

>    response.reply(htmltext)		Sends back a message

or setBody perhaps -- reply implies that the text will be immediately 
(irrevocably?) sent.  Maybe that's good, or maybe a separate 
commit/close is better.

>    response.open(ContentType="text/html", code=200) => file object to 
> write to

I'm not sure I understand the purpose of the keyword arguments.

> 	fp.write(...)
> 	fp.close()			Sends back the response
>    response.redirect(URL)		Sends back redirect to the URL

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From anthony at interlink.com.au  Fri Oct 31 01:28:34 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Oct 31 01:31:34 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: <20031030222620.B1901@lyra.org> 
Message-ID: <200310310628.h9V6SYdw023795@localhost.localdomain>


>>> Greg Stein wrote
> On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote:
> > I'm just catching up on the archive for this list.  Some random
> > thoughts:
> > 
> >   * a new package, 'web', is definitely in order.
> >     "from web import cookies", "from web import http" just sounds right.
> >     (That contradicts Greg Stein's proposal in PEP 267, but I assume
> >     he's not strongly wedded to that.)
> 
> Correct. The name isn't the important part of the PEP. That said, "web" is
> a big misnomer for [package containing] an http client library, but that's
> a bikeshed of an entirely different color :-)


Wouldn't it be better to have something more like:

web/
   client.py
   cgi.py
   server.py

.. and the like? web.http seems so very redundant, web.client seems more
meaningful.


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From gstein at lyra.org  Fri Oct 31 01:33:29 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 31 01:33:49 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>;
	from ianb@colorstudy.com on Thu, Oct 30, 2003 at 10:44:52AM -0600
References: <20031030112919.J97494@onyx.ispol.com>
	<6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>
Message-ID: <20031030223329.C1901@lyra.org>

On Thu, Oct 30, 2003 at 10:44:52AM -0600, Ian Bicking wrote:
> On Thursday, October 30, 2003, at 10:33 AM, Gregory (Grisha) Trubetskoy 
> wrote:
> > HTMLgen has a DocumentTemplate thing which is a bare bones templating
> > system allowing for substitution in a text file. I think something
> > primitive of this sort and perhaps implemented based on this:
> >
> > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330
> >
> > (which can probably be even further optimized)
> >
> > would be nice to have in stdlib.
> 
> A templating system in its most naive form is just a kind of string 
> substitution.  If that's the kind of thing we're looking for, then 
> perhaps -- but it has to be usefully better than %.  (Though % would be

Right. Simple interpolation is rarely enough. The features that I found to
be useful in a templating system:

* interpolation
* conditionals
* iteration
* structured objects  (i.e. something like: foo.bar)
* including sub-templates

I've also found that *restricting* the functionality to just this limited
set helps to provide clarity and avoid complex abuses of templates. I look
at the task simply as "rendering data" and prefer a simple syntax and
functionality to match that.

Cheers,
-g

p.s. yah yah, this is an implicit pimping of my ezt module :-)
   http://svn.webdav.org/repos/projects/ezt/trunk/ezt.py

-- 
Greg Stein, http://www.lyra.org/

From amk at amk.ca  Fri Oct 31 06:41:09 2003
From: amk at amk.ca (amk@amk.ca)
Date: Fri Oct 31 06:41:32 2003
Subject: [Web-SIG] Random thoughts
In-Reply-To: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<03Oct30.194623pst."58611"@synergy1.parc.xerox.com>
Message-ID: <20031031114109.GA16773@rogue.amk.ca>

On Thu, Oct 30, 2003 at 07:46:21PM -0800, Bill Janssen wrote:
> I don't get it.  Any particular request only has one method, not two:
> "GET" and "POST".  Are you talking about for some reason
> special-casing these two methods in the Request class?  I think it

Simon wants to differentiate between where a variable comes from;
http://example/?password=foo is treated differently than when 
the 'password' variable is specified in the body of a POST.

--amk

From neel at mediapulse.com  Fri Oct 31 09:43:58 2003
From: neel at mediapulse.com (Michael C. Neel)
Date: Fri Oct 31 09:44:06 2003
Subject: [Web-SIG] htmlgen
Message-ID: <C0FC22C08B82074A88B50061764157776B97AA@johnson.mediapulse.net>


> -----Original Message-----
> From: Greg Stein [mailto:gstein@lyra.org] 
> Sent: Friday, October 31, 2003 1:33 AM
> To: web-sig@python.org
> Subject: Re: [Web-SIG] htmlgen
> 


> p.s. yah yah, this is an implicit pimping of my ezt module :-)
>    http://svn.webdav.org/repos/projects/ezt/trunk/ezt.py
> 


I can +1 ezt having used it before; it's the exact type of lightweight
template system that should be part of stdlib.  It cover the basics and
you can extend it form there if you need some more.  It's also worth
note that it's not in anyway tied to HTML (I use it for email templates
mostly).

I'd recommend to all here to take a few moments and play with it, then
give feedback on any changes you think should be made.  No need to solve
this from scratch if we don't have to.

Mike

From amk at amk.ca  Fri Oct 31 10:09:22 2003
From: amk at amk.ca (amk@amk.ca)
Date: Fri Oct 31 10:09:45 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <C0FC22C08B82074A88B50061764157776B97AA@johnson.mediapulse.net>
References: <C0FC22C08B82074A88B50061764157776B97AA@johnson.mediapulse.net>
Message-ID: <20031031150922.GA17539@rogue.amk.ca>

On Fri, Oct 31, 2003 at 09:43:58AM -0500, Michael C. Neel wrote:
> I'd recommend to all here to take a few moments and play with it, then
> give feedback on any changes you think should be made.  No need to solve
> this from scratch if we don't have to.

... well, except for the other 12 templating solutions that already exist.

ezt looks very cute, but it's clear that no one has the same requirements
for templating.  Let's just walk away from trying to choose one.

--amk

From barry at python.org  Fri Oct 31 10:16:02 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Oct 31 10:16:08 2003
Subject: [Web-SIG] Random thoughts
In-Reply-To: <20031031025116.GA7401@cthulhu.gerg.ca>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
Message-ID: <1067613362.5173.8.camel@anthem>

On Thu, 2003-10-30 at 21:51, Greg Ward wrote:

>     (But I disagree with Steve on handling multiple values for a query
>     variable that expects a single value: in that case, IMHO sloppy
>     should be the default, and you should get the first value.  I don't
>     want to guard every get_form_var() call with a "except KeyError" to
>     avoid broken/malicious clients crashing the script!)

Agreed!  I'd much rather test for None-ness or provide my own default.

>     (I've mentally toyed with funky types like Barry suggested, but I
>     think that sort of context-sensitive trickery is unPythonic.  Just
>     because you can do something doesn't mean you should.)

Greg's been reading my Oblique Strategies again. :)

-Barry


From davidf at sjsoft.com  Fri Oct 31 10:42:52 2003
From: davidf at sjsoft.com (David Fraser)
Date: Fri Oct 31 10:43:02 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031030223329.C1901@lyra.org>
References: <20031030112919.J97494@onyx.ispol.com>	<6230765A-0AF8-11D8-9E10-000393C2D67E@colorstudy.com>
	<20031030223329.C1901@lyra.org>
Message-ID: <3FA282FC.8060406@sjsoft.com>

Greg Stein wrote:

>On Thu, Oct 30, 2003 at 10:44:52AM -0600, Ian Bicking wrote:
>  
>
>>On Thursday, October 30, 2003, at 10:33 AM, Gregory (Grisha) Trubetskoy 
>>wrote:
>>    
>>
>>>HTMLgen has a DocumentTemplate thing which is a bare bones templating
>>>system allowing for substitution in a text file. I think something
>>>primitive of this sort and perhaps implemented based on this:
>>>
>>>http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330
>>>
>>>(which can probably be even further optimized)
>>>
>>>would be nice to have in stdlib.
>>>      
>>>
>>A templating system in its most naive form is just a kind of string 
>>substitution.  If that's the kind of thing we're looking for, then 
>>perhaps -- but it has to be usefully better than %.  (Though % would be
>>    
>>
>
>Right. Simple interpolation is rarely enough. The features that I found to
>be useful in a templating system:
>
>* interpolation
>* conditionals
>* iteration
>* structured objects  (i.e. something like: foo.bar)
>* including sub-templates
>
>I've also found that *restricting* the functionality to just this limited
>set helps to provide clarity and avoid complex abuses of templates. I look
>at the task simply as "rendering data" and prefer a simple syntax and
>functionality to match that.
>
>Cheers,
>-g
>
>p.s. yah yah, this is an implicit pimping of my ezt module :-)
>   http://svn.webdav.org/repos/projects/ezt/trunk/ezt.py
>
>  
>
What I've found really helpful in my jtoolkit framework is to allow 
anything to go inside a tag object (in between the start and end tags), 
including a string, another tag object, or a list of any of the above.
The toolkit then expands any of the required items.
pagelinks = []
for pagelinknum in range(1, len(pages)+1):
pagelinktext = "Page %d" % pagelinknum
if pagelinknum = currentpagenum:
pagelinktext += " (current)"
pagelinklink = '?page=%d' % pagelinknum
pagelinks.append(widgets.Link(pagelinklink, pagelinktext))
pagelinks.append(' ')
e.g. widgets.Page(title, contents=[widgets.Paragraph(pagelinks), 
restofcontents])

David


From grisha at modpython.org  Fri Oct 31 10:54:46 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 31 10:54:50 2003
Subject: [Web-SIG] Random thoughts
In-Reply-To: <20031031025116.GA7401@cthulhu.gerg.ca>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
Message-ID: <20031031094609.T12375@onyx.ispol.com>


On Thu, 30 Oct 2003, Greg Ward wrote:

> An aside: in the query string
>
>    ?name=Greg&colour=blue&age=31
>
> what exactly are 'name', 'colour', and 'age'?

Short answer: "field names"

Long answer:

I cannot claim to be an absolute expert on the matter, but here is my best
understanding:


In ?name=Greg&colour=blue&age=31

"name=Greg&colour=blue&age=31" is called "searchpart", "query information"
or simply "query"

from RFC 1808 sec 2.1 "URL Syntactic Components":

  <scheme>://<net_loc>/<path>;<params>?<query>#<fragment>

  - [snip] -

  "?" query    ::= query information, as per Section 3.3 of
                       RFC 1738 [2].

Then if we look at RFC 1738, it describes an HTTP URL specifically:

  An HTTP URL takes the form:

    http://>:<port>/<path>?<searchpart>


Now, RFC 1866 (HTML)  introduces the concept of a "form". Forms have a
METHOD attribute which lets you specify how the form is to be submitted.
When method is 'GET', the form will be submitted as "query information",
described above.

Since there are limits to what is allowed in a URL, the data has to be
"url encoded", as described in 8.2.1 of RFC 1866:

        2. The fields are listed in the order they appear in the
        document with the name separated from the value by `=' and
        the pairs separated from each other by `&'.

[Note BTW that the order is specified]

Therefore, 'name', 'colour', and 'age' are "field names", and 'Greg',
'blue', '31' are "field values".

A more clever example would be:

?name=Greg%20Ward&colour=blue&age=31

Here, "Greg Ward" is a form field value, while "Greg%20Ward" is a random
chunk of a URL query with no particular meaning, just as "0Ward&col".


Here is the interesting part (RFC 1866 8.2.3):

   To process a form whose action URL is an HTTP URL and whose method is
   `POST', the user agent conducts an HTTP POST transaction using the
   action URI, and a message body of type `application/x-www-form-
   urlencoded' format as above.

Note that it doesn't say that the action URI cannot contain a query,
so based on this, I can have a form like this:

  <form method="post"
   action="http://blah/blah?some=form&data=as&query=info">

  <input type="text" name="bleh>

  ...


This form would result in a POST request containing a query string as
well.

To the best of my understanding, there is no formal specification of what
happens on the server side; all HTML RFC's only describe the client
behaviour. So it's up to the developer to decide whether field "some"
above is part of the form data.

Or to put it another way, to insist that POST form data does NOT contain
fields from the query would not be correct, it should be optional
behaviour.

*My* inclanation is that combined data from POST and quesry should be
*default* behaviour, and if you want to separate them, you may have to do
extra work.

IMHO adding a query to the action of a POST form is a simple technique of
injecting form data into a form without resorting to hidden inputs.


Grisha


From grisha at modpython.org  Fri Oct 31 11:22:55 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 31 11:22:58 2003
Subject: [Web-SIG] PATHINFO [was Random thoughts]
In-Reply-To: <20031031025116.GA7401@cthulhu.gerg.ca>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
Message-ID: <20031031105512.G12375@onyx.ispol.com>


On Thu, 30 Oct 2003, Greg Ward wrote:

>   * the "PATHINFO" variable is not CGI-specific.

"the PATHINFO variable" only has meaning in a particular context, here is
the CGI definition:

  http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt
  Section 6.1.6:

   The PATH_INFO metavariable specifies a path to be interpreted
   by the CGI script. It identifies the resource or sub-resource
   to be returned by the CGI script, and it is derived from the
   portion of the URI path following the script name but
   preceding any query data.

>     Also, the Java servlet API has a getPathInfo() method

Yes, it does:

http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/http/HttpServletRequest.html#getPathInfo()

   any extra path information associated with the URL the client sent when
   it made this request. The extra path information  follows the servlet
   path but precedes the query string and will start with a "/" character".

So this definition relies on the notion of a "servlet", which is OK since
this is part of J2EE. Then they go on to say that it is "Same as the value
of the CGI variable PATH_INFO", but it really isn't, "similar" would be a
better word.


I think I can live with a pathinfo that is "implementation specific", or
if we were to define a "Python Enterprise Architecture" with our own
definition of a servlet (or whatever), but for the Python standard library
to try to define it *outside of any context* would be a mistake I think.

Grisha

From jjl at pobox.com  Fri Oct 31 11:34:02 2003
From: jjl at pobox.com (John J Lee)
Date: Fri Oct 31 11:34:10 2003
Subject: client-side [was: Re: [Web-SIG] Random thoughts]
In-Reply-To: <20031030222620.B1901@lyra.org>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
Message-ID: <Pine.LNX.4.58.0310311623090.933@alice>

On Thu, 30 Oct 2003, Greg Stein wrote:

> On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote:
> > I'm just catching up on the archive for this list.  Some random
> > thoughts:
> >
> >   * a new package, 'web', is definitely in order.
> >     "from web import cookies", "from web import http" just sounds right.
> >     (That contradicts Greg Stein's proposal in PEP 267, but I assume
> >     he's not strongly wedded to that.)
>
> Correct. The name isn't the important part of the PEP. That said, "web" is
> a big misnomer for [package containing] an http client library, but that's
> a bikeshed of an entirely different color :-)

He was talking about the server side!


> I'm more interested in a way of constructing a connection to a server,
> where that connection has some various combination of features:
>
>   * SSL

That's already down at the httplib level (and the socket level, of
course).


>   * Basic/Digest/??? authentication

That's naturally done at the urllib / urllib2 level, given the way it
works.


>   * WebDAV

I plead ignorance.


>   * Proxy
>   * Proxy auth

Somebody has submitted a patch (515003) to shift this to a lower level
than urllib2.  I have no opinion as yet.


> The current model for the client side uses two, distinct classes to deal
> with the SSL feature.

Sorry, which classes are they?


> I have an entirely separate module for the WebDAV
> stuff.

How should it be integrated (if at all), in your opinion (assuming you
want it in the standard library)?


> And authentication isn't even handled in the core http classes, but
> over in urllib(2). Same for proxy support.

See above.


> PEP 267 is about a refactoring to bring these features under one cover,

Er, "Optimized Access to Module Namespaces"?  Which PEP *did* you mean?
I haven't seen it.


John

From jjl at pobox.com  Fri Oct 31 11:35:24 2003
From: jjl at pobox.com (John J Lee)
Date: Fri Oct 31 11:36:26 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: <200310310628.h9V6SYdw023795@localhost.localdomain>
References: <200310310628.h9V6SYdw023795@localhost.localdomain>
Message-ID: <Pine.LNX.4.58.0310311634380.933@alice>

On Fri, 31 Oct 2003, Anthony Baxter wrote:
[...]
> Wouldn't it be better to have something more like:
>
> web/
>    client.py
>    cgi.py
>    server.py
>
> .. and the like? web.http seems so very redundant, web.client seems more
> meaningful.

Nobody has yet explained to me why we need a new module for client-side
code.


John

From ianb at colorstudy.com  Fri Oct 31 11:45:04 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 31 11:45:57 2003
Subject: client-side [was: Re: [Web-SIG] Random thoughts]
In-Reply-To: <Pine.LNX.4.58.0310311623090.933@alice>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
	<Pine.LNX.4.58.0310311623090.933@alice>
Message-ID: <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>

On Oct 31, 2003, at 10:34 AM, John J Lee wrote:
>>   * WebDAV
>
> I plead ignorance.

I don't think urllib2 and WebDAV will work very well together, though 
maybe... in the end, a WebDAV interface has to be a lot more complex 
than a URL-fetching interface.  So even if WebDAV was built on urllib2, 
it would end up looking a lot different in the end.

Though thinking about it... for the most part a WebDAV client could 
*use* urllib2.  The most important things are just using different 
methods (PROPFIND, PUT, etc), and setting the body of the request -- 
these are probably already easy to do with urllib2.  Dealing with 
multiple error responses, and some of the other error responses that 
WebDAV defines, may be more challenging urllib2 (or not, I don't know) 
-- you can do compound operations with WebDAV, and so there may be an 
error message associated with a specific subrequest.  There's some sort 
of "multiple response" response code, but the actual responses are in 
the body of the response.  urllib2 could just do nothing and pass all 
the information on to the WebDAV client and let it reinterpret the 
results.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Fri Oct 31 11:46:22 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 31 11:46:31 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: <Pine.LNX.4.58.0310311634380.933@alice>
References: <200310310628.h9V6SYdw023795@localhost.localdomain>
	<Pine.LNX.4.58.0310311634380.933@alice>
Message-ID: <C23749B4-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>

On Oct 31, 2003, at 10:35 AM, John J Lee wrote:
> On Fri, 31 Oct 2003, Anthony Baxter wrote:
> [...]
>> Wouldn't it be better to have something more like:
>>
>> web/
>>    client.py
>>    cgi.py
>>    server.py
>>
>> .. and the like? web.http seems so very redundant, web.client seems 
>> more
>> meaningful.
>
> Nobody has yet explained to me why we need a new module for client-side
> code.

Nobody likes the name urllib2?  That "2" is pretty icky...

That's probably not a good enough justification, though.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From cs1spw at bath.ac.uk  Fri Oct 31 11:49:16 2003
From: cs1spw at bath.ac.uk (Simon Willison)
Date: Fri Oct 31 11:49:21 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <20031031150922.GA17539@rogue.amk.ca>
References: <C0FC22C08B82074A88B50061764157776B97AA@johnson.mediapulse.net>
	<20031031150922.GA17539@rogue.amk.ca>
Message-ID: <3FA2928C.4080004@bath.ac.uk>

amk@amk.ca wrote:
>>I'd recommend to all here to take a few moments and play with it, then
>>give feedback on any changes you think should be made.  No need to solve
>>this from scratch if we don't have to.
> 
> ... well, except for the other 12 templating solutions that already exist.
> 
> ezt looks very cute, but it's clear that no one has the same requirements
> for templating.  Let's just walk away from trying to choose one.

+1. Everyone's templating style is different. At work, we just spent a 
couple of days implementing our own having looked at over a dozen 
existing systems because none of them quite matched our requirements. 
Templating is the kind of problem to which there is nostraight forward 
solution, and I see no benefit of including it in the standard library 
when so many template systems are already available that cover so many 
different styles.


-- 
Simon Willison
Web development weblog: http://simon.incutio.com/


From jjl at pobox.com  Fri Oct 31 12:52:54 2003
From: jjl at pobox.com (John J Lee)
Date: Fri Oct 31 12:53:00 2003
Subject: client-side [was: Re: [Web-SIG] Random thoughts]
In-Reply-To: <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
	<Pine.LNX.4.58.0310311623090.933@alice>
	<93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0310311750460.1043@alice>

On Fri, 31 Oct 2003, Ian Bicking wrote:
> On Oct 31, 2003, at 10:34 AM, John J Lee wrote:
> >>   * WebDAV
> >
> > I plead ignorance.
>
[...info about WebDAV from Ian...]

Sounds (I'm saying this with virtually no knowledge of the protocol, of
course) like it would be best built on top of urllib2 rather than
integrated with it.  Do you agree, Greg S.?


John

From jjl at pobox.com  Fri Oct 31 12:55:55 2003
From: jjl at pobox.com (John J Lee)
Date: Fri Oct 31 12:56:06 2003
Subject: client-side [was: Re: [Web-SIG] Random thoughts]
In-Reply-To: <Pine.LNX.4.58.0310311750460.1043@alice>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
	<Pine.LNX.4.58.0310311623090.933@alice>
	<93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310311750460.1043@alice>
Message-ID: <Pine.LNX.4.58.0310311755080.1043@alice>

On Fri, 31 Oct 2003, John J Lee wrote:
[...]
> best built on top of urllib2 rather than integrated with it.
[...]

Or entirely separate from it, of course...


John

From neel at mediapulse.com  Fri Oct 31 13:22:03 2003
From: neel at mediapulse.com (Michael C. Neel)
Date: Fri Oct 31 13:22:07 2003
Subject: [Web-SIG] htmlgen
Message-ID: <C0FC22C08B82074A88B500617641577787A7AA@johnson.mediapulse.net>

> ... well, except for the other 12 templating solutions that 
> already exist.
> 
> ezt looks very cute, but it's clear that no one has the same 
> requirements
> for templating.  Let's just walk away from trying to choose one.
> 

Now that's a scary thought.  A problem that is common to several domains
is not addressed because there may be more than one way to address it?

Yes there are several template options out there, but how many can be
considered for inclusion?  Albatross (one I personally find to be
extremely useful) doesn't have much if any scope outside of html
templates, so it wouldn't be a good candidate.  PSP (python server
pages) also have a limited scope.  Going though the list and see which
systems are good candidates, i.e. those that really just provide a good
alternative to %()s, should produce a manageable sized list to consider.

Also inclusion of a template system doesn't preclude the use of any
other template systems, so I don't see the harm.  Python is billed as
"batteries included" so we should be making choosing python for the web
more than just a syntax preference.

Mike

From t.vandervossen at fngtps.com  Fri Oct 31 15:48:58 2003
From: t.vandervossen at fngtps.com (Thijs van der Vossen)
Date: Fri Oct 31 16:32:32 2003
Subject: [Web-SIG] htmlgen
In-Reply-To: <3FA2928C.4080004@bath.ac.uk>
References: <C0FC22C08B82074A88B50061764157776B97AA@johnson.mediapulse.net>	<20031031150922.GA17539@rogue.amk.ca>
	<3FA2928C.4080004@bath.ac.uk>
Message-ID: <3FA2CABA.6010205@fngtps.com>

Simon Willison wrote:
> amk@amk.ca wrote: 
>>> I'd recommend to all here to take a few moments and play with it, then
>>> give feedback on any changes you think should be made.  No need to solve
>>> this from scratch if we don't have to.
>>
>> ... well, except for the other 12 templating solutions that already 
>> exist.
>>
>> ezt looks very cute, but it's clear that no one has the same requirements
>> for templating.  Let's just walk away from trying to choose one.
> 
> +1. Everyone's templating style is different. At work, we just spent a 
> couple of days implementing our own having looked at over a dozen 
> existing systems because none of them quite matched our requirements. 
> Templating is the kind of problem to which there is nostraight forward 
> solution, and I see no benefit of including it in the standard library 
> when so many template systems are already available that cover so many 
> different styles.

+1. My company came to the same conclusion and also developed our own 
matching our requirements.

Let's please drop the issue of templating, we will never find a solution 
fitting everyone's needs.

Regards,
Thijs

-- 
Fingertips __ www.fngtps.com __ +31.(0)20.4896540


From janssen at parc.com  Fri Oct 31 18:42:16 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Oct 31 18:42:44 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: Your message of "Fri, 31 Oct 2003 03:41:09 PST."
	<20031031114109.GA16773@rogue.amk.ca> 
Message-ID: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com>

> Simon wants to differentiate between where a variable comes from;
> http://example/?password=foo is treated differently than when 
> the 'password' variable is specified in the body of a POST.
> 
> --amk

That makes more sense, but I don't see the connection to GET and POST.
Thanks.

Bill

From ianb at colorstudy.com  Fri Oct 31 18:50:36 2003
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Oct 31 18:50:42 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com>
References: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com>
Message-ID: <05FFB5B0-0BFD-11D8-B230-000393C2D67E@colorstudy.com>

On Oct 31, 2003, at 5:42 PM, Bill Janssen wrote:
>> Simon wants to differentiate between where a variable comes from;
>> http://example/?password=foo is treated differently than when
>> the 'password' variable is specified in the body of a POST.
>>
>> --amk
>
> That makes more sense, but I don't see the connection to GET and POST.
> Thanks.

A more accurate description would be "URL parameters" or "query 
parameters" instead of GET.  Though POST variables really are POST 
variables (request body parameters, maybe, but that's kind of 
confusing).  And if you have POST, it's a natural tendency to consider 
the "opposite" of POST as GET and call them GET variables.

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org


From gstein at lyra.org  Fri Oct 31 19:18:02 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 31 19:18:29 2003
Subject: client-side [was: Re: [Web-SIG] Random thoughts]
In-Reply-To: <Pine.LNX.4.58.0310311623090.933@alice>;
	from jjl@pobox.com on Fri, Oct 31, 2003 at 04:34:02PM +0000
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
	<Pine.LNX.4.58.0310311623090.933@alice>
Message-ID: <20031031161802.C3462@lyra.org>

On Fri, Oct 31, 2003 at 04:34:02PM +0000, John J Lee wrote:
> On Thu, 30 Oct 2003, Greg Stein wrote:
> 
> > On Thu, Oct 30, 2003 at 09:51:17PM -0500, Greg Ward wrote:
> > > I'm just catching up on the archive for this list.  Some random
> > > thoughts:
> > >
> > >   * a new package, 'web', is definitely in order.
> > >     "from web import cookies", "from web import http" just sounds right.
> > >     (That contradicts Greg Stein's proposal in PEP 267, but I assume
> > >     he's not strongly wedded to that.)

NOTE: typo here. Greg Ward meant to say "PEP 268"

  (http://www.python.org/peps/pep-0268.html)

> > Correct. The name isn't the important part of the PEP. That said, "web" is
> > a big misnomer for [package containing] an http client library, but that's
> > a bikeshed of an entirely different color :-)
> 
> He was talking about the server side!

No, Greg Ward was talking about an http client. Otherwise, he would not
have mentioned PEP 268.

> > I'm more interested in a way of constructing a connection to a server,
> > where that connection has some various combination of features:
> >
> >   * SSL
> 
> That's already down at the httplib level (and the socket level, of
> course).

I know that (given that I wrote the current httplib :-). However, I
maintain that the implementation uses an improper design.

> >   * Basic/Digest/??? authentication
> 
> That's naturally done at the urllib / urllib2 level, given the way it
> works.

There is nothing "natural" about it. That is where it resides, but
authentication is part of the HTTP specification and should be able to be
used by anything attempting to interact at the HTTP level. HTTP is far
more than "fetch the contents of this URL."

My list was specifically intended to say: each of these items belongs in
the core HTTP (client) service layer. Not urllib.

> >   * WebDAV
> 
> I plead ignorance.

RFC 2518 and RFC 3253. Essentially, WebDAV provides a way to write to your
web server. It also provides for versioning support. And a lot of other
stuff. WebDAV provides a lot of interesting features, layered on top of
HTTP. Thus, any HTTP layer should also be able to provide DAV facilities.

> >   * Proxy
> >   * Proxy auth
> 
> Somebody has submitted a patch (515003) to shift this to a lower level
> than urllib2.  I have no opinion as yet.

Oh, geez. Again with the improper design model. Following in this lead,
we'll end up with a combinatoric explosion of every feature combination
ending up with its own class.

/me goes to comment on that patch

> > The current model for the client side uses two, distinct classes to deal
> > with the SSL feature.
> 
> Sorry, which classes are they?

HTTPConnection and HTTPSConnection.  (or HTTP and HTTPS for the backwards
compat stuff). See above about combinatorics using this design model.

> > I have an entirely separate module for the WebDAV stuff.
> 
> How should it be integrated (if at all), in your opinion (assuming you
> want it in the standard library)?

See PEP 268.

> > And authentication isn't even handled in the core http classes, but
> > over in urllib(2). Same for proxy support.
> 
> See above.

See PEP 268 :-)

> > PEP 267 is about a refactoring to bring these features under one cover,
> 
> Er, "Optimized Access to Module Namespaces"?  Which PEP *did* you mean?
> I haven't seen it.

Sorry, I just blindly repeated the number from Greg Ward's post. It really
should be 268.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

From gstein at lyra.org  Fri Oct 31 19:28:53 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 31 19:29:17 2003
Subject: [Web-SIG] Re: client-side
In-Reply-To: <Pine.LNX.4.58.0310311750460.1043@alice>;
	from jjl@pobox.com on Fri, Oct 31, 2003 at 05:52:54PM +0000
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
	<Pine.LNX.4.58.0310311623090.933@alice>
	<93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310311750460.1043@alice>
Message-ID: <20031031162853.D3462@lyra.org>

On Fri, Oct 31, 2003 at 05:52:54PM +0000, John J Lee wrote:
> On Fri, 31 Oct 2003, Ian Bicking wrote:
> > On Oct 31, 2003, at 10:34 AM, John J Lee wrote:
> > >>   * WebDAV
> > >
> > > I plead ignorance.
> >
> [...info about WebDAV from Ian...]
> 
> Sounds (I'm saying this with virtually no knowledge of the protocol, of
> course) like it would be best built on top of urllib2 rather than
> integrated with it.  Do you agree, Greg S.?

WebDAV belongs on top of httplib, not urllib. And... hey, what do you
know! ... that is exactly how I implemented davlib.py many years ago. In
fact, creating davlib.py was the impetus for rebuilding httplib into a
connection-based client model rather than the old request-based model.

urllib is about fetching content. That's about it. WebDAV was designed
specifically for writing-to/managing your server remotely. Not to mention
that "V" in its name, for versioning.

Cheers,
-g

p.s. http://www.lyra.org/greg/python/ for info on davlib.py

-- 
Greg Stein, http://www.lyra.org/

From grisha at modpython.org  Fri Oct 31 19:30:31 2003
From: grisha at modpython.org (Gregory (Grisha) Trubetskoy)
Date: Fri Oct 31 19:32:07 2003
Subject: [Web-SIG] Random thoughts 
In-Reply-To: <05FFB5B0-0BFD-11D8-B230-000393C2D67E@colorstudy.com>
References: <03Oct31.154221pst."58611"@synergy1.parc.xerox.com>
	<05FFB5B0-0BFD-11D8-B230-000393C2D67E@colorstudy.com>
Message-ID: <20031031192847.N16489@onyx.ispol.com>


On Fri, 31 Oct 2003, Ian Bicking wrote:

> On Oct 31, 2003, at 5:42 PM, Bill Janssen wrote:
> >> Simon wants to differentiate between where a variable comes from;
> >> http://example/?password=foo is treated differently than when
> >> the 'password' variable is specified in the body of a POST.
> >>
> >> --amk
> >
> > That makes more sense, but I don't see the connection to GET and POST.
> > Thanks.
>
> A more accurate description would be "URL parameters" or "query
> parameters" instead of GET.  Though POST variables really are POST
> variables (request body parameters, maybe, but that's kind of
> confusing).  And if you have POST, it's a natural tendency to consider
> the "opposite" of POST as GET and call them GET variables.

And what's even more fun is when a GET variable is submitted via POST :-)

Grisha

From gstein at lyra.org  Fri Oct 31 19:30:19 2003
From: gstein at lyra.org (Greg Stein)
Date: Fri Oct 31 19:32:09 2003
Subject: client-side [was: Re: [Web-SIG] Random thoughts]
In-Reply-To: <93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>;
	from ianb@colorstudy.com on Fri, Oct 31, 2003 at 10:45:04AM -0600
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
	<Pine.LNX.4.58.0310311623090.933@alice>
	<93AE4DA2-0BC1-11D8-88D0-000393C2D67E@colorstudy.com>
Message-ID: <20031031163019.E3462@lyra.org>

Simple answer: to see what a DAV client would look like, see davlib.py.

    http://www.lyra.org/greg/python/

It really *wouldn't* use urllib, which is all about fetching.

On Fri, Oct 31, 2003 at 10:45:04AM -0600, Ian Bicking wrote:
> On Oct 31, 2003, at 10:34 AM, John J Lee wrote:
> >>   * WebDAV
> >
> > I plead ignorance.
> 
> I don't think urllib2 and WebDAV will work very well together, though 
> maybe... in the end, a WebDAV interface has to be a lot more complex 
> than a URL-fetching interface.  So even if WebDAV was built on urllib2, 
> it would end up looking a lot different in the end.
> 
> Though thinking about it... for the most part a WebDAV client could 
> *use* urllib2.  The most important things are just using different 
> methods (PROPFIND, PUT, etc), and setting the body of the request -- 
> these are probably already easy to do with urllib2.  Dealing with 
> multiple error responses, and some of the other error responses that 
> WebDAV defines, may be more challenging urllib2 (or not, I don't know) 
> -- you can do compound operations with WebDAV, and so there may be an 
> error message associated with a specific subrequest.  There's some sort 
> of "multiple response" response code, but the actual responses are in 
> the body of the response.  urllib2 could just do nothing and pass all 
> the information on to the WebDAV client and let it reinterpret the 
> results.
> 
> --
> Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
> 
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/gstein%40lyra.org

-- 
Greg Stein, http://www.lyra.org/

From gward at python.net  Fri Oct 31 21:23:05 2003
From: gward at python.net (Greg Ward)
Date: Fri Oct 31 21:23:08 2003
Subject: [Web-SIG] [server-side] request/response objects
In-Reply-To: <09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com>
References: <20031024132028.C15765@lyra.org>
	<09A2F67A-09E2-11D8-ABB3-000393C2D67E@colorstudy.com>
Message-ID: <20031101022305.GA5781@cthulhu.gerg.ca>

On 29 October 2003, Ian Bicking said:
> The difficulty of writing, say, request.response.write(something) vs. 
> handler.write(something) doesn't seem like a big deal to me.

FWIW, this is how Quixote works.  We started out with completely
separate HTTPRequest and HTTPResponse objects (borrowed from Zope, and
drastically stripped down).  Then somewhere along the line, someone
(Neil S. I think) noted, like Greg S., that you can't have a response
without a request, and vice-versa.  So now the HTTPResponse object is a
accessible as request.response.

It's convenient and simple, and I agree that the request and response
are indeed distinct concepts.  But Greg S.'s "handler" idea has an
appeal too.  One thing that bugs me about Quixote's request.response is
that the request is "special" because it's what's passed around, and the
response is subordinate to it.  That's wrong; although the request comes
first chronologically, the two are equally important in a typical web
app.  So right now I think I'm 51% in favour of a single object.

But I'm not sure if "handler" is the right name though: in English, I
would call it an "HTTP request/response cycle", but that's a bit of a
mouthful for a classname.  (Except in Java, but LetsNotGoThere.)  Maybe
HTTPTransaction -- tack on the "HTTP" and it's pretty clear we're not
talking about databases.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
I hope something GOOD came in the mail today so I have a REASON to live!!

From gward at python.net  Fri Oct 31 21:27:18 2003
From: gward at python.net (Greg Ward)
Date: Fri Oct 31 21:27:20 2003
Subject: [Web-SIG] More prior art, less experimentation
In-Reply-To: <DCFAD68C-068F-11D8-A49B-000393C2D67E@colorstudy.com>
References: <DCFAD68C-068F-11D8-A49B-000393C2D67E@colorstudy.com>
Message-ID: <20031101022718.GB5781@cthulhu.gerg.ca>

On 24 October 2003, Ian Bicking said:
> We *do* have the opportunity to create something that can unify the 
> Python web experience and provide the basis for more adoption of Python 
> for web programming.  To do that we will have to repeat the work done 
> many times before.  We should aspire to quality, but I think we need to 
> hold ourselves back from aesthetic experimentation, and respect 
> convention above our own preferences.  We can still indulge our own 
> fancies outside of the standard library, and building on the standard 
> library -- nothing we do should preclude your individual preferences 
> toward web programming, but it should not preclude other people's 
> preference either.  But most of all it should provide the foundation 
> upon which the mature, *existing* frameworks can build.

+1000.  Hence my statement about disagreeing with the practice of
overlapping GET and POST variables, but supporting that practice *in the
standard library*.  And, simultaneously, *not* supporting that practice
in Quixote, where a slightly different aesthetic prevails.

Whatever we come up with here must be agnostic with respect to many
choices, eg. how to map URLs to code (or data) or how to generate HTML
(or XML, or whatever) pages from code.  (Those two decisions are, IMHO,
at the heart of most web frameworks, and the most prone to religious
discussions -- ie. they have no place in the stdlib.)

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Save energy: be apathetic.

From gward at python.net  Fri Oct 31 21:48:13 2003
From: gward at python.net (Greg Ward)
Date: Fri Oct 31 21:48:19 2003
Subject: [Web-SIG] Random thoughts
In-Reply-To: <20031030222620.B1901@lyra.org>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<20031030222620.B1901@lyra.org>
Message-ID: <20031101024813.GA9101@cthulhu.gerg.ca>

On 30 October 2003, Greg Stein said:
> Correct. The name isn't the important part of the PEP. That said, "web" is
> a big misnomer for [package containing] an http client library, but that's
> a bikeshed of an entirely different color :-)

Really?  I know "world-wide web" (capitalized or not) is a ridiculously
over-used, over-broad term, but what's the alternative?  If an HTTP
client library isn't about "the web", then what the heck is it about?

(BTW, whoever said that "web.client" and "web.server" are better names
than "web.http" is right.  I think.  So far I've agreed with every idea
I've seen on this sig, including the mutually contradicting ones.  ;-)

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Never put off till tomorrow what you can put off till the day after tomorrow.

From gward at python.net  Fri Oct 31 21:54:58 2003
From: gward at python.net (Greg Ward)
Date: Fri Oct 31 21:55:01 2003
Subject: [Web-SIG] Random thoughts
In-Reply-To: <03Oct30.194623pst."58611"@synergy1.parc.xerox.com>
References: <20031031025116.GA7401@cthulhu.gerg.ca>
	<03Oct30.194623pst."58611"@synergy1.parc.xerox.com>
Message-ID: <20031101025458.GA9131@cthulhu.gerg.ca>

[me]
>   * I oppose Simon Willison's practice of using the same variable
>     in the "GET" and "POST" part of a request, but I will defend to the
>     death his right to do so.  (But not in Quixote, where a narrower
>     definition of what is Right, Good, and Truthfull prevails.)

[Bill Janssen]
> I don't get it.  Any particular request only has one method, not two:
> "GET" and "POST".  Are you talking about for some reason
> special-casing these two methods in the Request class?  I think it
> makes more sense to do things generically:

Sorry, lame/fuzzy terminology on part.  AMK cleared it up nicely.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
If you can read this, thank a programmer.

From richardjones at optushome.com.au  Sat Oct 25 00:08:13 2003
From: richardjones at optushome.com.au (Richard Jones)
Date: Mon Nov  3 14:52:29 2003
Subject: [Web-SIG] Client-side support - webunit is back :)
Message-ID: <200310251408.13911.richardjones@optushome.com.au>

[sorry, I'm not subscribed to this list - I simply don't have the spare 
cycles]

I noticed some archive messages saying webunit code was off the air. I've been 
migrating my website, and the code's back now. See webunit's PyPI page for 
info:

   http://www.python.org/pypi?:action=display&name=webunit&version=1.3.3

and the code is at:

   http://mechanicalcat.net/tech/webunit/


     Richard

ps. from the discussion, it sounds like my code does pretty much everything 
that has been asked of client-side code. It's not pretty but is used in Real 
Life.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://mail.python.org/pipermail/web-sig/attachments/20031025/d7f20ea8/attachment.bin
From aahz at pythoncraft.com  Tue Oct 28 13:10:10 2003
From: aahz at pythoncraft.com (Aahz)
Date: Mon Nov  3 14:52:34 2003
Subject: [Python-Dev] Re: [Web-SIG] Threading and client-side support
In-Reply-To: <Pine.LNX.4.58.0310281718580.601@alice>
References: <F97C5A85-0806-11D8-A3EF-000393C2D67E@colorstudy.com>
	<Pine.LNX.4.58.0310270400450.1734@alice>
	<20031027150709.GA29045@rogue.amk.ca>
	<Pine.LNX.4.58.0310281033410.467@alice>
	<20031028124646.GB1095@rogue.amk.ca>
	<Pine.LNX.4.58.0310281718580.601@alice>
Message-ID: <20031028181009.GA20129@panix.com>

On Tue, Oct 28, 2003, John J Lee wrote:
> On Tue, 28 Oct 2003 amk@amk.ca wrote:
>> On Tue, Oct 28, 2003 at 10:35:33AM +0000, John J Lee wrote:
>>>
>>> Thanks.  So, in particular, httplib, urllib and urllib2 are thread-safe?
>>
>> No idea; reading the code would be needed to figure that out.
> 
> That might not be helpful if the person reading it (me) has zero
> threading experience ;-)
>
> I certainly plan to gain that experience, but surely *somebody*
> already knows whether they're thread-safe?  I presume they are,
> broadly, since a couple of violations of thread safety are commented
> in urllib2 and urllib.  Right?

Generally speaking, any code that does not rely on global objects is
thread-safe in Python.  For more information, let's take this to
python-list.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From thijs at vandervossen.net  Thu Oct 30 02:55:24 2003
From: thijs at vandervossen.net (Thijs van der Vossen)
Date: Mon Nov  3 14:52:39 2003
Subject: [Web-SIG] Form field dictionaries
In-Reply-To: <7325A4A1-0A2B-11D8-ABB3-000393C2D67E@colorstudy.com>
References: <7325A4A1-0A2B-11D8-ABB3-000393C2D67E@colorstudy.com>
Message-ID: <200310300855.24441.thijs@vandervossen.net>

On Wednesday 29 October 2003 17:17, Ian Bicking wrote:
> On Wednesday, October 29, 2003, at 11:12 AM, Barry Warsaw wrote:
> > Dumb-ass suggestion of the day: what if the field values were
> > represented by a dict subclass, and we had several different
> > subclasses,
> > each of which specified the exact behavior for __getitem__().  E.g.
> > David could have his "_getitem__ is getfirst" behavior, Steve could
> > have
> > his verified-multiples behavior, and I could have my "always return a
> > list" behavior.  We'd then be reduced to choosing a default and a few
> > interfaces and everyone would be happy <wink>.
>
> That would make me unhappy... next thing you know, you'll be
> introducing a magic quoting dict subclass...

Aargh! Maybe it's time to move to Ruby for web development without magic? ;-)

Regards,
Thijs

-- 
Fingertips __ www.fngtps.com __ +31.(0)20.4896540

From fincher.8 at osu.edu  Thu Oct 30 16:03:15 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Mon Nov  3 14:52:46 2003
Subject: [Web-SIG] Re: [Python-Dev] HTML parsing: anyone use formatter?
In-Reply-To: <20031030192718.GA13220@rogue.amk.ca>
References: <20031030192718.GA13220@rogue.amk.ca>
Message-ID: <200310301603.15437.fincher.8@osu.edu>

On Thursday 30 October 2003 02:27 pm, amk@amk.ca wrote:
> I suppose the more general question is, does anyone use Python's formatter
> module?  Do we want to keep it around, or should htmllib be pushed toward
> doing just HTML parsing?  formatter.py is a long way from being able to
> handle modern web pages and it would be a lot of work to build a decent
> renderer.

I've never used it myself, though I'll admit that some software I've used (for 
searching the IMDB) does use it.

Jeremy