[Web-SIG] So what's missing?

Sat Oct 25 14:25:25 EDT 2003

On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote:
> On Sat, 25 Oct 2003, Ian Bicking wrote:
>> To client-side, I would add that authentication is too hard in 
>> urllib2,
>> and only works for HTTP (for trivially reasons).  I think think
>> urllib2's subclasses are unnecessarily complicated -- authentication
>> handling could be put directly in the HTTP/HTTPS, both basic and
>> digest.
>
> It's a minor issue, but it seems nicer to me to have authentication
> separate if it can easily be separate -- that fits in with the general
> philosophy of urllib2 that you pick 'n mix the features you want.  What
> are the trivial reasons for it breaking on non-HTTP auth?

There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and 
though the two concepts are orthogonal they are still tied into each 
other.  Another option would be to take HTTPS out of the class 
hierarchy, and make SSL a feature of HTTPHandler (and maybe the other 
handlers too, FTP/SSL does exist after all).

The AuthHandlers are a little annoying too, you can't just give them a 
username/password.  You have to give them some manager object that can 
be queried for a password for a username/realm/URL.  This is a nice 
option to have, but in most cases you don't need that kind of 
generality, and it makes it a lot harder to understand what you need to 
do.  username=x, password=y are very easy to understand.

>> Goes together with post/multipart, and I think these shouldn't
>> be too hard to add.
>
> How does this go together with post/multipart?  Do you just mean that
> you're likely to post the multipart data using urllib2.urlopen?

Yes, that's what I mean -- same code involved.

>
>> There is also some talk about putting urllib2 and urlparse together,
>> i.e., have a URL object.  The distinction between the urllib, urllib2,
>> and urlparse libraries is not very good, e.g., urllib.quote (and
>> friends) are more related to urlparse than urllib.  A URL object could
>> unify all these.
>
> It's an appealing idea, especially given the cuteness of string
> subclassing ;-)
>
>
>> Cookie handling also fits into this, but from the opposite direction
>> from a URL object, since we are creating something of a user agent.
>> You'd almost want to do:
>>
>> ua = UserAgent()
>> url = web.URL('http://whatever.com')
>> content = ua.get(url)
>>
>> Or something like that.  I think an explicit agent is called for,
>> separate from the URLs that it may retrieve.  But only when you start
>> considering cookies and caching.
> [...]
>
> Are you suggesting replacing urllib2, building on top of it, or 
> extending
> it?  urllib2's handlers already gets a lot of the 'user-agent' job 
> done.
> What requirements does caching impose that urllib2 doesn't meet?  
> There's
> already a CacheFTPHandler.

I think a URL class would probably building on top of urllib2, but 
would also need some more features.  And obviously urllib2 can't go 
anywhere, so we might as well use it.

The caching in CacheFTPHandler is connection caching, not result 
caching.  HTTP has a wide array of ways to indicate caching, check for 
updates, etc.  Enough that it becomes kind of complicated, which is why 
I don't think that fits well into the idea of a URL object (which 
should be quite simple, at least from the outside).

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org