urllib, urllib2, httplib -- Begging for consolidation?

Wed May 8 06:03:09 EDT 2002

brueckd at tbye.com wrote in message news:<mailman.1020782351.10319.python-list at python.org>...
> On Tue, 7 May 2002, A. Keyton Weissinger wrote:
> 
> > Am I the only one that thinks these need to be pulled together some? I saw a
> > PEP (268?) where there are some rumblings about adding some things to it as
> > well. Maybe a combo project?
> 
> Yes, part of the problem is that it's not obvious when you should use 
> which (e.g. urllib vs. urllib2).

Indeed.

> BUT, if there were to occur some sort of consolidation (meaning, 
> introducing incompatibilities or a whole new module), then we should use 
> that as an opportunity to restructure/redesign that whole set of modules 
> because, IMO, they've evolved past their original design. If we can come 
> up with a good organization, the actual implementation could be handled by 
> various members of the community.

I think we should stick with the urlopen concept because it's very
powerful - just open a URL and pretend that it's a file. The clever
design will arise when specialised features of various protocols need
to be specified whilst using the general interface, but then there are
plenty of other packages which deal with this kind of problem; for
example, the DB-API has ways of allowing database-specific
functionality to be specified when opening database connections.

> The original premise of urllib, that it helps your app open any type of 
> URL in roughly the same way, is pretty neat but now both urllib and 
> urllib2 have lots of stuff tacked on that is pretty HTTP-specific. Also, 
> I usually need to support only one protocol and I know in advance which 
> that is (usually HTTP, sometimes FTP), but the httplib docs imply that 
> httplib is more of an internal module.

What I found, after experimentation, was that httplib just wasn't
needed for my purposes. I wrongly believed that redirects weren't
going to be handled by urlopen and that I would have to extend various
httplib classes to get the required functionality, but in fact urlopen
proved to handle everything I needed in a transparent way.

> So... if we were to change something, I'd like us to build a rich HTTP
> library that supports the super easy use case (gimme the data at this URL,
> optionally posting this data right here first) as well as more complicated
> cases (add in these request headers before sending the request to the
> origin). It would be in this module (or one closely tied to it) that we'd
> capture knowledge about the HTTP protocol, such as parsing and building
> HTTP 1.0 and 1.1 compliant request and response headers, handling cookies,
> basic and digest authentication, '\n' vs. '\r\n' line endings, easy-to-use
> HTTPS, etc. Supporting routines (like quote, urlencode, urlparse) can
> either be imported and exposed through the HTTP module, or kept in a
> module with better definied boundaries.

I'd like to see support for FTP and other protocols being just as
important as HTTP, especially since I haven't had as much luck with
FTP in conjunction with urlopen as I have with HTTP, although these
problems can be extremely hard to diagnose sometimes. I strongly
disagree that URL manipulation should be accessed through a
HTTP-specific module - the last thing we need is a "beware of the
leopard" situation in the standard library (where things are tucked
away in obscure or bizarre places, depending on the context of the
enquiry).

> We could take the same approach with other protocols, and include modules 
> for FTP, plain files, etc. With all those in place we could still have the 
> "open any type of URL" routine built on top, but it should work only for 
> the simplest of use cases; if you need something more complex then you'd 
> go use the corresponding protocol library yourself.

The key to this exercise is making the uncommon case almost as easy to
handle as the common case so that one doesn't necessarily need to
learn a completely new framework in order to get that 1% of
functionality that the common case doesn't deal with.

> I'm not suggesting that we scrap the current protocol modules (they've be 
> very, very useful); it's just that over time they've grown up and are due 
> for some redesign/refactoring (the kind that will not be backwards 
> compatible).

I think that various parts of urllib can be retained in an
almost-backward-compatible way, but I'm not so sure about httplib.

Paul