urllib, urllib2, httplib -- Begging for consolidation?

John J. Lee jjl at pobox.com
Thu Jun 6 17:55:07 EDT 2002


On Wed, 5 Jun 2002 brueckd at tbye.com wrote:

> On Wed, 5 Jun 2002, John J. Lee wrote:
[...]
> Before I jump in I do want to emphasize that I'm not trying to be too
> critical of httplib/urllib/urllib2 - the OP asked if they should be
> consolidated, and my short response is "yes, and reorganized too". :-)

OK -- but no reason for consolidation of the implementation, yes.  I don't
really think there is any reason for consolidation of the interface either
(which I suppose is what you mean by reorganisation), but the
documentation for httplib might include a pointer to urllib{2,}, since as
you point out below it's probably not obvious that that's where the HTTP
redirection stuff is.

[...]
> So, depending on your specific needs, you'd "plug in" to this hierarchy at
> the appropriate level. If all you need is to go fetch some object, you'd
[...]
> The key though is that each level you'd have to reinvent as little as
> possible - you'd build on related work from lower levels.

Almost goes without saying.  But again, I think that's what we've got
already.

[...]
> functionality instead of enforced layering). The approach we have today is
> *sort of* like this, except that richer and smarter functionality is being
> added at the *top* of the hierarchy.

The redirection stuff was added between httplib and the OpenerDirector /
urlopen stuff.  That's not at the top.

> Problem #1 is what makes me throw my hands up in frustration when we talk
> about, e.g., expanding the urllib APIs to have some way to do a HEAD
> request: it doesn't belong on that level of functionality.

I think you mean "it doesn't belong in a module named 'urllib2'"... or at
least you should mean that ;)

> That generic
> interface is for making it trivially easy to fetch the contents associated
> with a URL, independent as much as possible from protocol.

Only part of urllib2 (OpenerDirector &c) is a generic any-old-url-scheme
API.  The rest (eg. AbstractHTTPHandler, HTTPRedirectHandler) is
HTTP-specific, and can be used on its own, without going through the
generic stuff.

[...]
> Probably 90% of the problem is naming and/or module organization - urllib,
> a module to retrieve the contents given a generic URL, should be just that
> and little else, and protocol-specific knowledge should be accessible from
> protocol-specific modules.

Given how it is now and need for backwards-compat., what do you suggest to
do?  I think putting a pointer in httplib docs (if there's not one there
already) is the best that can be done now.


John




More information about the Python-list mailing list