urllib, urllib2, httplib -- Begging for consolidation?

brueckd at tbye.com brueckd at tbye.com
Thu May 9 13:31:04 EDT 2002


On 9 May 2002, Paul Boddie wrote:

> For me, what I've mostly been doing with urllib is to connect to
> locations and to download files. Indeed, having this functionality in
> the standard library is incredibly useful when a server

Right, and I'm not in any way advocating the elimination of this, only
that more of the meat of the functionality be built in the corresponding
protocol libraries (and not urllib), on top of which is built a thin,
general-purpose API that does simple "protocol triage" if you will.

> between HTTP and FTP for such simple operations, and the following
> URLs, which can be used to access a remote server (with authentication
> details provided) and to download a resource, both make sense:
> 
>   http://user@myserver:8080/docs/resource
>   https://user@myserver:8080/docs/resource
>   ftp://user@myserver/pub/docs/resource

Yes, I'm not disagreeing that at the simplest level these resources can be 
treated (read) as simple files. What I _am_ saying is that this simplest 
form of retrieval should also be that easy using the respective protocol 
modules, so that incrementally more complex tasks with a specific protocol 
are incrementally more work and, more importantly, pretty obvious "next 
steps".

> Even the following URLs, which I've just made up, share several common
> details with those above:
> 
>   pop3://user at myserver/INBOX/123
>   imap://user@myserver/INBOX/123

Yes, the URLs are similar, but again the usefulness begins to break down. 
If your program is reading a mailbox message it'll need to know how to 
handle the mail headers, so the likelihood of you treating it as a plain 
file is close to nil.

> It would be interesting to hear what kinds of things force you to deal
> with httplib.

Adding request headers, dealing with cookies or HTTP/1.1 features in 
general, reusing the connection, etc. But in reality your question implies 
that I've done a poor job of conveying my point: it's not a question of 
whether or not things _can_ be done via urllib, it's that (if we were to 
reorganize them), it should be so that if you know you're using a 
particular protocol you should be able to use that protocol module just as 
easily as you can use urllib. If you want to do something 
protocol-specific, urllib should never be the place to do it (whereas 
today you have stuff like FancyURLOpener which knows how to follow HTTP 
redirects - great stuff, but it doesn't belong in urllib).

> > A similar approach might work well for the different protocol libraries - 
> > go to the appropriate module to open the one you want (setting it up with 
> > any protocol-specific information), after which you have a file-like 
> > object that your code can use generically.

> I believe that specialisation may be introduced into the objects
> returned by whatever function was used to create them (even if the
> creation was delegated to other functions, classes or modules).

Yes, the two preceding paragraphs propose the same thing.

> Again, I would say that many protocols lend themselves to a filesystem
> type of view of the remote location.

At the simplest level, yes, which is why I'm in favor of keeping the 
open-generic-url functionality. At anything beyond the simplest level, 
however, the way you look at them diverges wildly. Files lend themselves 
to true random access for both reading and writing, and generally don't 
have security credentials specified in their "url". HTTP lends itself to 
stateless transactions, where the data returned may vary wildly depending 
on the headers and data submitted; it is also commonly used to initiate 
action on the server (in which case "retrieving the object at the given 
URL" makes little sense). FTP is mostly limited to retrieving all of a 
file or a later chunk of a partial file.

> The very nature of URLs tends to suggest a filesystem abstraction of
> network resources - that's why they were invented, after all.

URLs suggest a uniform method to _locate_ resources, not _use_ them.

-Dave






More information about the Python-list mailing list