Adding resume (206) support to urllib(2)
Hi guys, I've just been putting together a podcasting doodad and have included resuming support in it. Is this something that's already in the pipeline or should I abstract it out to urllib and submit a patch? Dan
Dan> I've just been putting together a podcasting doodad and have Dan> included resuming support in it. Is this something that's already Dan> in the pipeline or should I abstract it out to urllib and submit a Dan> patch? Check urllib2 before putting any effort into extending urllib. If a patch is needed it should go in urllib2. Skip
Daniel Watkins schrieb:
I've just been putting together a podcasting doodad and have included resuming support in it. Is this something that's already in the pipeline or should I abstract it out to urllib and submit a patch?
Not sure where you got the impression that 206 is "resume"; in my copy of the spec it's "partial content", and you must have put a Range: header into the request to get that in the first place. If I had to use that, I'd implement it right on top of httplib, and wouldn't bother with urllib*: this is really specific to http, and adding it to urllib would break the abstraction. In any case, there is no "pipeline" it may be in (except for changes that have already been committed to the trunk). Something may have been submitted as a patch or feature request, but a quick search reveals no relevant open issues. Regards, Martin
Martin v. Löwis wrote:
I've just been putting together a podcasting doodad and have included resuming support in it. Is this something that's already in the pipeline or should I abstract it out to urllib and submit a patch?
Not sure where you got the impression that 206 is "resume"; in my copy of the spec it's "partial content", and you must have put a Range: header into the request to get that in the first place.
If I had to use that, I'd implement it right on top of httplib, and wouldn't bother with urllib*: this is really specific to http, and adding it to urllib would break the abstraction.
given that urllib2 already supports partial requests, I'm not sure I see the point of reimplementing this on top of httplib. an example: import urllib2 request = urllib2.Request("http://www.pythonware.com/daily/index.htm") request.add_header("range", "bytes=0-999") http_file = urllib2.urlopen(request) print http_file.headers["content-range"] print len(http_file.read()) this prints: bytes 0-999/245105 1000 </F>
Fredrik Lundh schrieb:
given that urllib2 already supports partial requests, I'm not sure I see the point of reimplementing this on top of httplib. an example:
import urllib2
request = urllib2.Request("http://www.pythonware.com/daily/index.htm") request.add_header("range", "bytes=0-999")
But what does this do if the URL was a file URL, or an ftp URL? You have to know the syntax of the range header, and you have to know the syntax of the content-range header, to process it. With that, you can just as easily use httplib: py> import httplib py> h=httplib.HTTPConnection("www.pythonware.com") py> h.putrequest("GET","/daily/index.htm") py> h.putheader("range","bytes=0-999") py> h.endheaders() py> r=h.getresponse() py> r.getheader("content-range") 'bytes 0-999/245105' py> len(r.read()) 1000 If you add protocol-specifics to urllib, the abstraction that urllib provides goes away, and you are better off (IMO) to use the underlying protocol library in the first place. I'm not sure what the OP wanted to contribute in the first place (given that it "works" already either way), but it might have been a convenience API for the range header, and a parser for the content-range header. That should go IMO into httplib, so that all users of httplib get access to it, not just urllib*. Regards, Martin
On Wed, Dec 13, 2006 at 08:30:00AM +0100, "Martin v. L?wis" wrote:
If you add protocol-specifics to urllib, the abstraction that urllib provides goes away, and you are better off (IMO) to use the underlying protocol library in the first place.
IMO you better don't because urllib2 provides not only an abstraction, but a lot of services (authenticated proxies, cached FTP files)... Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Oleg Broytmann schrieb:
IMO you better don't because urllib2 provides not only an abstraction, but a lot of services (authenticated proxies, cached FTP files)...
If you are using http ranges, cached FTP files won't do any good. As for authenticated proxies: I think they ought to be implemented in httplib as well. If everybody wants to become urllib just a better library to access http servers, I probably can't do much about it, though. Regards, Martin
On Wed, Dec 13, 2006 at 09:05:49AM +0100, "Martin v. L?wis" wrote:
As for authenticated proxies: I think they ought to be implemented in httplib as well.
Agree.
If everybody wants to become urllib just a better library to access http servers, I probably can't do much about it, though.
HTTP is one of the most widely known and used protocol. Would you better have big httplib and small abstract urllib? so abstract it doesn't allow a user to change protocol-specific handling? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Oleg Broytmann schrieb:
HTTP is one of the most widely known and used protocol. Would you better have big httplib and small abstract urllib? so abstract it doesn't allow a user to change protocol-specific handling?
Personally, I think very elaborate support for HTTP in httplib, and very few generalizations and abstractions in urllib* would be the "right" way to do it, IMO. For example, there might be the notion of an "http session" object where a single application request can resolve to multiple http requests (with redirection, authentication negotiation, cookies, 100 continue, implicit headers, etc). For compatibility, urllib* can't drop features, and we'd need contributors who contribute such a refactoring, but IMO that would be the right way. If applications use urllib *only* for http, and *only* because it has these multi-request, implicit headers features, something is wrong with the abstractions. Regards, Martin
On Wed, Dec 13, 2006 at 10:00:48PM +0100, "Martin v. L?wis" wrote:
Personally, I think very elaborate support for HTTP in httplib, and very few generalizations and abstractions in urllib* would be the "right" way to do it, IMO. For example, there might be the notion of an "http session" object where a single application request can resolve to multiple http requests (with redirection, authentication negotiation, cookies, 100 continue, implicit headers, etc).
I see.
For compatibility, urllib* can't drop features
Leave it for py3k, then.
and we'd need contributors who contribute such a refactoring
That's the hardest part.
If applications use urllib *only* for http, and *only* because it has these multi-request, implicit headers features, something is wrong with the abstractions.
I think I agree. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Martin v. Löwis wrote:
given that urllib2 already supports partial requests, I'm not sure I see the point of reimplementing this on top of httplib. an example:
import urllib2
request = urllib2.Request("http://www.pythonware.com/daily/index.htm") request.add_header("range", "bytes=0-999")
But what does this do if the URL was a file URL, or an ftp URL?
same thing as if you use range on a HTTP server that doesn't support ranges. you get all the data, and there's no content-range field in the response header.
You have to know the syntax of the range header, and you have to know the syntax of the content-range header, to process it. With that, you can just as easily use httplib:
I'm not sure "as easily" is the right way to describe something that requires more code but yet leaves out practical things as redirection support, host and user-agent headers, etc. </F>
participants (5)
-
"Martin v. Löwis"
-
Daniel Watkins
-
Fredrik Lundh
-
Oleg Broytmann
-
skip@pobox.com