How to grab a part of web page?

brueckd at tbye.com brueckd at tbye.com
Wed Jul 10 12:12:47 EDT 2002


On 10 Jul 2002, William Park wrote:

> Gerhard H?ring <gerhard.haering at gmx.de> wrote:
> >> I think it's possible.  When I use 'wget -c' to download HTML, sometimes
> >> I see it start from an offset, rather than from the beginning.
> > 
> > Doing this while having a packet sniffer running showed me what the
> > corresponding HTTP header is.
> 
> >From the docs that I came across,
> 
>     HTTP/1.1:
> 	- "Range" header in request
> 	- "multipart/byteranges" type in the response
> 
>     Older HTTP:
> 	- "Request-Range" header
> 	- "multipart/x-byteranges" type
> 
> but the exact syntax I don't remember.

(sorry this isn't replying to the OP's post..)

Byte range requests can request multiple ranges, but both the requests and 
the responses are more complex and ill-supported, so it's usually best to 
just go with the straightforward single-range request:

GET /foo HTTP/1.1
Range: bytes=500-1000

With a single-range request the response comes back as a normal response 
(i.e. not as a multipart MIME message):

HTTP/1.1 206 Partial content
Content-Range: bytes 500-1000/2000
Content-Length: 501

(500-1000/2000 means the response includes bytes of data from offset 500 
to 1000, inclusive, and the total size of the object is 2000 bytes and the 
content-length header shows that 501 bytes are in the response).

You can also include an If-Range request header (plus an etag or 
last-modified date) to have the server respond with either the requested 
range or the entire object if the object is now out of date.

-Dave






More information about the Python-list mailing list