How to grab a part of web page?
brueckd at tbye.com
brueckd at tbye.com
Wed Jul 10 12:12:47 EDT 2002
On 10 Jul 2002, William Park wrote:
> Gerhard H?ring <gerhard.haering at gmx.de> wrote:
> >> I think it's possible. When I use 'wget -c' to download HTML, sometimes
> >> I see it start from an offset, rather than from the beginning.
> >
> > Doing this while having a packet sniffer running showed me what the
> > corresponding HTTP header is.
>
> >From the docs that I came across,
>
> HTTP/1.1:
> - "Range" header in request
> - "multipart/byteranges" type in the response
>
> Older HTTP:
> - "Request-Range" header
> - "multipart/x-byteranges" type
>
> but the exact syntax I don't remember.
(sorry this isn't replying to the OP's post..)
Byte range requests can request multiple ranges, but both the requests and
the responses are more complex and ill-supported, so it's usually best to
just go with the straightforward single-range request:
GET /foo HTTP/1.1
Range: bytes=500-1000
With a single-range request the response comes back as a normal response
(i.e. not as a multipart MIME message):
HTTP/1.1 206 Partial content
Content-Range: bytes 500-1000/2000
Content-Length: 501
(500-1000/2000 means the response includes bytes of data from offset 500
to 1000, inclusive, and the total size of the object is 2000 bytes and the
content-length header shows that 501 bytes are in the response).
You can also include an If-Range request header (plus an etag or
last-modified date) to have the server respond with either the requested
range or the entire object if the object is now out of date.
-Dave
More information about the Python-list
mailing list