Download the "head" of a large file?

Mon Jul 27 22:16:42 EDT 2009

En Mon, 27 Jul 2009 19:40:25 -0300, John Yeung
<gallium.arsenide at gmail.com> escribió:

> On Jul 27, 4:38 pm, erikcw <erikwickst... at gmail.com> wrote:
>> I'm trying to figure out how to download just the first few lines of a
>> large (50mb) text file form a server to save bandwidth.  Can Python do
>> this?
>>
>> Something like the Python equivalent of curlhttp://url.com/file.xml|
>> head -c 2048
>
> urllib.urlopen gives you a file-like object, which you can then read
> line by line or in fixed-size chunks.  For example:
>
> import urllib
> chunk = urllib.urlopen('http://url.com/file.xml').read(2048)
>
> At that point, chunk is just bytes, which you can write to a local
> file, print, or whatever it is you want.

As the OP wants to save bandwidth, it's better to ask exactly the amount
of data to read. That is, add a Range header field [1] to the request, and
inspect the response for a corresponding Content-Range header [2].

py> import urllib2
py> url = "http://www.python.org/"
py> req = urllib2.Request(url)
py> req.add_header('Range', 'bytes=0-10239')  # first 10K
py> f = urllib2.urlopen(req)
py> data = f.read()
py> print repr(data[-30:]), len(data)
'\t    <a href="http://www.zope.' 10240
py> f.headers['Content-Range']
'bytes 0-10239/18196'
py> f.getcode()
206            # 206=Partial Content
py> f.close()

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35

[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16

-- 
Gabriel Genellina