Downloading binary files - Python3
Peter Otten
__peter__ at web.de
Sat Mar 21 10:45:48 EDT 2009
Anders Eriksson wrote:
> Hello,
>
> I have made a short program that given an url will download all referenced
> files on that url.
>
> It works, but I'm thinking it could use some optimization since it's very
> slow.
>
> I create a list of tuples where each tuple consist of the url to the file
> and the path to where I want to save it. E.g
> (http://somewhere.com/foo.mp3, c:\Music\foo.mp3)
>
> The downloading part (which is the part I need help with) looks like this:
> def GetFiles():
Consider passing 'hreflist' explicitly. Global variables make your script
harder to manage in the long run.
> """do the actual copying of files"""
> for url,path in hreflist:
> print(url,end=" ")
You can force python to write out its internal buffer by calling
sys.stdout.flush()
You may also take a look at the logging package.
> srcdata = urlopen(url).read()
For large files you would read the source in chunks:
src = urlopen(url)
with open(path, mode="wb") as dstfile:
while True:
chunk = src.read(2**20)
if not chunk:
break
dstfile.write(chunk)
Instead of writing this loop yourself you can use
shutil.copyfileobj(src, dstfile)
or even
urllib.request.urlretrieve(url, path)
which also takes care of opening the file.
> dstfile = open(path,mode='wb')
> dstfile.write(srcdata)
> dstfile.close()
> print("Done!")
>
> hreflist if the list of tuples.
>
> at the moment the print(url,end=" ") will not be printed before the actual
> download, instead it will be printed at the same time as print("Done!").
> This I would like to have the way I intended.
>
> Is downloading a binary file using: srcdata = urlopen(url).read()
> the best way? Is there some other way that would speed up the downloading?
The above method may not faster (the operation is "io-bound") but it is able
to handle large files gracefully.
Peter
More information about the Python-list
mailing list