python dowload

Tue Feb 23 15:17:19 EST 2010

On Tue, Feb 23, 2010 at 2:42 PM, monkeys paw <monkey at joemoney.net> wrote:
> I used the following code to download a PDF file, but the
> file was invalid after running the code, is there problem
> with the write operation?
>
> import urllib2
> url = 'http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf'
> a = open('adobe.pdf', 'w')
> for line in urllib2.urlopen(url):
>    a.write(line)

Two guesses:

First, you need to call a.close() when you're done writing to the file.

This will happen automatically when you have no more references to the
file, but I'm guessing that you're running this code in IDLE or some
other IDE, and a is still a valid reference to the file after you run
that snippet.

Second, you're treating the pdf file as text (you're assuming it has
lines, you're not writing the file in binary mode, etc.).  I don't
know if that's correct for a pdf file.  I would do something like this
instead:

Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit
(Intel)] on win32
IDLE 2.6.4

>>> import urllib2
>>> url = 'http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf'
>>> a = open('C:/test.pdf', 'wb')
>>> data = urllib2.urlopen(url).read()
>>> a.write(data)
>>> a.close()

That seems to works for me, in that it downloads a 16 page pdf
document, and that document opens without error or any other obvious
problems.

-- 
Jerry