PyPdf/pdfminer library will be of help<br><br><div class="gmail_quote">On Wed, Feb 24, 2010 at 1:47 AM, Tim Chase <span dir="ltr"><<a href="mailto:python.list@tim.thechases.com">python.list@tim.thechases.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">monkeys paw wrote:<br>

<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

I used the following code to download a PDF file, but the<br>

file was invalid after running the code, is there problem<br>

with the write operation?<br>

<br>

import urllib2<br>

url = '<a href="http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf" target="_blank">http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf</a>'<br>

a = open('adobe.pdf', 'w')<br>

</blockquote>

<br></div>

Sure you don't need this to be 'wb' instead of 'w'?<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

for line in urllib2.urlopen(url):<br>

     a.write(line)<br>

</blockquote>

<br></div>

I also don't know if this "for line...a.write(line)" loop is doing newline translation.  If it's a binary file, you should use .read() (perhaps with a modest-sized block-size, writing it in a loop if the file can end up being large.)<br>


<br>

-tkc<div><div></div><div class="h5"><br>

<br>

<br>

-- <br>

<a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>

</div></div></blockquote></div><br>