Download excel file from web?

patf at patf at
Tue Jul 29 00:07:20 CEST 2008

On Jul 28, 3:00 pm, "p... at" <p... at> wrote:
> Hi - experienced programmer but this is my first Python program.
> This URL will retrieve an excel spreadsheet containing (that day's)
> msci stock index returns.
> Want to write python to download and save the file.
> So far I've arrived at this:
> [quote]
> # import pdb
> import urllib2
> from win32com.client import Dispatch
> xlApp = Dispatch("Excel.Application")
> # test 1
> # xlApp.Workbooks.Add()
> # xlApp.ActiveSheet.Cells(1,1).Value = 'A'
> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
> # xlBook = xlApp.ActiveWorkbook
> # xlBook.SaveAs(Filename='C:\\test.xls')
> # pdb.set_trace()
> response = urllib2.urlopen('
> excel?
> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> +25%2C+2008&export=Excel_IEIPerfRegional')
> # test 2 - returns check = False
> check_for_data = urllib2.Request('
> indexperf/excel?
> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> +25%2C+2008&export=Excel_IEIPerfRegional').has_data()
> xlApp = response.fp
> print(
> print(
> xlApp.write
> xlApp.Close
> [/quote]

Woops hit Send when I wanted Preview.  Looks like the html [quote] tag
doesn't work from (nice).

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

So, in theory, I'm retrieving my excel spreadsheet with

response = urllib2.urlopen()

Except what then do I do with this?

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it.  It returns False.
Hmm that's not encouraging.

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

I use pdb to debug.  This is interesting:

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff).  Would be happy to learn if
that's the case (and if that gets the job done for me).


More information about the Python-list mailing list