Download excel file from web?

Mon Jul 28 18:29:09 EDT 2008

patf at well.com schrieb:
> On Jul 28, 3:00 pm, "p... at well.com" <p... at well.com> wrote:
>> Hi - experienced programmer but this is my first Python program.
>>
>> This URL will retrieve an excel spreadsheet containing (that day's)
>> msci stock index returns.
>>
>> http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
>>
>> Want to write python to download and save the file.
>>
>> So far I've arrived at this:
>>
>> [quote]
>> # import pdb
>> import urllib2
>> from win32com.client import Dispatch
>>
>> xlApp = Dispatch("Excel.Application")
>>
>> # test 1
>> # xlApp.Workbooks.Add()
>> # xlApp.ActiveSheet.Cells(1,1).Value = 'A'
>> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
>> # xlBook = xlApp.ActiveWorkbook
>> # xlBook.SaveAs(Filename='C:\\test.xls')
>>
>> # pdb.set_trace()
>> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>> excel?
>> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> +25%2C+2008&export=Excel_IEIPerfRegional')
>> # test 2 - returns check = False
>> check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>> indexperf/excel?
>> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> +25%2C+2008&export=Excel_IEIPerfRegional').has_data()
>>
>> xlApp = response.fp
>> print(response.fp.name)
>> print(xlApp.name)
>> xlApp.write
>> xlApp.Close
>> [/quote]
> 
> Woops hit Send when I wanted Preview.  Looks like the html [quote] tag
> doesn't work from groups.google.com (nice).
> 
> Anway, in test 1 above, I determined how to instantiate an excel
> object; put some stuff in it; then save to disk.
> 
> So, in theory, I'm retrieving my excel spreadsheet with
> 
> response = urllib2.urlopen()
> 
> Except what then do I do with this?
> 
> Well for one read some of the urllib2 documentation and found the
> Request class with the method has_data() on it.  It returns False.
> Hmm that's not encouraging.
> 
> I supposed the trick to understand what urllib2.urlopen is returning
> to me; rummage around in there; and hopefully find my excel file.
> 
> I use pdb to debug.  This is interesting:
> 
> (Pdb) dir(response)
> ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
> 'code', '
> fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
> 'readline', '
> readlines', 'url']
> (Pdb)
> 
> I suppose the members with __*_ are methods; and the names without the
> underbars are attributes (variables) (?).

No, these are the names of all attributes and methods. read is a method, 
for example.

> Or maybe this isn't at all the right direction to take (maybe there
> are much better modules to do this stuff).  Would be happy to learn if
> that's the case (and if that gets the job done for me).

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty 
clear on this:

"""
This function returns a file-like object with two additional methods:
"""

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

"""
read(  	[size])
     Read at most size bytes from the file (less if the read hits EOF 
before obtaining size bytes). If the size argument is negative or 
omitted, read all data until EOF is reached. The bytes are returned as a 
string object. An empty string is returned when EOF is encountered 
immediately. (For certain files, like ttys, it makes sense to continue 
reading after an EOF is hit.) Note that this method may call the 
underlying C function fread() more than once in an effort to acquire as 
close to size bytes as possible. Also note that when in non-blocking 
mode, less data than what was requested may be returned, even if no size 
parameter was given.
"""

Diez