Download excel file from web?

Guilherme Polo ggpolo at gmail.com
Mon Jul 28 18:52:36 EDT 2008


On Mon, Jul 28, 2008 at 7:43 PM, patf at well.com <patf at well.com> wrote:
> On Jul 28, 3:33 pm, "p... at well.com" <p... at well.com> wrote:
>> On Jul 28, 3:29 pm, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
>>
>>
>>
>> > p... at well.com schrieb:
>>
>> > > On Jul 28, 3:00 pm, "p... at well.com" <p... at well.com> wrote:
>> > >> Hi - experienced programmer but this is my first Python program.
>>
>> > >> This URL will retrieve an excel spreadsheet containing (that day's)
>> > >> msci stock index returns.
>>
>> > >>http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
>>
>> > >> Want to write python to download and save the file.
>>
>> > >> So far I've arrived at this:
>>
>> > >> [quote]
>> > >> # import pdb
>> > >> import urllib2
>> > >> from win32com.client import Dispatch
>>
>> > >> xlApp = Dispatch("Excel.Application")
>>
>> > >> # test 1
>> > >> # xlApp.Workbooks.Add()
>> > >> # xlApp.ActiveSheet.Cells(1,1).Value = 'A'
>> > >> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
>> > >> # xlBook = xlApp.ActiveWorkbook
>> > >> # xlBook.SaveAs(Filename='C:\\test.xls')
>>
>> > >> # pdb.set_trace()
>> > >> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>> > >> excel?
>> > >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> > >> +25%2C+2008&export=Excel_IEIPerfRegional')
>> > >> # test 2 - returns check = False
>> > >> check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
>> > >> indexperf/excel?
>> > >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> > >> +25%2C+2008&export=Excel_IEIPerfRegional').has_data()
>>
>> > >> xlApp = response.fp
>> > >> print(response.fp.name)
>> > >> print(xlApp.name)
>> > >> xlApp.write
>> > >> xlApp.Close
>> > >> [/quote]
>>
>> > > Woops hit Send when I wanted Preview.  Looks like the html [quote] tag
>> > > doesn't work from groups.google.com (nice).
>>
>> > > Anway, in test 1 above, I determined how to instantiate an excel
>> > > object; put some stuff in it; then save to disk.
>>
>> > > So, in theory, I'm retrieving my excel spreadsheet with
>>
>> > > response = urllib2.urlopen()
>>
>> > > Except what then do I do with this?
>>
>> > > Well for one read some of the urllib2 documentation and found the
>> > > Request class with the method has_data() on it.  It returns False.
>> > > Hmm that's not encouraging.
>>
>> > > I supposed the trick to understand what urllib2.urlopen is returning
>> > > to me; rummage around in there; and hopefully find my excel file.
>>
>> > > I use pdb to debug.  This is interesting:
>>
>> > > (Pdb) dir(response)
>> > > ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
>> > > 'code', '
>> > > fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
>> > > 'readline', '
>> > > readlines', 'url']
>> > > (Pdb)
>>
>> > > I suppose the members with __*_ are methods; and the names without the
>> > > underbars are attributes (variables) (?).
>>
>> > No, these are the names of all attributes and methods. read is a method,
>> > for example.
>>
>> right - I got it backwards.
>>
>>
>>
>>
>>
>> > > Or maybe this isn't at all the right direction to take (maybe there
>> > > are much better modules to do this stuff).  Would be happy to learn if
>> > > that's the case (and if that gets the job done for me).
>>
>> > The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
>> > clear on this:
>>
>> > """
>> > This function returns a file-like object with two additional methods:
>> > """
>>
>> > And then for file-like objects:
>>
>> >http://docs.python.org/lib/bltin-file-objects.html
>>
>> > """
>> > read(   [size])
>> >      Read at most size bytes from the file (less if the read hits EOF
>> > before obtaining size bytes). If the size argument is negative or
>> > omitted, read all data until EOF is reached. The bytes are returned as a
>> > string object. An empty string is returned when EOF is encountered
>> > immediately. (For certain files, like ttys, it makes sense to continue
>> > reading after an EOF is hit.) Note that this method may call the
>> > underlying C function fread() more than once in an effort to acquire as
>> > close to size bytes as possible. Also note that when in non-blocking
>> > mode, less data than what was requested may be returned, even if no size
>> > parameter was given.
>> > """
>>
>> > Diez
>>
>> Just stumbled upon .read:
>>
>> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
>> excel?
>> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
>> +25%2C+2008&export=Excel_IEIPerfRegional').read
>>
>> Now the question is: what to do with this?  I'll look at the
>> documentation that you point to.
>>
>> thanx - pat
>
> Or rather (next iteration):
>
> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
> excel?
> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> +25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
>
> The file is generally something like 26 KB so specifying 1,000,000
> seems like a good idea (first approximation).
>
> And then when I do:
>
> print(response)
>
> I get a whole lot of garbage (and some non-garbage), so I know I'm
> onto something.
>
> When I read the .read documentation further, it says that read() has
> returned the data as a string object.  Now - how do I convince Python
> that the string object is in fact an excel file - and save it to disk?
>

You don't need to convince Python, just write it to a file.
More reading for you: http://docs.python.org/tut/node9.html

> pat
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
-- Guilherme H. Polo Goncalves



More information about the Python-list mailing list