Financial time series data

Frederic Rentsch anthra.norell at bluewin.ch
Fri Sep 3 19:48:13 CEST 2010


On Fri, 2010-09-03 at 16:48 +0200, Virgil Stokes wrote:
> On 03-Sep-2010 15:45, Frederic Rentsch wrote:
> > On Fri, 2010-09-03 at 13:29 +0200, Virgil Stokes wrote:
> >> A more direct question on accessing stock information from Yahoo.
> >>
> >> First, use your browser to go to:  http://finance.yahoo.com/q/cp?s=%
> >> 5EGSPC+Components
> >>
> >> Now, you see the first 50 rows of a 500 row table of information on
> >> S&P 500 index. You can LM click on
> >>
> >>    1 -50 of 500 |First|Previous|Next|Last
> >>
> >> below the table to position to any of the 10 pages.
> >>
> >> I would like to use Python to do the following.
> >>
> >> Loop on each of the 10 pages and for each page extract information for
> >> each row --- How can this be accomplished automatically in Python?
> >>
> >> Let's take the first page (as shown by default). It is easy to see the
> >> link to the data for "A" is http://finance.yahoo.com/q?s=A. That is, I
> >> can just move
> >> my cursor over the "A" and I see this URL in the message at the bottom
> >> of my browser (Explorer 8). If I LM click on "A" then I will go to
> >> this
> >> link --- Do this!
> >>
> >> You should now see a table which shows information on this stock and
> >> this is the information that I would like to extract. I would like to
> >> do this for all 500 stocks without the need to enter the symbols for
> >> them (e.g. "A", "AA", etc.). It seems clear that this should be
> >> possible since all the symbols are in the first column of each of the
> >> 50 tables --- but it is not at all clear how to extract these
> >> automatically in Python.
> >>
> >> Hopefully, you understand my problem. Again, I would like Python to
> >> cycle through these 10 pages and extract this information for each
> >> symbol in this table.
> >>
> >> --V
> >>
> >>
> >>
> > Here's a quick hack to get the SP500 symbols from the visual page with
> > the index letters. From this collection you can then order fifty at a
> > time from the download facility. (If you get a better idea from Yahoo,
> > you'll post it of course.)
> >
> >
> >
> > def get_SP500_symbols ():
> > 	import urllib
> > 	symbols = []
> > 	url = 'http://finance.yahoo.com/q/cp?s=^GSPC&alpha=%c'
> > 	for c in [chr(n) for n in range (ord ('A'), ord ('Z') + 1)]:				
> > 		print url % c
> > 		f = urllib.urlopen (url % c)
> > 		html = f.readlines ()
> > 		f.close ()
> > 		for line in html:
> > 			if line.lstrip ().startswith ('</script><span id="yfs_params_vcr"'):
> > 				line_split = line.split (':')
> > 				s = [item.strip ().upper () for item in line_split [5].replace ('"',
> > '').split (',')]
> > 			 	symbols.extend (s [:-3])
> >
> > 	return symbols
> > 	# Not quite 500 (!?)
> >
> >
> > Frederic
> >
> >
> >
> I made a few modifications --- very minor. But, I believe that it is a little 
> faster.
> 
> import urllib2
> 
> def get_SP500_symbolsX ():
>     symbols = []
>     for page in range(0,9):
>        url = 'http://finance.yahoo.com/q/cp?s=%5EGSPC&c='+str(page)
>        print url
>        f = urllib2.urlopen (url)
>        html = f.readlines ()
>        f.close ()
>        for line in html:
>       if line.lstrip ().startswith ('</script><span id="yfs_params_vcr"'):
>          line_split = line.split (':')
>          s = [item.strip ().upper () for item in line_split [5].replace 
> ('"','').split (',')]
>          symbols.extend (s [:-3])
> 
>     return symbols
>     # Not quite 500 -- which is correct (for example p. 2 has only 49 symbols!)
>     # Actually the S&P 500 as shown does not contain 500 stocks (symbols)
> 
> 
> symbols = get_SP500_symbolsX()
> pass

Oh, yes, and there's no use reading lines to the end once the symbols
are in the bag. The symbol-line-finder conditional section should end
with "break".
   And do let us know if you get an answer from Yahoo. Hacks like this
are unreliable. They fail almost certainly the next time a page gets
redesigned, which can be any time. 

Frederic
 




More information about the Python-list mailing list