parse html:what is the meaning of "//"?
Stefan Behnel
stefan_ml at behnel.de
Fri Sep 16 07:02:06 EDT 2011
alias, 16.09.2011 08:39:
> code1:
> import lxml.html
> import urllib
> down='http://finance.yahoo.com/q/op?s=C+Options'
> content=urllib.urlopen(down).read()
> root=lxml.html.document_fromstring(content)
I see this quite often, but many people don't know that this can be
simplified to
import lxml.html
url = 'http://finance.yahoo.com/q/op?s=C+Options'
root = lxml.html.parse(url).getroot()
which is less code, but substantially more efficient.
> table = root.xpath("//table[@class='yfnc_mod_table_title1']")[0]
> tds=table.xpath("tr[@valign='top']//td")
> for td in tds:
> print td.text_content()
>
> what i get is :
> Call Options
> Expire at close Friday, September 16, 2011
> these are waht i want.
>
> code2
> import lxml.html
> import urllib
> down='http://finance.yahoo.com/q/op?s=C+Options'
> content=urllib.urlopen(down).read()
> root=lxml.html.document_fromstring(content)
> table = root.xpath("//table[@class='yfnc_mod_table_title1']")[0]
> tds=table.xpath("//tr[@valign='top']//td")
Here, you are looking for all "tr" tags in the table recursively, instead
of taking just the ones that are direct children of the "table" tag.
That's what "//" is there for, it's a recursive subtree selector. You might
want to read up on XPath expressions.
> what i get is :
> N/A
> N/A
> 2
> 114
> 48.00
> C110917P00048000
> 16.75
> 0.00
> N/A
> N/A
> 0
> 23
> 50.00
> C110917P00050000
> 23.16
> 0.00
> N/A
> N/A
> 115
> 2,411
>
>
> Highlighted options are in-the-money.
I don't see any highlighting in your text above, and I don't know what you
mean by "in-the-money".
Stefan
More information about the Python-list
mailing list