parse html to get tr content

contro opinion contropinion at gmail.com
Sat Sep 24 06:49:10 CEST 2011


here is my code:
import lxml.html
sfile='http://finance.yahoo.com/q/op?s=A+Options'<http://finance.yahoo.com/q/op?s=A+Options%27>
root=lxml.html.parse(sfile).getroot()
t = root.xpath("//table[@class='yfnc_datamodoutline1']")[0]
trs=t.xpath(".//tr")
for  i, tr  in  enumerate(trs):
        print (i, len(tr),tr.text_content())

the output is:
0 1 StrikeSymbolLastChgBidAskVolOpen Int25.00A111022C0002500010.70
0.007.007.4531528.00A111022C000280004.35
0.004.654.7520121029.00A111022C000290003.80
0.003.954.0542642530.00A111022C000300003.110.013.253.35559731.00A111022C000310002.700.162.662.71740732.00A111022C000320002.110.082.122.17236433.00A111022C000330001.870.311.651.702956834.00A111022C000340001.360.151.261.302664935.00A111022C000350000.960.040.940.984547736.00A111022C000360000.720.120.690.724378637.00A111022C000370000.510.030.490.52511,43538.00A111022C000380000.35
0.000.340.354429339.00A111022C000390000.16
0.000.220.26914940.00A111022C000400000.180.030.150.185330141.00A111022C000410000.33
0.000.100.14218442.00A111022C000420000.08
0.000.060.10314745.00A111022C000450000.10 0.000.010.05200243
1 8 StrikeSymbolLastChgBidAskVolOpen Int
2 8 25.00A111022C0002500010.70 0.007.007.45315
3 8 28.00A111022C000280004.35 0.004.654.75201210
4 8 29.00A111022C000290003.80 0.003.954.05426425
5 8 30.00A111022C000300003.110.013.253.355597
6 8 31.00A111022C000310002.700.162.662.717407
7 8 32.00A111022C000320002.110.082.122.172364
8 8 33.00A111022C000330001.870.311.651.7029568
9 8 34.00A111022C000340001.360.151.261.3026649
10 8 35.00A111022C000350000.960.040.940.9845477
11 8 36.00A111022C000360000.720.120.690.7243786
12 8 37.00A111022C000370000.510.030.490.52511,435
13 8 38.00A111022C000380000.35 0.000.340.3544293
14 8 39.00A111022C000390000.16 0.000.220.269149
15 8 40.00A111022C000400000.180.030.150.1853301
16 8 41.00A111022C000410000.33 0.000.100.142184
17 8 42.00A111022C000420000.08 0.000.060.103147
18 8 45.00A111022C000450000.10 0.000.010.05200243

i want to know  why  i=0  the  tr.text_content()'s  value  is :
StrikeSymbolLastChgBidAskVolOpen Int25.00A111022C0002500010.70
0.007.007.4531528.00A111022C000280004.35
0.004.654.7520121029.00A111022C000290003.80
0.003.954.0542642530.00A111022C000300003.110.013.253.35559731.00A111022C000310002.700.162.662.71740732.00A111022C000320002.110.082.122.17236433.00A111022C000330001.870.311.651.702956834.00A111022C000340001.360.151.261.302664935.00A111022C000350000.960.040.940.984547736.00A111022C000360000.720.120.690.724378637.00A111022C000370000.510.030.490.52511,43538.00A111022C000380000.35
0.000.340.354429339.00A111022C000390000.16
0.000.220.26914940.00A111022C000400000.180.030.150.185330141.00A111022C000410000.33
0.000.100.14218442.00A111022C000420000.08
0.000.060.10314745.00A111022C000450000.10 0.000.010.05200243
it's  strannge thing for me to understand.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110924/127ec81e/attachment.html>


More information about the Python-list mailing list