Mailman 3 parse problem - lxml - The Python XML Toolkit

Jan. 18, 2012

      here is my code:

import urllib
import lxml.html

down="
http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/e...
"
file=urllib.urlopen(down).
read()
root=lxml.html.document_fromstring(file)

data1 = root.xpath('//tr[@class="tr_normal"  and  .//img]')
print "the row which contains img  :"
for u in data1:
    print  u.text_content()

data2 = root.xpath('//tr[@class="tr_normal"  and  not(.//img)]')
print "the row which do not contain img  :"
for u in data2:
    print  u.text_content()

the output is :(i omit many lines )

the row which contains img  :
00329
the row which do not contain img  :
00001长江实业1,000#HOF
................many lines omitted
00327百富环球1,000#H
00328ALCO HOLDINGS2,000#

i wondered why  there are so many lines i can't get such as :
(you can see in the web
http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/e...
)

00330    思捷环球     100    #    H    O    F
00331    春天百货     2,000    #    H
00332    NGAI LIK IND     4,000    #
...................many lines  ommitted
i want to know how can i get these ??

parse problem

contro opinion

tags

participants (1)