why can not parse the web in almost same xpath expression?
Piet van Oostrum
piet at vanoostrum.org
Wed Mar 6 21:13:17 EST 2013
python <mailtomanage at 163.com> writes:
> import urllib
> import lxml.html
> down='http://v.163.com/special/visualizingdata/'
> file=urllib.urlopen(down).read()
> root=lxml.html.document_fromstring(file)
> urllist=root.xpath('//div[@class="down s-fc3 f-fl"]//a')
> for url in urllist:
> print url.get("href")
>
> i get the output ,
> http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4
> http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4
> http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4
> http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4
> http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4
> http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4
>
> when i change
>
> xpath('//div[@class="down s-fc3 f-fl"]//a')
>
> into
>
> xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')
>
> that is to say ,
>
> urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')
>
> why i can't get nothing?
There is only one <div class="col f-cb"> in the document and that div
contains only a single <div class="down s-fc3 f-fl"> but the latter does
not contain any <a>. The URLs that you get in the first code are not
contained in a <div class="col f-cb">. They are contained in a <div
class="m-tdli">, however.
So xpath('//div[@class="m-tdli"]//div[@class="down s-fc3 f-fl"]//a') works.
--
Piet van Oostrum <piet at vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]
More information about the Python-list
mailing list