why can not parse the web in almost same xpath expression?

Piet van Oostrum piet at vanoostrum.org
Thu Mar 7 03:13:17 CET 2013

python <mailtomanage at 163.com> writes:

>     import urllib
>     import lxml.html
>     down='http://v.163.com/special/visualizingdata/'
>     file=urllib.urlopen(down).read()
>     root=lxml.html.document_fromstring(file)
>     urllist=root.xpath('//div[@class="down s-fc3 f-fl"]//a') 
>     for url in urllist:
>          print url.get("href")
> i get the output ,  
> http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4  
> when i change   
>     xpath('//div[@class="down s-fc3 f-fl"]//a')
> into   
>     xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')  
> that is to say ,  
>     urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')  
> why i can't get nothing? 

There is only one <div class="col f-cb"> in the document and that div
contains only a single <div class="down s-fc3 f-fl"> but the latter does
not contain any <a>. The URLs that you get in the first code are not
contained in a <div class="col f-cb">. They are contained in a <div
class="m-tdli">, however. 
So xpath('//div[@class="m-tdli"]//div[@class="down s-fc3 f-fl"]//a') works.

Piet van Oostrum <piet at vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

More information about the Python-list mailing list