> I am having issues with the urllib and lxml.html modules.
> Here is my original code: import urllib import lxml . html
> down = '' file = urllib .
> urlopen ( down ). read () root = lxml . html . document_fromstring (
> file ) xpath_str = "//div[@class='down s-fc3 f-fl']/a" urllist =
> root . xpath ( xpath_str ) for url in urllist : print url . get (
> "href" )
> When run, it returns this output: http :
> // http :
> // http :
> // http :
> // http :
> // http :
> //
> But, when I change the line
> xpath_str='//div[@class="down s-fc3 f-fl"]//a'
> into
> xpath_str='//div[@class="col f-cb"]//div[@class="down s-fc3
> f-fl"]//a'
> that is to say, urllist = root . xpath ( '//div[@class="col
> f-cb"]//div[@class="down s-fc3 f-fl"]//a' )
> I do not receive any output. What is the flaw in this code?
> it is so strange that the shorter one can work,the longer one can
> not,they have the same xpath structure!

Are you sure this is somehow related to python ? It looks like you just have issue parsing the xml.

I know little about what you're trying to do but :

1/ you're overriding the built-in 'file' type
2/ your selector is probably wrong 'class="col f-cb"' will fail because in the document, the div class may be "col f-cb", "col  f-cb" (2 spaces) or "f-cb col" etc...
3/ your short selector will return all elements without regard for the parent, hence it is not sensible to the issue 2/

How to get all .mp4 links:

hrefList = root.xpath('//a[@href]')
mp4List =[ref for ref in hrefList if '.mp4' in ref.attrib.get('href','')]

From this list you can access to parent and child informations.

for mp4 in mp4List:
  print mp4.get('href')




