how to get the source of html in lxml?

Dave Angel d at
Mon Dec 31 07:51:47 CET 2012

On 12/31/2012 01:32 AM, contro opinion wrote:
> import urllibimport lxml.html
> down=''
> file=urllib.urlopen(down).read()
> root=lxml.html.document_fromstring(file)
> body=root.xpath('//div[@class="articalContent  "]')[0]print body.text_content()
> When i run the code, what i get is the text content ,how can i get the html
> source code of it?

That's got several syntax errors, but if you remove the parts with
errors, you'll find the html source in the misnamed variable 'file'. 
The read() method returns a string.



More information about the Python-list mailing list