lxml to parse html
Stefan Behnel
stefan_ml at behnel.de
Mon Jan 23 03:56:11 EST 2012
contro opinion, 23.01.2012 08:34:
> import lxml.html
> myxml='''
> <cooperate>
> <job DecreaseHour="1" table="tpa_radio_sum">
> </job>
>
> <job DecreaseHour="2"
> table="tpa_radio_sum">
> </job>
>
>
> <job DecreaseHour="3" table="tpa_radio_sum">
> </job>
> </cooperate>
> '''
> root=lxml.html.fromstring(myxml)
> nodes1=root.xpath('//job[@DecreaseHour="1"]')
> nodes2=root.xpath('//job[@ne_type="101"]')
> print "nodes1=",nodes1
> print "nodes2=",nodes2
>
> what i get is:
> nodes1=[] and
> nodes2=[<Element job at 0x13636f0>]
> why nodes1 is []?nodes2=[<Element job at 0x13636f0>],
Not on my side. I get two empty lists.
> it is so strange thing?why ?
The really strange thing that I don't understand is why you would use an
HTML parser to parse an XML document. You should use lxml.etree instead.
Stefan
More information about the Python-list
mailing list