What is the difference between etree.XML and etree.HTML?
Hi, In the following example, XML and HTML work equally well. Does anybody has an example showing when they will be different? Thanks. from lxml import etree tree = etree.XML('<foo><bar>abc</bar></foo>') tree = etree.HTML('<foo><bar>abc</bar></foo>') print type(tree) r = tree.xpath('//bar') print [x.tag for x in r] -- Regards, Peng
Am 20.12.2017 um 00:55 schrieb Peng Yu:
from lxml import etree tree = etree.XML('<foo><bar>abc</bar></foo>') tree = etree.HTML('<foo><bar>abc</bar></foo>') print type(tree) r = tree.xpath('//bar') print [x.tag for x in r]
python ~/test.py Traceback (most recent call last): File "/net/homes/schoepf/test.py", line 2, in <module> tree = etree.XML('<html><p>abc</html>') File "lxml.etree.pyx", line 3072, in lxml.etree.XML (src/lxml/lxml.etree.c:70460) File "parser.pxi", line 1828, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:106689) File "parser.pxi", line 1716, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:105478) File "parser.pxi", line 1086, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:100105) File "parser.pxi", line 580, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:94543) File "parser.pxi", line 690, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:96003) File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:95050) lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: p line 1 and html,
from lxml import etree tree = etree.XML('<html><p>abc</html>') print type(tree) r = tree.xpath('//p') print [x.tag for x in r] gives: line 1, column 20 whereas from lxml import etree tree = etree.HTML('<html><p>abc</html>') print type(tree) r = tree.xpath('//p') print [x.tag for x in r] gives
python ~/test.py <type 'lxml.etree._Element'> ['p']
participants (2)
-
Markus Schöpflin
-
Peng Yu