[lxml-dev] I get CDATA inside parsed html <script> element, and can not retrieve it's text
Hello all! I'm very new with lxml. Probably, I find a bug. AFAIK, lxml does not expose direct interface to CDATA sections. But, when I use etree.HTML function I get content of <script> as CDATA section! >>> html = etree.HTML('<script> alert("Hello!"); </script>') >>> etree.tostring(html) '<html><head><script><![CDATA[ alert("Hello!"); ]]></script></head></html>' The problem is, I cannot retrieve content of <script> tag because lxml does not allow this: >>> script = html.find('.//script') >>> len(script) 0 >>> print script.text None EXPECTED: >>> print script.text alert("Hello!"); Is it really a bug, or I don't understand something? -- Best regards, Alexander mailto:alexander.kozlovsky@gmail.com
Alexander Kozlovsky wrote:
This is a bug in libxml2 -- if you update to the latest version (nightly build?) it has been fixed. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
participants (2)
-
Alexander Kozlovsky
-
Ian Bicking