Mailman 3 [lxml-dev] html parsing, .text - lxml - The Python XML Toolkit

28 Jan 2011

      When i try get text from tag in html it return text only if no tag is
before this text.

Here is demonstrating code :

import lxml.html
html = """<strong><strong>another text</strong><br>some text</strong>"""
doc = lxml.html.fromstring(html)
print doc.text_content()   # "some text" is here but when i try get
text for this tag then:
print doc.text                 # return None, but it have text : "some text"
for a in doc:
   a.text         # no subtag have text "some text"

it s only work if text is before tags:
html = """<strong>some text<strong>another text</strong><br></strong>"""

But i need parsing web page with text after tags. Can you help me ?

version :
lxml.etree:        (2, 2, 6, 0)
libxml used:       (2, 7, 7)
libxml compiled:   (2, 7, 6)
libxslt used:      (1, 1, 26)
libxslt compiled:  (1, 1, 26)

[lxml-dev] html parsing, .text

Miro Mintal

Joaquin Cuenca Abela

John W. Shipman

tags

участники (3)