Re: [lxml-dev] html parsing, .text

28 Jan 2011


      you need to use also the "tail" property. "text" is for the text inside the
element, tail is for the text after the element is closed.

for a in doc:
  print a.text, a.tail

Cheers,

On Fri, Jan 28, 2011 at 3:45 PM, Miro Mintal  wrote:
...
When i try get text from tag in html it return text only if no tag is
before this text.
Here is demonstrating code :
import lxml.html
html = """<strong><strong>another text</strong><br>some text</strong>"""
doc = lxml.html.fromstring(html)
print doc.text_content()   # "some text" is here but when i try get
text for this tag then:
print doc.text                 # return None, but it have text : "some
text"
for a in doc:
   a.text         # no subtag have text "some text"
it s only work if text is before tags:
html = """<strong>some text<strong>another text</strong><br></strong>"""
But i need parsing web page with text after tags. Can you help me ?
version :
lxml.etree:        (2, 2, 6, 0)
libxml used:       (2, 7, 7)
libxml compiled:   (2, 7, 6)
libxslt used:      (1, 1, 26)
libxslt compiled:  (1, 1, 26)
_______________________________________________
lxml-dev mailing list
lxml-dev@codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
-- 
Joaquin Cuenca Abela