Error from .itertext() “ValueError: Input object has no element: HtmlComment”
Hi I'm trying to iterate through the text content of a subtree using elt.itertext() (v3.5.0b1 git master branch) as follows: import lxml.html.soupparser as soupparser import requests doc = requests.get("http://f10.5post.com/forums/showthread.php?t=1142017").content tree = soupparser.fromstring(doc) nodes = tree.getchildren() for elt in nodes: for t in elt.itertext(): print t But I keep getting an error saying File "src/lxml/iterparse.pxi", line 248, in lxml.etree.iterwalk.__init__ (src/lxml/lxml.etree.c:134032) File "src/lxml/apihelpers.pxi", line 67, in lxml.etree._rootNodeOrRaise (src/lxml/lxml.etree.c:15220) ValueError: Input object has no element: HtmlComment Is there a way to skip all HTML comments? Also, what does this error actually mean? Any help will be appreciated. Thanks John
John Munroe schrieb am 26.06.2015 um 08:14:
I'm trying to iterate through the text content of a subtree using elt.itertext() (v3.5.0b1 git master branch) as follows:
import lxml.html.soupparser as soupparser import requests
doc = requests.get("http://f10.5post.com/forums/showthread.php?t=1142017").content tree = soupparser.fromstring(doc)
nodes = tree.getchildren()
for elt in nodes: for t in elt.itertext(): print t
But I keep getting an error saying
File "src/lxml/iterparse.pxi", line 248, in lxml.etree.iterwalk.__init__ (src/lxml/lxml.etree.c:134032) File "src/lxml/apihelpers.pxi", line 67, in lxml.etree._rootNodeOrRaise (src/lxml/lxml.etree.c:15220) ValueError: Input object has no element: HtmlComment
Is there a way to skip all HTML comments?
itertext('*') will iterate only over elements. You can also use itertext(etree.Element) if you find that more readable.
Also, what does this error actually mean?
It means that there is no subtree to iterate over here because the start element you pass is a comment. I guess that's a bug. There's nothing wrong with iterating over comments. Stefan
participants (2)
-
John Munroe
-
Stefan Behnel