
26 Aug
2011
26 Aug
'11
3:47 a.m.
Hi,
I have a question about parsing HTML.
>>> from lxml.html import fromstring, tostring
It parses trailing whitespace in some cases:
>>> html = """<div>some <i>text</i> </div>""" >>> html == tostring(fromstring(html)) True
But it seems to break when encountering unknown tags (such as the "blah" tag below).
>>> html = """<div>some <blah>text</blah> </div>""" >>> html == tostring(fromstring(html)) False
How can I fix it to include trailing whitespace for all tags?
Thanks, John.
4484
Age (days ago)
4484
Last active (days ago)
0 comments
1 participants
participants (1)
-
John Benediktsson