data:image/s3,"s3://crabby-images/4debe/4debe65927c3d69045ff833da66543df8df5da7a" alt=""
Aug. 25, 2011
6:47 p.m.
Hi, I have a question about parsing HTML. >>> from lxml.html import fromstring, tostring It parses trailing whitespace in some cases: >>> html = """<div>some <i>text</i> </div>""" >>> html == tostring(fromstring(html)) True But it seems to break when encountering unknown tags (such as the "blah" tag below). >>> html = """<div>some <blah>text</blah> </div>""" >>> html == tostring(fromstring(html)) False How can I fix it to include trailing whitespace for all tags? Thanks, John.
4929
Age (days ago)
4929
Last active (days ago)
0 comments
1 participants
participants (1)
-
John Benediktsson