I've modified my fork to preserve whitespace after a tag. its a bit hacky (
https://github.com/orf/lxml/blob/master/src/lxml/html/diff.py#L731) but it
to works. I've sent a pull request with the changes on Github.
Tom
On 1 August 2013 05:29, Stefan Behnel
Tom ., 26.07.2013 15:21:
I'm using lxml's htmldiff function to (surprise) diff two HTML snippets. However I have found that if the snippet includes a <pre> tag then any newlines are stripped from it. For example:
html = "<pre>test\ntest2\ntest3</pre>" print repr(htmldiff(html, html)) u'<pre>test test2 test3</pre>'
Is there a reason for this, or better yet is there a way I can stop this from occurring? I was under the impression that inside a pre tag newlines and whitespace are handled differently and shouldn't be stripped like you can do with other HTML tags.
Thanks for your patch, I merged it into current master.
https://github.com/lxml/lxml/pull/124
I think the change is a good thing, but playing with it a bit, it doesn't really feel completely right yet. Consider this example:
>>> print(htmldiff('<p> first\nsecond\nthird</p>', ... '<p> first\n second\nthird </p>')) <p>first second third </p>
It still drops the whitespace at the beginning, but not at the end. It also seems to copy the content from the second argument, not the first. Not sure if that's good or bad, but it seems surprising.
What do others think here?
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml