Hi all, Stefan Behnel wrote:
What bothers me is that lxml is consistent with ElementTree in that it adds whitespace around comment texts. I have no idea why ElementTree does that in the first place. AFAICT, this happily breaks things like SSI.
I took a second look at this and found that lxml can't actually support this. Since the parser does not ignore comments (as ET does) and since we don't serialise on our own, we can't make sure that we always add spaces around the comment text. We can do that through the API calls to Comment(), but that would make things inconsistent compared to parsed trees. So, the only solution I can see is to be incompatible with ET here and not add spaces around comment texts. This means that
c = Comment("test")
will result in "<!--test-->" in the serialised XML data, as opposed to ElementTree's "<!-- test -->". On the other hand, accessing the .text attribute will be identical in both:
c.text 'test' c.text = "TEST" c.text 'TEST'
So, the only problem is serialisation here. I personally believe that the lxml way of doing it is better, since it does not modify the comment provided by parser or user. Unless someone can convince me of the opposite, this will be the way lxml 1.0 will work then. Stefan