Re: [lxml-dev] Can't change text in comments
Hi Noah, Noah Slater wrote:
I have noticed that setting the 'text' of a Comment does not seem to change anything when the Comment is serialised.
Oh well, /that/ was an old bug you found there. The code in that area dates from SVN revision 8082 - we're at 27509 now. Changing the text of a comment was never implemented, although ET supports it. The reason (I believe) why Martijn did not implement it at the time (unless he just forgot) is that changing comment texts in libxml2 is a bit tricky. There are no libxml2 functions for doing that and it requires checking the document dictionary to see if the original comment string isn't still in use anywhere else. Great place to add some more segfaults. I implemented this and actually found a couple of other bugs that I fixed (including the test cases that relied on these bugs). What bothers me is that lxml is consistent with ElementTree in that it adds whitespace around comment texts. I have no idea why ElementTree does that in the first place. AFAICT, this happily breaks things like SSI. Maybe Fredrik can enlighten us here? Stefan
Hi all, Stefan Behnel wrote:
What bothers me is that lxml is consistent with ElementTree in that it adds whitespace around comment texts. I have no idea why ElementTree does that in the first place. AFAICT, this happily breaks things like SSI.
I took a second look at this and found that lxml can't actually support this. Since the parser does not ignore comments (as ET does) and since we don't serialise on our own, we can't make sure that we always add spaces around the comment text. We can do that through the API calls to Comment(), but that would make things inconsistent compared to parsed trees. So, the only solution I can see is to be incompatible with ET here and not add spaces around comment texts. This means that
c = Comment("test")
will result in "<!--test-->" in the serialised XML data, as opposed to ElementTree's "<!-- test -->". On the other hand, accessing the .text attribute will be identical in both:
c.text 'test' c.text = "TEST" c.text 'TEST'
So, the only problem is serialisation here. I personally believe that the lxml way of doing it is better, since it does not modify the comment provided by parser or user. Unless someone can convince me of the opposite, this will be the way lxml 1.0 will work then. Stefan
So, the only problem is serialisation here. I personally believe that the lxml way of doing it is better, since it does not modify the comment provided by parser or user. Unless someone can convince me of the opposite, this will be the way lxml 1.0 will work then.
+1 -- "Creativity can be a social contribution, but only in so far as society is free to use the results." - R. Stallman
Stefan Behnel wrote:
Hi all,
Stefan Behnel wrote:
What bothers me is that lxml is consistent with ElementTree in that it adds whitespace around comment texts. I have no idea why ElementTree does that in the first place. AFAICT, this happily breaks things like SSI.
I took a second look at this and found that lxml can't actually support this. Since the parser does not ignore comments (as ET does) and since we don't serialise on our own, we can't make sure that we always add spaces around the comment text. We can do that through the API calls to Comment(), but that would make things inconsistent compared to parsed trees.
Right, I recall looking into that long ago and coming to the same conclusion. I should've marked that in the compatibility file. Regards, Martijn
participants (4)
-
Fredrik Lundh
-
Martijn Faassen
-
Noah Slater
-
Stefan Behnel