[lxml-dev] ElementTree.write implementation

Hello, I just discovered lxml last week and I must say it definitely fills a gap as I am pretty fed up with the complexity of the libxml2 Python bindings. I think I will lxml for my xml work. I played around a bit with the ElementTree.write method. I think it contains a bug: It does not respect the encoding parameter given. Output is always us-ascii encoded. Charakters missing from us-ascii are replaced by unicode charakter entities. I looked up the implementation of ElementTree.write in the lxml source and I think I found the problem. It is that, the xmlDocDumpMemory function is used to serialize the tree. This function does not do any encoding conversion. The output of this function is then encoded using the Python-library encoding functions. But this has no effect, as the output of xmlDocDumpMemory is in us-ascii encoding. I think a solution to this would be to use a libxml2 xmlOutputBuffer and the xmlSaveToEnc function to implement the ElementTree.write function. I am willing to supply a patch to this, if there are no concerns which I didn't think of. Thanks a lot, Emil

Emil Kroymann wrote:
Yes, I was fed up with the libxml2 Python bindings too, and this motivated me to write lxml. I'm happy to hear that you've decided to give it a try.
A patch would be great! Looks like right now you are more aware of this issue than I am. Looks like I need to review libxml2 APIs again in the light of this. All the things one needs to know about libxml2 in order to be able to forget about it! :) Thanks! Martijn

Hey, Emil Kroymann wrote:
More info for your patch. :) I just realized that tostring() is somewhat broken in that it doesn't actually work for individual element output, so that right now it only works correctly (besides the ascii problem you mention) for the root element. If you could figure out a way to do individual element dumps using th same output buffer APIs, that would be very nice. Regards, Martijn

Emil Kroymann wrote:
Yes, I was fed up with the libxml2 Python bindings too, and this motivated me to write lxml. I'm happy to hear that you've decided to give it a try.
A patch would be great! Looks like right now you are more aware of this issue than I am. Looks like I need to review libxml2 APIs again in the light of this. All the things one needs to know about libxml2 in order to be able to forget about it! :) Thanks! Martijn

Hey, Emil Kroymann wrote:
More info for your patch. :) I just realized that tostring() is somewhat broken in that it doesn't actually work for individual element output, so that right now it only works correctly (besides the ascii problem you mention) for the root element. If you could figure out a way to do individual element dumps using th same output buffer APIs, that would be very nice. Regards, Martijn
participants (2)
-
Emil Kroymann
-
Martijn Faassen