[Fwd: Re: [lxml-dev] ElementTree.write implementation]
Emil Kroymann wrote:
sorry for the delay, but here is the patch. I have never used PyRex bevor, so I wonder if there are no memory leaks. I think the xmlOutputBuffer memory is freed correctly, though there is no explicit call to do this.
Thanks for the patch! I'll study the code to verify the memory issue.
Another thing is, that the encoding parameter given to ElementTree.write now uses the libxml2 encoding names rather than the Python encoding names. There could be differences.
I'm mostly concerned about making UTF-8 work, so I'll make sure it at least recognize those. I notice you didn't run the lxml tests correctly after applying the patch; the tests do a lot of work with StringIO and your check: if not tree.PyFile_Check(file): raise ValueError("Not a file!") breaks these tests. Feel free to help me to investigate how to handle the StringIO scenario correctly. :) Regards, Martijn
Martijn Faassen schrieb:
I notice you didn't run the lxml tests correctly after applying the patch; the tests do a lot of work with StringIO and your check:
if not tree.PyFile_Check(file): raise ValueError("Not a file!")
breaks these tests.
I think the case of StringIO has to be handled separately, as there probably is no underlying unix file descriptor for a StringIO object. One could use the xmlOutputBufferCreateIO function to implement this case. This function exspects custom write and read operations for the created xmlOutputBuffer as a parameter. I will work this out on the weekend. Emil
Emil Kroymann wrote:
Martijn Faassen schrieb:
I notice you didn't run the lxml tests correctly after applying the patch; the tests do a lot of work with StringIO and your check:
if not tree.PyFile_Check(file): raise ValueError("Not a file!")
breaks these tests.
I think the case of StringIO has to be handled separately, as there probably is no underlying unix file descriptor for a StringIO object. One could use the xmlOutputBufferCreateIO function to implement this case. This function exspects custom write and read operations for the created xmlOutputBuffer as a parameter.
Note that you should take a look at what I checked in today -- it already handles the case for StringIO for write() (the old code did too, but it cheated by just first writing all output to a memory buffer). If you could test whether this works correctly for your encoding usecases I'd be grateful. Note that I also checked in a unittest that does a simple check with non-ascii contents.
I will work this out on the weekend.
Thanks! The code I checked in is rather ugly in that I had to completely special case the case for writing to a StringIO. I used the xmlsave APIs, but I wouldn't object to going back to a xmlOutputBuffer style API again. libxml2 has a plethora of these APIs and none does exactly what I want; xmlsave doesn't appear to allow writing to memory, for instance... Additionally I looked through the APIs of libxml2 to see whether I could find an easy way to write an *element* as opposed to a whole document, in order to fully support tostring(). While I can find various ways to do it, I don't think I ran into a way to save to a buffer yet (the dump() method in lxml.etree saves to a file). I also need to check whether tostring() has similar requirements concerning outputting the 'tail' of an element as dump() has. Regards, Martijn
participants (2)
-
Emil Kroymann
-
Martijn Faassen