[Fwd: Re: [lxml-dev] ElementTree.write implementation]

Emil Kroymann wrote:
Thanks for the patch! I'll study the code to verify the memory issue.
I'm mostly concerned about making UTF-8 work, so I'll make sure it at least recognize those. I notice you didn't run the lxml tests correctly after applying the patch; the tests do a lot of work with StringIO and your check: if not tree.PyFile_Check(file): raise ValueError("Not a file!") breaks these tests. Feel free to help me to investigate how to handle the StringIO scenario correctly. :) Regards, Martijn

Martijn Faassen schrieb:
I think the case of StringIO has to be handled separately, as there probably is no underlying unix file descriptor for a StringIO object. One could use the xmlOutputBufferCreateIO function to implement this case. This function exspects custom write and read operations for the created xmlOutputBuffer as a parameter. I will work this out on the weekend. Emil

Emil Kroymann wrote:
Note that you should take a look at what I checked in today -- it already handles the case for StringIO for write() (the old code did too, but it cheated by just first writing all output to a memory buffer). If you could test whether this works correctly for your encoding usecases I'd be grateful. Note that I also checked in a unittest that does a simple check with non-ascii contents.
I will work this out on the weekend.
Thanks! The code I checked in is rather ugly in that I had to completely special case the case for writing to a StringIO. I used the xmlsave APIs, but I wouldn't object to going back to a xmlOutputBuffer style API again. libxml2 has a plethora of these APIs and none does exactly what I want; xmlsave doesn't appear to allow writing to memory, for instance... Additionally I looked through the APIs of libxml2 to see whether I could find an easy way to write an *element* as opposed to a whole document, in order to fully support tostring(). While I can find various ways to do it, I don't think I ran into a way to save to a buffer yet (the dump() method in lxml.etree saves to a file). I also need to check whether tostring() has similar requirements concerning outputting the 'tail' of an element as dump() has. Regards, Martijn

Emil Kroymann wrote:
Thanks for the patch! I'll study the code to verify the memory issue.
I'm mostly concerned about making UTF-8 work, so I'll make sure it at least recognize those. I notice you didn't run the lxml tests correctly after applying the patch; the tests do a lot of work with StringIO and your check: if not tree.PyFile_Check(file): raise ValueError("Not a file!") breaks these tests. Feel free to help me to investigate how to handle the StringIO scenario correctly. :) Regards, Martijn

Martijn Faassen schrieb:
I think the case of StringIO has to be handled separately, as there probably is no underlying unix file descriptor for a StringIO object. One could use the xmlOutputBufferCreateIO function to implement this case. This function exspects custom write and read operations for the created xmlOutputBuffer as a parameter. I will work this out on the weekend. Emil

Emil Kroymann wrote:
Note that you should take a look at what I checked in today -- it already handles the case for StringIO for write() (the old code did too, but it cheated by just first writing all output to a memory buffer). If you could test whether this works correctly for your encoding usecases I'd be grateful. Note that I also checked in a unittest that does a simple check with non-ascii contents.
I will work this out on the weekend.
Thanks! The code I checked in is rather ugly in that I had to completely special case the case for writing to a StringIO. I used the xmlsave APIs, but I wouldn't object to going back to a xmlOutputBuffer style API again. libxml2 has a plethora of these APIs and none does exactly what I want; xmlsave doesn't appear to allow writing to memory, for instance... Additionally I looked through the APIs of libxml2 to see whether I could find an easy way to write an *element* as opposed to a whole document, in order to fully support tostring(). While I can find various ways to do it, I don't think I ran into a way to save to a buffer yet (the dump() method in lxml.etree saves to a file). I also need to check whether tostring() has similar requirements concerning outputting the 'tail' of an element as dump() has. Regards, Martijn
participants (2)
-
Emil Kroymann
-
Martijn Faassen