Re: [lxml] create xml file incrementally

17 Nov 2012

      On 11/17/12 12:17, Stefan Behnel wrote:
...
The problem
with generating XML as part of the WSGI output phase is two-fold.
One is that certain stages in the WSGI pipeline may still force the
iterable to be unfolded, before sending everything out. Not XML related,
more of a general "sending large data" problem. That's obviously in the
hands of the application designers, but if they incrementally generate
their content they have to take care that their whole web stack handles
this nicely.
The more important problem is that serialisation errors will only be
detected very late, further down in the WSGI pipeline and way outside the
application code that might want to handle them. If the data is coming from
a non-trivial source (and I would expect most sources of large amounts of
data to be non-trivial), this means that you will end up sending a
potentially large amount of data to the client before you notice that there
is a problem that you have to handle.
See: https://github.com/arskom/spyne/issues/187 (if you wonder what 
spyne is, see http://spyne.io)

So I'm quite familiar with both of the issues you mention. First point 
can be addressed rather easily with rigorous testing (see the streaming 
example in spyne's examples directory for a combination that works) But 
with the second point, things are more complicated. The problem stems 
from the shortcomings of the established application-level protocols -- 
none of them were designed to communicate mid-stream errors to the 
client. Unfortunately, I don't think there's a way around this besides 
designing a new rpc protocol.

That said, most of my data comes from a select query where mid-stream 
erros are quite infrequent, so it's not as much of a problem when the 
error handling is done as upfront as possible. That's why when consuming 
a generator, Spyne runs it until the first yield before sending out any 
headers for protocols that support headers. (only Http and Soap as of now)

Simon, I'm also aware of the technique that you point to, but as the 
WSGI spec also mentions, this comes with its own overhead, so should be 
used only as a last resort.
...
Anyway, I think the two use cases are sufficiently different to have two
different interfaces. A "yield" based pull approach (potentially using
"yield from" for structural chaining) doesn't fold well into a push
interface for writing incrementally into a file.
So, does this mean my earlier 'append_generator' suggestion has a green 
light?

Best,
Burak

Re: [lxml] create xml file incrementally

Burak Arslan