On 11/17/12 12:17, Stefan Behnel wrote:
The problem with generating XML as part of the WSGI output phase is two-fold.
One is that certain stages in the WSGI pipeline may still force the iterable to be unfolded, before sending everything out. Not XML related, more of a general "sending large data" problem. That's obviously in the hands of the application designers, but if they incrementally generate their content they have to take care that their whole web stack handles this nicely.
The more important problem is that serialisation errors will only be detected very late, further down in the WSGI pipeline and way outside the application code that might want to handle them. If the data is coming from a non-trivial source (and I would expect most sources of large amounts of data to be non-trivial), this means that you will end up sending a potentially large amount of data to the client before you notice that there is a problem that you have to handle.
See: https://github.com/arskom/spyne/issues/187 (if you wonder what spyne is, see http://spyne.io) So I'm quite familiar with both of the issues you mention. First point can be addressed rather easily with rigorous testing (see the streaming example in spyne's examples directory for a combination that works) But with the second point, things are more complicated. The problem stems from the shortcomings of the established application-level protocols -- none of them were designed to communicate mid-stream errors to the client. Unfortunately, I don't think there's a way around this besides designing a new rpc protocol. That said, most of my data comes from a select query where mid-stream erros are quite infrequent, so it's not as much of a problem when the error handling is done as upfront as possible. That's why when consuming a generator, Spyne runs it until the first yield before sending out any headers for protocols that support headers. (only Http and Soap as of now) Simon, I'm also aware of the technique that you point to, but as the WSGI spec also mentions, this comes with its own overhead, so should be used only as a last resort.
Anyway, I think the two use cases are sufficiently different to have two different interfaces. A "yield" based pull approach (potentially using "yield from" for structural chaining) doesn't fold well into a push interface for writing incrementally into a file.
So, does this mean my earlier 'append_generator' suggestion has a green light? Best, Burak