Hi all,
I spent a fun half-a-day implementing async support for xmlfile(). In
latest master (and the upcoming lxml 4.0), you can do this:
async def writer(out_stream, xml_messages):
async with xmlfile(out_stream) as xf:
async with xf.element(
'{http://etherx.jabber.org/streams}stream'):
async for el in xml_messages:
await xf.write(el)
await xf.flush()
Assuming that "out_stream" has an "async def write()" method.
Now, while this feature is available in all supported Python versions
(Py2.6+), only Py3.5 and later support the async/await syntax in Python
code, so you would probably want to use Py3.5 or Py3.6 with it.
The implementation is here, it's really straight forward:
https://github.com/lxml/lxml/commit/293b5b4221d4f27a7bf3e7c898d616fd80c9a46a
I'd be happy to get (user) feedback on this. Also, when thinking about
other potential async tools in lxml, the XMLPullParser() comes to mind, but
it's probably not as easy. Further suggestions, ideas and pull requests
welcome.
Stefan
All,
We use xmlfile inside a Twisted application, using more or less the
exact same recipe in the docs - a generator/coroutine:
def gen(f):
# wrap the transport in a fake file-like
tf = TransportFile(f)
with etree.xmlfile(tf, encoding='us-ascii', buffered=False) as xf:
xf.write_declaration()
with xf.element('open-tag'):
try:
while True:
el = (yield)
xf.write(el)
except GeneratorExit:
pass
The TransportFile is a very simple adapter from the file() interface to
a Twisted transport, nothing special other than relaying the .write() call.
In certain circumstances, I get a SEGV when the python interpreter shuts
down (we discovered this because the test suite started exiting
non-zero). This goes away if I do:
gen.close()
...before shutdown, suggesting an ordering problems. The thing is I can
only seem to trigger it when running our test suite, and I'm not sure
which of the various tests and therefore use-cases triggers it; I'll
need to spend some time on isolating it.
In case anyone can spot an obvious problem the various versions are:
libxml2-2.9.4-2.fc26.x86_64
libxslt-1.1.29-1.fc26.x86_64
python2-2.7.13-11.fc26.x86_64
lxml == 3.8.0 (manylinux1 wheel inside virtualenv via pip)
...and a GDB backtrace is here:
https://gist.github.com/philmayers/ec4c193d2e794d28c71eafeda5f0ce7e
I'm not entirely sure why the backtrace is relatively useless, I do have
the python2-debuginfo packages installed but no symbols for the last few
calls; however AFAICT it's just after this line here:
https://github.com/lxml/lxml/blob/lxml-3.8.0/src/lxml/serializer.pxi#L1115
...which is about where I get lost; I don't see how the
_element_stack.pop() can fail since the list is checked just above it.
I'm not consciously creating any threads but I do see some starting in
GDB - I'm assuming these are Twisted internal threads that don't touch
lxml e.g. the DNS resolver and stuff, but honestly don't know fore sure.
Although I don't have a self-contained example yet, I can reproduce this
at will - let me know if there is additional debugging anyone wants to see.
Thanks for the great software BTW!
Regards,
Phil