[lxml-dev] Threading

Hi all, I looked a bit at the threading issues and found that there is not much to gain from using threads anyway. Calling code will suffer from the GIL, so it would be up to lxml to release the GIL on long-running operations to benefit from threading. However, releasing the GIL is non-trivial, as most of these operations can call back into the Python API and the interpreter: parsing calls resolvers, as does XSLT. XPath can call extension functions, even the error messages end up in Python objects and lists, etc. So, there are only very few places that could easily be wrapped by ALLOW_THREADS: * everything that traverses trees (which is so fast that the gain may be eaten by the locking overhead) and if error messages can somehow be made working without requiring the GIL: * serialisation to memory or 'real' files * validation (which current does not support custom resolvers) An alternative would be to always create separate Python threads for things like error handling and resolving, but that would require always releasing the GIL whenever these things *might* get called. So, it would be relatively easy to release the GIL in the ElementDepthFirstIterator to have a thread speedup for findall() etc., but everything else will be real work. And since the major overhead is in parsers and serialisers, I don't think it's worth changing this bit just to say "you can gain from threading". Considering this, I changed the FAQ entry on threading to simply state that threading is not supported, with a short explanation that the gain would be marginal without major changes in lxml. Stefan
participants (1)
-
Stefan Behnel