Hi all, There was an experiment based on CPython's code called PyParallel <https://github.com/pyparallel/pyparallel> that allows running threads in parallel without STM and modifying source code of both Python and C extensions. The only limitation is that they disallow mutation of global state in parallel context. I briefly mentioned it before on PyPy's freenode channel. I'd like to discuss why the approach is useful, how it can benefit PyPy users and how can it be implemented. Allowing to run in parallel without mutating global state can help servers use each thread to handle a request. It can also allow to log in parallel or send an HTTP request (or an AMQP message) without sharing the response with the main thread. This is useful in some cases and since PyParallel managed to keep the same semantics it (shouldn't) break CPyExt. If we keep to the following rules: 1. No global state mutation is allowed 2. No new keywords or code modifications required 3. No CPyExt code is allowed (for now) I believe that users can somewhat benefit from this implementation if done correctly. As for implementation, if we can trace the code running in the thread and ensure it's not mutating global state and that CPyExt is never used during the thread's course we can simply release the GIL when such a thread is run. That requires less knowledge than using STM and less code modifications. However I think that attempting to do so will introduce the same issue with caching traces (Armin am I correct here?). As for CPyExt, we could copy the same code modifications that PyParallels did but I suspect that it will be so slow that the benefit of running in parallel will be completely lost for all cases but very long threads. Is what I'm suggesting even possible? How challenging will it be? Thanks, Omer Katz.