To speed-up function calls, the interpreter uses a linear, resizable stack to store function call frames, an idea taken from LuaJIT. The stack stores the interpreter registers (local variables + space for temporaries) plus some extra information per-function call. This avoids the need for allocating PyFrameObjects for each call. For compatibility, the PyFrameObject type still exists, but they are created lazily as-needed (such as for exception handling and for sys._getframe).
The optimized function calls have about an order of magnitude less overhead than the current CPython implementation.
The change also simplifies the use of deferred reference counting with the data that is stored per-call like the function object. The interpreter can usually avoid incrementing the reference count of the function object during a call. Like other objects on the stack, a borrowed reference to the function is indicated by setting the least-significant-bit.

Congrats Sam! This is incredible work! One quick question after reading the design doc:

When you mean "an order of magnitude less overhead than the current CPython implementation" do you mean compared with the main branch? We recently implemented already almost everything is listed in this paragraph:

https://github.com/python/cpython/pull/27077

We also pack some extra similar optimizations in this other PR, including stealing the frame arguments from python to python calls:

https://github.com/python/cpython/pull/28488

This could explain why the performance is closer to the current master branch as you indicate:

 It gets about the same average performance as the “main” branch of CPython 3.11 as of early September 2021.

Cheers from cloudy London,
Pablo Galindo Salgado

On Fri, 8 Oct 2021 at 03:49, Sam Gross <colesbury@gmail.com> wrote:
Hi,

I've been working on changes to CPython to allow it to run without the global interpreter lock. I'd like to share a working proof-of-concept that can run without the GIL. The proof-of-concept involves substantial changes to CPython internals, but relatively few changes to the C-API. It is compatible with many C extensions: extensions must be rebuilt, but usually require small or no modifications to source code. I've built compatible versions of packages from the scientific Python ecosystem, and they are installable through the bundled "pip".

Source code:
https://github.com/colesbury/nogil

Design overview:
https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit

My goal with the proof-of-concept is to demonstrate that removing the GIL is feasible and worthwhile, and that the technical ideas of the project could serve as a basis of such an effort.

I'd like to start a discussion about these ideas and gauge the community's interest in this approach to removing the GIL.

Regards,
Sam Gross
colesbury@gmail.com / sgross@fb.com
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/
Code of Conduct: http://python.org/psf/codeofconduct/