inline Python functions and methods

The Python community has a 5 year plan to push the limit of speed in Python. One of the things that reduces Python execution speed is calling methods or functions that are not in the nearest scope. My suggestion is to introduce inline functions just as they are in C. They can be defined as global functions or class methods but are created in the scope of the function that is calling it at parsetime. I also don’t think it’ll be hard to adapt. inline functions might look something like: inline def func(): pass inline async def func(): pass inline class Executor: inline def func(): pass For an inline method, the class must be defined as inline as well, in order to bring to scope whatever class variable or method that the inline function might rely on. This is just a suggestion, what do you guys think about it.

Hi Tobias, On Wed, Dec 08, 2021 at 05:53:56AM -0000, TobiasHT wrote:
Do you have some benchmarks that the cost of the global lookup is a significant slowdown?
The first thing I think of this is that I don't know what it does. Let's pretend that we're Python programmers, not C programmers, and so might not know what the C "inline" keyword does. Can you explain, using Python terms, how an inline function and a regular function will be different? You say that they are: "created in the scope of the function that is calling it at parsetime." but I'm not sure I understand the consequences of that. (I assume you mean the function's *local* scope, not the function's *surrounding* scope.) So if I have an inline function and it gets used twice: inline def spam(*args): # Do something with args... def eggs(a, b, c): thing = spam(a, b, c) ... def cheese(x): obj = spam(x, 2) ... does that mean that the compiler will translate the above to: def eggs(a, b, c): def spam(*args): # Do something with args... thing = spam(a, b, c) ... def cheese(x): def spam(*args): # Do something with args... obj = spam(x, 2) ... and the top level "spam" will not actually exist? globals()['spam'] # KeyError If the top-level spam *does* exist, what kind of object is it? Going back to the functions eggs() and cheese() that call the inline function, and so have a copy of spam compiled into their body. Doesn't that double (or triple) the memory used? Are you sure that this will lead to a performance increase? Nested functions are created at runtime, not compile time, same as non-nested functions. This is pretty fast, because they are created from pre-compiled parts, but there is still some runtime cost. It is not clear to me that the cost of assembling the function will be less than the cost of a global lookup. What happens if we write the reference to the inline function before we write the inline function? def eggs(a, b, c): thing = spam(a, b, c) ... inline def spam(*args): How does that resolve with Python's execution rules? Modules are compiled in one pass. How does the interpreter know that eggs' reference to spam is to an inline function? Note: this is also being discussed on Discuss. https://discuss.python.org/t/inline-python-functions-and-methods/12412/1 Anyone who cares a lot about this issue may want to follow both discussions. -- Steve

On 9/12/21 12:03 am, Steven D'Aprano wrote:
If that's what's intended, it wouldn't really be an inline function. And I doubt it would make any significant difference to speed, because looking up the function is only a small part of what makes function calls expensive in Python -- most of it is in the parameter processing, which can get quite complicated. What inlining usually means is to copy the body of the function in place of the call, with appropriate parameter substitutions. That would eliminate most of the overhead of a function call, but there are problems with doing it in Python. Imported modules would have to be located and parsed at compile time, something that doesn't currently happen. And it can't be done in general-- the location of an imported module isn't know for sure until run time, because changes can be made dynamically to sys.path. -- Greg

On 9/12/21 2:07 am, TobiasHT wrote:
If a function fails to be inlined at compiletime due to dynamic behavior of python, then the normal function call behavior can be the fallback
The problem is that the compiler might *think* it knows where the module is at compile time, but at run time it turns out to be different. Maybe you could come up with some scheme to allow this to be detected and invalidate all the code resulting from inlining its functions, but that could get very complicated and it would be hard to be sure it works properly in all situations. Another thing to consider is that all this would only help with calling stand-alone functions, which is a relatively rare thing to do in Python. Most of the time you're calling a method of some object, and you don't know until each call which function that will be. That's another reason I'm doubtful it would help much in practice. -- Greg

So for the point of benchmarks, This is a link to some of the hacks developed by Pythonistas to boost some python speed. Among the hacks, there's a topics called "reducing dots" and "local variables". https://wiki.python.org/moin/PythonSpeed/PerformanceTips Also I would explain to a pythonista that an inline function is that function which is copied into the same spot that it's called. I know it's a somewhat vague definition, but that's what I could think about so far. By scope, yes, I do mean the function's local scope and not surrounding scope. And per example you've provided, yes, spam shall appear inside the functions a stated, and I think It might not appear in the global scope as hypothesized, since it's what we're trying to evade in the first place. The rest of the concerns you've mentioned above are thinking points that I must take into consideration, that's exactly why I was poking around for such insights that I might have skipped while thinking about this.

So I went back and revised my idea for inline functions in python, and I realized that it would be harder to implement them in python the way I had originally thought about them, due to Python’s dynamic nature. However, the idea itself doesn’t seem so original, as Cinder already implements inline byte-code caching which significantly boosts its performance. I do not think they ever upstreamed these changes however. So a modification of my idea goes like this. The inline functions can still be written with the inline keyword, but evaluated at runtime. This is because there are some cases such as in conditions where are function might never be called at all, so it wouldn’t need to be optimized in that case so in cases where someone writes code like, if True: inline def func_version_one(*args): #func body pass else: inline def func_version_two(*args): #func body pass The right function to perform inlining on shall be determined at runtime and cached in the same scope as where it’s performing it’s operations from cases where the program performs large iterations or even in infinite loops and other cases that need optimization. Like I said before, it’s still a work-in-progress, and I’m still putting lots of factors into consideration, including your incredible insight as the core dev team.

On 11/12/21 5:40 pm, TobiasHT wrote:
The right function to perform inlining on shall be determined at runtime and cached in the same scope as where it’s performing it’s operations from cases where the program performs large iterations or even in infinite loops and other cases that need optimization.
If it's to be a run-time optimisation, you could consider dropping the inline declaration and just have the implementation decide whether it's worth inlining things, based on factors such as the size of the function and how often it's called. Then the language wouldn't have to be changed at all, and programmers wouldn't need to have foresight to decide when to declare things inline. -- Greg

Hi Tobias, On Wed, Dec 08, 2021 at 05:53:56AM -0000, TobiasHT wrote:
Do you have some benchmarks that the cost of the global lookup is a significant slowdown?
The first thing I think of this is that I don't know what it does. Let's pretend that we're Python programmers, not C programmers, and so might not know what the C "inline" keyword does. Can you explain, using Python terms, how an inline function and a regular function will be different? You say that they are: "created in the scope of the function that is calling it at parsetime." but I'm not sure I understand the consequences of that. (I assume you mean the function's *local* scope, not the function's *surrounding* scope.) So if I have an inline function and it gets used twice: inline def spam(*args): # Do something with args... def eggs(a, b, c): thing = spam(a, b, c) ... def cheese(x): obj = spam(x, 2) ... does that mean that the compiler will translate the above to: def eggs(a, b, c): def spam(*args): # Do something with args... thing = spam(a, b, c) ... def cheese(x): def spam(*args): # Do something with args... obj = spam(x, 2) ... and the top level "spam" will not actually exist? globals()['spam'] # KeyError If the top-level spam *does* exist, what kind of object is it? Going back to the functions eggs() and cheese() that call the inline function, and so have a copy of spam compiled into their body. Doesn't that double (or triple) the memory used? Are you sure that this will lead to a performance increase? Nested functions are created at runtime, not compile time, same as non-nested functions. This is pretty fast, because they are created from pre-compiled parts, but there is still some runtime cost. It is not clear to me that the cost of assembling the function will be less than the cost of a global lookup. What happens if we write the reference to the inline function before we write the inline function? def eggs(a, b, c): thing = spam(a, b, c) ... inline def spam(*args): How does that resolve with Python's execution rules? Modules are compiled in one pass. How does the interpreter know that eggs' reference to spam is to an inline function? Note: this is also being discussed on Discuss. https://discuss.python.org/t/inline-python-functions-and-methods/12412/1 Anyone who cares a lot about this issue may want to follow both discussions. -- Steve

On 9/12/21 12:03 am, Steven D'Aprano wrote:
If that's what's intended, it wouldn't really be an inline function. And I doubt it would make any significant difference to speed, because looking up the function is only a small part of what makes function calls expensive in Python -- most of it is in the parameter processing, which can get quite complicated. What inlining usually means is to copy the body of the function in place of the call, with appropriate parameter substitutions. That would eliminate most of the overhead of a function call, but there are problems with doing it in Python. Imported modules would have to be located and parsed at compile time, something that doesn't currently happen. And it can't be done in general-- the location of an imported module isn't know for sure until run time, because changes can be made dynamically to sys.path. -- Greg

On 9/12/21 2:07 am, TobiasHT wrote:
If a function fails to be inlined at compiletime due to dynamic behavior of python, then the normal function call behavior can be the fallback
The problem is that the compiler might *think* it knows where the module is at compile time, but at run time it turns out to be different. Maybe you could come up with some scheme to allow this to be detected and invalidate all the code resulting from inlining its functions, but that could get very complicated and it would be hard to be sure it works properly in all situations. Another thing to consider is that all this would only help with calling stand-alone functions, which is a relatively rare thing to do in Python. Most of the time you're calling a method of some object, and you don't know until each call which function that will be. That's another reason I'm doubtful it would help much in practice. -- Greg

So for the point of benchmarks, This is a link to some of the hacks developed by Pythonistas to boost some python speed. Among the hacks, there's a topics called "reducing dots" and "local variables". https://wiki.python.org/moin/PythonSpeed/PerformanceTips Also I would explain to a pythonista that an inline function is that function which is copied into the same spot that it's called. I know it's a somewhat vague definition, but that's what I could think about so far. By scope, yes, I do mean the function's local scope and not surrounding scope. And per example you've provided, yes, spam shall appear inside the functions a stated, and I think It might not appear in the global scope as hypothesized, since it's what we're trying to evade in the first place. The rest of the concerns you've mentioned above are thinking points that I must take into consideration, that's exactly why I was poking around for such insights that I might have skipped while thinking about this.

So I went back and revised my idea for inline functions in python, and I realized that it would be harder to implement them in python the way I had originally thought about them, due to Python’s dynamic nature. However, the idea itself doesn’t seem so original, as Cinder already implements inline byte-code caching which significantly boosts its performance. I do not think they ever upstreamed these changes however. So a modification of my idea goes like this. The inline functions can still be written with the inline keyword, but evaluated at runtime. This is because there are some cases such as in conditions where are function might never be called at all, so it wouldn’t need to be optimized in that case so in cases where someone writes code like, if True: inline def func_version_one(*args): #func body pass else: inline def func_version_two(*args): #func body pass The right function to perform inlining on shall be determined at runtime and cached in the same scope as where it’s performing it’s operations from cases where the program performs large iterations or even in infinite loops and other cases that need optimization. Like I said before, it’s still a work-in-progress, and I’m still putting lots of factors into consideration, including your incredible insight as the core dev team.

On 11/12/21 5:40 pm, TobiasHT wrote:
The right function to perform inlining on shall be determined at runtime and cached in the same scope as where it’s performing it’s operations from cases where the program performs large iterations or even in infinite loops and other cases that need optimization.
If it's to be a run-time optimisation, you could consider dropping the inline declaration and just have the implementation decide whether it's worth inlining things, based on factors such as the size of the function and how often it's called. Then the language wouldn't have to be changed at all, and programmers wouldn't need to have foresight to decide when to declare things inline. -- Greg
participants (3)
-
Greg Ewing
-
Steven D'Aprano
-
TobiasHT