[Python-ideas] Re: Permanent code objects (less memory, quicker load, less Unix Copy On Write)

June 20, 2020

      On Fri, Jun 19, 2020 at 06:33:59PM +1200, Greg Ewing wrote:
...
On 19/06/20 9:28 am, Steven D'Aprano wrote:
...
I know very little about how this works except a vague rule of thumb
that in the 21st century memory locality is king. If you want code to be
fast, keep it close together, not spread out.
Python objects are already scattered all over memory, and a
function already consists of several objects -- the function
object itself, a dict, a code object, lists of argument and
local variable names, etc. I doubt whether the suggested
change would make locality noticeably worse.
There's a difference between "I doubt" and "it won't". Unless you're an 
expert on C level optimizations, like Victor or Serhiy, which I 
definitely am not, I think we're both just guessing.

Here is some evidence that cache misses makes a real difference for 
performance. A 70% slow down on calling functions, due to an increase in 
L1 cache misses:

https://bugs.python.org/issue28618

There's also been a lot of work done on using immortal objects. The 
results have been mixed at best:

https://bugs.python.org/issue40255

Jonathan's initial post claimed to have shown that this technique will 
be of benefit:

"We show that extending Python, to provide and take advantage of 
permanent code objects, will bring some benefits."

but there is a huge gulf between faster in theory and faster in practice 
and we should temper our enthusiasm and not claim certainty in the face 
of a vast gulf of uncertainty. I know that Python-Ideas is not held to 
the same standards as scientific papers, but if you claim to have shown 
a benefit, you really ought to have *actually* shown a benefit, not just 
identified a promising area for future study.

Making objects immortal is not free, it risks memory leaks, and the 
evidence (as far as I can tell) is that it helps only a small subset of 
Python users (those that fork() lots of worker processes) at the expense 
of the majority of users.

Personally, based on my *extremely limited* (i.e. infinitesimal) 
knowledge of C-level optimizations on 21st century CPUs, I don't think 
this is a promising area to explore, except maybe as an option. (A 
runtime switch, perhaps, or a build option?) If Jonathan, or anyone 
else, thinks differently and is willing to run some before and after 
benchmarks, I look forward to being proven wrong :-)

-- 
Steven