
On Fri, Jun 19, 2020 at 06:33:59PM +1200, Greg Ewing wrote:
On 19/06/20 9:28 am, Steven D'Aprano wrote:
I know very little about how this works except a vague rule of thumb that in the 21st century memory locality is king. If you want code to be fast, keep it close together, not spread out.
Python objects are already scattered all over memory, and a function already consists of several objects -- the function object itself, a dict, a code object, lists of argument and local variable names, etc. I doubt whether the suggested change would make locality noticeably worse.
There's a difference between "I doubt" and "it won't". Unless you're an expert on C level optimizations, like Victor or Serhiy, which I definitely am not, I think we're both just guessing. Here is some evidence that cache misses makes a real difference for performance. A 70% slow down on calling functions, due to an increase in L1 cache misses: https://bugs.python.org/issue28618 There's also been a lot of work done on using immortal objects. The results have been mixed at best: https://bugs.python.org/issue40255 Jonathan's initial post claimed to have shown that this technique will be of benefit: "We show that extending Python, to provide and take advantage of permanent code objects, will bring some benefits." but there is a huge gulf between faster in theory and faster in practice and we should temper our enthusiasm and not claim certainty in the face of a vast gulf of uncertainty. I know that Python-Ideas is not held to the same standards as scientific papers, but if you claim to have shown a benefit, you really ought to have *actually* shown a benefit, not just identified a promising area for future study. Making objects immortal is not free, it risks memory leaks, and the evidence (as far as I can tell) is that it helps only a small subset of Python users (those that fork() lots of worker processes) at the expense of the majority of users. Personally, based on my *extremely limited* (i.e. infinitesimal) knowledge of C-level optimizations on 21st century CPUs, I don't think this is a promising area to explore, except maybe as an option. (A runtime switch, perhaps, or a build option?) If Jonathan, or anyone else, thinks differently and is willing to run some before and after benchmarks, I look forward to being proven wrong :-) -- Steven