[Python-ideas] Re: Less is more? Smaller code and data to fit more into the CPU cache?

March 27, 2022

...
On 22 Mar 2022, at 15:57, Jonathan Fine <jfine2358@gmail.com> wrote:
Hi
As you may have seen, AMD has recently announced CPUs that have much larger L3 caches. Does anyone know of any work that's been done to research or make critical Python code and data smaller so that more of it fits in the CPU cache? I'm particularly interested in measured benefits.
I few years ago (5? 10?) there was a blog about making the python eval loop fit into L1 cache.
The author gave up on the work as he claimed it was too hard to contribute any changes to python at the time.
I have not kept a link to the blog post sadly.

What I recall is that the author found that GCC was producing far more code then was required to implement sections of ceval.c.
Fixing that would shrink the ceval code by 50% I recall was the claim. He had a PoC that showed the improvements.

Then there was research on the opcodes and eliminating unnecessary code in there implementation,
but I do not trust that I remember the details of this, its been too long since I read the blog.

Barry
...
This search
  https://www.google.com/search?q=python+performance+CPU+cache+size <https://www.google.com/search?q=python+performance+CPU+cache+size>
provides two relevant links
  https://www.oreilly.com/library/view/high-performance-python/9781449361747/c... <https://www.oreilly.com/library/view/high-performance-python/9781449361747/c...>
  https://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2016/slides/PyH... <https://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2016/slides/PyH...>
but not much else I found relevant.
AnandTech writes about the chips with triple the L3 cache:
  https://www.anandtech.com/show/17323/amd-releases-milan-x-cpus-with-3d-vcach... <https://www.anandtech.com/show/17323/amd-releases-milan-x-cpus-with-3d-vcach...>
"As with other chips that incorporate larger caches, the greatest benefits are going to be found in workloads that spill out of contemporary-sized caches, but will neatly fit into the larger cache."
And also:
  https://www.anandtech.com/show/17313/ryzen-7-5800x3d-launches-april-20th-plu... <https://www.anandtech.com/show/17313/ryzen-7-5800x3d-launches-april-20th-plu...>
" As detailed by the company back at CES 2022 and reiterated in today’s announcement, AMD has found that the chip is 15% faster at gaming than their Ryzen 9 5900X."
I already know that using Non Uniform Memory Access (NUMA) raises the difficult problem of cache coherence.
  https://en.wikipedia.org/wiki/Non-uniform_memory_access <https://en.wikipedia.org/wiki/Non-uniform_memory_access>
  https://en.wikipedia.org/wiki/Cache_coherence <https://en.wikipedia.org/wiki/Cache_coherence>
-- 
Jonathan
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CUKUKY...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Less is more? Smaller code and data to fit more into the CPU cache?

Barry Scott