New GitHub issue #119730 from diegorusso:<br>

<hr>

<pre>

# Feature or enhancement

### Proposal:

The issue https://github.com/python/cpython/issues/116017 explains already what the problem is with memory allocation used by the JIT.

To give more data point, I decided to debug this a little bit further, put some debugging info in the `_PyJIT_Compile` and then ran a pyperformance run.

The debugging info are around the memory allocated and the padding used to align it to the page size.

The function has been called 1288249 times and this is the ratio between the actual memory allocated and the padding due to 16K (on MacOS) page size:

- Total Padding size: 16,490,764,792

- Total Code/Data size: 6,737,241,608

71% of the memory allocated is wasted in padding whilst only 29% is being used by data. There is an indication that memory needed for these objects is *usually* much smaller than the page size.

This is a brain dump from @brandtbucher to help out with the implementation:

> for 3.14 we'll probably need to look into some sort of slab allocator that will let us share pages between executors. We can allocate by either batching the compiles or stopping the world to flip the permission bits, and then deallocate by maintaining refcounts of each page or something. [...] 

One benefit that could come with an arena allocator is the ability to JIT a bunch of guaranteed-in-range trampolines for long jumps to library/C-API calls, rather than needing to create a ton of redundant in-line trampolines inline in the trace (or using global offset table hacks). That should save us memory *and* speed things up, I think.

### Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

### Links to previous discussion of this feature:

This has been discussed with Brandt via email and in person at PyCon 2024.

</pre>

<hr>

<a href="https://github.com/python/cpython/issues/119730">View on GitHub</a>

<p>Labels: type-feature</p>

<p>Assignee: </p>