Hi

SUMMARY: We're starting to discuss implementation. I'm going to focus on what can be done, with only a few changes to the interpreter.

First consider this:
    >>> from sys import getrefcount as grc
    >>> def fn(obj): return grc(obj)

    >>> grc(fn.__code__), grc(fn.__code__.co_code)
    (2, 2)
    >>> fn(fn.__code__), fn(fn.__code__.co_code)
    (5, 4)
    >>> grc(fn.__code__), grc(fn.__code__.co_code)
    (2, 2)

    # This is the bytecode.
    >>> fn.__code__.co_code
    b't\x00|\x00\x83\x01S\x00'
 
What's happening here? While the interpreter executes the pure Python function fn, it changes the refcount of both fn.__code__ and fn.__code__.co_code. This is one of the problems we have to solve, to make progress.

These refcounts are stored in the objects themselves. So unless the interpreter is changed, these Python objects can't be stored in read-only memory. We may have to change code and co_code objects also.

Let's focus on the bytecode, as it's the busiest and often largest part of the code object. The (ordinary) code object has a field which is (a pointer to) the co_code attribute, which is a Python bytes object. This is the bytecode, as a Python object.

Let's instead give the C implementation of fn.__code__ TWO fields. The first is a pointer, as usual, to the co_code attribute of the code object. The second is a pointer to the raw data of the co_code object.

When the interpreter executes the code object, the second field tells the interpreter where to start executing. (This might be why the refcount of fn.__code__.co_code is incremented during the execution of fn.) The interpreter doesn't even have to look at the first field.

If we want the raw bytecode of a code object to lie in read-only memory, it is enough to set the second pointer to that location. In both cases, the interpreter reads the memory location of the raw bytecode and executes accordingly.

This leaves the problem of the first field. At present, it can only be a bytes object. When the raw bytecode is in read-only memory, we need a second sort of object. It's purpose is to 'do the right thing'.

Let's call this sort of object perma_bytes. It's like a bytes object, except the data is stored elsewhere, in read-only permanent storage.

Aside: If the co_code attribute of a code object is ordinary bytes - not perma_bytes -- then the two pointer addresses differ by a constant, namely the size of the header of a Python bytes object.

Any Python language operation on perma_bytes is done by performing the same Python operation on bytes, but on the raw data that is pointed to. (That raw data had better still be there, otherwise chaos or worse will result.)

So what have we gained, and what have we lost.

LOST:
1. fn.code object bigger by the size of a pointer.
2. Added perma_bytes object.

GAINED:
1. Can store co_code data in read-only permanent storage.
2. Bytes on fn.__code__.co_code objects are slower.
3. perma_bytes might be useful elsewhere.

It may be possible to improve the outcome, by making more changes to the interpreter. I don't see a way of getting a useful outcome, by making fewer.

Here's another way of looking at things. If all the refcounts were stored in a single array, and the data stored elsewhere, the changing refcount wouldn't be a problem. Using perma_bytes allows the refount and the data to be stored at different locations, thereby avoiding the refcount problem!

I hope this is clear enough, and that it helps. And that it is correct.

I'll let T. S. Eliot have the last word:

    https://faculty.washington.edu/smcohen/453/NamingCats.html
    The Naming of Cats is a difficult matter,
    It isn’t just one of your holiday games;
    You may think at first I’m as mad as a hatter
    When I tell you, a cat must have THREE DIFFERENT NAMES.

We're giving the raw data TWO DIFFERENT POINTERS.

with best wishes

Jonathan