New subject: Permanent code objects (less memory, quicker load, less Unix Copy On Write)

June 18, 2020

      Hi All

Summary: Shared objects in Unix are a major influence. This proposal can be
seen as a first step towards packaging pure Python modules as Unix shared
objects.

First, there's a high level overview. Then some technical stuff in
Appendices.

An object is transient if it can be garbage collected. An object is
permanent if it will never be garbage collected. Every interpreted Python
function has a code object (that contains instructions for the
interpreter). Many of these code objects persist to the end of the program,
and are used for little else than providing interpreter instructions.

We show that extending Python, to provide and take advantage of permanent
code objects, will bring some benefits. The cost is expected to be quite
small.

When a Python function is called, the interpreter increases the refcount of
its code object. At the end of the function's execution, the interpreter
decreases the refcount. (An example below shows this.)

If Python were extended to take advantage of permanent code objects, then
for example popular code objects could be loaded into memory in this way.
This can reduce memory usage (by sharing immutable resources) and reduce
startup time.

In addition, a Unix forked process would have less need to do copy-on-write
(see below). This is related to packaging pure Python modules as Unix
shared objects.

The core of implementing this change would be to provide if ... else ...
branching, around the interpreter source code that changes the refcount of
a code object. The interpreter itself will of course want direct access to
the permanent code object. There is no harm in that.

The cost is that unprivileged access to fn.__code__ will be slower, due to
an additional indirection. However, as such commands are rarely executed in
ordinary programs, the cost is expected to be small.

It might be helpful, after checking the analysis and before coding, to do
some simple timing tests and calculations to estimate the performance
benefits and costs of making such a change. These would of course depend on
the use case.

I hope this helps.

Jonathan

APPENDICES
===========

SOME IMPLEMENTATION DETAILS AND COMMENTS
Because fn.__code__ must not return a permanent object, some sort of opaque
proxy would be required. Because Python programs rarely inspect
fn.__code__, in practice the cost of this additional indirection is likely
to be small.

As things are, the time spent changing the refcount of fn.__code__ is
probably insignificant. The benefit is that permanent code objects are made
immutable, and so can be stored safely in read-only memory (that can be
shared across all processes and users). Code objects are special, in that
they are only rarely looked at directly. Their main purpose is to be used
by the interpreter.

Python allows the user to replace fn.__code__ by a different code object.
This is a rarely done dirty trick. The transient / permanent nature of
fn.__code__ could be stored as a hidden field on the fn object. This would
reduce the cost of the if ... else ... branching, as it amounts to caching
the transient / permanent nature of fn.__code__.

FORK AND COPY ON WRITE
On Unix, the fork system call causes a process to make a child of itself.
The parent and child share memory. To avoid confusion and errors, when
either asks the system to write to shared memory, the system ensures that
both parent and child have their own copy (of the page of memory that is
being written to). This is an expensive operation.
See: https://en.wikipedia.org/wiki/Copy-on-write

INTERPRETER SESSION

    >>> from sys import getrefcount as grc

    # Identical functions with different code objects.
    >>> def f1(obj): return grc(obj)
    >>> def f2(obj): return grc(obj)
    >>> f1.__code__ is f2.__code__
    False

    # Initial values.
    >>> grc(f1.__code__), grc(f2.__code__)
    (2, 2)

    # Calling f1 increases the refcount of f1.__code__.
    >>> f1(f1), f1(f2), f2(f1), f2(f2)
    (6, 4, 4, 6)

    # If fn is a generator function, then x = fn() will increase the
    # refcount of fn.__code__.
    >>> def f1(): yield True
    >>> grc(f1.__code__)
    2

    # Let's create and store 10 generators.
    >>> iterables = [f1() for i in range(10)]
    >>> grc(f1.__code__)
    22

    # Let's get one item from each.
    >>> [next(i) for i in iterables]
    [True, True, True, True, True, True, True, True, True, True]
    >>> grc(f1.__code__)
    22

    # Let's exhaust all the iterables. This reduces the refcount.
    >>> [next(i, False) for i in iterables]
    [False, False, False, False, False, False, False, False, False, False]
    >>> grc(f1.__code__)
    12

    # Nearly done. Now let go of the iterables.
    >>> del iterables
    >>> grc(f1.__code__)
    2

Permanent code objects (less memory, quicker load, less Unix Copy On Write)

tags

participants (13)