[Python-ideas] Make Python code read-only

Tue May 20 18:57:53 CEST 2014

Hi,

I'm trying to find the best option to make CPython faster. I would
like to discuss here a first idea of making the Python code read-only
to allow new optimizations.

Make Python code read-only
==========================

I propose to add an option to Python to make the code read-only. In
this mode, module namespace, class namespace and function attributes
become read-only. It is still be possible to add a "__readonly__ =
False" marker to keep a module, a class and/or a function modifiable.

I chose to make the code read-only by default instead of the opposite.
In my test, almost all code can be made read-only without major issue,
few code requires the "__readonly__ = False" marker.

A module is only made read-only by importlib after the module is
loaded. The module is stil modifiable when code is executed until
importlib has set all its attributes (ex: __loader__).

I have a proof of concept: a fork of Python 3.5 making code read-only
if the PYTHONREADONLY environment variable is set to 1. Commands to
try it:

    hg clone http://hg.python.org/sandbox/readonly
    cd readonly && ./configure && make
    PYTHONREADONLY=1 ./python -c 'import os; os.x = 1'
    # ValueError: read-only dictionary

Status of the standard library (Lib/*.py): 139 modules are read-only,
25 are modifiable. Except of the sys module, all modules writen in C
are read-only.

I'm surprised that so few code rely on the ability to modify
everything. Most of the code can be read-only.

Optimizations possible when the code is read-only
=================================================

* Inline calls to functions.

* Replace calls to pure functions (without side effect) with the
result. For example, len("abc") can be replaced with 3.

* Constants can be replaced with their values (at least for simple
types like bytes, int and str).

It is for example possible to implement these optimizations by
manipulating the Abstract Syntax Tree (AST) during the compilation
from the source code to bytecode. See my astoptimizer project which
already implements similar optimizations:

    https://bitbucket.org/haypo/astoptimizer

More optimizations
==================

My main motivation to make code read-only is to specialize a function:
optimize a function for a specific environment (type of parameters,
external symbols like other functions, etc). Checking the type of
parameters can be fast (especially when implemented in C), but it
would be expensive to check that all global variables used in the
function were not modified since the function has been "specialized".
For example, if os.path.isabs(path) is called: you have to check that
"os.path" and "os.path.isabs" attributes were not modified and that
the isabs() was not modified. If we know that globals are read-only,
these checks are no more needed and so it becomes cheap to decide if
the specialized function can be used or not.

It becomes possible to "learn" types (trace the execution of the
application, and then compile for the recorded types). Knowing the
type of function parameters, result and local variables opens an
interesting class of new optimizations, but I prefer to discuss this
later, after discussing the idea of making the code read-only.

One point remains unclear to me. There is a short time window between
a module is loaded and the module is made read-only. During this
window, we cannot rely on the read-only property of the code.
Specialized code cannot be used safetly before the module is known to
be read-only. I don't know yet how the switch from "slow" code to
optimized code should be implemented.

Issues with read-only code
==========================

* Currently, it's not possible to allow again to modify a module,
class or function to keep my implementation simple. With a registry of
callbacks, it may be possible to enable again modification and call
code to disable optimizations.

* PyPy implements this but thanks to its JIT, it can optimize again
the modified code during the execution. Writing a JIT is very complex,
I'm trying to find a compromise between the fast PyPy and the slow
CPython. Add a JIT to CPython is out of my scope, it requires too much
modifications of the code.

* With read-only code, monkey-patching cannot be used anymore. It's
annoying to run tests. An obvious solution is to disable read-only
mode to run tests, which can be seen as unsafe since tests are usually
used to trust the code.

* The sys module cannot be made read-only because modifying sys.stdout
and sys.ps1 is a common use case.

* The warnings module tries to add a __warningregistry__ global
variable in the module where the warning was emited to not repeat
warnings that should only be emited once. The problem is that the
module namespace is made read-only before this variable is added. A
workaround would be to maintain these dictionaries in the warnings
module directly, but it becomes harder to clear the dictionary when a
module is unloaded or reloaded. Another workaround is to add
__warningregistry__ before making a module read-only.

* Lazy initialization of module variables does not work anymore. A
workaround is to use a mutable type. It can be a dict used as a
namespace for module modifiable variables.

* The interactive interpreter sets a "_" variable in the builtins
namespace. I have no workaround for this. The "_" variable is no more
created in read-only mode. Don't run the interactive interpreter in
read-only mode.

* It is not possible yet to make the namespace of packages read-only.
For example, "import encodings.utf_8" adds the symbol "utf_8" to the
encodings namespace. A workaround is to load all submodules before
making the namespace read-only. This cannot be done for some large
modules. For example, the encodings has a lot of submodules, only a
few are needed.

Read the documentation for more information:

   http://hg.python.org/sandbox/readonly/file/tip/READONLY.txt

More optimizations
==================

See my notes for all ideas to optimize CPython:

   http://haypo-notes.readthedocs.org/faster_cpython.html

I explain there why I prefer to optimize CPython instead of working on
PyPy or another Python implementation like Pyston, Numba or similar
projects.

Victor