[Python-ideas] Make Python code read-only

Eric Snow ericsnowcurrently at gmail.com
Wed May 21 00:04:22 CEST 2014


An interesting idea.  Comments below.

On May 20, 2014 10:58 AM, "Victor Stinner" <victor.stinner at gmail.com> wrote:
> Make Python code read-only
> ==========================
>
> I propose to add an option to Python to make the code read-only. In
> this mode, module namespace, class namespace and function attributes
> become read-only. It is still be possible to add a "__readonly__ =
> False" marker to keep a module, a class and/or a function modifiable.

Make __readonly__ a data descriptor (getset in the C-API) on
ModuleType, type, and FunctionType and people could toggle it as
needed.  The descriptor could look something like this (in pure
Python):

class ReadonlyDescriptor:
    DEFAULT = os.environ.get(b'PYTHONREADONLY', False)  # i.e. ignore
changes to PYTHONREADONLY
    def __init__(self, *, default=None):
        if default is None:
            default = cls.DEFAULT
        self.default = default
    def __get__(self, obj, cls):
        if obj is None:
            return self
        try:
            return obj.__dict__['__readonly__']
        except KeyError:
            readonly = bool(self.default)
            obj.__dict__['__readonly__'] = readonly
            return readonly
    def __set__(self, obj, value):
        obj.__dict__['__readonly__'] = value

Alternately, the object structs for the 3 types (e.g. PyModuleObject)
could each grow a "readonly" field (or an extra flag option if there
is an appropriate flag).  The descriptor (in C) would use that instead
of obj.__dict__['__readonly__'].  However, I'd prefer going through
__dict__.

Either way, the 3 types would share a tp_setattro implementation that
checked the read-only flag.  That way there's no need to make sweeping
changes to the 3 types, nor to the dict type.

def __setattr__(self, name, value):
    if self.__readonly__:
        raise AttributeError('readonly')
    super().__setattr__(name, value)

FWIW, the idea of a flag for read-only could be applied to objects in
general, particularly in a future language addition.  "__readonly__"
is a good name for the flag so the precedent set by the three types in
this proposal would be a good one.

>
> I chose to make the code read-only by default instead of the opposite.
> In my test, almost all code can be made read-only without major issue,
> few code requires the "__readonly__ = False" marker.

Read-only by default would be backwards-incompatible, but having a
commandline flag (and/or env var) to enable it would be useful.

For classes a decorator could be nice, though it should wait until it
was more obviously worth doing.  I'm not sure it would matter for
functions, though the same decorator would probably work.

>
> A module is only made read-only by importlib after the module is
> loaded. The module is stil modifiable when code is executed until
> importlib has set all its attributes (ex: __loader__).

With a data descriptor and __setattr__ like I described above, there
is no need to make any changes to importlib.

> Optimizations possible when the code is read-only
> =================================================
...
> More optimizations
> ==================

+1

> One point remains unclear to me. There is a short time window between
> a module is loaded and the module is made read-only. During this
> window, we cannot rely on the read-only property of the code.
> Specialized code cannot be used safetly before the module is known to
> be read-only.

How big a problem would this be in practice?

> Issues with read-only code
> ==========================
>
> * Currently, it's not possible to allow again to modify a module,
> class or function to keep my implementation simple. With a registry of
> callbacks, it may be possible to enable again modification and call
> code to disable optimizations.

With the data descriptor approach toggling read-only would work.
Enabling/disabling optimizations at that point would depend on how
they were implemented.

> * Lazy initialization of module variables does not work anymore. A
> workaround is to use a mutable type. It can be a dict used as a
> namespace for module modifiable variables.

What do you mean by "lazy initialization of module variables"?

> * It is not possible yet to make the namespace of packages read-only.
> For example, "import encodings.utf_8" adds the symbol "utf_8" to the
> encodings namespace. A workaround is to load all submodules before
> making the namespace read-only. This cannot be done for some large
> modules. For example, the encodings has a lot of submodules, only a
> few are needed.

If read-only is only enforced via __setattr__ then the workaround is
to bind the submodule directly via pkg.__dict__.

-eric


More information about the Python-ideas mailing list