[Python-3000] Draft PEP: Module Initialization and finalization

Tue Apr 11 20:05:59 CEST 2006

Abstract: Module initialization currently has a few deficiencies.
There is no cleanup for modules, the entry point name might give
naming conflicts, the entry functions don't follow the usual
calling convention, and multiple interpreters are not supported
well. This PEP addresses these issues.

Module Finalization
-----------------

Currently, C modules are initialized usually once and then "live"
forever. The only exception is when Py_Finalize() is called: then
the initialization routine is invoked a second time. This is bad
from a resource management point of view: memory and other resources
might get allocated each time initialization is called, but there
is no way to reclaim them. As a result, there is currently no
way to completely release all resources Python has allocated.

Entry point name conflicts
--------------------------

The entry point is currently called init<module>. This might
conflict with other symbols also called init<something>. In
particular, initsocket is known to have conflicted in the
past (this specific problem got resolved as a side effect of
renaming the module to _socket).

Entry point signature
---------------------

The entry point is currently a procedure (returning void).
This deviates from the usual calling conventions; callers
can find out whether there was an error during initialization
only by checking PyErr_Occurred. The entry point should
return a PyObject*, which will be the module created, or
NULL in case of an exception.

Multiple Interpreters
---------------------

Currently, extension modules share their state across all
interpreters. This allows for undesirable information leakage
across interpreters: one script could permanently corrupt
objects in an extension module, possibly breaking all
scripts in other interpreters.

Specification
-------------

The module initialization routines change their signature
to

  PyObject *PyInit_<modulename>(PyInterpreterState*)

The initialization routine will be invoked once per
interpreter, when the module is imported. It should
return a new module object each time.

In addition, the module MAY implement a finalizer

  PyObject *PyFinalize_<modulename>(PyInterpreterState*)

which returns None on success.

In order to store per-module state in C variables,
the following API is introduced:

  struct PyModule_Slot;
  void
  PyInterpreter_AllocateSlot(PyInterpreterState*,
                             PyModule_Slot*, size_t);
  void*
  PyInterpreter_AccessSlot(PyInterpreterState*,
                           PyModule_Slot*);

Each module should declare a single global variable
of struct PyModule_Slot. This will get initialized to
some unique value on the first call of
PyInterpreter_AllocateSlot; this and each subsequent call
also allocate and zero-initialize a block of memory
(per interpreter and module).

To simplify access, the module code can put the
lines

  PyModule_Slot module_slot;
  struct state{
    /* members, e.g. PyObject *member; */
  };
  #define STATE PyModule_STATE(module_slot, struct state)

after including Python.h, and then access the module's
state simply with STATE->member. This macro expands to

  ((struct state*)PyInterpreter_AccessSlot(
    PyInterpreter_Current(), &module_slot))

Discussion
----------

It would be possible to support the existing init<module>
functions if that is desirable; in that case, nothing
would change.