[Python-3000] PEP 3121: Module Initialization and finalization

Fri Apr 27 20:23:49 CEST 2007

On 4/27/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Continuing a discussion from last April, I added
> PEP 3121, included below for convenience. Please
> comment.
>
> Regards,
> Martin
>
> PEP: 3121
> Title: Module Initialization and finalization
> Version: $Revision: 54998 $
> Last-Modified: $Date: 2007-04-27 10:31:58 +0200 (Fr, 27 Apr 2007) $
> Author: Martin v. Löwis <martin at v.loewis.de>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 27-Apr-2007
> Python-Version: 3.0
> Post-History:
>
> Abstract
> ========
>
> Module initialization currently has a few deficiencies.  There is no
> cleanup for modules, the entry point name might give naming conflicts,
> the entry functions don't follow the usual calling convention, and
> multiple interpreters are not supported well. This PEP addresses these
> issues.
>

Thanks for trying to solve this, Martin!

> Module Finalization
> ===================
>
> Currently, C modules are initialized usually once and then "live"
> forever. The only exception is when Py_Finalize() is called: then
> the initialization routine is invoked a second time. This is bad
> from a resource management point of view: memory and other resources
> might get allocated each time initialization is called, but there
> is no way to reclaim them. As a result, there is currently no
> way to completely release all resources Python has allocated.
>
> Entry point name conflicts
> ==========================
>
> The entry point is currently called init<module>. This might conflict
> with other symbols also called init<something>. In particular,
> initsocket is known to have conflicted in the past (this specific
> problem got resolved as a side effect of renaming the module to
> _socket).
>
> Entry point signature
> =====================
>
> The entry point is currently a procedure (returning void).  This
> deviates from the usual calling conventions; callers can find out
> whether there was an error during initialization only by checking
> PyErr_Occurred. The entry point should return a PyObject*, which will
> be the module created, or NULL in case of an exception.
>
> Multiple Interpreters
> =====================
>
> Currently, extension modules share their state across all
> interpreters. This allows for undesirable information leakage across
> interpreters: one script could permanently corrupt objects in an
> extension module, possibly breaking all scripts in other interpreters.
>

After the intro and up to here, it would seem like changing each
section into a sub-section of a "Problems" section of some sort would
make the document more organized and easier to read.

> Specification
> =============
>
> The module initialization routines change their signature
> to::
>
>   PyObject *PyInit_<modulename>()
>
> The initialization routine will be invoked once per
> interpreter, when the module is imported. It should
> return a new module object each time.
>
> In order to store per-module state in C variables,
> each module object will contain a block of memory
> that is interpreted only by the module. The amount
> of memory used for the module is specified at
> the point of creation of the module.
>
> In addition to the initialization function, a module
> may implement a number of additional callback
> function, which are invoked when the module's
> tp_traverse, tp_clear, and tp_free functions are
> invoked, and when the module is reloaded.
>
> The entire module definition is combined in a struct
> PyModuleDef::
>
>   struct PyModuleDef{
>     PyModuleDef_Base m_base;  /* To be filled out by the interpreter */
>     Py_ssize_t m_size; /* Size of per-module data */
>     PyMethodDef *m_methods;
>     inquiry m_reload;
>     traverseproc m_traverse;
>     inquiry m_clear;
>     freefunc m_free;
>   };
>
> Creation of a module is changed to expect an optional
> PyModuleDef*. The module state will be
> null-initialized.
>
> Each module method with be passed the module object

I think you meant "will", not "with".

> as the first parameter. To access the module data,
> a function::
>
>   void* PyModule_GetData(PyObject*);
>
> will be provided. In addition, to lookup a module
> more efficiently than going through sys.modules,
> a function::
>
>   PyObject* PyState_FindModule(struct PyModuleDef*);
>
> will be provided. This lookup function will use an
> index located in the m_base field, to find the
> module by index, not by name.
>
> As all Python objects should be controlled through
> the Python memory management, usage of "static"
> type objects is discouraged, unless the type object
> itself has no memory-managed state.

Ooh, I like this side-effect of the proposal!

> To simplify
> definition of heap types, a new method::
>
>   PyTypeObject* PyType_Copy(PyTypeObject*);
>
> is added.
>
> Example
> =======
>
> xxmodule.c would be changed to remove the initxx
> function, and add the following code instead::
>
>   struct xxstate{
>     PyObject *ErrorObject;
>     PyObject *Xxo_Type;
>   };
>
>   #define xxstate(o) ((struct xxstate*)PyModule_GetState(o))
>
>   static int xx_traverse(PyObject *m, visitproc v,
>                          void *arg)
>   {
>     Py_VISIT(xxstate(m)->ErrorObject);
>     Py_VISIT(xxstate(m)->Xxo_Type);
>     return 0;
>   }
>
>   static int xx_clear(PyObject *m)
>   {
>     Py_CLEAR(xxstate(m)->ErrorObject);
>     Py_CLEAR(xxstate(m)->Xxo_Type);
>     return 0;
>   }
>
>   static struct PyModuleDef xxmodule = {
>     {}, /* m_base */
>     sizeof(struct xxstate),
>     &xx_methods,
>     0,  /* m_reload */
>     xx_traverse,
>     xx_clear,
>     0,  /* m_free - not needed, since all is done in m_clear */
>   }
>
>   PyObject*
>   PyInit_xx()
>   {
>     PyObject *res = PyModule_New("xx", &xxmodule);
>     if (!res) return NULL;
>     xxstate(res)->ErrorObject = PyErr_NewException("xx.error, NULL, NULL);
>     if (!xxstate(res)->ErrorObject) {
>       Py_DECREF(res);
>       return NULL;
>     }
>     xxstate(res)->XxoType = PyType_Copy(&Xxo_Type);
>     if (!xxstate(res)->Xxo_Type) {
>       Py_DECREF(res);
>       return NULL;
>     }
>     return res;
>   }
>

How would I raise xx.error in C code now?  I am guessing like this::

  PyObject* module = PyState_FindModule(&xxmodule);
  if (!module)
    return NULL;
  PyObject* xx_error = xxstate(module)->ErrorObject;
  if (!xx_error) {
    PyErr_SetString(PyExc_SystemError, "xx.error missing");
    return NULL;
  }
  PyErr_SetString(xx_error, "oops");
  return NULL;

Since most objects will move to being memory-managed, one needs to
worry about checking that the object still exists.  I assume I didn't
go overboard with the error checking here, right?  I guess people are
going to end up writing helper functions to access the various data
fields as the above would get rather tedious if you had to write it
more than twice.

>
> Discussion
> ==========
>
> Tim Peters reports in [1]_ that PythonLabs considered such a feature
> at one point, and lists the following additional hooks which aren't
> currently supported in this PEP:
>
>  * when the module object is deleted from sys.modules
>
>  * when Py_Finalize is called
>
>  * when Python exits
>

Wouldn't the above be covered by the deallocation of the module?

Overall I like the idea.  I think people will need to get used to the
idea of writing more accessor functions for the data field, though, if
using static variables to hold things like exceptions becomes
discouraged.

-Brett