[Python-3000] PEP 3121: Module Initialization and finalization

"Martin v. Löwis" martin at v.loewis.de
Fri Apr 27 10:33:52 CEST 2007


Continuing a discussion from last April, I added
PEP 3121, included below for convenience. Please
comment.

Regards,
Martin

PEP: 3121
Title: Module Initialization and finalization
Version: $Revision: 54998 $
Last-Modified: $Date: 2007-04-27 10:31:58 +0200 (Fr, 27 Apr 2007) $
Author: Martin v. Löwis <martin at v.loewis.de>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Apr-2007
Python-Version: 3.0
Post-History:

Abstract
========

Module initialization currently has a few deficiencies.  There is no
cleanup for modules, the entry point name might give naming conflicts,
the entry functions don't follow the usual calling convention, and
multiple interpreters are not supported well. This PEP addresses these
issues.

Module Finalization
===================

Currently, C modules are initialized usually once and then "live"
forever. The only exception is when Py_Finalize() is called: then
the initialization routine is invoked a second time. This is bad
from a resource management point of view: memory and other resources
might get allocated each time initialization is called, but there
is no way to reclaim them. As a result, there is currently no
way to completely release all resources Python has allocated.

Entry point name conflicts
==========================

The entry point is currently called init<module>. This might conflict
with other symbols also called init<something>. In particular,
initsocket is known to have conflicted in the past (this specific
problem got resolved as a side effect of renaming the module to
_socket).

Entry point signature
=====================

The entry point is currently a procedure (returning void).  This
deviates from the usual calling conventions; callers can find out
whether there was an error during initialization only by checking
PyErr_Occurred. The entry point should return a PyObject*, which will
be the module created, or NULL in case of an exception.

Multiple Interpreters
=====================

Currently, extension modules share their state across all
interpreters. This allows for undesirable information leakage across
interpreters: one script could permanently corrupt objects in an
extension module, possibly breaking all scripts in other interpreters.

Specification
=============

The module initialization routines change their signature
to::

  PyObject *PyInit_<modulename>()

The initialization routine will be invoked once per
interpreter, when the module is imported. It should
return a new module object each time.

In order to store per-module state in C variables,
each module object will contain a block of memory
that is interpreted only by the module. The amount
of memory used for the module is specified at
the point of creation of the module.

In addition to the initialization function, a module
may implement a number of additional callback
function, which are invoked when the module's
tp_traverse, tp_clear, and tp_free functions are
invoked, and when the module is reloaded.

The entire module definition is combined in a struct
PyModuleDef::

  struct PyModuleDef{
    PyModuleDef_Base m_base;  /* To be filled out by the interpreter */
    Py_ssize_t m_size; /* Size of per-module data */
    PyMethodDef *m_methods;
    inquiry m_reload;
    traverseproc m_traverse;
    inquiry m_clear;
    freefunc m_free;
  };

Creation of a module is changed to expect an optional
PyModuleDef*. The module state will be
null-initialized.

Each module method with be passed the module object
as the first parameter. To access the module data,
a function::

  void* PyModule_GetData(PyObject*);

will be provided. In addition, to lookup a module
more efficiently than going through sys.modules,
a function::

  PyObject* PyState_FindModule(struct PyModuleDef*);

will be provided. This lookup function will use an
index located in the m_base field, to find the
module by index, not by name.

As all Python objects should be controlled through
the Python memory management, usage of "static"
type objects is discouraged, unless the type object
itself has no memory-managed state. To simplify
definition of heap types, a new method::

  PyTypeObject* PyType_Copy(PyTypeObject*);

is added.

Example
=======

xxmodule.c would be changed to remove the initxx
function, and add the following code instead::

  struct xxstate{
    PyObject *ErrorObject;
    PyObject *Xxo_Type;
  };

  #define xxstate(o) ((struct xxstate*)PyModule_GetState(o))

  static int xx_traverse(PyObject *m, visitproc v,
                         void *arg)
  {
    Py_VISIT(xxstate(m)->ErrorObject);
    Py_VISIT(xxstate(m)->Xxo_Type);
    return 0;
  }

  static int xx_clear(PyObject *m)
  {
    Py_CLEAR(xxstate(m)->ErrorObject);
    Py_CLEAR(xxstate(m)->Xxo_Type);
    return 0;
  }

  static struct PyModuleDef xxmodule = {
    {}, /* m_base */
    sizeof(struct xxstate),
    &xx_methods,
    0,  /* m_reload */
    xx_traverse,
    xx_clear,
    0,  /* m_free - not needed, since all is done in m_clear */
  }

  PyObject*
  PyInit_xx()
  {
    PyObject *res = PyModule_New("xx", &xxmodule);
    if (!res) return NULL;
    xxstate(res)->ErrorObject = PyErr_NewException("xx.error, NULL, NULL);
    if (!xxstate(res)->ErrorObject) {
      Py_DECREF(res);
      return NULL;
    }
    xxstate(res)->XxoType = PyType_Copy(&Xxo_Type);
    if (!xxstate(res)->Xxo_Type) {
      Py_DECREF(res);
      return NULL;
    }
    return res;
  }


Discussion
==========

Tim Peters reports in [1]_ that PythonLabs considered such a feature
at one point, and lists the following additional hooks which aren't
currently supported in this PEP:

 * when the module object is deleted from sys.modules

 * when Py_Finalize is called

 * when Python exits

 * when the Python DLL is unloaded (Windows only)


References
==========

.. [1] Tim Peters, reporting earlier conversation about such a feature
   http://mail.python.org/pipermail/python-3000/2006-April/000726.html


Copyright
=========

This document has been placed in the public domain.



More information about the Python-3000 mailing list