[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

Antoine Pitrou solipsis at pitrou.net
Sun Aug 11 12:33:16 CEST 2013

Hi Eli,

On Sat, 10 Aug 2013 17:12:53 -0700
Eli Bendersky <eliben at gmail.com> wrote:
> Note how doing some sys.modules acrobatics and re-importing suddenly
> changes the internal state of a previously imported module. This happens
> because:
> 1. The first import of 'csv' (which then imports `_csv) creates
> module-specific state on the heap and associates it with the current
> sub-interpreter. The list of dialects, amongst other things, is in that
> state.
> 2. The 'del's wipe 'csv' and '_csv' from the cache.
> 3. The second import of 'csv' also creates/initializes a new '_csv' module
> because it's not in sys.modules. This *replaces* the per-sub-interpreter
> cached version of the module's state with the clean state of a new module

I would say this is pretty much expected. The converse would be a bug
IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
subinterpreter support:

  "Extension module initialization currently has a few deficiencies.
  There is no cleanup for modules, the entry point name might give
  naming conflicts, the entry functions don't follow the usual calling
  convention, and multiple interpreters are not supported well."

Re-initializing state when importing a module anew makes extension
modules more like pure Python modules, which is a good thing.

I think the piece of interpretation you offered yesterday on IRC may be
the right explanation for the ET shenanigans:

  "Maybe the bug is that ParseError is kept in per-module state, and
  also exported from the module?"

PEP 3121 doesn't offer any guidelines for using its API, and its
example shows PyObject* fields in a module state.

I'm starting to think that it might be a bad use of PEP 3121. PyObjects
can, and therefore should be stored in the extension module dict where
they will participate in normal resource management (i.e. garbage
collection). If they are in the module dict, then they shouldn't be
held alive by the module state too, otherwise the (currently tricky)
lifetime management of extension modules can produce oddities.

So, the PEP 3121 "module state" pointer (the optional opaque void*
thing) should only be used to hold non-PyObjects.  PyObjects should go
to the module dict, like they do in normal Python modules.  Now, the
reason our PEP 3121 extension modules abuse the module state pointer to
keep PyObjects is two-fold:

1. it's surprisingly easier (it's actually a one-liner if you don't
handle errors - a rather bad thing, but all PEP 3121 extension modules
currently don't handle a NULL return from PyState_FindModule...)

2. it protects the module from any module dict monkeypatching. It's not
important if you are using a generic API on the PyObject, but it is if
the PyObject is really a custom C type with well-defined fields.

Those two issues can be addressed if we offer an API for it. How about:

  PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
                                  const char *name,
                                  PyObject *restrict_type)

  *def* is a pointer to the module definition.
  *name* is the attribute to look up on the module dict.
  *restrict_type*, if non-NULL, is a type object the looked up attribute
  must be an instance of.

  Lookup an attribute in the current interpreter's extension module
  instance for the module definition *def*.
  Returns a *new* reference (!), or NULL if an error occurred.
  An error can be:
  - no such module exists for the current interpreter (ImportError?
      RuntimeError? SystemError?)
  - no such attribute exists in the module dict (AttributeError)
  - the attribute doesn't conform to *restrict_type* (TypeError)

So code can be written like:

  PyObject *dialects = PyState_GetModuleAttr(
      &_csvmodule, "dialects", &PyDict_Type);
  if (dialects == NULL)
      return NULL;



More information about the Python-Dev mailing list