[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

Eli Bendersky eliben at gmail.com
Sun Aug 11 15:26:55 CEST 2013


On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

>
> Hi Eli,
>
> On Sat, 10 Aug 2013 17:12:53 -0700
> Eli Bendersky <eliben at gmail.com> wrote:
> >
> > Note how doing some sys.modules acrobatics and re-importing suddenly
> > changes the internal state of a previously imported module. This happens
> > because:
> >
> > 1. The first import of 'csv' (which then imports `_csv) creates
> > module-specific state on the heap and associates it with the current
> > sub-interpreter. The list of dialects, amongst other things, is in that
> > state.
> > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > 3. The second import of 'csv' also creates/initializes a new '_csv'
> module
> > because it's not in sys.modules. This *replaces* the per-sub-interpreter
> > cached version of the module's state with the clean state of a new module
>
> I would say this is pretty much expected.


I'm struggling to see how it's expected. The two imported csv modules are
different (i.e. different id() of members), and yet some state is shared
between them. I think the root reason for it is that "PyModuleDev
_csvmodule" is uniqued per interpreter, not per module instance.

Even if dialects were not a PyObject, this would still be problematic,
don't you think? And note that here, unlike the ET.ParseError case, I don't
think the problem is exporting internal per-module state as a module
attribute. The following two are un-reconcilable, IMHO:

1. Wanting to have two instances of the same module in the same interpterer.
2. Using a global shared PyModuleDef between all instances of the same
module in the same interpterer.



> The converse would be a bug
> IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
> subinterpreter support:
>
>   "Extension module initialization currently has a few deficiencies.
>   There is no cleanup for modules, the entry point name might give
>   naming conflicts, the entry functions don't follow the usual calling
>   convention, and multiple interpreters are not supported well."
>
> Re-initializing state when importing a module anew makes extension
> modules more like pure Python modules, which is a good thing.
>
>
> I think the piece of interpretation you offered yesterday on IRC may be
> the right explanation for the ET shenanigans:
>
>   "Maybe the bug is that ParseError is kept in per-module state, and
>   also exported from the module?"
>
> PEP 3121 doesn't offer any guidelines for using its API, and its
> example shows PyObject* fields in a module state.
>
> I'm starting to think that it might be a bad use of PEP 3121. PyObjects
> can, and therefore should be stored in the extension module dict where
> they will participate in normal resource management (i.e. garbage
> collection). If they are in the module dict, then they shouldn't be
> held alive by the module state too, otherwise the (currently tricky)
> lifetime management of extension modules can produce oddities.
>
>
> So, the PEP 3121 "module state" pointer (the optional opaque void*
> thing) should only be used to hold non-PyObjects.  PyObjects should go
> to the module dict, like they do in normal Python modules.  Now, the
> reason our PEP 3121 extension modules abuse the module state pointer to
> keep PyObjects is two-fold:
>
> 1. it's surprisingly easier (it's actually a one-liner if you don't
> handle errors - a rather bad thing, but all PEP 3121 extension modules
> currently don't handle a NULL return from PyState_FindModule...)
>
> 2. it protects the module from any module dict monkeypatching. It's not
> important if you are using a generic API on the PyObject, but it is if
> the PyObject is really a custom C type with well-defined fields.
>
> Those two issues can be addressed if we offer an API for it. How about:
>
>   PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
>                                   const char *name,
>                                   PyObject *restrict_type)
>
>   *def* is a pointer to the module definition.
>   *name* is the attribute to look up on the module dict.
>   *restrict_type*, if non-NULL, is a type object the looked up attribute
>   must be an instance of.
>
>   Lookup an attribute in the current interpreter's extension module
>   instance for the module definition *def*.
>   Returns a *new* reference (!), or NULL if an error occurred.
>   An error can be:
>   - no such module exists for the current interpreter (ImportError?
>       RuntimeError? SystemError?)
>   - no such attribute exists in the module dict (AttributeError)
>   - the attribute doesn't conform to *restrict_type* (TypeError)
>
> So code can be written like:
>
>   PyObject *dialects = PyState_GetModuleAttr(
>       &_csvmodule, "dialects", &PyDict_Type);
>   if (dialects == NULL)
>       return NULL;
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130811/4615d0be/attachment.html>


More information about the Python-Dev mailing list