[Python-Dev] advice needed: best approach to enabling "metamodules"?
Guido van Rossum
guido at python.org
Tue Dec 2 01:00:50 CET 2014
On Mon, Dec 1, 2014 at 1:38 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Mon, Dec 1, 2014 at 4:06 AM, Guido van Rossum <guido at python.org> wrote:
> > On Sun, Nov 30, 2014 at 5:42 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >> On Mon, Dec 1, 2014 at 1:27 AM, Guido van Rossum <guido at python.org>
> >> > Nathaniel, did you look at Brett's LazyLoader? It overcomes the
> >> > issue by using a module loader that makes all modules instances of a
> >> > (trivial) Module subclass. I'm sure this approach can be backported as
> >> > far
> >> > as you need to go.
> >> The problem is that by the time your package's code starts running,
> >> it's too late to install such a loader. Brett's strategy works well
> >> for lazy-loading submodules (e.g., making it so 'import numpy' makes
> >> 'numpy.testing' available, but without the speed hit of importing it
> >> immediately), but it doesn't help if you want to actually hook
> >> attribute access on your top-level package (e.g., making 'numpy.foo'
> >> trigger a DeprecationWarning -- we have a lot of stupid exported
> >> constants that we can never get rid of because our rules say that we
> >> have to deprecate things before removing them).
> >> Or maybe you're suggesting that we define a trivial heap-allocated
> >> subclass of PyModule_Type and use that everywhere, as a
> >> quick-and-dirty way to enable __class__ assignment? (E.g., return it
> >> from PyModule_New?) I considered this before but hesitated b/c it
> >> could potentially break backwards compatibility -- e.g. if code A
> >> creates a PyModule_Type object directly without going through
> >> PyModule_New, and then code B checks whether the resulting object is a
> >> module by doing isinstance(x, type(sys)), this will break. (type(sys)
> >> is a pretty common way to get a handle to ModuleType -- in fact both
> >> types.py and importlib use it.) So in my mind I sorta lumped it in
> >> with my Option 2, "minor compatibility break". OTOH maybe anyone who
> >> creates a module object without going through PyModule_New deserves
> >> whatever they get.
> > Couldn't you install a package loader using some install-time hook?
> > Anyway, I still think that the issues with heap types can be overcome.
> > didn't you bring that up before here? Was the conclusion that it's
> > impossible?
> I've brought it up several times but no-one's really discussed it :-).
That's because nobody dares to touch it. (Myself included -- I increased
the size of typeobject.c from ~50 to ~5000 lines in a single intense
editing session more than a decade ago, and since then it's been basically
> I finally attempted a deep dive into typeobject.c today myself. I'm
> not at all sure I understand the intricacies correctly here, but I
> *think* __class__ assignment could be relatively easily extended to
> handle non-heap types, and in fact the current restriction to heap
> types is actually buggy (IIUC).
> object_set_class is responsible for checking whether it's okay to take
> an object of class "oldto" and convert it to an object of class
> "newto". Basically it's goal is just to avoid crashing the interpreter
> (as would quickly happen if you e.g. allowed ".__class__ = dict").
> Currently the rules (spread across object_set_class and
> compatible_for_assignment) are:
> (1) both oldto and newto have to be heap types
> (2) they have to have the same tp_dealloc
> (3) they have to have the same tp_free
> (4) if you walk up the ->tp_base chain for both types until you find
> the most-ancestral type that has a compatible struct layout (as
> checked by equiv_structs), then either
> (4a) these ancestral types have to be the same, OR
> (4b) these ancestral types have to have the same tp_base, AND they
> have to have added the same slots on top of that tp_base (e.g. if you
> have class A(object): pass and class B(object): pass then they'll both
> have added a __dict__ slot at the same point in the instance struct,
> so that's fine; this is checked in same_slots_added).
> The only place the code assumes that it is dealing with heap types is
> in (4b) -- same_slots_added unconditionally casts the ancestral types
> to (PyHeapTypeObject*). AFAICT that's why step (1) is there, to
> protect this code. But I don't think the check actually works -- step
> (1) checks that the types we're trying to assign are heap types, but
> this is no guarantee that the *ancestral* types will be heap types.
> [Also, the code for __bases__ assignment appears to also call into
> this code with no heap type checks at all.] E.g., I think if you do
> class MyList(list):
> __slots__ = ()
> class MyDict(dict):
> __slots__ = ()
> MyList().__class__ = MyDict()
> then you'll end up in same_slots_added casting PyDict_Type and
> PyList_Type to PyHeapTypeObjects and then following invalid pointers
> into la-la land. (The __slots__ = () is to maintain layout
> compatibility with the base types; if you find builtin types that
> already have __dict__ and weaklist and HAVE_GC then this example
> should still work even with perfectly empty subclasses.)
Have you filed this as a bug? I believe nobody has discovered this problem
before. I've confirmed it as far back as 2.5 (I don't have anything older
> Okay, so suppose we move the heap type check (step 1) down into
> same_slots_added (step 4b), since AFAICT this is actually more correct
> anyway. This is almost enough to enable __class__ assignment on
> modules, because the cases we care about will go through the (4a)
> branch rather than (4b), so the heap type thing is irrelevant.
> The remaining problem is the requirement that both types have the same
> tp_dealloc (step 2). ModuleType itself has tp_dealloc ==
> module_dealloc, while all(?) heap types have tp_dealloc ==
Yeah, I can't see a way that type_new() can create a type whose tp_dealloc
> Here again, though, I'm not sure what purpose this
> check serves. subtype_dealloc basically cleans up extra slots, and
> then calls the base class tp_dealloc. So AFAICT it's totally fine if
> oldto->tp_dealloc == module_dealloc, and newto->tp_dealloc ==
> subtype_dealloc, so long as newto is a subtype of oldto -- b/c this
> means newto->tp_dealloc will end up calling oldto->tp_dealloc anyway.
I guess the simple check is an upper bound (or whatever that's called -- my
math-speak is rusty ;-) for the necessary-and-sufficient check that you're
> OTOH it's not actually a guarantee of anything useful to see that
> oldto->tp_dealloc == newto->tp_dealloc == subtype_dealloc, because
> subtype_dealloc does totally different things depending on the
> ancestry tree -- MyList and MyDict above pass the tp_dealloc check,
> even though list.tp_dealloc and dict.tp_dealloc are definitely *not*
> So I suspect that a more correct way to do this check would be something
> PyTypeObject *old__real_deallocer = oldto, *new_real_deallocer = newto;
> while (old_real_deallocer->tp_dealloc == subtype_dealloc)
> old_real_deallocer = old_real_deallocer->tp_base;
> while (new_real_deallocer->tp_dealloc == subtype_dealloc)
> new_real_deallocer = new_real_deallocer->tp_base;
> if (old_real_deallocer->tp_dealloc != new_real_deallocer)
> error out;
I'm not set up to disagree with you on this any more...
> Module subclasses would pass this check. Alternatively it might make
> more sense to add a check in equiv_structs that
> (child_type->tp_dealloc == subtype_dealloc || child_type->tp_dealloc
> == parent_type->tp_dealloc); I think that would accomplish the same
> thing in a somewhat cleaner way.
> Obviously this code is really subtle though, so don't trust any of the
> above without review from someone who knows typeobject.c better than
> me! (Antoine?)
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev