PEP 447: Adding type.__getdescriptor__

Hi, I’m trying to get up to speed again w.r.t. PEP 447 (and to be honest, with CPython’s development workflow as its been too long since I’ve actually done work on the CPython codebase…). Let me start by recapping the PEP (<https://www.python.org/dev/peps/pep-0447/ <https://www.python.org/dev/peps/pep-0447/>>) is about: The problem I’m trying to solve is that super.__getattribute__ basicy assumes looking at the __dict__ of classes on the MRO is all that’s needed to check if those classes provide an attribute. The core of both object.__getattribute__ and super.__getattribute__ for finding a descriptor on the class is basically (from the PEP): def _PyType_Lookup(tp, name): mro = tp.mro() assert isinstance(mro, tuple) for base in mro: assert isinstance(base, type) try: return base.__dict__[name] except KeyError: pass return None Note that the real implementation in in C, and accesses base.__dict__ directly instead of through Python’s attribute lookup (which would lead to infinite recursion). This is problematic when the class is dynamically populated, as the descriptor may not yet be present in the class __dict__. This is not a problem for normal attribute lookup, as you can replace the __getattribute__ method of the class to do the additional work (at the cost of having to reimplement all of __getattribute__). This is a problem for super() calls though, as super will unconditionally use the lookup code above. The PEP proposed to replace “base.__dict__[name]” by “base.__class__.__getdescriptor__(name)” (once again, in C with direct access to the two slots instead of accessing them through normal attribute lookup). I have two open questions about this PEP: 1) Last time around Mark Shannon worried that this introduces infinite recursion in the language itself (in my crummy summary, please read this message to get the real concern <https://mail.python.org/pipermail/python-dev/2015-July/140938.html <https://mail.python.org/pipermail/python-dev/2015-July/140938.html>>). Is this truly a problem? I don’t think there is a problem, but I’m worried that I don’t fully understand Mark’s concerns. 2) PEP 487 introduced __init_subclass__ as a class method to avoid having to write a metaclass for a number of use cases. My PEP currently does require a metaclass, but it might be nicer to switch to a regular class method instead (like __init_subclass__). My primary usecase for this PEP is PyObjC, which does populate the class dictionary on demand for performance and correctness reasons and PyObC therefore currently includes a replacement for super. For PyObjC’s implementation making __getdescriptor__ a classmethod would be a win as this would avoid creating yet another level of metaclasses in C code (PyObjC already has a Python class and metaclass for every Objective-C class, PEP 447 would require the introduction of a metaclass for those metaclasses). BTW. Two other open issues w.r.t. this PEP are forward porting the patch (in issue 18181) and performing benchmarks. The patch doesn’t apply cleanly to the current tip of the tree, I’ll try to update it tomorrow. And with some luck will manage to update PyObjC’s implementation as well to validate the design. Ronald

On 1 December 2017 at 01:23, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
I think the second point there may actually allow you to resolve the first one, by way of making `__getdescriptor__` an optional override of the typical lookup algorithm. That is: def _PyType_Lookup(tp, name): # New optional override for descriptor lookups try: # Ordinary attribute lookup, *NOT* a descriptor lookup getdesc = tp.__getdescriptor__ except AttributeError: pass else: return getdesc(name) # Default algorithm used in the absence of an override mro = tp.mro() assert isinstance(mro, tuple) for base in mro: assert isinstance(base, type) try: return base.__dict__[name] except KeyError: pass return None If you did go this way, then we'd want to add a "types.getdescriptor(cls, name)" API to expose _PyType_Lookup at the Python layer, since "getattr(type(obj), name)" wouldn't be an accurate emulation of the algorithm any more. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Maybe, but how would this work with super()? Super walks the MRO of type of the instance, but skips the class on the MRO. This is not equivalent to walking the MRO of the second class on the MRO when you use multiple inheritance, This also has some other disadvantages. The first is that tp.__getdescriptor__ would replace the default behaviour for the entire MRO and it would be possible to have different behavior for classes on the MRO. The second, minor. one is that __getdescriptor__ would have to reimplement the default logic of walking the MRO, but that logic is fairly trivial. BTW. getattr(type(obj), name) is not an accurate emulation of _PyType_Lookup even now, thanks to metaclasses. In particular: ``` class A_meta (type): @property def description(self): return "meta description" @description.setter def description(self, v): raise RuntimeError class A (metaclass=A_meta): @property def description(self): return "description" @description.setter def description(self, v): raise RuntimeError a = A() print(A.description) # prints “meta description" print(a.description) # prints “description" print(getattr(type(a), 'description’)) # prints “meta description" ``` The setter definitions are necessary to ensure that the properties are data descriptors, which are handled differently than function descriptors by __getattribute__. Ronald

On 1 December 2017 at 19:15, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
I believe those can both be addressed by structuring the override a little differently, and putting it at the level of individual attribute retrieval on a specific class (rather than on all of its subclasses): def _PyType_Lookup(tp, name): mro = tp.mro() assert isinstance(mro, tuple) for base in mro: assert isinstance(base, type) try: getdesc = base.__dict__["__getdescriptor__"] except KeyError: try: return base.__dict__[name] except KeyError: pass else: try: return getdesc(tp, base, name) except AttributeError: pass return None In that version, the __getdescriptor__ signature would be: def __getdescriptor__(dynamic_cls, base_cls, attr): ... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

That’s basically what’s in the PEP, except that the PEP says that type will implement __getdescriptor__ to make cooperative subclassing easier. The current patch inlines type.__getdescriptor__ in the lookup code for efficiency reasons (that is, the runtime cost for not using this feature is basically a pointer test instead of a function call), but would work just as well without inlining. I’m pretty sure that the first concern isn’t really there, in the end attribute/descriptor resolution ends up in object and type whose implementation must be magic in some way, even without this PEP. The second question is more a design question: what’s the better design, having __getdescriptor__ as a class method on classes or as method on metaclasses? Either one would work, but a class method appears to be easier to use and with the introduction of __init_subclass__ there is a precedent for going for a class method. The current PEP claims that a method on a metaclass would be better to avoid subtle problems, but ignores the conceptual cost of adding a metaclass. The subtle problem is that a class can have two direct superclasses with a __getdescriptor__ when using multiple inheritance, but that can already be an issue for other methods and that currently includes __getattribute__ for most of not all usecases where __getdescriptor__ would be useful. Ronald

On 1 December 2017 at 21:04, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
The second question is more a design question: what’s the better design, having __getdescriptor__ as a class method on classes or as method on metaclasses? Either one would work, but a class method appears to be easier to use and with the introduction of __init_subclass__ there is a precedent for going for a class method.
The current PEP claims that a method on a metaclass would be better to avoid subtle problems, but ignores the conceptual cost of adding a metaclass. The subtle problem is that a class can have two direct superclasses with a __getdescriptor__ when using multiple inheritance, but that can already be an issue for other methods and that currently includes __getattribute__ for most of not all usecases where __getdescriptor__ would be useful.
I think it's having it being a method on the metaclass that creates the infinite regress Mark was worried about: since type's metaclass *is* type, if "__getdescriptor__" is looked up as a regular descriptor in its own right, then there's no base case to terminate the recursive lookup. By contrast, defining it as a class method opens up two options: 1. Truly define it as a class method, and expect implementors to call super().__getdescriptor__() if their own lookup fails. I think this will be problematic and a good way to get the kinds of subtle problems that prompted you to initially opt for the metaclass method. 2. Define it as a class method, but have the convention be for the *caller* to worry about walking the MRO, and hence advise class implementors to *never* call super() from __getdescriptor__ implementations (since doing so would nest MRO walks, and hence inevitably have weird outcomes). Emphasise this convention by passing the current base class from the MRO as the second argument to the method. The reason I'm liking option 2 is that it leaves the existing __getattribute__ implementations fully in charge of the MRO walk, and *only* offers a way to override the "base.__dict__[name]" part with a call to "base.__dict__['__getdescriptor__'](cls, base, name)" instead. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

But type.__getattribute__ is already special, it cannot be the same as its superclass implementation (because that’s object), and object.__getattribute__ logically uses type.__getattribute__ to get at type.__dict__. Adding __getdescriptor__ in the mix makes type.__getattribute__ a bit messier, but not by much.
The only subtle problem is having a class using multiple inheritance that uses two __getdescriptor__ implementations from two superclasses, where both do something beyond looking up the name in __dict__. Failing to call super().__getdescriptor__() is similar to failing to do so for other methods.
But that’s how I already define the method, that is the PEP proposes to change the MRO walking loop to: for cls in mro_list: try: return cls.__getdescriptor__(name) # was cls.__dict__[name] except AttributeError: # was KeyError pass Note that classes on the MRO control how to try to fetch the name at that level. The code is the same for __getdescriptor__ as a classmethod and as a method on the metaclass. I don’t think there’s a good technical reason to pick either option, other than that the metaclass option forces an exception when creating a class that inherits (using multiple inheritance) from two classes that have a custom __getdescriptor__. I’m not convinced that this is a good enough reason to go for the metaclass option.
Right. That’s why I propose __getdescriptor__ in the first place. This allows Python coders to do extra work (or different work) to fetch an attribute of a specific class in the MRO and works both with regular attribute lookup as well as lookup through super(). The alternative I came up with before writing the PEP is to add a special API that can be used by super(), but that leads to more code duplication as coders would have to implement both __getattribute__ and this other method. I guess another options would be a method that does the work including walking the MRO, but that leads to more boilerplate for users of the API. BTW. I haven’t had a lot of time to work on the implementation. The code in typeobject.c has changed enough that this needs more work than tweaking the patch until it applies cleanly. Ronald

On 2 December 2017 at 01:12, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
That's not exactly the same as what I'm suggesting, and it's the part that has Mark concerned about an infinite regression due to the "cls.__getdescriptor__" subexpression. What I'm suggesting: try: getdesc = base.__dict__["__getdescriptor__"] except KeyError: # Use existing logic else: try: getdesc(cls, base, name) except AttributeError: pass * Neither type nor object implement __getdescriptor__ * Calling super() in __getdescriptor__ would be actively discouraged without a base class to define the cooperation rules * If it's missing in the base class dict, fall back to checking the base class dict directly for the requested attribute * cls is injected into the call by the MRO walking code *not* the normal bound method machinery * Only "base.__dict__" needs to be assured of getting a hit on every base class What's currently in the PEP doesn't clearly define how it thinks "cls.__getdescriptor__" should work without getting itself into an infinite loop.
I don’t think there’s a good technical reason to pick either option, other than that the metaclass option forces an exception when creating a class that inherits (using multiple inheritance) from two classes that have a custom __getdescriptor__. I’m not convinced that this is a good enough reason to go for the metaclass option.
I'm still not clear on how you're planning to break the recursive loop for "cls.__getdescriptor__" when using a metaclass.
Yeah, I'm definitely in agreement with the intent of the PEP. I'm just thinking that we should aim to keep the expected semantics of the hook as close as we can to the semantics of the code it's replacing, rather than risking the introduction of potentially nested MRO walks (we can't outright *prevent* the latter, but we *can* quite clearly say "That's almost certainly a bad idea, avoid it if you possibly can"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I honestly don’t understand why that’s better than what is in the PEP, other than the way to locate __getdescriptor__. In particular: why change the signature of __getdescriptor__ from __getdescriptor__(base, name) to __getdescriptor__(base, cls, name)? And to keep things clear, what are “cls” and “base”? Based on your initial proposal I’m assuming that “cls” is the type of the object whose attribute we’re looking up, and “base” is the current class on the MRO that we’re looking at, that is: def _PyType_Lookup(cls, name): mro = cls.mro() for base in mro: … Getting back to the way __getdescriptor__ is accessed: Locating this using base.__dict__[‘__getdescriptor__’] instead of base.__getdescriptor__ makes it clear that this method is not accessed using normal attribute lookup, and matches the (currently non-functional) patch that access slots on the type object from C code. The current implementation access the slot on the meta type, that is type(base).__dict__[‘__getdescriptor__’], but that doesn’t fundamentally change the mechanics. My proposal would then end up as: for base in mro: try: getdescr = base.__dict__[“__getdescriptor__”] except KeyError: try: return base.__dict__[name] except KeyError: pass else: getdesc(base, name) This is for the classmethod version of the PEP that I’m currently preferring. The way the __getdescriptor__ implementation is located also might explain the confusion: I’m reasoning based on the implementation and that doesn’t match the PEP text in this regard. I didn’t get how this affected comprehension of the proposal, but think I’m getting there :-)
* Neither type nor object implement __getdescriptor__
I’m not convinced that this is strictly necessary, but also am not opposed to this. Another reason for not implementing __getdescriptor__ on type and object is that this means that the existence of the methods cannot confuse users (as this is a pretty esoteric API that most users will never have to touch). BTW. My patch more or less inlines the default __getdescriptor__, but that’s for performance reasons. Not implementing __getdescriptor__ on object and type would avoid possible confusion about this, and would make it clearer that the attribute lookup cache is still valid (and removing that cache would definitely be a bad idea)
As mentioned above I don’t understand why the change in interface is needed, in particular why __getdescriptor__ needs to have access to the original class and not just the current class on the MRO.
See above. The PEP doesn’t explain that “cls.__getdescriptor__” isn’t normal attribute access but uses slot access.
Does my updated proposal (base.__dict__[‘__getdescriptor__’]) adres this issue? My intend with the PEP was indeed to stay as close as possible to the current behaviour and just replace peeking into base.__dict__ by calling an overridable special method on base. The alternative is to make it possible to use something that isn’t builtin.dict as the __dict__ for types, but that requires significant changes to CPython because the C code currently assumes that __dict__ is an instance of exactly builtin.dict and changing that is likely significantly more work than just replacing PyDict_GetItem by PyMapping_GetItem (different failure modes, different refcount rules, different re-entrancy, …) BTW. Thanks for feedback, I’m finally starting to understand how the PEP isn’t clear enough and how to fix that. Ronald

On 3 December 2017 at 21:36, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
It's the specifically the way we locate __getdescriptor__ that I'm interested in :)
In particular: why change the signature of __getdescriptor__ from __getdescriptor__(base, name) to __getdescriptor__(base, cls, name)?
So that the __getdescriptor__ implementation in the base has access to the class currently being accessed. That way you can have a common algorithm in the base class, but then tune that algorithm based on class level attributes on the class being defined. My inspiration for the suggestion is a combination of the signature of https://docs.python.org/3/reference/datamodel.html#object.__get__ (where you always get passed the class that holds the property, but are only sometimes passed the instance being accessed), and the signature of super() itself (which you have to pass both the dynamic class, and the current position in the MRO).
Yep (I wrote out the full version a few posts further up the thread, but only quoted the inner snippet here).
Yep, I was going off the description in the PEP, which just shows a normal "base.__getdescriptor__" access. If the implementation already doesn't do that and instead works as you suggest above, then that's a good thing, and we're actually already in agreement about how it should work - the PEP just needs to be updated to say that :)
Right, I think the null case here is to *not* include them, and then see if that causes any insurmountable problems. I don't expect it will, which would leave us with the simpler option of omitting them.
I think we could likely live without it, but it's easy to pass in, and adding it later would be difficult. If you use the signature order "(cls, base, name)", it also means that __getdescriptor__ implementations will behave more like regular class methods, even though they're not actually looked up that way.
Aye, it's sounding like it really is just the "cls.__getdescriptor__" short hand in the PEP that was confusing me, and the way you're intending to implement it matches the way I'm suggesting it should work.
The alternative is to make it possible to use something that isn’t builtin.dict as the __dict__ for types, but that requires significant changes to CPython because the C code currently assumes that __dict__ is an instance of exactly builtin.dict and changing that is likely significantly more work than just replacing PyDict_GetItem by PyMapping_GetItem (different failure modes, different refcount rules, different re-entrancy, …)
I agree with that - changing the execution namespace is one thing, but changing the actual storage underlying cls.__dict__ would be a far more difficult prospect. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (2)
-
Nick Coghlan
-
Ronald Oussoren