There is nothing more humbling than sending nonsense out to an entire mailing list. It will teach me to stop an process my steam of consciousness critically instead of just firing off a message. You were right that I was not at all considering how python variables must of course work (both assignment and lookup). Your message was quite thorough, I hope it was out of a sense of teaching and not frustration. If the latter I am sorry to be the cause.
There are other supporting changes, but the core idea is just that the type struct (typeobject.c) now has one more field (I called it tp_setself) that under normal circumstances is just 0. Then in the insertdict function (dictobject.c) (which does just what it sounds), post looking up the old value, and pre setting anything new, I added a block to check if tp_setself is defined on the type of the old value. If it is this means the user defined a __setself__ method on a type and it is called with the value that was to be assigned causing whatever side effects the user chose for that function, and the rest of insertdict body is never run.
This is by no means pull request level of coding (Plenty of clean up, tests to run, and to see how this works in nestings etc.), but is a proof of concept. Out of a sense of transparency I will make note that there is an issue when __repr__ is called in the interpreter. Tt is causing insertdict to be called which could lead to an infinite recursion. This is reason for the if old_value != value check in the diff, as it prevents the recursion. However it would probably be better to not let the user build infinite cycles anyway.
This has a runtime penalty (on all executed python sets) of a type stuct field lookup, a pointer offset on the struct (looking up the tp_setself field), and a comparison to null on all python code, which is to say not much at all. A runtime side effect is that the type struct grows in size by the width of a pointer. As other type fields have been added for things like async, I am guessing this is not a large issue.
There is of course a runtime cost added to whatever is done in the __setself__ method, but that is something a user would expect since they are adding the method.
The question about confusion reading code with this behavior still remains, i.e. this might be very surprising using a library that defines this. On this point I see the argument both ways and not really have an opinion. In some ways it is surprising that you can overload other operators but not assignment (like people might be familiar with from other languages). People don't seem to have an expectation that + always works the same way on any given library code, however python has long had the standing assignment behavior so it could be a surprise to have a change now.
Below is how this runs on the python I have built on my system with the above patch:
Python 3.7.4rc1+ (heads/3.7-dirty:7b2a913bd8, Jun 20 2019, 14:43:10)
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo:
def __init__(self, o):
self.o = o
def __setself__(self, v):
self.v = v
>>> f = Foo(5)
>>> print(f)
<__main__.Foo object at 0x7f486bb8d300>
>>> print(f.o)
5
>>> print(f.v)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute 'v'
>>> f = "hello world"
>>> print(f.v)
hello world
>>> print(f)
<__main__.Foo object at 0x7f486bb8d300>
>>>
>>> class Bar:
def __init__(self, o):
self.o = o
>>>
>>> b = Bar(5)
>>> print(b)
<__main__.Bar object at 0x7f486bb8d6f0>
>>> print(b.o)
5
>>> b = "hello world"
>>> print(b)
hello world
>>>