[Python-ideas] Overloading assignment concrete proposal (Re: Re: Operator as first class citizens -- like in scala -- or yet another new operator?)

19 Jun 2019

      The thread on operators as first-class citizens keeps getting vague ideas about assignment overloading that wouldn't actually work, or don't even make sense. I think it's worth writing down the simplest design that would actually work, so people can see why it's not a good idea (or explain why they think it would be anyway).

in pseudocode, just as x += y means this:
    xval = globals()['x']    try:        result = xval.__iadd__(y)    except AttributeError:        result = xval.__add__(y)    globals()['x'] = result
… x = y would mean this:
    try:
        xval = globals()['x']        result = xval.__iassign__(y)
    except (LookupErrorr, AttributeError):        result = y    globals()['x'] = result
If you don't understand why this would work or why it wouldn't be a great idea (or want to nitpick details), read on; otherwise, you can skip the rest of this message.
---
First, why is there even a problem? Because Python doesn't even have "variables" in the same sense that languages like C++ that allow assignment overloading do.
In C++, a variable is an "lvalue", a location with identity and type, and an object is just a value that lives in a location. So assignment is an operation on variables: x = 2 is the same as XClass::operator=(&x, y).
In Python, an object is a value that lives wherever it wants, with identity and type, and a variable is just a name that can be bound to a value in a namespace. So assignment is an operation on namespaces, not on variables: x = 2 is the same as dict.__settem__(globals(), 'x', 2).
The same thing is true for more complicated assignments. For example, a.x = 2 is just an operation on a's namespace instead of the global namespace: type(a).__setattr__(a, 'x', 2). Likewise, a.b['x'] = 2 is type(a.b).__setitem__(a.b, 'x', 2), And so on,
---

But Python allows overloading augmented assignment. How does that work? There's a perfectly normal namespace lookup at the start and namespace store at the end—but in between, the existing value of the target gets to specify the value being assigned.
Immutable types like int don't define __iadd__, and __add__ creates and returns a new object. So, x += y ends up the same as x = x + y.
But mutable types like list define an __iadd__ that mutates self in-place and then returns self, so x gets harmlessly rebound to the same object it was already bound to. So x += y ends up the same as x.extend(y); x = x.
The exact same technique would work for overloading normal assignment. The only difference is that x += y is illegal if x is unbound, while x = y obviously has to be legal (and mean there is no value to intercept the assignment). So, the fallback happens when xval doesn't define __iassign__, but also when x isn't bound at all.

So, for immutable types like eint, and almost all mutable types like list—and when x is unbound—x = y does the same thing it always did.
But special types that want to act like transparent mutable handles define an __iassign__ that mutates self in place and returns self, so x gets harmlessly rebound to the same object. So x = y ends up the same as, say, x.set_target(y); x = x.
This all works the same if the variables are local rather than global, or for more complicated targets like attribution or subscription, and even for target lists; the intercept still happens the same way, between the (more complicated) lookup and storage steps.
---
Now, why is this a bad idea?
First, the benefit of __iassign__ is a lot smaller than __iadd__. A sizable fraction of "x += y" statements are for mutable "x" values, but only a rare handful of "x = y" statements would be for special handle "x" values. Even the same cost for a much smaller benefit would be a much harder sell.
But the runtime performance cost difference is huge. If augmented assignment weren't overloadable, it would still have to lookup the value, lookup and call a special method on it, and store the value. The only cost overloading adds is trying two special methods instead of one, which is tiny. But regular assignment doesn't have to do a value lookup or a special method call at all, only a store; adding those steps would roughly double the cost of every new variable assignment, and even more for every reassignment. And assignments are very common in Python, even within inner loops, so we're talking about a huge slowdown to almost every program out there.

Also, the fact that assignment always means assignment makes Python code easier both for humans to skim, and for automated programs to process. Consider, for example, a static type checker like mypy. Today, x = 2 means that x must now be an int, always. But if x could be a Signal object with an overloaded __iassign__, then, x = 2 might mean that x must now be an int, or it might mean that x must now be whatever type(x).__iassign__ returns.
Finally, the complexity of __iassign__ is at least a little higher than __iadd__. Notice that in my pseudocode above, I cheated—obviously the xval =  and result = lines are not supposed to recursively call the same pseudocode, but to directly store a value in new temporary local variable. In the real implementation, there wouldn't even be such a temporary variable (in CPython, the values would just be pushed on the stack), but for documenting the behavior, teaching it to students, etc., that doesn't matter. Being precise here wouldn't be hugely difficult, but it is a little more difficult than with __iadd__, where there's no similar potential confusion even possible.    On Wednesday, June 19, 2019, 10:54:04 AM PDT, Andrew Barnert via Python-ideas  wrote:  

 On Jun 18, 2019, at 12:43, nate lust  wrote:

I have been following this discussion for a long time, and coincidentally I recently started working on a project that could make use of assignment overloading. (As an aside it is a configuration system for a astronomical data analysis pipeline that makes heavy use of descriptors to work around historical decisions and backward compatibility). Our system makes use of nested chains of objects and descriptors and proxy object to manage where state is actually stored. The whole system could collapse down nicely if there were assignment overloading. However, this works OK most of the time, but sometimes at the end of the chain things can become quite complicated. I was new to this code base and tasked with making some additions to it, and wished for an assignment operator, but knew the data binding model of python was incompatible from p.
This got me thinking. I didnt actually need to overload assignment per-say, data binding could stay just how it was, but if there was a magic method that worked similar to how __get__ works for descriptors but would be called on any variable lookup (if the method was defined) it would allow for something akin to assignment. 

What counts as “variable lookup”? In particular:

For example:
class Foo:    def __init__(self):        self.value = 6        self.myself = weakref.ref(self)    def important_work(self):        print(self.value)

… why doesn’t every one of those “self” lookups call self.__get_self__()? It’s a local variable being looked up by name, just like your “foo” below, and it finds the same value, which has the same __get_self__ method on its type.
The only viable answer seems to that it does. So, to avoid infinite circularity, your class needs to use the same kind of workaround used for attribute lookup in classes that define __getattribute__ and/or __setattr__:

    def important_work(self):        print(object.__get_self__(self).value)

    def __get_self__(self):        return object.__get_self__(self).myself

But even that won’t work here, because you still have to look up self to call the superclass method on it. I think it would require some new syntax, or at least something horrible involving locals(), to allow you to write the appropriate methods.

    def __get_self__(self):        return self.myself

Besides recursively calling itself for that “self” lookup, why doesn’t this also call weakref.ref.__get_self__ for that “myself” lookup? It’s an attribute lookup rather than a local namespace lookup, but surely you need that to work too, or as soon as you store a Foo instance in another object it stops overloading.
For this case there’s at least an obvious answer: because weakref.ref doesn’t override that method, the variable lookup doesn’t get intercepted. But notice that this means every single value access in Python now has to do an extra special-method lookup that almost always does nothing, which is going to be very expensive.

    def __setattr__(self, name, value):        self.value = value

You can’t write __setattr__ methods this way. That assignment statement just calls self.__setattr__(‘value’, value), which will endlessly recurse. That’s why you need something like the object method call to break the circularity.
Also, this will take over the attribute assignments in your __init__ method. And, because it ignores the name and always sets the value attribute, it means that self.myself = is just going to override value rather than setting myself.
To solve both of these problems, you want a standard __setattr__ body here:

    def __setattr__(self, name, value):        object.__setattr__(self, name, value)

But that immediately makes it obvious that your __setattr__ isn’t actually doing anything, and could just be left out entirely.

foo = Foo() # Create an instancefoo # The interpreter would return foo.myselffoo.value # The interpreter would return foo.myself.value

foo = 19 # The interpreter would run foo.myself = 6 which would invoke foo.__setattr__('myself', 19)

For this last one, why would it do that? There’s no lookup here at all, only an assignment.
The only way to make this work would be for the interpreter to lookup the current value of the target on every assignment before assigning to it, so that lookup could be overloaded. If that were doable, then assignment would already be overloadable, and this whole discussion wouldn’t exist.
But, even if you did add that, __get_self__ is just returning the value self.myself, not some kind of reference to it. How can the interpreter figure out that the weakref.ref value it got came from looking up the name “myself” on the Foo instance? (This is the same reason __getattr__ can’t help you override attribute setting, and a separate method __setattr__ is needed.) To make this work, you’d need a __set_self__ to go along with __get_self__. Otherwise, your changes not only don’t provide a way to do assignment overloading, they’d break assignment overloading if it existed.
Also, all of the extra stuff you’re trying to add on top of assignment overloading can already be done today. You just want a transparent proxy: a class whose instances act like a reference to some other object, and delegate all methods (and maybe attribute lookups and assignments) to it. This is already pretty easy; you can define __getattr__ (and __setattr__) to do it dynamically, or you can do some clever stuff to create static delegating methods (and properties) explicitly at object-creation or class-creation time. Then foo.value returns foo.myself.value, foo.important_work() calls the Foo method but foo.__str__() calls foo.myself.__str__(), you can even make it pass isinstance checks if you want. The only thing it can’t do is overload assignment.
I think the real problem here is that you’re thinking about references to variables rather than values, and overloading operators on variables rather than values, and neither of those makes sense in Python. Looking up, or assigning to, a local variable named “foo” is not an operation on “the foo variable”, because there is no such thing; it’s an operation on the locals namespace._______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4JMNZE...
Code of Conduct: http://python.org/psf/codeofconduct/