[Python-ideas] Augmented assignment [was Re: Adding "+" and "+=" operators to dict]

Sun Feb 15 06:40:15 CET 2015

On Sat, Feb 14, 2015 at 07:10:19PM -0800, Andrew Barnert wrote:
> On Feb 14, 2015, at 17:30, Steven D'Aprano <steve at pearwood.info> wrote:
[snip example of tuple augmented assignment which both succeeds and 
fails at the same time]

> >> I have argued that this never would have come up if augmented assignment
> >> were only used for in-place operations,
> > 
> > And it would never happen if augmented assignment *never* was used for 
> > in-place operations. If it always required an assignment, then if the 
> > assignment failed, the object in question would be unchanged.
> > 
> > Alas, there's no way to enforce the rule that __iadd__ doesn't modify 
> > objects in place, and it actually is a nice optimization when they can 
> > do so.
> 
> No, it's not just a nice optimization, it's an important part of the 
> semantics. The whole point of being able to share mutable objects is 
> being able to mutate them in a way that all the sharers see.

Sure. I didn't say that it was "just" (i.e. only) a nice optimization. 
Augmented assignment methods like __iadd__ are not only permitted but 
encouraged to perform changes in place.

As you go on to explain, those semantics are language design choice. Had 
the Python developers made different choices, then Python would 
naturally be different. But given the way Python works, you cannot 
enforce any such "no side-effects" rule for mutable objects.

> If __iadd__ didn't modify objects in-place, you'd have this:
> 
>     py> a, b = [], []
>     py> c = [a, b]
>     py> c[1] += [1]
>     py> b
>     []

Correct. And that's what happens if you write c[1] = c[1] + [1]. If you 
wanted to modify the object in place, you could write c[1].extend([1]) 
or c[1].append(1).

The original PEP for this feature:

http://legacy.python.org/dev/peps/pep-0203/

lists two rationales:

"simplicity of expression, and support for in-place operations". It 
doesn't go into much detail for the reason *why* in-place operations are 
desirable, but the one example given (numpy arrays) talks about avoiding 
needing to create a new object, possibly running out of memory, and then 
"possibly delete the original object, depending on reference count". To 
me, saving memory sounds like an optimization :-)

But of course you are correct Python would be very different indeed if 
augmented assignment didn't allow mutation in-place.

[...]
> > I wonder if we can make this work more clearly if augmented assignments 
> > checked whether the same object is returned and skipped the assignment 
> > in that case?
> 
> I already answered that earlier. There are plenty of objects that 
> necessarily rely on the assumption that item/attr assignment always 
> means __setitem__/__setattr__.
> 
> Consider any object with a cache that has to be invalidated when one 
> of its members or attributes is set. Or a sorted list that may have to 
> move an element if it's replaced. Or really almost any case where a 
> custom __setattr__, an @property or custom descriptor, or a 
> non-trivial __setitem__ is useful. All of there would break.

Are there many such objects where replacing a member with itself is a 
meaningful change?

E.g. would the average developer reasonably expect that as a deliberate 
design feature, spam = spam should be *guaranteed* to not be a no-op? I 
know that for Python today, it may not be a no-op, if spam is an 
expression such as foo.bar or foo[bar], and I'm not suggesting that it 
would be reasonable to change the compiler semantics so that "normal" = 
binding should skip the assignment when both sides refer to the same 
object.

But I wonder whether *augmented assignment* should do so. I don't do 
this lightly, but only to fix a wart in the language. See below.

> What you're essentially proposing is that augmented assignment is no 
> longer really assignment, so classes that want to manage assignment in 
> some way can't manage augmented assignment.

I am suggesting that perhaps we should rethink the idea that augmented 
assignment is *unconditionally* a form of assignment. We're doing this 
because the current behaviour breaks under certain circumstances. If it 
simply raised an exception, that would be okay, but the fact that the 
operation succeeds and yet still raises an exception, that's pretty bad.

In the case of mutable objects inside immutable ones, we know the 
augmented operation actually doesn't require there to be an assignment, 
because the mutation succeeds even though the assignment fails. Here's 
an example again, for anyone skimming the thread or has gotten lost:

t = ([1, 2], None)
t[0] += [1]

The PEP says:

    The __iadd__ hook should behave similar to __add__,
    returning the result of the operation (which could be `self')
    which is to be assigned to the variable `x'.

I'm suggesting that if __iadd__ returns self, the assignment be skipped. 
That would solve the tuple case above. You're saying it might break code 
that relies on some setter such as __setitem__ or __setattr__ being 
called. E.g.

myobj.spam = []  # spam is a property
myobj.spam += [1]  # myobj expects this to call spam's setter

But that's already broken, because the caller can trivially bypass the 
setter for any other in-place mutation:

myobj.spam.append(1)
myobj.spam[3:7] = [1, 2, 3, 4, 5]
del myobj.spam[2]
myobj.spam.sort()

etc. In other words, in the "cache invalidation" case (etc.), no real 
class that directly exposes a mutable object to the outside world can 
rely on a setter being called. It would have to wrap it in a proxy to 
intercept mutator methods, or live with the fact that it won't be 
notified of mutations.

I used to think that Python had no choice but to perform an 
unconditional assignment, because it couldn't tell whether the operation 
was a mutation or not. But I think I was wrong. If the result of 
__iadd__ is self, then either the operation was a mutation, or the 
assignment is "effectively" a no-op. (That is, the result of the op 
hasn't changed anything.)

I say "effectively" a no-op in scare quotes because setters will 
currently be called in this situation:

myobj.spam = "a"
myobj.spam += ""  # due to interning, "a" + "" may be the same object

Currently that will call spam's setter, and the argument will be the 
identical object as spam's current value. It may be that the setter is 
rather naive, and it doesn't bother to check whether the new value is 
actually different from the old value before performing its cache 
invalidation (or whatever). So you are right that this change will 
affect some code.

That doesn't mean we can't fix this. It just means we have to go through 
a transition period, like for any other change to Python's semantics. 
During the transition, you may need to import from __future__, or there 
may be a warning, or both. After the transition, writing:

myobj.spam = myobj.spam

will still call the spam setter, always. But augmented assignment may 
not, if __iadd__ returns the same object. (Not just an object with the 
same value, it has to be the actual same object.)

I think that's a reasonable change to make, to remove this nasty gotcha 
from the language.

For anyone relying on their cache being invalidated when it is touched, 
even if the touch otherwise makes no difference, they just have to deal 
with a slight change in the definition of "touched". Augmented 
assignment won't work. Instead of using cache.time_to_live += 0 to cause 
an invalidation, use cache.time_to_live = cache.time_to_live. Or better 
still, provide an explicit cache.invalidate() method.

-- 
Steve