
On Mar 12, 2020, at 08:08, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
TL;DR: should we make `del x` an expression that returns the value of `x`.
I agree that a “move” would have to be a keyword rather than a function, and that adding a new keyword is too high a bar, and that “del” is the best option of the existing keywords, since an expression statement with a del expression would be moving to an ignored temporary, having the same semantics as the del statement. The first problem is that it doesn’t actually have the same semantics, even visibly to the user. For example, in interactive mode, evaluating an expression statement causes the result of the expression to be bound to the _ variable. So del x at the REPL will no longer release the value; you’ll have to del x and then rebind _ (by, say, evaluating None). Also, you need to think through what happens with a del expression inside all kinds of contexts that currently can’t have del—eval, lambdas and comprehensions (can I del something from the outer scope?), etc.; just defining what it does in the specific case of deleting the loop variable isn’t sufficient. I think this could all be worked out, but I’m not sure this is needed. Is there a realistic use case where this optimization gives you a significant benefit? I tried running timeit on your loop with and without the del and it’s virtually undetectable. If I remove the print so the loop does nothing at all, then it’s a bit over 2% faster with the del. But if I then use a different iterator that doesn’t optimize and reuse the same value this way, it’s about 4% slower with the del (presumably the cost of executing the del itself?). So, it seems like this del expression is something that would be used rarely, when you need to micro-optimize a loop and 2% one way or the other matters. Meanwhile, wouldn’t it slow down all existing code that uses del? Instead of a del statement that just removes a binding, there will be a del expression that removes a binding and pushes a value on the stack, which, when used as an expression statement, will be popped off and ignored. This also means that the value, instead of getting decref’d once, instead has to be incref’d, decref’d, and then decref’d again later, and even if you can optimize out the extra ref twiddle in the middle, being released later (at least one more pass through the eval loop) is going to slow down some code. And that’s before you even look at the cases of interactive mode, etc. Also, experience with other languages makes me worry that adding move to an existing language for optimization purposes rarely goes well. Since you mentioned std::move, I assume you’re familiar with C++, so you must have spent time debugging code where you thought a value was being moved but it was actually being copied, or where some template instantiation won’t compile because it’s trying to do an A(A&) followed by a B(A&&) instead of a B(A&) but it can’t tell you what the problem is, or where changing a spam(string) to spam(const string&) to speed things up on one implementation (by turning two copies into one) ended up slowing things down far worse on another (by turning zero copies into one), and so on. Compare Rust, where moving values around is easy to write, read, and debug because the language and stdlib were designed around move, rather than it being retrofitted in. Obviously you’re not proposing to turn Python into a big mess like C++, but you are proposing to add complexity for something that can only be a micro optimization that we can only hope isn’t going to be used pervasively.
## Motivation
I noticed yesterday that `itertools.combinations` has an optimization for when the returned tuple has no remaining ref-counts, and reuses it - namely, the following code:
for v in itertools.combinations([1, 2, 3], 1): ... print(id(v)) ... del v # without this, the optimization can't take place 2500926199840 2500926199840 2500926199840
will print the same id three times. However, when used as a list comprehension, the optimization can't step in, and I have no way of using the `del` keyword
[id(v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926199072, 2500926200992]
It looks like it’s still reusing the tuple every _other_ time. Presumably this is just the object allocator stepping in rather than anything tricky that itertools does? But if that’s good enough to make it only 2% slower in your use case and probably even closer in realistic ones, maybe that implies that if anything is needed here in the first place, we should look for an automated way that applies everywhere rather than a user-driven way that applies narrowly. What if a for loop, instead of nexting the iterator and binding the result to the loop variable, instead unbound the loop variable, nexted the Iterator, and bound the result to the loop variable? Presumably this would slow down all loops a very tiny bit (by the cost of checking whether a name is bound) but speed up some—and it would speed up this one even more than your explicit del could (because it doesn’t need an whole extra opcode to loop over, if nothing else). Maybe there’s a way a clever optimization in the compiler and/or interpreter could figure out when it is and isn’t worth doing?
## Optional extension
For consistency, `x = (del foo.attr)` and `x = (del foo[i])` could also become legal expressions, and `__delete__`, `__delattr__`, and `__delitem__` would now have return values. Existing types would be free to continue to return `None`.
Returning None seems like it would be confusing. Your code either expects del to return the value, or it doesn’t; you can’t write code that expects del to sometimes return the value but sometimes doesn’t. But not allowing it at all seems even more confusing. Python grammar doesn’t have a bunch of different rules for “target” for different contexts, and the few special cases it does have, people always want to un-special them the first time they run into them.