
On Mar 12, 2020, at 11:23, Marco Sulla <mail.python.org@marco.sulla.e4ward.com> wrote:
On Thu, 12 Mar 2020 at 19:10, Andrew Barnert <abarnert@yahoo.com.via.e4ward.com> wrote:
No, because the return value only lives until (effectively) the end of the statement. A statement has no value, so the effect of an expression statement is to immediately discard whatever the value of the expression was. (In CPython this is an explicit stack pop.)
Except for the case of interactive mode, of course, where an expression statement binds the value to the _ variable.
Okay, so it seems I was confused by the REPL feature.
Anyway, since the object is not discarded until the expression statement ends, it has no effect on speed.
Well, in many cases, I think it should be possible to tell from either the AST or the bytecode that a value that’s popped and discarded could have been popped earlier. Compilers for lots of other languages routinely do this kind of static analysis even though it’s a lot more complicated in, say, C++ than in Python. But I don’t think that would actually help this case. Each temporary is immediately consumed by the next opcode, not discarded, so the problem must be that the __add__ method (or rather the C API slot) can’t tell that one of its operands is a temporary and can therefore be reused if that’s helpful. And as I understand it (from a quick scan) the reason it can’t tell isn’t that the refcount isn’t 1 (which is something CPython could maybe fix if it were a problem, but it isn’t). Rather, it is already 1, but a refcount of 1 doesn’t actually prove a temporary value, because there are PyNumber C API functions that may be using the value without incref’ing it that end up calling the numpy code. Which is something I don’t think CPython could fix without removing those API functions. Anyway, I think you’re right that del couldn’t help here. Even if it could, surely the solution to “a+b+c+d is slow” can’t be “just write (del (del a+b)+c)+d and it’ll be fast” if we want anyone to use Python without laughing and/or cursing.
On the contrary, for what I have read, the numpy patch removes the temporary ndarrays immediately. This speeds up calcula with large ndarrays. So there's no need to change the `del` behaviour. Python could implement something similar to the numpy patch for large immutables.
But they’re not actually immutable values. They’re mutable values that we happen to know aren’t being mutated by any Python code, but might still be mutated by arbitrary C API code farther up the stack. How could CPython detect that, except by the same kind of hack that numpy does to see if there are any C API calls on the stack and assume that any of them might have mutated any values they could see? I suppose it could track calls out to C functions as they happen and mark every value that was live before the call, and then instead of numpy checking refs==1 is could check refs==1 && !c_flagged, and then it wouldn’t need the C frame hackery. But that seems like it would slow down everything in exchange for occasionally helping numpy and a few other C extension libs. And I’m not sure it would work anyway. Values created by C extensions have to start off flagged, but then values created and used internally by numpy won’t look temporary to numpy, right?