
On Mar 12, 2020, at 17:47, Marco Sulla <python-ideas@marco.sulla.e4ward.com> wrote:
Well, so all this discussion is only for freeing _one_ memory location earlier?
Basically, yes. And now I get why you were confused—you’re not confused about Python, you’re confused about the proposal, because you expected that there must be more to it than this, and kept trying to figure out what it was. Apologies for missing that. The OP’s motivating use case at the start of the thread was to allow itertools.combinations to keep reusing the same tuple object instead of (almost always) alternating back and forth between two tuple objects, and that’s only ever going to save a few hundred bytes of peak heap use. I don’t think the OP was worried about saving those bytes so much as saving the cost of calling the tuple C-API destructor and constructor over and over. But it’s still definitely a micro-optimization that would only occasionally be needed. In the numpy case that came up later, the object in question can be a lot bigger. People do regularly run numpy calculations on arrays that take up 40% of their RAM, so releasing one array one step earlier can mean the difference between having 80% of your RAM in use at a time and 120%, which is obviously a big deal. But I don’t think the proposal actually helps there in the same way as it helps for the combinations tuple.
Seriously... as the other users already said, if someone really need it, he can use normal loops instead of comprehensions and split complex expression so temporary objects are immediately free.
Well, it’s actually the opposite problem: temporary objects are already sort of free; breaking them up into separate statements means you have to assign the intermediate values, which means they’re no longer temporary and therefore no longer free. A del expression would give you a more convenient way to regain the better temporary behavior after refactoring into separate statements and giving the intermediates names. But I agree anyway. The fact that you have to be more verbose when you need to write a specific micro-optimization doesn’t seem like a huge problem to me.
Furthermore, 2x speedup observations are done by benchmarks. In the real world, how can you be sure that L2 cache is not already filled up? :-)
I suspect the numpy benchmarks are applicable in a lot of real life numpy code. But those benchmarks are for a patch to numpy that’s only vaguely similar to this proposal, and neither requires it nor could benefit from it, so…