Make `del x` an expression evaluating to `x`
TL;DR: should we make `del x` an expression that returns the value of `x`. ## Motivation I noticed yesterday that `itertools.combinations` has an optimization for when the returned tuple has no remaining ref-counts, and reuses it - namely, the following code: >>> for v in itertools.combinations([1, 2, 3], 1): ... print(id(v)) ... del v # without this, the optimization can't take place 2500926199840 2500926199840 2500926199840 will print the same id three times. However, when used as a list comprehension, the optimization can't step in, and I have no way of using the `del` keyword >>> [id(v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926199072, 2500926200992] `itertools.combinations` is not the only place to make this optimization - parts of numpy use it too, allowing a = (b * c) + d to elide the temporary `b*c`. This elision can't happen with the spelling bc = b * c a = bc + d My suggestion would be to make `del x` an expression, with semantics "unbind the name `x`, and evaluate to its value". This would allow: >>> [id(del v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926200992, 2500926200992] and bc = b * c a = (del bc) + d # in C++, this would be `std::move(bc) + d` ## Why a keyword Unbinding a name is not something that a user would expect a function to do. Functions operate on values, not names, so `move(bc)` would be surprising. ## Why `del` `del` already has "unbind this name" semantics, and allow it to be used as an expression would break no existing code. `move x` might be more understandable, but adding new keywords is expensive ## Optional extension For consistency, `x = (del foo.attr)` and `x = (del foo[i])` could also become legal expressions, and `__delete__`, `__delattr__`, and `__delitem__` would now have return values. Existing types would be free to continue to return `None`.
I can be wrong, but for what I know, `del variable_name` does not optimize the code, in the sense of performance. On the contrary, it should slow it down, since it simply removes the binding between the variable name from the namespace, allowing the gc to free the memory if that variable was the last reference to the object. In a for and a list comprehension, I suppose it's unneeded, since the variable points automatically to another memory location, so the reference is removed automatically. Furthermore, if you want to delete a temporary variable, you could do: bc = b * c a = bc + d del bc but if bc is not a very big object, I suspect it will simply slow down the program. On Thu, 12 Mar 2020 at 16:06, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
TL;DR: should we make `del x` an expression that returns the value of `x`.
## Motivation
I noticed yesterday that `itertools.combinations` has an optimization for when the returned tuple has no remaining ref-counts, and reuses it - namely, the following code:
>>> for v in itertools.combinations([1, 2, 3], 1): ... print(id(v)) ... del v # without this, the optimization can't take place 2500926199840 2500926199840 2500926199840
will print the same id three times. However, when used as a list comprehension, the optimization can't step in, and I have no way of using the `del` keyword
>>> [id(v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926199072, 2500926200992]
`itertools.combinations` is not the only place to make this optimization - parts of numpy use it too, allowing
a = (b * c) + d
to elide the temporary `b*c`. This elision can't happen with the spelling
bc = b * c a = bc + d
My suggestion would be to make `del x` an expression, with semantics "unbind the name `x`, and evaluate to its value". This would allow:
>>> [id(del v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926200992, 2500926200992]
and
bc = b * c a = (del bc) + d # in C++, this would be `std::move(bc) + d`
## Why a keyword
Unbinding a name is not something that a user would expect a function to do. Functions operate on values, not names, so `move(bc)` would be surprising.
## Why `del`
`del` already has "unbind this name" semantics, and allow it to be used as an expression would break no existing code.
`move x` might be more understandable, but adding new keywords is expensive
## Optional extension
For consistency, `x = (del foo.attr)` and `x = (del foo[i])` could also become legal expressions, and `__delete__`, `__delattr__`, and `__delitem__` would now have return values. Existing types would be free to continue to return `None`. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WVQNAT... Code of Conduct: http://python.org/psf/codeofconduct/
Marco Sulla wrote:
I can be wrong, but for what I know, del variable_name does not optimize the code, in the sense of performance.
Correct, by itself `del variable_name` does not optimize the code - however, there exist functions implemented in C (which I gave examples of) with special cases for when `sys.getrefcount(x) == 1`, which means they are free to repurpose an existing object.
allowing the gc to free the memory if that variable was the last reference to the object.
The gc may not be relevant here - in CPython, objects are cleaned up immediately as part of reference counting. The GC only kicks in for circular object graphs.
In a for and a list comprehension, I suppose it's unneeded, since the variable points automatically to another memory location, so the reference is removed automatically.
The nth reference is not removed until the n+1th reference has been created. This is unavoidable in a for loop without breaking existing code, but I think could (and should?) be changed in a list comprehension
Furthermore, if you want to delete a temporary variable, you could do:
bc = b * c a = bc + d del bc
This doesn't hit the optimization I was talking about, because `ndarray.__add__(bc, d)` contains `if sys.getrefcount(self) == 1`, but here the local variable results in a second refcount.
On Mar 12, 2020, at 08:53, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
The nth reference is not removed until the n+1th reference has been created. This is unavoidable in a for loop without breaking existing code,
First, is it unavoidable? What could it break if the loop unbound the variable before nexting the iterator? Unless the iterator knew what variable was being used and explicitly looked it up in its __next__ (which I think you could only do with non-portable frame hackery), there would be no visible difference inside the iterator except for the one you’re explicitly looking for here. And there wouldn’t be any visible difference in the loop code. In fact, you could probably test this with a bytecode-hacking decorator that just replaces any GET_ITER/FOR_ITER/STORE_FAST n with GET_ITER/DELETE_FAST n/FOR_ITER/STORE_FAST n, and use that to prove that it improves performance in your case, only slightly worsens it in other cases (a real implementation could modify the behavior of the bytecodes instead of inserting a new one, but this proof of concept can’t), and doesn’t break any code (write an import hook that implies this transformation to all code, and then run the test suite with the hook injected).
but I think could (and should?) be changed in a list comprehension
I think that would actually be confusing. Comprehension loops are exactly the same as for statements in this way, which means one thing to learn instead of two. In fact, if I were designing a new language, the loop variable would be a new binding (a separate variable) each time through, because I think that’s what people actually expect. (Everyone runs into the problem of capturing loop variables in closures —not just in Python, but in radically different languages like C++. And everyone not only has a hard time understanding this the first time, they also keep running into it over and over even after they’ve learned it.) But it’s obviously too late to do that for Python. And the fact that Python explicitly leaves the binding after the loop makes the behavior that most languages have less confusing in Python than in most languages (but still somewhat confusing). Unbinding at the start of the loop instead of the end doesn’t have that problem—there is no place that user code can see where anything is changed. And that works just as well in for statements as in comprehensions, so there’s no difference to learn, understand, and remember.
On Thu, 12 Mar 2020 at 15:06, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
will print the same id three times. However, when used as a list comprehension, the optimization can't step in, and I have no way of using the `del` keyword
>>> [id(v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926199072, 2500926200992]
You can get the same effect by not naming the value being thrown away:
list(map(id, itertools.combinations([1,2,3], 1))) [2267883926192, 2267883926192, 2267883926192]
`itertools.combinations` is not the only place to make this optimization - parts of numpy use it too, allowing
a = (b * c) + d
to elide the temporary `b*c`. This elision can't happen with the spelling
bc = b * c a = bc + d
My suggestion would be to make `del x` an expression, with semantics "unbind the name `x`, and evaluate to its value". This would allow:
>>> [id(del v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926200992, 2500926200992]
and
bc = b * c a = (del bc) + d # in C++, this would be `std::move(bc) + d`
But why would you want this anyway? Do you have an example of an application where this sort of micro-optimisation is significant enough to justify a fairly major language change? (And yes, I know the optimisation in itertools could just as easily be dismissed as "why is this needed?", but that didn't need a language change to implement...) Paul
You can get the same effect by not naming the value being thrown away
This is absolutely true, although I don't think that's a particularly strong argument against it. The same can be said of `std::move` in C++. The purpose of this suggestion is primarily to allow introducing names without it coming at the cost of unwanted reference counts. Consider the following expression, with no intermediate names: h = get_h_long_name(get_g_long_name(get_f_long_name(x)) With my proposal, this would become exactly equivalent to the following in terms of reference counts f = get_f_long_name(del x) g = get_g_long_name(del f) h = get_h_long_name(del g)
Do you have an example of an application where this sort of micro-optimisation is significant enough to justify a fairly major language change?
As an example of these optimizations being valuable, see https://github.com/numpy/numpy/pull/7997, which claims the optimization I described at the beginning resulted in a 1.5x-2x speedup.
but that didn't need a language change to implement...
The language change I'm proposing isn't about _implementing_ these optimizations, it's about allowing users to exploit them without having to sacrifice naming their variables. Eric
On Thu, 12 Mar 2020 at 16:42, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Do you have an example of an application where this sort of micro-optimisation is significant enough to justify a fairly major language change?
As an example of these optimizations being valuable, see https://github.com/numpy/numpy/pull/7997, which claims the optimization I described at the beginning resulted in a 1.5x-2x speedup.
I was amused by the first comment in that thread: "First, let me say that this is terrifying and this comment should not be taken as condoning it."
but that didn't need a language change to implement...
The language change I'm proposing isn't about _implementing_ these optimizations, it's about allowing users to exploit them without having to sacrifice naming their variables.
Fair point. But I still question the cost vs benefit of this. After all, this is a *language* change you're proposing - not just limited to CPython with its particular implementation of GC. Consider x = some_complex_object() y = (del x) What is to stop a Python implementation (not CPython, maybe, but some other one) from discarding the object in x as soon as it's unreferenced by the (del x) expression? Then y wouldn't be able to hold that value. After all, in theory, after the first line our object has a refcount of 1, then the (del x) changes that refcount to 0, then y = (the value) changes it back to 1. But it could be discarded in that period when it has refcount 0. The above paragraph talked about refcounts, and about precisely when objects are discarded. None of which is part of the language definition. Can you describe your intended semantics for (del x) *without* using implementation-specific terminology? Because that would be important if this were to become a language feature. Just to be clear - I see what you want to achieve here, and I understand why it might be important to you. But the bar for a *language* feature is by necessity higher than that. Paul
On Fri, Mar 13, 2020 at 4:04 AM Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 12 Mar 2020 at 16:42, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Do you have an example of an application where this sort of micro-optimisation is significant enough to justify a fairly major language change?
As an example of these optimizations being valuable, see https://github.com/numpy/numpy/pull/7997, which claims the optimization I described at the beginning resulted in a 1.5x-2x speedup.
I was amused by the first comment in that thread: "First, let me say that this is terrifying and this comment should not be taken as condoning it."
but that didn't need a language change to implement...
The language change I'm proposing isn't about _implementing_ these optimizations, it's about allowing users to exploit them without having to sacrifice naming their variables.
Fair point. But I still question the cost vs benefit of this. After all, this is a *language* change you're proposing - not just limited to CPython with its particular implementation of GC. Consider
x = some_complex_object() y = (del x)
What is to stop a Python implementation (not CPython, maybe, but some other one) from discarding the object in x as soon as it's unreferenced by the (del x) expression? Then y wouldn't be able to hold that value. After all, in theory, after the first line our object has a refcount of 1, then the (del x) changes that refcount to 0, then y = (the value) changes it back to 1. But it could be discarded in that period when it has refcount 0.
No, it wouldn't - the use of the value as a return value counts as a reference. It's exactly the same as any other function that returns a brand-new value. ChrisA
On Thu, 12 Mar 2020 at 18:19, Chris Angelico <rosuav@gmail.com> wrote:
No, it wouldn't - the use of the value as a return value counts as a reference. It's exactly the same as any other function that returns a brand-new value.
So the memory of that object will never free... since there's a reference that can't be deleted, until the current scope is not finished. This in practice will break `del variable`
On Fri, Mar 13, 2020 at 4:39 AM Marco Sulla <mail.python.org@marco.sulla.e4ward.com> wrote:
On Thu, 12 Mar 2020 at 18:19, Chris Angelico <rosuav@gmail.com> wrote:
No, it wouldn't - the use of the value as a return value counts as a reference. It's exactly the same as any other function that returns a brand-new value.
So the memory of that object will never free... since there's a reference that can't be deleted, until the current scope is not finished. This in practice will break `del variable`
I suspect you're misunderstanding the way references are counted somewhere, but I'm not sure where. An object being used in an expression is referenced by being "currently in use" and won't be disposed of (in CPython, it's on the internal stack of objects that's used by the evaluator), but you don't have to "delete" this reference in any way. The action of deleting a *name* is not the same as disposing of an *object*. You can consider "del x" to be very similar to "x = None", except that instead of rebinding to some other object, it unbinds x altogether. Whatever behaviour you would expect of "x = None" in terms of the previous value for x, the same will be true of "del x". ChrisA
On Thu, 12 Mar 2020 at 19:05, Chris Angelico <rosuav@gmail.com> wrote:
The action of deleting a *name* is not the same as disposing of an *object*. You can consider "del x" to be very similar to "x = None", except that instead of rebinding to some other object, it unbinds x altogether.
Exactly. But if `x = None` will return the previous value of x, the refcount of the object will be not decreased, and Python does not free the object memory. An example: (venv) marco@buzz:~/sources$ python Python 3.8.2 (tags/v3.8.2-dirty:7b3ab5921f, Mar 1 2020, 16:16:55) [GCC 9.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.
from sys import getrefcount a = "nfnjgjsbsjbjbsjlbslj" getrefcount(a) 2 a 'nfnjgjsbsjbjbsjlbslj' getrefcount(a) 3
On Fri, Mar 13, 2020 at 5:16 AM Marco Sulla <python-ideas@marco.sulla.e4ward.com> wrote:
On Thu, 12 Mar 2020 at 19:05, Chris Angelico <rosuav@gmail.com> wrote:
The action of deleting a *name* is not the same as disposing of an *object*. You can consider "del x" to be very similar to "x = None", except that instead of rebinding to some other object, it unbinds x altogether.
Exactly. But if `x = None` will return the previous value of x, the refcount of the object will be not decreased, and Python does not free the object memory. An example:
(venv) marco@buzz:~/sources$ python Python 3.8.2 (tags/v3.8.2-dirty:7b3ab5921f, Mar 1 2020, 16:16:55) [GCC 9.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.
from sys import getrefcount a = "nfnjgjsbsjbjbsjlbslj" getrefcount(a) 2 a 'nfnjgjsbsjbjbsjlbslj' getrefcount(a) 3
I don't understand your point. What you're showing here is that the name binding 'a' has a reference to that string, and then getrefcount itself has one (because the value is "in use" by being a parameter), and then you add another one using the REPL's "_" binding, at which point there are three references. What has this to do with the behaviour if something returned a value? The broad idea of "del x" returning a value isn't inherently ridiculous. If you consider name bindings to be like a dictionary (and, in fact, it will often be implemented with one), then "del x" is very similar to "namespace.pop('x')", which returns the value that it's just removed. The semantics, in terms of whether an object can be garbage collected, would be identical. When the interpreter dels a name, it simply removes one reference. ChrisA
On Thu, 12 Mar 2020 at 19:35, Chris Angelico <rosuav@gmail.com> wrote:
I don't understand your point.
Yes, Andrew Barnert already explained me that :)
The broad idea of "del x" returning a value isn't inherently ridiculous
The point is Eric Wieser does not really want this. What he want is something similar to the numpy patch he linked: https://github.com/numpy/numpy/pull/7997 that is an automatic delete of temporary, large objects. For example: z = a + b + c + d will create a temporary (a+b) object and a temporary (a + b) + c object. With that patch, if a, b, c and d are all ndarrays, the temporary objects are discard as soon as they are no more referenced, instead of at the end of the statement. He thought that the change of `del` he proposed will give him that behavior, but this is not true. And I don't think it's very much simple to implement this feature for __all__ Python objects. Maybe for immutable ones, but it should be done one by one, I suppose, since there's not an "immutable iterface".
On Fri, Mar 13, 2020 at 6:53 AM Marco Sulla <mail.python.org@marco.sulla.e4ward.com> wrote:
On Thu, 12 Mar 2020 at 19:35, Chris Angelico <rosuav@gmail.com> wrote:
I don't understand your point.
Yes, Andrew Barnert already explained me that :)
The broad idea of "del x" returning a value isn't inherently ridiculous
The point is Eric Wieser does not really want this. What he want is something similar to the numpy patch he linked: https://github.com/numpy/numpy/pull/7997
that is an automatic delete of temporary, large objects. For example:
z = a + b + c + d
will create a temporary (a+b) object and a temporary (a + b) + c object. With that patch, if a, b, c and d are all ndarrays, the temporary objects are discard as soon as they are no more referenced, instead of at the end of the statement.
They actually ARE already discarded, as can be seen with a simpler example: class Thing: def __init__(self, name): self.name = name print("Creating", self) def __repr__(self): return '%' + self.name + '%' def __add__(self, other): return type(self)(self.name + other.name) def __del__(self): print("Garbage collecting", self) a = Thing("a") b = Thing("b") c = Thing("c") d = Thing("d") z = a + b + c + d print(z) The "ab" object gets disposed of before "abcd" gets created.
He thought that the change of `del` he proposed will give him that behavior, but this is not true. And I don't think it's very much simple to implement this feature for __all__ Python objects. Maybe for immutable ones, but it should be done one by one, I suppose, since there's not an "immutable iterface".
Yeah. The only way you could really optimize this would be to notice that the "ab" object has only one reference, and so it can be mutated. But that would be pretty finicky, and it would depend entirely on refcounting semantics (which are not a guarantee of the language - a mark-and-sweep GC is 100% valid, and wouldn't be able to do this optimization). Optimizing this CAN theoretically be done, but it's probably a lot simpler and cleaner to just use in-place modifications (eg augmented assignment) explicitly. ChrisA
On Thu, 12 Mar 2020 at 21:22, Chris Angelico <rosuav@gmail.com> wrote:
They actually ARE already discarded
0____O You're right. So *how* can juliantaylor said he measured a speedup of 2x for large ndarrays? He added also benchmarks, that are still in numpy code. Furthermore he stated that what he done it's done by Python also for strings. Maybe it's an old "feature"? Since it seems this happens for _every_ object. I tried to install numpy version 1.14.0rc1, the latest version before the patch, but it's not compatible with python 3.8, since PyThreadState was changed in Python 3.7 (hey, what about "no more breaking changes?" :-P)
On Mar 12, 2020, at 14:32, Marco Sulla via Python-ideas <python-ideas@python.org> wrote:
On Thu, 12 Mar 2020 at 21:22, Chris Angelico <rosuav@gmail.com> wrote:
They actually ARE already discarded
0____O You're right. So *how* can juliantaylor said he measured a speedup of 2x for large ndarrays?
Because Julian’s patch has nothing to do with discarding the temporary references. It has to do with knowing that an array is safe to reuse. Having a refcount of 1 is not sufficient to know that for arbitrary Python objects; it is apparently sufficient for numpy arrays if you can prove that there’s nothing but Python code and numpy C code on the stack that might have touched them. So Julian wrote a patch that can check that by walking the C stack. (In fact, I’m not sure even that much is really true. If you use ctypes.pythonapi to call the PyNumber functions from Python, can you trick numpy into incorrectly reusing an array? But I’m guessing they thought about that and decided that’s worth breaking; people who screw with objects at the C level from Python had better know exactly what they’re doing or they get what’s coming to them.)
He added also benchmarks, that are still in numpy code. Furthermore he stated that what he done it's done by Python also for strings.
Yes, but only for strings. Because they are immutable (once shared with the interpreter) and have no reference-borrowing APIs, strings with 1 refcount are safe to reuse, so CPython can try to expand them in-place instead of creating a new string, and everything works. That isn’t true for arbitrary Python objects, only for strings (and a small number of other builtin types that presumably aren’t worth optimizing in this way). As I said before, this is not too different from the fact that the compiler may merge two equal string constants into a single object, and the interpreter may intern strings at runtime, and similar things happen for other known safe immutable types like int, but none of this works for arbitrary Python objects. `"spam" is "spam"` won’t break anything, so the compiler can make that true, but it had better not make `[1] is [1]` true, right? So pointing out that strings have a particular optimization does not argue that all objects should have this optimization.
On Thu, 12 Mar 2020 at 23:55, Andrew Barnert <abarnert@yahoo.com.via.e4ward.com> wrote:
Because Julian’s patch has nothing to do with discarding the temporary references. It has to do with knowing that an array is safe to reuse.
Sorry, but I do not understand. Are you saying that the patch does not create a new temporary ndarray, but modifies in-place the old temporary ndarray? And that he knows that is temporary because refcount is 1? Anyway, I modified a little the Chris' code class Thing: def __init__(self, name): self.name = name print("Creating", self) def __repr__(self): return f"{self.__class__.__name__}({self.name})" def __add__(self, other): return self.__class__(self.name + other.name) def __del__(self): print("Deleting", self, " - ID:", id(self)) a = Thing("a") b = Thing("b") c = Thing("c") d = Thing("d") print("######## Before statement") z = a + b + c + d print("######## After statement") Result: Creating Thing(a) Creating Thing(b) Creating Thing(c) Creating Thing(d) ######## Before statement Creating Thing(ab) Creating Thing(abc) Deleting Thing(ab) - ID: 140185716001088 Creating Thing(abcd) Deleting Thing(abc) - ID: 140185715998976 ######## After statement Deleting Thing(a) - ID: 140185717096992 Deleting Thing(b) - ID: 140185717130192 Deleting Thing(c) - ID: 140185717176784 Deleting Thing(d) - ID: 140185715998880 Deleting Thing(abcd) - ID: 140185716001088 As you can see, Thing(abcd) has the same id of Thing(ab). So what Eric Wieser wanted is already implemented in Python, for temporary objects.
Little test:
from itertools import combinations [id(v) for v in combinations(range(10), 1)] [140347856824784, 140347856101328, 140347856824784, 140347856101328, 140347856824784, 140347856101328, 140347856824784, 140347856101328, 140347856824784, 140347856101328] for v in combinations(range(10), 1): ... print(id(v)) ... 140347856706048 140347856824784 140347856706048 140347856824784 140347856706048 140347856824784 140347856706048 140347856824784 140347856706048 140347856824784
As you can see, the temporary object is not immediately removed, but it's removed after the reassignment. This was not very clear in Eric Wieser's example, since the size of the iterable was too short. I suppose Python, as Andrew Barnert suggested, can immediately discard the old object for comprehensions. But, really, I don't see a very big advantage. Only if the element of the iterable are __really__ big it could be a little improvement, but I suppose it will slow down all the other comprehensions (as Barnert already pointed out).
On 13/03/20 12:30 pm, Marco Sulla via Python-ideas wrote:
As you can see, Thing(abcd) has the same id of Thing(ab). So what Eric Wieser wanted is already implemented in Python, for temporary objects.
This is probably an accident. It's not really re-using an object, it's just that the object created for a+b+c+d happens to land in the same block of memory that was freed when a+b became unreferenced. It's not actually saving the cost of a memory allocation the way the numpy optimisation does. -- Greg
On Thu, Mar 12, 2020 at 2:32 PM Marco Sulla via Python-ideas <python-ideas@python.org> wrote:
On Thu, 12 Mar 2020 at 21:22, Chris Angelico <rosuav@gmail.com> wrote:
They actually ARE already discarded
0____O You're right. So *how* can juliantaylor said he measured a speedup of 2x for large ndarrays?
I think that currently you have roughly the following situation with abcd = a+b+c+d: temp_ab = a+b (with a and b having refcounts of 2 or more) temp_abc = temp_ab+c (with temp_ab having a refcount of 1, and c a refcount of 2 or more) del temp_ab abcd = temp_abc+d (with temp_abc having a refcount of 1, and d a refcount of 2 or more) del temp_abc The temp_ab+c and temp_abc+d computations can reuse the temporary arrays, but only if you walk the stack to verify that there are no hidden references, which is slow, non-portable and "terrifying" to quote one comment on numpy bug #7997. If you can't reuse the arrays, then you have to allocate a third hunk of RAM for the result, which is slower. If there were better information about "temporariness", then the temporaries could be reused without the expensive and complicated stack walk, which would allow this optimization to benefit smaller arrays. If there were del expressions, then (del a)+b+c+d would allow reuse in the a+b expression as well - but only if you walked the stack or had temporariness information. So I think that del expressions are orthogonal to the stack tracing hack in terms of optimization potential. C++11 introduced temporariness information through the type system with rvalue references. I don't know how you'd do it in Python. Of course, you can get the same effect in Python right now with a += b; a += c; a += d; abcd = a; del a
He thought that the change of del he proposed will give him that behavior, but this is not true.
Unless I'm forgetting part of the conversation, that's not true. Note that the numpy patch is merged. Today, you get the optimization with `z = a + b + c + d`. What you don't get is the same optimization if you use: ``` ab = a + b abc = ab + c z = abc + d ``` The language feature I propose is to allow you to _keep_ the optimization that was present in `z = a + b + c + d`, but write it as ``` ab = a + b abc = (del ab) + c z = (del abc) + d ```
On Fri, Jun 12, 2020 at 4:55 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
He thought that the change of del he proposed will give him that behavior, but this is not true.
Unless I'm forgetting part of the conversation, that's not true. Note that the numpy patch is merged. Today, you get the optimization with `z = a + b + c + d`. What you don't get is the same optimization if you use: ``` ab = a + b abc = ab + c z = abc + d ``` The language feature I propose is to allow you to _keep_ the optimization that was present in `z = a + b + c + d`, but write it as ``` ab = a + b abc = (del ab) + c z = (del abc) + d ```
But does that mean that typing `del x` at the REPL will print the value of `x`? That seems wrong. If `del x` is an expression that returns the value of `x` (and then unbinds it), then the REPL would seem to have no choice but to print the returned value -- the REPL has no indication that it came from `del`. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Yes, it would mean that, which is particularly nasty considering the fact that `_` in python and `Out` in IPython would then acquire the very reference that was being deleted. I think that my proposed change would be unacceptable without addressing this problem. To resolve this, `del` could be made both a statement and an expression. The parsing rules would be very similar to how the walrus operator is handled today, but instead of being a SyntaxError in statement mode, it would simply produce a different ast node: * `del x` -> `Delete(x)` (as `x := 0` -> SyntaxError) * `(del x)` -> `Expr(DeleteExpr(x))` (as `(x := 0)` -> `Expr(NamedExpr(...))`) Eric On Fri, 12 Jun 2020 at 17:21, Guido van Rossum <guido@python.org> wrote:
On Fri, Jun 12, 2020 at 4:55 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
He thought that the change of del he proposed will give him that behavior, but this is not true.
Unless I'm forgetting part of the conversation, that's not true. Note that the numpy patch is merged. Today, you get the optimization with `z = a + b + c + d`. What you don't get is the same optimization if you use: ``` ab = a + b abc = ab + c z = abc + d ``` The language feature I propose is to allow you to _keep_ the optimization that was present in `z = a + b + c + d`, but write it as ``` ab = a + b abc = (del ab) + c z = (del abc) + d ```
But does that mean that typing `del x` at the REPL will print the value of `x`? That seems wrong. If `del x` is an expression that returns the value of `x` (and then unbinds it), then the REPL would seem to have no choice but to print the returned value -- the REPL has no indication that it came from `del`.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Yes, that's reasonable, and the PEG parser supports this (just put the `del_stmt` rule before the `expression` rule). I'm not sponsoring your proposal though -- you're going to have to convince other core devs (and ultimately the Steering Council) that this is worth it. Once you have a core dev willing to sponsor, they can help you write a PEP. On Fri, Jun 12, 2020 at 9:36 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Yes, it would mean that, which is particularly nasty considering the fact that `_` in python and `Out` in IPython would then acquire the very reference that was being deleted. I think that my proposed change would be unacceptable without addressing this problem.
To resolve this, `del` could be made both a statement and an expression. The parsing rules would be very similar to how the walrus operator is handled today, but instead of being a SyntaxError in statement mode, it would simply produce a different ast node:
* `del x` -> `Delete(x)` (as `x := 0` -> SyntaxError) * `(del x)` -> `Expr(DeleteExpr(x))` (as `(x := 0)` -> `Expr(NamedExpr(...))`)
Eric
On Fri, 12 Jun 2020 at 17:21, Guido van Rossum <guido@python.org> wrote:
On Fri, Jun 12, 2020 at 4:55 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
He thought that the change of del he proposed will give him that behavior, but this is not true.
Unless I'm forgetting part of the conversation, that's not true. Note that the numpy patch is merged. Today, you get the optimization with `z = a + b + c + d`. What you don't get is the same optimization if you use: ``` ab = a + b abc = ab + c z = abc + d ``` The language feature I propose is to allow you to _keep_ the optimization that was present in `z = a + b + c + d`, but write it as ``` ab = a + b abc = (del ab) + c z = (del abc) + d ```
But does that mean that typing `del x` at the REPL will print the value of `x`? That seems wrong. If `del x` is an expression that returns the value of `x` (and then unbinds it), then the REPL would seem to have no choice but to print the returned value -- the REPL has no indication that it came from `del`.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Mar 12, 2020, at 10:52, Marco Sulla via Python-ideas <python-ideas@python.org> wrote:
On Thu, 12 Mar 2020 at 18:19, Chris Angelico <rosuav@gmail.com> wrote:
No, it wouldn't - the use of the value as a return value counts as a reference. It's exactly the same as any other function that returns a brand-new value.
So the memory of that object will never free... since there's a reference that can't be deleted, until the current scope is not finished. This in practice will break `del variable`
No, because the return value only lives until (effectively) the end of the statement. A statement has no value, so the effect of an expression statement is to immediately discard whatever the value of the expression was. (In CPython this is an explicit stack pop.) Except for the case of interactive mode, of course, where an expression statement binds the value to the _ variable.
On Thu, 12 Mar 2020 at 19:10, Andrew Barnert <abarnert@yahoo.com.via.e4ward.com> wrote:
No, because the return value only lives until (effectively) the end of the statement. A statement has no value, so the effect of an expression statement is to immediately discard whatever the value of the expression was. (In CPython this is an explicit stack pop.)
Except for the case of interactive mode, of course, where an expression statement binds the value to the _ variable.
Okay, so it seems I was confused by the REPL feature. Anyway, since the object is not discarded until the expression statement ends, it has no effect on speed. On the contrary, for what I have read, the numpy patch removes the temporary ndarrays immediately. This speeds up calcula with large ndarrays. So there's no need to change the `del` behaviour. Python could implement something similar to the numpy patch for large immutables. The problem is: how?
On Mar 12, 2020, at 11:23, Marco Sulla <mail.python.org@marco.sulla.e4ward.com> wrote:
On Thu, 12 Mar 2020 at 19:10, Andrew Barnert <abarnert@yahoo.com.via.e4ward.com> wrote:
No, because the return value only lives until (effectively) the end of the statement. A statement has no value, so the effect of an expression statement is to immediately discard whatever the value of the expression was. (In CPython this is an explicit stack pop.)
Except for the case of interactive mode, of course, where an expression statement binds the value to the _ variable.
Okay, so it seems I was confused by the REPL feature.
Anyway, since the object is not discarded until the expression statement ends, it has no effect on speed.
Well, in many cases, I think it should be possible to tell from either the AST or the bytecode that a value that’s popped and discarded could have been popped earlier. Compilers for lots of other languages routinely do this kind of static analysis even though it’s a lot more complicated in, say, C++ than in Python. But I don’t think that would actually help this case. Each temporary is immediately consumed by the next opcode, not discarded, so the problem must be that the __add__ method (or rather the C API slot) can’t tell that one of its operands is a temporary and can therefore be reused if that’s helpful. And as I understand it (from a quick scan) the reason it can’t tell isn’t that the refcount isn’t 1 (which is something CPython could maybe fix if it were a problem, but it isn’t). Rather, it is already 1, but a refcount of 1 doesn’t actually prove a temporary value, because there are PyNumber C API functions that may be using the value without incref’ing it that end up calling the numpy code. Which is something I don’t think CPython could fix without removing those API functions. Anyway, I think you’re right that del couldn’t help here. Even if it could, surely the solution to “a+b+c+d is slow” can’t be “just write (del (del a+b)+c)+d and it’ll be fast” if we want anyone to use Python without laughing and/or cursing.
On the contrary, for what I have read, the numpy patch removes the temporary ndarrays immediately. This speeds up calcula with large ndarrays. So there's no need to change the `del` behaviour. Python could implement something similar to the numpy patch for large immutables.
But they’re not actually immutable values. They’re mutable values that we happen to know aren’t being mutated by any Python code, but might still be mutated by arbitrary C API code farther up the stack. How could CPython detect that, except by the same kind of hack that numpy does to see if there are any C API calls on the stack and assume that any of them might have mutated any values they could see? I suppose it could track calls out to C functions as they happen and mark every value that was live before the call, and then instead of numpy checking refs==1 is could check refs==1 && !c_flagged, and then it wouldn’t need the C frame hackery. But that seems like it would slow down everything in exchange for occasionally helping numpy and a few other C extension libs. And I’m not sure it would work anyway. Values created by C extensions have to start off flagged, but then values created and used internally by numpy won’t look temporary to numpy, right?
On Thu, 12 Mar 2020 at 19:52, Andrew Barnert <abarnert@yahoo.com.via.e4ward.com> wrote:
I suppose it could track calls out to C functions as they happen and mark every value that was live before the call, and then instead of numpy checking refs==1 is could check refs==1 && !c_flagged, and then it wouldn’t need the C frame hackery. But that seems like it would slow down everything in exchange for occasionally helping numpy and a few other C extension libs.
The author of the patch says this is already implemented in string concatenation in Python itself. Maybe we should look at the implementation, but I don't know where and what to search.
On Mar 12, 2020, at 12:02, Marco Sulla <mail.python.org@marco.sulla.e4ward.com> wrote:
On Thu, 12 Mar 2020 at 19:52, Andrew Barnert <abarnert@yahoo.com.via.e4ward.com> wrote:
I suppose it could track calls out to C functions as they happen and mark every value that was live before the call, and then instead of numpy checking refs==1 is could check refs==1 && !c_flagged, and then it wouldn’t need the C frame hackery. But that seems like it would slow down everything in exchange for occasionally helping numpy and a few other C extension libs.
The author of the patch says this is already implemented in string concatenation in Python itself. Maybe we should look at the implementation, but I don't know where and what to search.
Strings, unlike numpy arrays, are immutable. (You can’t mutate them from Python; you can mutate them from the C API but it’s against the rules to do so after they’ve been shared with the interpreter.) And strings, unlike numbers (which numpy arrays act as), don’t have any C API functions that steal references. So they don’t require any kind of tracking or stack-fumbling. For a simpler parallel: the CPython compiler can merge two strings into a single object, and the CPython interpreter can intern strings at runtime, because this is guaranteed to be safe, but obviously neither can do the same thing for arbitrary objects.
And as I understand it (from a quick scan) the reason it can’t tell isn’t that the refcount isn’t 1 (which is something CPython could maybe fix if it were a problem, but it isn’t). Rather, it is already 1, but a refcount of 1 doesn’t actually prove a temporary value
This is at odds with my undestanding, which is that the reason it can’t tell _is_ that the refcount isn’t 1. When you do `(b * c) + d`, `(b * c).__add__(d)` is passed `self` with a refcount of 1 (on the stack). I believe this hits the optimization today in numpy. When you do `bc + d`, `bc.__add__(d)` is passed `self` with a refcount of 2 (one on the stack, and one in `locals()`). My proposal of del is that `(del bc) + d` would enter `bc.__add__(d)` with `self` passed with a refcount of 1.
Even if it could, surely the solution to “`a+b+c+d` is slow” can’t be “just write `(del (del a+b)+c)+d` and it’ll be fast”
That would be a syntax error under my proposal. `del` remains an operator only on names, not expressions.
On Mar 12, 2020, at 12:33, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
And as I understand it (from a quick scan) the reason it can’t tell isn’t that the refcount isn’t 1 (which is something CPython could maybe fix if it were a problem, but it isn’t). Rather, it is already 1, but a refcount of 1 doesn’t actually prove a temporary value
This is at odds with my undestanding, which is that the reason it can’t tell _is_ that the refcount isn’t 1. When you do `(b * c) + d`, `(b * c).__add__(d)` is passed `self` with a refcount of 1 (on the stack). I believe this hits the optimization today in numpy. When you do `bc + d`, `bc.__add__(d)` is passed `self` with a refcount of 2 (one on the stack, and one in `locals()`).
The GitHub issue you referenced is about optimizing this: r = -a + b * c**2 The explanation of the problem is that “Temporary arrays generates in expressions are expensive”. And the detail says:
The tricky part is that via the C-API one can use the PyNumber_ directly and skip increasing the reference count when not needed.
In other words, in their example, in the `b * c**2`, even though both `b` and the result of `c**2` pass into `type(b).__add__` with a refcount of 1, they can’t actually know that it’s safe to treat either one as a temporary, because if the caller of `__add__` or anything higher up the stack is a C extension function, it could have a still-alive-in-C PyObject* that refers to one of those values despite the refcount being 1. That’s why they have to walk the stack to see if there are any C functions (besides numpy functions that they know play nice): if not, they can assume refcount==1 means temporary, but otherwise, they can’t.
My proposal of del is that `(del bc) + d` would enter `bc.__add__(d)` with `self` passed with a refcount of 1.
So your proposal doesn’t help their problem. At best, it gives them the same behavior they already have, which they still need to optimize.
On Thu, 12 Mar 2020 at 17:20, Chris Angelico <rosuav@gmail.com> wrote:
What is to stop a Python implementation (not CPython, maybe, but some other one) from discarding the object in x as soon as it's unreferenced by the (del x) expression? Then y wouldn't be able to hold that value. After all, in theory, after the first line our object has a refcount of 1, then the (del x) changes that refcount to 0, then y = (the value) changes it back to 1. But it could be discarded in that period when it has refcount 0.
No, it wouldn't - the use of the value as a return value counts as a reference. It's exactly the same as any other function that returns a brand-new value.
Sorry, I my error. What I was trying to get at is that the semantics are subtle and very closely tied to precise details of how CPython's refcounting works. And I then proceeded to either undermine or prove my own argument (I can't tell which!) by misunderstanding refcount semantics myself :-) I suspect that at least part of my point stands, though - describing the semantics of the proposed "del expression" in a way that (a) can be understood independently of the CPython implementation, and (b) ensures that CPython actually implements the intended semantics of "allow the refcount-1 optimisation to apply", is likely to be extremely difficult. But I'll bow out of the discussion at this point - there's too many posts for me to keep up with it. Paul
On Thu, 12 Mar 2020 at 17:42, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
As an example of these optimizations being valuable, see https://github.com/numpy/numpy/pull/7997, which claims the optimization I described at the beginning resulted in a 1.5x-2x speedup.
This speedup is observed only for large objects. I quote:
Heuristicly it seems to be beneficial around 400kb array sizes (which is about the typical L2 cache size).
I think anyway that this could be a good idea, without changing `del`. The problem is: how can Python knows that a generic operation will create a temporary variable? The change you linked was done specifically for ndarrays, that you know they are immutable. How can Python know that an object is an immutable?
It looks like actually this can be be built as a function today: def move(name): return inspect.currentframe().f_back.f_locals.pop(name) Which works as follows, but it feels awkward to pass variable names by strings (and will confuse linters): >>> for v in itertools.combinations([1, 2, 3], 1): ... print(id(move("v"))) 1718903397008 1718903397008 1718903397008 Eric
On Thu, 12 Mar 2020 at 16:57, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
It looks like actually this can be be built as a function today:
def move(name): return inspect.currentframe().f_back.f_locals.pop(name)
Which works as follows, but it feels awkward to pass variable names by strings (and will confuse linters):
>>> for v in itertools.combinations([1, 2, 3], 1): ... print(id(move("v"))) 1718903397008 1718903397008 1718903397008
I'm both impressed and horrified by this :-) Paul
On Thu, 12 Mar 2020 at 14:09, Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 12 Mar 2020 at 16:57, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
It looks like actually this can be be built as a function today:
def move(name): return inspect.currentframe().f_back.f_locals.pop(name)
Which works as follows, but it feels awkward to pass variable names by strings (and will confuse linters):
>>> for v in itertools.combinations([1, 2, 3], 1): ... print(id(move("v"))) 1718903397008 1718903397008 1718903397008
I'm both impressed and horrified by this :-)
Even - it just works because it is on the top-level scope of a module - and would not work if called from a function, where `locals()` modification actually do not affect the local variables. (And that behavior that was sort of "in the air" was well documented and defined prior to Python 3.8 in PEP 558) But for the module-level-scope locals() == globals() is just a plain dictionary, not one working as a proxy to slotted local variables (And in a class body, locals() is watever is returned from the metaclass `__prepare__` method )
Paul _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/42KOI2... Code of Conduct: http://python.org/psf/codeofconduct/
On Mar 12, 2020, at 08:08, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
TL;DR: should we make `del x` an expression that returns the value of `x`.
I agree that a “move” would have to be a keyword rather than a function, and that adding a new keyword is too high a bar, and that “del” is the best option of the existing keywords, since an expression statement with a del expression would be moving to an ignored temporary, having the same semantics as the del statement. The first problem is that it doesn’t actually have the same semantics, even visibly to the user. For example, in interactive mode, evaluating an expression statement causes the result of the expression to be bound to the _ variable. So del x at the REPL will no longer release the value; you’ll have to del x and then rebind _ (by, say, evaluating None). Also, you need to think through what happens with a del expression inside all kinds of contexts that currently can’t have del—eval, lambdas and comprehensions (can I del something from the outer scope?), etc.; just defining what it does in the specific case of deleting the loop variable isn’t sufficient. I think this could all be worked out, but I’m not sure this is needed. Is there a realistic use case where this optimization gives you a significant benefit? I tried running timeit on your loop with and without the del and it’s virtually undetectable. If I remove the print so the loop does nothing at all, then it’s a bit over 2% faster with the del. But if I then use a different iterator that doesn’t optimize and reuse the same value this way, it’s about 4% slower with the del (presumably the cost of executing the del itself?). So, it seems like this del expression is something that would be used rarely, when you need to micro-optimize a loop and 2% one way or the other matters. Meanwhile, wouldn’t it slow down all existing code that uses del? Instead of a del statement that just removes a binding, there will be a del expression that removes a binding and pushes a value on the stack, which, when used as an expression statement, will be popped off and ignored. This also means that the value, instead of getting decref’d once, instead has to be incref’d, decref’d, and then decref’d again later, and even if you can optimize out the extra ref twiddle in the middle, being released later (at least one more pass through the eval loop) is going to slow down some code. And that’s before you even look at the cases of interactive mode, etc. Also, experience with other languages makes me worry that adding move to an existing language for optimization purposes rarely goes well. Since you mentioned std::move, I assume you’re familiar with C++, so you must have spent time debugging code where you thought a value was being moved but it was actually being copied, or where some template instantiation won’t compile because it’s trying to do an A(A&) followed by a B(A&&) instead of a B(A&) but it can’t tell you what the problem is, or where changing a spam(string) to spam(const string&) to speed things up on one implementation (by turning two copies into one) ended up slowing things down far worse on another (by turning zero copies into one), and so on. Compare Rust, where moving values around is easy to write, read, and debug because the language and stdlib were designed around move, rather than it being retrofitted in. Obviously you’re not proposing to turn Python into a big mess like C++, but you are proposing to add complexity for something that can only be a micro optimization that we can only hope isn’t going to be used pervasively.
## Motivation
I noticed yesterday that `itertools.combinations` has an optimization for when the returned tuple has no remaining ref-counts, and reuses it - namely, the following code:
for v in itertools.combinations([1, 2, 3], 1): ... print(id(v)) ... del v # without this, the optimization can't take place 2500926199840 2500926199840 2500926199840
will print the same id three times. However, when used as a list comprehension, the optimization can't step in, and I have no way of using the `del` keyword
[id(v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926199072, 2500926200992]
It looks like it’s still reusing the tuple every _other_ time. Presumably this is just the object allocator stepping in rather than anything tricky that itertools does? But if that’s good enough to make it only 2% slower in your use case and probably even closer in realistic ones, maybe that implies that if anything is needed here in the first place, we should look for an automated way that applies everywhere rather than a user-driven way that applies narrowly. What if a for loop, instead of nexting the iterator and binding the result to the loop variable, instead unbound the loop variable, nexted the Iterator, and bound the result to the loop variable? Presumably this would slow down all loops a very tiny bit (by the cost of checking whether a name is bound) but speed up some—and it would speed up this one even more than your explicit del could (because it doesn’t need an whole extra opcode to loop over, if nothing else). Maybe there’s a way a clever optimization in the compiler and/or interpreter could figure out when it is and isn’t worth doing?
## Optional extension
For consistency, `x = (del foo.attr)` and `x = (del foo[i])` could also become legal expressions, and `__delete__`, `__delattr__`, and `__delitem__` would now have return values. Existing types would be free to continue to return `None`.
Returning None seems like it would be confusing. Your code either expects del to return the value, or it doesn’t; you can’t write code that expects del to sometimes return the value but sometimes doesn’t. But not allowing it at all seems even more confusing. Python grammar doesn’t have a bunch of different rules for “target” for different contexts, and the few special cases it does have, people always want to un-special them the first time they run into them.
On Thu, Mar 12, 2020 at 10:43 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
What if a for loop, instead of nexting the iterator and binding the result to the loop variable, instead unbound the loop variable, nexted the Iterator, and bound the result to the loop variable? Presumably this would slow down all loops a very tiny bit (by the cost of checking whether a name is bound) but speed up some—and it would speed up this one even more than your explicit del could (because it doesn’t need an whole extra opcode to loop over, if nothing else). Maybe there’s a way a clever optimization in the compiler and/or interpreter could figure out when it is and isn’t worth doing?
For a "statement" for-loop this would change the semantics, since (even though not everyone likes this behavior) Python currently requires that the loop control variable retain the last value assigned to it. But in comprehensions it might work, since the comprehension control variable lives in an inner scope (at least if it's a simple variable). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
This was exactly what I was thinking of when I said
This is unavoidable in a for loop without breaking existing code, but I think could (and should?) be changed in a list comprehension
since the comprehension control variable
Unfortunately, I was wrong in the last half of this statement. lives in an inner scope The inner scope can still be closed over via a lambda, such as in the following. I think that makes changing the semantics of either a breaking change. >>> funcs = [lambda: i for i in range(2)] >>> [f() for f in funcs()] [1, 1] # often not intended, but demonstrates the last-value-assigned semantics Eric
On Thu, 12 Mar 2020 at 18:42, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
What if a for loop, instead of nexting the iterator and binding the result to the loop variable, instead unbound the loop variable, nexted the Iterator, and bound the result to the loop variable?
I missed that. But I do not understand how this can speed up any loop. I mean, if Python do this, it does an additional operation at every loop cycle, the unbounding. How can it be faster? Furthermore, maybe I can be wrong, but reassigning to a variable another object does not automatically unbound the variable from the previous object? For what I know, Python is a "pass by value", where the value is a pointer, like Java. Indeed any python variable is a PyObject*, a pointer to a PyObject. When you assign a new object to a variable, what are you really doing is change the value of the variable from a pointer to another. So the variable now points to a new memory location, and the old object has no more references other then itself. Am I right?
On Mar 12, 2020, at 13:22, Marco Sulla <python-ideas@marco.sulla.e4ward.com> wrote:
On Thu, 12 Mar 2020 at 18:42, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
What if a for loop, instead of nexting the iterator and binding the result to the loop variable, instead unbound the loop variable, nexted the Iterator, and bound the result to the loop variable?
I missed that. But I do not understand how this can speed up any loop. I mean, if Python do this, it does an additional operation at every loop cycle, the unbounding. How can it be faster?
Because rebinding includes unbinding if it was already bound, so the unbinding happens either way. Basically, instead of this pseudocode: push it->tp_iternext(it) on the stack if f_locals[idx]: decref f_locals[idx] f_locals[idx] = NULL f_locals[idx] = stack pop incref f_locals[idx] … you’d do this: if f_locals[idx]: decref f_locals[idx] f_locals[idx] = NULL push it->tp_iternext(it) on the stack f_locals[idx] = stack pop incref f_locals[idx] No extra cost (or, if you don’t optimize it out, the only extra cost is checking whether the variable is already bound an extra time, which is just checking a pointer in an array against NULL), and the benefit is that the object is decref’d before you call tp_iter. Why does this matter? Well, that’s the whole point of the proposal. A decref may reduce the count to 0. In this case, the object is freed before tp_iternext is called, so if tp_iternext needed to do a big allocation for each value, the object allocator will probably reuse the last one instead of going back to the heap. A decref may also reduce the count to 1, if the iterator is storing a copy of the same object internally. In general this doesn’t help, but if the iterator is written in C and it knows the object is a special known-safe type like tuple (which is immutable and has no reference borrowing APIs) it can reuse it safely. As permutations apparently does. All that being said, as Guido explained, I don’t think my idea is workable. I think what we really want is to release the object before tp->iternext iff it’s not going to raise StopIteration, and there’s no way to predict that in advance without solving the halting problem, so…
Furthermore, maybe I can be wrong, but reassigning to a variable another object does not automatically unbound the variable from the previous object? For what I know, Python is a "pass by value", where the value is a pointer, like Java.
That’s misleading terminology. Java uses “pass by value” and Ruby uses “call by reference” to mean doing the same thing Python does, so describing it as either “by value” or “by reference” is just going to confuse as many people as it helps. Barbara Liskov explained why it was a meaningless distinction for languages that aren’t sufficiently ALGOL-like back around 1980, and I don’t know why people keep confusingly trying to force languages to fit anyway 40 years later. Better to just describe what it does.
Indeed any python variable is a PyObject*, a pointer to a PyObject.
No. Any Python _value_ is a PyObject*. It doesn’t matter whether the value is a temporary, stored in a variable, stored in a list element, stored in 17 different variables, whatever. And that’s all specific to the CPython implementation. In Jython or PyPy, a Python value is a Java object or a Python object in the underlying Python. So what’s a variable? Well, Python doesn’t have variables in the same sense as a language like C++. It has namespaces, that map names to values. A variable in Python’s syntax is just a lookup of a name in a namespace in Python’s semantics. And a namespace is in general just a dictionary. That’s pretty much all there is to variables. (There’s an optimization for locals, which are converted into indexes into a C array of values stored on the frame object instead, which is why we have all those issues with locals() and exec. And there’s also the cell thing for closure variables. And there’s nothing stopping you from replacing a namespace’s __dict__ with an object of a different type that does almost anything you can imagine. But ignore all of that.) If you understand dicts, you understand variables, and you don’t need to mention PyObject* to understand dicts (unless you want to use them from the C API).
When you assign a new object to a variable, what are you really doing is change the value of the variable from a pointer to another.
You’re just updating the namespace dict to map the same key to a different value.
So the variable now points to a new memory location, and the old object has no more references other then itself. Am I right?
Well, the dict entry now holds a new value, and the old value has one reference fewer, which may be 0, in which case it’s garbage and can be cleaned up. It doesn’t hold a reference to itself (except in special cases, e.g., `self.circular = self` or `xs.append(xs)`). In CPython, where values are PyObject* under the covers, the hash buckets in a dict include PyObject* slots for the key and value, and the dict’s __setitem__ takes care of incref’ing the stored value, and incref’ing the key if it’s new or dec’refing the old value if it’s a replacement. And CPython knows to delete an object as soon as a decref brings it to 0 refs. (What about fastlocals? The code to load and store variables to fastlocal slots does the same incref and decref stuff, but there’s no keys to worry about, because the compiler already turned them into indexes into an array. And an unbound local variable is a NULL in the array, as opposed to just not being in the dict. And if you want to dig into cells, they’re not much more complicated.) But again, that’s all CPython specific. In, say, Jython, the hash buckets in a dict just have two Java objects for the key and value, which aren’t pointers (although under another set of covers your JVM is probably implemented in C using pointers all over the place), and nobody’s tracking refcounts; the JVM scans memory whenever it feels like it and deletes any objects (including Python ones) that aren’t referenced by anyone. This is why any optimizations like permutations reusing the same tuple if the refcount is 1 only make sense for CPython (and only from the C API rather than from Python itself).
Well, so all this discussion is only for freeing _one_ memory location earlier? Seriously... as the other users already said, if someone really need it, he can use normal loops instead of comprehensions and split complex expression so temporary objects are immediately free. Furthermore, 2x speedup observations are done by benchmarks. In the real world, how can you be sure that L2 cache is not already filled up? :-)
On Mar 12, 2020, at 17:47, Marco Sulla <python-ideas@marco.sulla.e4ward.com> wrote:
Well, so all this discussion is only for freeing _one_ memory location earlier?
Basically, yes. And now I get why you were confused—you’re not confused about Python, you’re confused about the proposal, because you expected that there must be more to it than this, and kept trying to figure out what it was. Apologies for missing that. The OP’s motivating use case at the start of the thread was to allow itertools.combinations to keep reusing the same tuple object instead of (almost always) alternating back and forth between two tuple objects, and that’s only ever going to save a few hundred bytes of peak heap use. I don’t think the OP was worried about saving those bytes so much as saving the cost of calling the tuple C-API destructor and constructor over and over. But it’s still definitely a micro-optimization that would only occasionally be needed. In the numpy case that came up later, the object in question can be a lot bigger. People do regularly run numpy calculations on arrays that take up 40% of their RAM, so releasing one array one step earlier can mean the difference between having 80% of your RAM in use at a time and 120%, which is obviously a big deal. But I don’t think the proposal actually helps there in the same way as it helps for the combinations tuple.
Seriously... as the other users already said, if someone really need it, he can use normal loops instead of comprehensions and split complex expression so temporary objects are immediately free.
Well, it’s actually the opposite problem: temporary objects are already sort of free; breaking them up into separate statements means you have to assign the intermediate values, which means they’re no longer temporary and therefore no longer free. A del expression would give you a more convenient way to regain the better temporary behavior after refactoring into separate statements and giving the intermediates names. But I agree anyway. The fact that you have to be more verbose when you need to write a specific micro-optimization doesn’t seem like a huge problem to me.
Furthermore, 2x speedup observations are done by benchmarks. In the real world, how can you be sure that L2 cache is not already filled up? :-)
I suspect the numpy benchmarks are applicable in a lot of real life numpy code. But those benchmarks are for a patch to numpy that’s only vaguely similar to this proposal, and neither requires it nor could benefit from it, so…
On Thu, Mar 12, 2020 at 10:43 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Also, you need to think through what happens with a del expression inside all kinds of contexts that currently can’t have del—eval, lambdas and comprehensions (can I del something from the outer scope?), etc.; just defining what it does in the specific case of deleting the loop variable isn’t sufficient.
del VAR currently follows the same scoping rules as VAR = EXPR, so I think that a hypothetical (del VAR) expression could and should follow the same scoping rules as (VAR := EXPR). del VAR essentially is an assignment, but of a "null pointer" instead of an object reference.
On Thu, Mar 12, 2020 at 03:02:28PM -0000, Eric Wieser wrote:
TL;DR: should we make `del x` an expression that returns the value of `x`.
## Motivation
I noticed yesterday that `itertools.combinations` has an optimization for when the returned tuple has no remaining ref-counts, and reuses it - namely, the following code:
>>> for v in itertools.combinations([1, 2, 3], 1): ... print(id(v)) ... del v # without this, the optimization can't take place 2500926199840 2500926199840 2500926199840
[...]
My suggestion would be to make `del x` an expression, with semantics "unbind the name `x`, and evaluate to its value". This would allow:
>>> [id(del v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926200992, 2500926200992]
Pity that locals() is so weird, because this works fine in module level code, but not inside comprehensions or functions: py> for v in combinations([1, 2, 3], 1): ... print(id(locals().pop('v'))) ... 3083202636 3083202636 3083202636 Mind you, I think that the cost of calling locals().pop is going to outweigh any hypothetical saving you get from reusing the object. -- Steven
On 12.03.20 16:02, Eric Wieser wrote
`itertools.combinations` is not the only place to make this optimization - parts of numpy use it too, allowing
a = (b * c) + d
to elide the temporary `b*c`. This elision can't happen with the spelling
bc = b * c a = bc + d This is not surprising since the user deliberately created the temporary variable. It's not the responsibility of Numpy to optimize that away (in fact it can't since `bc` might be reused later on).
My suggestion would be to make `del x` an expression, with semantics "unbind the name `x`, and evaluate to its value". This would allow:
>>> [id(del v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926200992, 2500926200992]
and
bc = b * c a = (del bc) + d # in C++, this would be `std::move(bc) + d`
If I wanted to split the computation over multiple lines and yet have it optimized I would just reuse the same (target) name instead of creating a temporary one and then discarding it in the next step: a = b * c a += d If there's many lines to the computation one can place a comment at the top, indicating that the result (`a`) is built incrementally in order to prevent confusion about names.
On 13/03/20 4:02 am, Eric Wieser wrote:
For consistency, `x = (del foo.attr)` and `x = (del foo[i])` could also become legal expressions, and `__delete__`, `__delattr__`, and `__delitem__` would now have return values. Existing types would be free to continue to return `None`.
That would mean you couldn't rely on 'del foo.attr' or 'del foo[item]' doing what you expect, unless you know for sure you're dealing with a type that's been updated to follow the new semantics. -- Greg
I've realized that I've actually seen code use a trick to enable exactly this optimization in the past, without needing language extensions (although I don't remember where). Taking two of my motivating examples from elsewhere in the thread: bc = b*c a = bc + d f = get_f_long_name(x) g = get_g_long_name(del f) h = get_h_long_name(del g) This can be written today with exactly the intended semantics by using lists and pop: bc = [b * c] a = bc.pop() f = [get_f_long_name(x)] g = [get_g_long_name(f.pop())] h = get_h_long_name(g.pop()) I'd argue that this is a little less readable, but it makes the incremental improvement provided by a language change less significant, which makes the change harder to justify.
On Fri, Mar 13, 2020 at 7:10 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Mar 12, 2020 at 03:02:28PM -0000, Eric Wieser wrote:
TL;DR: should we make `del x` an expression that returns the value of `x`.
What would `del x, y, z` return?
A tuple of the former values of x, y, and z. I don't see what else it could logically do. ChrisA
12.03.20 17:02, Eric Wieser пише:
>>> [id(v) for v in itertools.combinations([1, 2, 3], 1)] [2500926200992, 2500926199072, 2500926200992]
Note that the first id is repeated for third item. And if you use larger example
[id(v) for v in itertools.combinations(range(10), 1)] [139701363452336, 139701363451616, 139701363452336, 139701363451616, 139701363452336, 139701363451616, 139701363452336, 139701363451616, 139701363452336, 139701363451616]
you will see that two ids are interleaving. There are many causes of reusing the id. First, the combinations() iterator keeps a reference to a tuple and reuse it if it is free. Second, the tuple allocator keeps a pool of preallocated objects for small tuples. Third, the Python memory allocator keeps a pool of small blocks. Fourth, the libc memory allocator also more likely return the same memory address if repeatedly allocate and free the same amount of memory. All this is optimizations and implementation artifacts. If you are worring that the second tuple produced by combinations() uses more slower cache, you can make a first-level cache in combinations() keeping references to two tuples. Or make the second-level cache more efficient. Of cause if you have evidences that there is a problem and that this solution fixes it. Did you do any benchmarks?
participants (15)
-
Andrew Barnert
-
Andrew Barnert
-
Ben Rudiak-Gould
-
Ben Rudiak-Gould
-
Chris Angelico
-
Dominik Vilsmeier
-
Eric Wieser
-
Greg Ewing
-
Guido van Rossum
-
Joao S. O. Bueno
-
Marco Sulla
-
Marco Sulla
-
Paul Moore
-
Serhiy Storchaka
-
Steven D'Aprano