Decreasing refcount for locals before popping frame

Consider this example code: def test(): a = A() test() Currently, the locals (i.e. `a`) are cleared only after the function has returned: If we attach a finalizer to `a` immediately after the declaration then the frame stack available via `sys._getframe()` inside the finalizer function does not include the frame used to evaluate the function (i.e. with the code object of the `test` function). The nearest frame is that of the top-level module (where we make the call to the function). This is in practical terms no different than: def test(): return A() test() There's no way to distinguish between the two cases even though in the second example, the object is dropped only after the frame (used to evaluate the function) has been cleared. The effect I am trying to achieve is: def test(): a = A() del a Here's a use-case to motivate this need: In Airflow, we're considering introducing some "magic" to help users write: with DAG(...): # some code here That is, without declaring a top-level variable such as `dag`. However, we can't detect the following situation: def create(): with DAG(...) as dag: # some code here create() The DAG is not returned from the function but nevertheless, we can't distinguish between this code and the correct version: def create(): with DAG(...) as dag: # some code here return dag In this case, calling `create` will then "return" the DAG and of course, without a variable assignment, the finalizer will be called – but now we can detect this. I'm thinking that it ought to be possible to clear out `frame->localsplus` before leaving the function frame. I played around with "ceval.c" and only got segfaults. It's complicated machinery :-) Thoughts?

I don't know if there's anything specifically stopping this, but from what I understand, the precise moment that a finalizer gets called is unspecified, so relying on any sort of behavior there is undefined and non-portable. Implementations like PyPy don't always use reference counting, so their garbage collection might get called some unspecified amount of time later. I'm not familiar with Airflow, but would you be able to decorate the create() function to check for good return values? Something like : import functools : : def dag_initializer(func): : @functools.wraps(func) : def wrapper(): : with DAG(...) as dag: : result = func(dag) : del dag : if not isinstance(result, DAG): : raise ValueError(f"{func.__name__} did not return a dag") : return result : return wrapper : : @dag_initializer : def create(dag): : "some code here"

Dennis Sweeney wrote:
I don't know if there's anything specifically stopping this, but from what I understand, the precise moment that a finalizer gets called is unspecified, so relying on any sort of behavior there is undefined and non-portable. Implementations like PyPy don't always use reference counting, so their garbage collection might get called some unspecified amount of time later.
It's unspecified of course for the language as such, but in the specific case of CPython (which we're targeting), I think the refcounting logic is here to stay and generally speaking, can be relied on. Of course some version may come along to break expectations and I suppose we might cross that bridge when we get to it.
I'm not familiar with Airflow, but would you be able to decorate the create() function to check for good return values?
We could but for the most part, people don't define DAGs inside functions – it happens, but it is not the most simple usage pattern. It's not so much about the function itself, but about being able to determine if a DAG was dropped at the top-level of the module. If the frame clearing behavior was changed so that locals were reclaimed before popping the frame, I think the line number (i.e. `f_lineno`) would have to be that of the function definition, i.e. `def test():` in the examples above.

As it has been mentioned there is no guarantee that your variable will even be finalized (or even destroyed) after the frame finishes. For example, if your variable goes into a reference cycle for whatever reason it may not be cleared until a GC run happens (and in some situations it may not even be cleared at any point). The language gives you no guarantees over when or how objects will be finalized or destroyed and any attempt at relying on specific behaviour is deemed to fail because it can change between versions and implementations. On Thu, 28 Apr 2022, 14:14 Malthe, <mborch@gmail.com> wrote:
Consider this example code:
def test(): a = A()
test()
Currently, the locals (i.e. `a`) are cleared only after the function has returned:
If we attach a finalizer to `a` immediately after the declaration then the frame stack available via `sys._getframe()` inside the finalizer function does not include the frame used to evaluate the function (i.e. with the code object of the `test` function).
The nearest frame is that of the top-level module (where we make the call to the function).
This is in practical terms no different than:
def test(): return A()
test()
There's no way to distinguish between the two cases even though in the second example, the object is dropped only after the frame (used to evaluate the function) has been cleared.
The effect I am trying to achieve is:
def test(): a = A() del a
Here's a use-case to motivate this need:
In Airflow, we're considering introducing some "magic" to help users write:
with DAG(...): # some code here
That is, without declaring a top-level variable such as `dag`.
However, we can't detect the following situation:
def create(): with DAG(...) as dag: # some code here
create()
The DAG is not returned from the function but nevertheless, we can't distinguish between this code and the correct version:
def create(): with DAG(...) as dag: # some code here return dag
In this case, calling `create` will then "return" the DAG and of course, without a variable assignment, the finalizer will be called – but now we can detect this.
I'm thinking that it ought to be possible to clear out `frame->localsplus` before leaving the function frame.
I played around with "ceval.c" and only got segfaults. It's complicated machinery :-)
Thoughts? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/D5HCLMN4... Code of Conduct: http://python.org/psf/codeofconduct/

Pablo Galindo Salgado wrote:
As it has been mentioned there is no guarantee that your variable will even be finalized (or even destroyed) after the frame finishes. For example, if your variable goes into a reference cycle for whatever reason it may not be cleared until a GC run happens (and in some situations it may not even be cleared at any point).
I think there is a reasonable guarantee in CPython that it will happen exactly when you leave the frame, assuming there are no cycles or other references to the object. There's always the future, but I don't see a very near future where this will change fundamentally. Relying too much on CPython's behavior is a bad thing, but I think there are cases where it makes sense and can be a pragmatic choice. Certainly lots of programs have successfully relied on `sys._getframe` over the years.

Can you show a run-able example of the successful and unsuccessful usage of `with DAG(): ... `? On Fri, Apr 29, 2022, 6:31 AM Malthe <mborch@gmail.com> wrote:
Pablo Galindo Salgado wrote:
As it has been mentioned there is no guarantee that your variable will even be finalized (or even destroyed) after the frame finishes. For example, if your variable goes into a reference cycle for whatever reason it may not be cleared until a GC run happens (and in some situations it may not even be cleared at any point).
I think there is a reasonable guarantee in CPython that it will happen exactly when you leave the frame, assuming there are no cycles or other references to the object. There's always the future, but I don't see a very near future where this will change fundamentally.
Relying too much on CPython's behavior is a bad thing, but I think there are cases where it makes sense and can be a pragmatic choice. Certainly lots of programs have successfully relied on `sys._getframe` over the years. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BVO7RMMZ... Code of Conduct: http://python.org/psf/codeofconduct/

On Fri, 29 Apr 2022 at 06:38, Thomas Grainger <tagrain@gmail.com> wrote:
Can you show a run-able example of the successful and unsuccessful usage of `with DAG(): ... `?
from airflow import DAG # correct: dag = DAG("my_dag") # incorrect: DAG("my_dag") The with construct really has nothing to do with it, but it is a common source of confusion: # incorrect with DAG("my_dag"): ... It is less obvious (to some) in this way that the entire DAG will not be picked up. You will in fact have to write: # correct with DAG("my_dag") as dag: ... This way, you're capturing the DAG in the top-level scope which is the requirement.

Does this only apply to DAGfiles? Eg https://airflow.apache.org/docs/apache-airflow/1.10.12/concepts.html#scope You can use a `__del__` method that warns on collection - like an unawaited coroutine Also if you're in control of importing the dagfile you can record all created dags and report any that are missing from the globals of the module On Fri, Apr 29, 2022, 7:45 AM Malthe <mborch@gmail.com> wrote:
On Fri, 29 Apr 2022 at 06:38, Thomas Grainger <tagrain@gmail.com> wrote:
Can you show a run-able example of the successful and unsuccessful usage of `with DAG(): ... `?
from airflow import DAG
# correct: dag = DAG("my_dag")
# incorrect: DAG("my_dag")
The with construct really has nothing to do with it, but it is a common source of confusion:
# incorrect with DAG("my_dag"): ...
It is less obvious (to some) in this way that the entire DAG will not be picked up. You will in fact have to write:
# correct with DAG("my_dag") as dag: ...
This way, you're capturing the DAG in the top-level scope which is the requirement.

On Fri, 29 Apr 2022 at 06:50, Thomas Grainger <tagrain@gmail.com> wrote:
You can use a `__del__` method that warns on collection - like an unawaited coroutine
Also if you're in control of importing the dagfile you can record all created dags and report any that are missing from the globals of the module
Yes and I think this is the best we can do given how frames are being cleared. We can notify the user that a DAG was instantiated and not exposed at the top-level which is almost guaranteed to be a mistake. There's probably no good way currently to do better (for some value of "better"). Thanks

Can you ping me on the airflow PR for this change? (@graingert) On Fri, Apr 29, 2022, 7:54 AM Malthe <mborch@gmail.com> wrote:
On Fri, 29 Apr 2022 at 06:50, Thomas Grainger <tagrain@gmail.com> wrote:
You can use a `__del__` method that warns on collection - like an unawaited coroutine
Also if you're in control of importing the dagfile you can record all created dags and report any that are missing from the globals of the module
Yes and I think this is the best we can do given how frames are being cleared.
We can notify the user that a DAG was instantiated and not exposed at the top-level which is almost guaranteed to be a mistake. There's probably no good way currently to do better (for some value of "better").
Thanks
participants (4)
-
Dennis Sweeney
-
Malthe
-
Pablo Galindo Salgado
-
Thomas Grainger