Dealing with static local variables in Numpy
Hello, This is my first post to this group; I'd like to start by expressing my appreciation for the amazing work in developing and maintaining Numpy. I have a question. Numpy has quite a lot of static local variables (variables defined as static inside a function, like this (core/src/multiarraymodule.c, line 4483): if (raise_exceptions) { static PyObject *too_hard_cls = NULL; /* ... */ } I understand that these variables provide local caching and are important for efficiency. They do however cause some issues when dealing with multiple subinterpreters, where the static local variable might have been initialized by one of the subinterpreters, and is not reset when accessed by another subinterpreter. More globally, they cannot be reset when the Numpy module is released, and thus will likely cause an issue if it is reloaded after being released. I have seen the issue mentionned in at least one pull request: https://github.com/numpy/numpy/pull/15169 and in several issues. If I understand correctly, the issue is not considered as important because subinterpreters are not yet prominent in CPython, static local variables provide an important service in caching data locally (instead of exposing these variables globally). So the benefits outweigh the costs and risks (that would be a huge change to the code base). I happen to maintain, compile and run a version of Python on iOS (https://github.com/holzschu/a-shell/ or https://apps.apple.com/us/app/a-shell/id1473805438), where I have to remove all these static local variables, because of the specificity of the platform (in order to run Python multiple times, I have to release and reset all modules). Right now, I'm maintaining the changes to the code base in a separate branch (https://github.com/holzschu/numpy/) and not necessarily in a very clean way. With the recent renewed interest in subinterpreters, I was wondering if there was a way I could contribute these changes back to the main numpy branch. I would have to clean up the code, obviously, and probably get guidance on how to do it cleanly, but the first question is: would there be an interest, or is that something I should keep in my separate branch? From a technical point of view, about 80% of these static local variables are just before a call to npy_cache_import(), and the most efficient way to do it (in terms of lines of code) is just to remove the part where npy_cache_import uses the static local variable. You pay a price in performance, but gain in usability. Best regards, Nicolas Holzschuch
On Tue, 2023-08-29 at 08:01 +0000, Nicolas Holzschuch wrote:
Hello,
This is my first post to this group; I'd like to start by expressing my appreciation for the amazing work in developing and maintaining Numpy.
I have a question. Numpy has quite a lot of static local variables (variables defined as static inside a function, like this (core/src/multiarraymodule.c, line 4483): if (raise_exceptions) { static PyObject *too_hard_cls = NULL; /* ... */ }
I understand that these variables provide local caching and are important for efficiency. They do however cause some issues when dealing with multiple subinterpreters, where the static local variable might have been initialized by one of the subinterpreters, and is not reset when accessed by another subinterpreter. More globally, they cannot be reset when the Numpy module is released, and thus will likely cause an issue if it is reloaded after being released.
Right, but in the end these caches are there for a reason (or almost all), and just removing them does not seem acceptable to me. However, there are better ways to solve this. You can move it into module state. In the vast majority of cases that should not be hard: The patterns are known. In a few cases it may be harder but I believe CPython offers decent solutions now (not sure how it looks like). I had for a long time hoped for the HPy drive will solve this, but there is no reason to wait for it. In any case, contributions to this effect are very much welcome, I have been hoping they would come for a long time, but I am not excited about just removing the "static". - Sebastian
I have seen the issue mentionned in at least one pull request: https://github.com/numpy/numpy/pull/15169 and in several issues. If I understand correctly, the issue is not considered as important because subinterpreters are not yet prominent in CPython, static local variables provide an important service in caching data locally (instead of exposing these variables globally). So the benefits outweigh the costs and risks (that would be a huge change to the code base).
I happen to maintain, compile and run a version of Python on iOS ( https://github.com/holzschu/a-shell/ or https://apps.apple.com/us/app/a-shell/id1473805438), where I have to remove all these static local variables, because of the specificity of the platform (in order to run Python multiple times, I have to release and reset all modules). Right now, I'm maintaining the changes to the code base in a separate branch ( https://github.com/holzschu/numpy/) and not necessarily in a very clean way.
With the recent renewed interest in subinterpreters, I was wondering if there was a way I could contribute these changes back to the main numpy branch. I would have to clean up the code, obviously, and probably get guidance on how to do it cleanly, but the first question is: would there be an interest, or is that something I should keep in my separate branch?
From a technical point of view, about 80% of these static local variables are just before a call to npy_cache_import(), and the most efficient way to do it (in terms of lines of code) is just to remove the part where npy_cache_import uses the static local variable. You pay a price in performance, but gain in usability.
Best regards, Nicolas Holzschuch _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
On Fri, Sep 1, 2023 at 11:11 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Tue, 2023-08-29 at 08:01 +0000, Nicolas Holzschuch wrote:
Hello,
This is my first post to this group; I'd like to start by expressing my appreciation for the amazing work in developing and maintaining Numpy.
I have a question. Numpy has quite a lot of static local variables (variables defined as static inside a function, like this (core/src/multiarraymodule.c, line 4483): if (raise_exceptions) { static PyObject *too_hard_cls = NULL; /* ... */ }
I understand that these variables provide local caching and are important for efficiency. They do however cause some issues when dealing with multiple subinterpreters, where the static local variable might have been initialized by one of the subinterpreters, and is not reset when accessed by another subinterpreter. More globally, they cannot be reset when the Numpy module is released, and thus will likely cause an issue if it is reloaded after being released.
Right, but in the end these caches are there for a reason (or almost all), and just removing them does not seem acceptable to me.
However, there are better ways to solve this. You can move it into module state. In the vast majority of cases that should not be hard: The patterns are known. In a few cases it may be harder but I believe CPython offers decent solutions now (not sure how it looks like). I had for a long time hoped for the HPy drive will solve this, but there is no reason to wait for it.
In any case, contributions to this effect are very much welcome, I have been hoping they would come for a long time, but I am not excited about just removing the "static".
- Sebastian
I have seen the issue mentionned in at least one pull request: https://github.com/numpy/numpy/pull/15169 and in several issues. If I understand correctly, the issue is not considered as important because subinterpreters are not yet prominent in CPython, static local variables provide an important service in caching data locally (instead of exposing these variables globally). So the benefits outweigh the costs and risks (that would be a huge change to the code base).
I happen to maintain, compile and run a version of Python on iOS ( https://github.com/holzschu/a-shell/ or https://apps.apple.com/us/app/a-shell/id1473805438), where I have to remove all these static local variables, because of the specificity of the platform (in order to run Python multiple times, I have to release and reset all modules). Right now, I'm maintaining the changes to the code base in a separate branch ( https://github.com/holzschu/numpy/) and not necessarily in a very clean way.
With the recent renewed interest in subinterpreters, I was wondering if there was a way I could contribute these changes back to the main numpy branch. I would have to clean up the code, obviously, and probably get guidance on how to do it cleanly, but the first question is: would there be an interest, or is that something I should keep in my separate branch?
From a technical point of view, about 80% of these static local variables are just before a call to npy_cache_import(), and the most efficient way to do it (in terms of lines of code) is just to remove the part where npy_cache_import uses the static local variable. You pay a price in performance, but gain in usability.
Best regards, Nicolas Holzschuch _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
This was discussed at the last community meeting. We are open to the idea, but would like to see how it works out in practice. In particular, what the code looks like. Chuck
participants (3)
-
Charles R Harris
-
Nicolas Holzschuch
-
Sebastian Berg