On Tue, Aug 21, 2018 at 6:12 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <njs@pobox.com> wrote:
My suggestion: at numpy import time, check for an envvar, like say NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1. If it's not set, then all the __array_function__ dispatches turn into no-ops. This lets interested downstream libraries and users try this out, but makes sure that we won't have a hundred thousand end users depending on it without realizing.
- makes it easy for end-users to check how much overhead this adds (by running their code with it enabled vs disabled) - if/when we decide to commit to supporting it for real, we just remove the envvar.
I'm slightly concerned that the cost of reading an environment variable with os.environ could exaggerate the performance cost of __array_function__. It takes about 1 microsecond to read an environment variable on my laptop, which is comparable to the full overhead of __array_function__.
That's why I said "at numpy import time" :-). I was imagining we'd check it once at import, and then from then on it'd be stashed in some C global, so after that the overhead would just be a single predictable branch 'if (array_function_is_enabled) { ... }'.
Indeed, I missed the "at numpy import time" bit :).
In that case, I'm concerned that it isn't always possible to set environment variables once before importing NumPy. The environment variable solution works great if users have full control of their own Python binaries, but that isn't always the case today in this era of server-less infrastructure and online notebooks.
One example offhand is Google's Colaboratory (https://research.google.com/colaboratory), a web based Jupyter notebook. NumPy is always loaded when a notebook is opened, as you can check from inspecting sys.modules. Now, I work with the developers of Colaboratory, so we could probably figure out a work-around together, but I'm pretty sure this would also come up in the context of other tools.
I mean, the idea of the envvar is to be a temporary measure enable devs to experiment with a provisional feature, while being awkward enough that people don't build lots of stuff assuming its there. It doesn't have to 100% supported in every environment.
Another problem is unit testing. Does pytest use a separate Python process for running the tests in each file? I don't know and that feels like an implementation detail that I shouldn't have to know :). Yes, in principle I could use a subprocess in my __array_function__ for unit tests, but that would be really awkward.
Set the envvar before invoking pytest? For numpy itself we'll need to write a few awkward tests involving subprocesses to make sure the envvar parsing is working properly, but I don't think this is a big deal. As long as we only have 1-2 places that __array_function__ dispatch funnels through, we just need to make sure that they work properly with/without the envvar; no need to test every API separately. Or if it is an issue we can have some private API that's only available to the numpy test suite...
So we may want to switch to an explicit Python API instead, e.g., np.enable_experimental_array_function().
If we do this, then libraries that want to use __array_function__ will just call it themselves at import time. The point of the env-var is that our policy is not to break end-users, so if we want an API to be provisional and experimental then it's end-users who need to be aware of that before using it. (This is also an advantage of checking the envvar only at import time: it means libraries can't easily just setenv() to enable the functionality behind users' backs.)
I'm in complete agreement that only authors of end-user applications should invoke this option, but just because something is technically possible doesn't mean that people will actually do it or that we need to support that use case :).
I didn't say "authors of end-user applications", I said "end-users" :-). That said, I dunno. My intuition is that if we have a function call like this then libraries that define __array_function__ will merrily call it in their package __init__ and it accomplishes nothing, but maybe I'm being too cynical and untrusting. -n -- Nathaniel J. Smith -- https://vorpus.org