
On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <njs@pobox.com> wrote:
My suggestion: at numpy import time, check for an envvar, like say NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1. If it's not set, then all the __array_function__ dispatches turn into no-ops. This lets interested downstream libraries and users try this out, but makes sure that we won't have a hundred thousand end users depending on it without realizing.
- makes it easy for end-users to check how much overhead this adds (by running their code with it enabled vs disabled) - if/when we decide to commit to supporting it for real, we just remove the envvar.
I'm slightly concerned that the cost of reading an environment variable with os.environ could exaggerate the performance cost of __array_function__. It takes about 1 microsecond to read an environment variable on my laptop, which is comparable to the full overhead of __array_function__.
That's why I said "at numpy import time" :-). I was imagining we'd check it once at import, and then from then on it'd be stashed in some C global, so after that the overhead would just be a single predictable branch 'if (array_function_is_enabled) { ... }'.
Indeed, I missed the "at numpy import time" bit :). In that case, I'm concerned that it isn't always possible to set environment variables once before importing NumPy. The environment variable solution works great if users have full control of their own Python binaries, but that isn't always the case today in this era of server-less infrastructure and online notebooks. One example offhand is Google's Colaboratory ( https://research.google.com/colaboratory), a web based Jupyter notebook. NumPy is always loaded when a notebook is opened, as you can check from inspecting sys.modules. Now, I work with the developers of Colaboratory, so we could probably figure out a work-around together, but I'm pretty sure this would also come up in the context of other tools. Another problem is unit testing. Does pytest use a separate Python process for running the tests in each file? I don't know and that feels like an implementation detail that I shouldn't have to know :). Yes, in principle I could use a subprocess in my __array_function__ for unit tests, but that would be really awkward.
So we may
want to switch to an explicit Python API instead, e.g., np.enable_experimental_array_function().
If we do this, then libraries that want to use __array_function__ will just call it themselves at import time. The point of the env-var is that our policy is not to break end-users, so if we want an API to be provisional and experimental then it's end-users who need to be aware of that before using it. (This is also an advantage of checking the envvar only at import time: it means libraries can't easily just setenv() to enable the functionality behind users' backs.)
I'm in complete agreement that only authors of end-user applications should invoke this option, but just because something is technically possible doesn't mean that people will actually do it or that we need to support that use case :). numpy.seterr() is a good example. It allows users to globally set how NumPy does error handling, but well written libraries still don't do that. TensorFlow has similar function tf.enable_eager_execution() for enabling "eager mode" that is also worth examining: https://www.tensorflow.org/api_docs/python/tf/enable_eager_execution To solve the testing issue, they wrote decorator for using with tests, run_in_graph_and_eager_modes(): https://www.tensorflow.org/api_docs/python/tf/contrib/eager/run_test_in_grap...