On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <njs@pobox.com> wrote:

>> My suggestion: at numpy import time, check for an envvar, like say
>> NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1. If it's not set, then all the
>> __array_function__ dispatches turn into no-ops. This lets interested
>> downstream libraries and users try this out, but makes sure that we
>> won't have a hundred thousand end users depending on it without
>> realizing.
>>
>>
>>
>> - makes it easy for end-users to check how much overhead this adds (by
>> running their code with it enabled vs disabled)
>> - if/when we decide to commit to supporting it for real, we just
>> remove the envvar.
>
>
> I'm slightly concerned that the cost of reading an environment variable with
> os.environ could exaggerate the performance cost of __array_function__. It
> takes about 1 microsecond to read an environment variable on my laptop,
> which is comparable to the full overhead of __array_function__.

That's why I said "at numpy import time" :-). I was imagining we'd
check it once at import, and then from then on it'd be stashed in some
C global, so after that the overhead would just be a single
predictable branch 'if (array_function_is_enabled) { ... }'.

Indeed, I missed the "at numpy import time" bit :).

In that case, I'm concerned that it isn't always possible to set environment variables once before importing NumPy. The environment variable solution works great if users have full control of their own Python binaries, but that isn't always the case today in this era of server-less infrastructure and online notebooks.

One example offhand is Google's Colaboratory (https://research.google.com/colaboratory), a web based Jupyter notebook. NumPy is always loaded when a notebook is opened, as you can check from inspecting sys.modules. Now, I work with the developers of Colaboratory, so we could probably figure out a work-around together, but I'm pretty sure this would also come up in the context of other tools.

Another problem is unit testing. Does pytest use a separate Python process for running the tests in each file? I don't know and that feels like an implementation detail that I shouldn't have to know :). Yes, in principle I could use a subprocess in my __array_function__ for unit tests, but that would be really awkward.

> So we may
> want to switch to an explicit Python API instead, e.g.,
> np.enable_experimental_array_function().

If we do this, then libraries that want to use __array_function__ will
just call it themselves at import time. The point of the env-var is
that our policy is not to break end-users, so if we want an API to be
provisional and experimental then it's end-users who need to be aware
of that before using it. (This is also an advantage of checking the
envvar only at import time: it means libraries can't easily just
setenv() to enable the functionality behind users' backs.)

I'm in complete agreement that only authors of end-user applications should invoke this option, but just because something is technically possible doesn't mean that people will actually do it or that we need to support that use case :).

numpy.seterr() is a good example. It allows users to globally set how NumPy does error handling, but well written libraries still don't do that.

TensorFlow has similar function tf.enable_eager_execution() for enabling "eager mode" that is also worth examining:

https://www.tensorflow.org/api_docs/python/tf/enable_eager_execution

To solve the testing issue, they wrote decorator for using with tests, run_in_graph_and_eager_modes(): https://www.tensorflow.org/api_docs/python/tf/contrib/eager/run_test_in_graph_and_eager_modes