[Numpy-discussion] lazy evaluation

mark florisson markflorisson88 at gmail.com
Tue Jun 5 18:02:54 EDT 2012

On 5 June 2012 22:29, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Jun 5, 2012 at 9:47 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> On 5 June 2012 20:17, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>>> <markflorisson88 at gmail.com> wrote:
>>>> On 5 June 2012 17:38, Nathaniel Smith <njs at pobox.com> wrote:
>>>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>>>>> <markflorisson88 at gmail.com> wrote:
>>>>>> On 5 June 2012 14:58, Nathaniel Smith <njs at pobox.com> wrote:
>>>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>>>>>> <markflorisson88 at gmail.com> wrote:
>>>>>>>> It would be great if we implement the NEP listed above, but with a few
>>>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and
>>>>>>>> determine when expressions should be evaluated, etc. However, for each
>>>>>>>> user operation, Numpy will call back a user-installed hook
>>>>>>>> implementing some interface, to allow various packages to provide
>>>>>>>> their own hooks to evaluate vector operations however they want. This
>>>>>>>> will include packages such as Theano, which could run things on the
>>>>>>>> GPU, Numexpr, and in the future
>>>>>>>> https://github.com/markflorisson88/minivect (which will likely have an
>>>>>>>> LLVM backend in the future, and possibly integrated with Numba to
>>>>>>>> allow inlining of numba ufuncs). The project above tries to bring
>>>>>>>> together all the different array expression compilers together in a
>>>>>>>> single framework, to provide efficient array expressions specialized
>>>>>>>> for any data layout (nditer on steroids if you will, with SIMD,
>>>>>>>> threaded and inlining capabilities).
>>>>>>> A global hook sounds ugly and hard to control -- it's hard to tell
>>>>>>> which operations should be deferred and which should be forced, etc.
>>>>>> Yes, but for the user the difference should not be visible (unless
>>>>>> operations can raise exceptions, in which case you choose the safe
>>>>>> path, or let the user configure what to do).
>>>>>>> While it would be less magical, I think a more explicit API would in
>>>>>>> the end be easier to use... something like
>>>>>>>  a, b, c, d = deferred([a, b, c, d])
>>>>>>>  e = a + b * c  # 'e' is a deferred object too
>>>>>>>  f = np.dot(e, d)  # so is 'f'
>>>>>>>  g = force(f)  # 'g' is an ndarray
>>>>>>>  # or
>>>>>>>  force(f, out=g)
>>>>>>> But at that point, this could easily be an external library, right?
>>>>>>> All we'd need from numpy would be some way for external types to
>>>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>>>>>>> several reasons to want that functionality, and it seems like
>>>>>>> developing these "improved numexpr" ideas would be much easier if they
>>>>>>> didn't require doing deep surgery to numpy itself...
>>>>>> Definitely, but besides monkey-patch-chaining I think some
>>>>>> modifications would be required, but they would be reasonably simple.
>>>>>> Most of the functionality would be handled in one function, which most
>>>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add)
>>>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
>>>>>> , which is inserted after argument unpacking and sanity checking. You
>>>>>> could also do a per-module hook, and have the function look at
>>>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C
>>>>>> or Cython code.
>>>>>> How did you have overrides in mind?
>>>>> My vague idea is that core numpy operations are about as fundamental
>>>>> for scientific users as the Python builtin operations are, so they
>>>>> should probably be overrideable in a similar way. So we'd teach numpy
>>>>> functions to check for methods named like "__numpy_ufunc__" or
>>>>> "__numpy_dot__" and let themselves be overridden if found. Like how
>>>>> __gt__ and __add__ and stuff work. Or something along those lines.
>>>>>> I also found this thread:
>>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
>>>>>> , but I think you want more than just to override ufuncs, you want
>>>>>> numpy to govern when stuff is allowed to be lazy and when stuff should
>>>>>> be evaluated (e.g. when it is indexed, slice assigned (although that
>>>>>> itself may also be lazy), etc). You don't want some funny object back
>>>>>> that doesn't work with things which are not overridden in numpy.
>>>>> My point is that probably numpy should *not* govern the decision about
>>>>> what stuff should be lazy and what should be evaluated; that should be
>>>>> governed by some combination of the user and
>>>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make
>>>>> those decisions obvious and explicit. (And if the funny objects had an
>>>>> __array_interface__ attribute that automatically forced evaluation
>>>>> when accessed, then they'd work fine with code that was expecting an
>>>>> array, or if they were assigned to a "real" ndarray, etc.)
>>>> That's disappointing though, since the performance drawbacks can
>>>> severely limit the usefulness for people with big data sets. Ideally,
>>>> you would take your intuitive numpy code, and make it go fast, without
>>>> jumping through hoops. Numpypy has lazy evaluation,  I don't know how
>>>> good a job it does, but it does mean you can finally get fast numpy
>>>> code in an intuitive way (and even run it on a GPU if that is possible
>>>> and beneficial).
>>> All of these proposals require the user to jump through hoops -- the
>>> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more
>>> importantly, a set of rules that people have to learn and keep in mind
>>> for which numpy operations are affected, which ones aren't, which
>>> operations can't be performed while deferredstate is True, etc. So
>>> this has two problems: (1) these rules are opaque, (2) it's far from
>>> clear what the rules should be.
>> Right, I guess I should have commented on that. I don't think the
>> deferredstate stuff is needed at all, execution can always be deferred
>> as long as it does not affect semantics. So if something is marked
>> readonly because it is used in an expression and then written to, you
>> evaluate the expression and then perform the write. The only way to
>> break stuff, I think, would be to use pointers through the buffer
>> interface or PyArray_DATA and not respect the sudden readonly
>> property. A deferred expression is only evaluated once in any valid
>> GIL-holding context (so it shouldn't break threads either).
> I don't think you can get away with switching numpy to defer all
> operations by default. I just don't see how you could make it
> transparent.
> One obvious abstraction leak is that the readonly flag is never a
> reliable way to detect or prevent writes --
>  a = np.arange(10)
>  b = a.view()
>  a.flags.readonly = True
>  assert a[0] == 0
>  b[0] = 1
>  assert a[0] == 1

Right, that's a good point, although arguably each view should share a
data structure with its owner, which means you would mark a memory
region as readonly instead of a view.

> Another would be that memory and CPU usage suddenly become very unpredicable --
>  def f():
>    a = np.zeros((2, 1000000))
>    return np.sum(a, axis=1)
>  s = f()
>  # 'a' has left scope, but is still pinned in memory
>  # This operation allows the memory to be freed, but takes a ton of CPU time:
>  print s[0]

Another valid point. I guess the expression evaluating library can
govern the evaluation here, since it can see the shape of the operands
and lazy result, and decide it's not worth to have the result be lazy.

But your point stands, we cannot predict use cases that might break in
some (subtle) way, so we can't switch to entirely lazy. On the other
hand, if it works fine for numpypy, assuming the numpy C api is not an
obstruction, we should also be able to make it work.

> Another standard problem with such schemes is making sure that
> exceptions are raised at the correct place. (You can only defer
> operations if you can guarantee that they cannot fail.)
> The PyPy approach is the Right Thing, but it's very very difficult. I
> would either help them, or else try to find another approach that
> gives 90% of the benefit for 10% of the effort. That's just me though
> :-).

I don't know, it also binds you to that specific platform. What I'd
like is to experiment with various approaches and choose the  one that
is the best or suits a set of use cases best.

>>> If we (initially) implement
>>> "deferredness" in a third-party library with an explicit
>>> "deferredarray" type, then that works around both of them: it makes
>>> the rules transparent (operations using that type are deferred,
>>> operations using ndarray aren't), and gives you room to experiment
>>> with different approaches without having to first accomplish some
>>> major change in the numpy code base (and maybe get it wrong and have
>>> to change it again later). That's what I meant when I said in my first
>>> message that the more explicit API actually seemed like it would be
>>> easier for people to use in the long run.
>> Right, ok, that makes sense. I'm not sure how much more
>> experimentation is needed though. Theano, for instance, is something
>> that already does this stuff, it's just not as convenient to use as
>> regular numpy code (for the cases that work in both, Theano does other
>> stuff as well).
> Does Theano have the same rules for what is deferred and what isn't
> that Numba and minivect do? Are you sure that the same hook interface
> will work for generating all of their internal representations?

Well, they are all different projects with different goals. Numba is
to create ufuncs which numpy can then evaluate (at this point), Theano
is always lazy until you say "build me a callable function" (which is
not entirely just-in-time, but almost), and minivect is supposed to be
used as a specializer and code generator for Cython and hopefully
other projects such as numba and theano. But yes, it is possible to
generate all these representations (at least numexpr, theano and
minivect), since unsupported operations simply mean numpy evaluation,
which is already supported. I believe numba goes straight from
bytecode to llvm (correct me if I'm wrong), without an intermediate

> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list