[Numpy-discussion] lazy evaluation

mark florisson markflorisson88 at gmail.com
Tue Jun 5 18:06:46 EDT 2012

On 5 June 2012 22:36, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 06/05/2012 10:47 PM, mark florisson wrote:
>> On 5 June 2012 20:17, Nathaniel Smith<njs at pobox.com>  wrote:
>>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>>> <markflorisson88 at gmail.com>  wrote:
>>>> On 5 June 2012 17:38, Nathaniel Smith<njs at pobox.com>  wrote:
>>>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>>>>> <markflorisson88 at gmail.com>  wrote:
>>>>>> On 5 June 2012 14:58, Nathaniel Smith<njs at pobox.com>  wrote:
>>>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>>>>>> <markflorisson88 at gmail.com>  wrote:
>>>>>>>> It would be great if we implement the NEP listed above, but with a few
>>>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and
>>>>>>>> determine when expressions should be evaluated, etc. However, for each
>>>>>>>> user operation, Numpy will call back a user-installed hook
>>>>>>>> implementing some interface, to allow various packages to provide
>>>>>>>> their own hooks to evaluate vector operations however they want. This
>>>>>>>> will include packages such as Theano, which could run things on the
>>>>>>>> GPU, Numexpr, and in the future
>>>>>>>> https://github.com/markflorisson88/minivect (which will likely have an
>>>>>>>> LLVM backend in the future, and possibly integrated with Numba to
>>>>>>>> allow inlining of numba ufuncs). The project above tries to bring
>>>>>>>> together all the different array expression compilers together in a
>>>>>>>> single framework, to provide efficient array expressions specialized
>>>>>>>> for any data layout (nditer on steroids if you will, with SIMD,
>>>>>>>> threaded and inlining capabilities).
>>>>>>> A global hook sounds ugly and hard to control -- it's hard to tell
>>>>>>> which operations should be deferred and which should be forced, etc.
>>>>>> Yes, but for the user the difference should not be visible (unless
>>>>>> operations can raise exceptions, in which case you choose the safe
>>>>>> path, or let the user configure what to do).
>>>>>>> While it would be less magical, I think a more explicit API would in
>>>>>>> the end be easier to use... something like
>>>>>>>   a, b, c, d = deferred([a, b, c, d])
>>>>>>>   e = a + b * c  # 'e' is a deferred object too
>>>>>>>   f = np.dot(e, d)  # so is 'f'
>>>>>>>   g = force(f)  # 'g' is an ndarray
>>>>>>>   # or
>>>>>>>   force(f, out=g)
>>>>>>> But at that point, this could easily be an external library, right?
>>>>>>> All we'd need from numpy would be some way for external types to
>>>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>>>>>>> several reasons to want that functionality, and it seems like
>>>>>>> developing these "improved numexpr" ideas would be much easier if they
>>>>>>> didn't require doing deep surgery to numpy itself...
>>>>>> Definitely, but besides monkey-patch-chaining I think some
>>>>>> modifications would be required, but they would be reasonably simple.
>>>>>> Most of the functionality would be handled in one function, which most
>>>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add)
>>>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
>>>>>> , which is inserted after argument unpacking and sanity checking. You
>>>>>> could also do a per-module hook, and have the function look at
>>>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C
>>>>>> or Cython code.
>>>>>> How did you have overrides in mind?
>>>>> My vague idea is that core numpy operations are about as fundamental
>>>>> for scientific users as the Python builtin operations are, so they
>>>>> should probably be overrideable in a similar way. So we'd teach numpy
>>>>> functions to check for methods named like "__numpy_ufunc__" or
>>>>> "__numpy_dot__" and let themselves be overridden if found. Like how
>>>>> __gt__ and __add__ and stuff work. Or something along those lines.
>>>>>> I also found this thread:
>>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
>>>>>> , but I think you want more than just to override ufuncs, you want
>>>>>> numpy to govern when stuff is allowed to be lazy and when stuff should
>>>>>> be evaluated (e.g. when it is indexed, slice assigned (although that
>>>>>> itself may also be lazy), etc). You don't want some funny object back
>>>>>> that doesn't work with things which are not overridden in numpy.
>>>>> My point is that probably numpy should *not* govern the decision about
>>>>> what stuff should be lazy and what should be evaluated; that should be
>>>>> governed by some combination of the user and
>>>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make
>>>>> those decisions obvious and explicit. (And if the funny objects had an
>>>>> __array_interface__ attribute that automatically forced evaluation
>>>>> when accessed, then they'd work fine with code that was expecting an
>>>>> array, or if they were assigned to a "real" ndarray, etc.)
>>>> That's disappointing though, since the performance drawbacks can
>>>> severely limit the usefulness for people with big data sets. Ideally,
>>>> you would take your intuitive numpy code, and make it go fast, without
>>>> jumping through hoops. Numpypy has lazy evaluation,  I don't know how
>>>> good a job it does, but it does mean you can finally get fast numpy
>>>> code in an intuitive way (and even run it on a GPU if that is possible
>>>> and beneficial).
>>> All of these proposals require the user to jump through hoops -- the
>>> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more
>>> importantly, a set of rules that people have to learn and keep in mind
>>> for which numpy operations are affected, which ones aren't, which
>>> operations can't be performed while deferredstate is True, etc. So
>>> this has two problems: (1) these rules are opaque, (2) it's far from
>>> clear what the rules should be.
>> Right, I guess I should have commented on that. I don't think the
>> deferredstate stuff is needed at all, execution can always be deferred
>> as long as it does not affect semantics. So if something is marked
>> readonly because it is used in an expression and then written to, you
>> evaluate the expression and then perform the write. The only way to
>> break stuff, I think, would be to use pointers through the buffer
>> interface or PyArray_DATA and not respect the sudden readonly
>> property. A deferred expression is only evaluated once in any valid
>> GIL-holding context (so it shouldn't break threads either).
> I think Nathaniel's point is that the point where you get a 10-second
> pause to wait for computation is part of the semantics of current NumPy:
> print 'Starting computation'
> z = (x + y).sum()
> print 'Computation done'
> print 'Result was', z
> I think that if this wasn't the case, newbies would be be tripped up a
> lot and things would feel a lot less intuitive. Certainly when working
> from the IPython command line.
> Also, to remain sane in IPython (or when using a debugger, etc.), I'd want
> "print z"
> to print something like "unevaluated array", not to trigger a
> computation. Same with str(z) and so on.

I guess you could detect that at runtime, or just make it
configurable. As for triggering computation somewhere else, I guess I
find it preferable to horrible performance :)

> I don't think a context manager modifying thread-local global state like
> with np.lazy:
>     ...
> would be horribly intrusive.
> But I also think it'd be good to start with being very explicit (x =
> np.lazy_multiply(a, b); compute(x)) -- such an API should be available
> anyway -- and then have the discussion once that works.

Maybe that's the best way forward. I guess I'd prefer an import
numpy.lazy_numpy as numpy in that case. I don't really like the with
statement here, since ideally you'd just experiment with swapping in
another module and see if your code still runs fine.

> Dag
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list