<div dir="ltr"><div><div><div><div>Julian,<br><br></div>This is really, really cool!<br><br></div>I have been wanting something like this for years (over a decade? wow!), but always thought it would require hacking the interpreter to intercept operations. This is a really inspired idea, and could buy numpy a lot of performance.<br><br></div>I'm afraid I can't say much about the implementation details -- but great work!<br><br></div>-Chris<br><br><div><div><br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 30, 2016 at 2:50 PM, Julian Taylor <span dir="ltr"><<a href="mailto:jtaylor.debian@googlemail.com" target="_blank">jtaylor.debian@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 30.09.2016 23:09, <a href="mailto:josef.pktd@gmail.com">josef.pktd@gmail.com</a> wrote:<br>

> On Fri, Sep 30, 2016 at 9:38 AM, Julian Taylor<br>

> <<a href="mailto:jtaylor.debian@googlemail.com">jtaylor.debian@googlemail.com</a><wbr>> wrote:<br>

>> hi,<br>

>> Temporary arrays generated in expressions are expensive as the imply<br>

>> extra memory bandwidth which is the bottleneck in most numpy operations.<br>

>> For example:<br>

>><br>

>> r = a + b + c<br>

>><br>

>> creates the b + c temporary and then adds a to it.<br>

>> This can be rewritten to be more efficient using inplace operations:<br>

>><br>

>> r = b + c<br>

>> r += a<br>

><br>

> general question (I wouldn't understand the details even if I looked.)<br>

><br>

> how is this affected by broadcasting and type promotion?<br>

><br>

> Some of the main reasons that I don't like to use inplace operation in<br>

> general is that I'm often not sure when type promotion occurs and when<br>

> arrays expand during broadcasting.<br>

><br>

> for example b + c is 1-D, a is 2-D, and r has the broadcasted shape.<br>

> another case when I switch away from broadcasting is when b + c is int<br>

> or bool and a is float. Thankfully, we get error messages for casting<br>

> now.<br>

<br>

</span>the temporary is only avoided when the casting follows the safe rule, so<br>

it should be the same as what you get without inplace operations. E.g.<br>

float32-temporary + float64 will not be converted to the unsafe float32<br>

+= float64 which a normal inplace operations would allow. But<br>

float64-temp + float32 is transformed.<br>

<br>

Currently the only broadcasting that will be transformed is temporary +<br>

scalar value, otherwise it will only work on matching array sizes.<br>

Though there is not really anything that prevents full broadcasting but<br>

its not implemented yet in the PR.<br>

<div class="HOEnZb"><div class="h5"><br>

><br>

>><br>

>> This saves some memory bandwidth and can speedup the operation by 50%<br>

>> for very large arrays or even more if the inplace operation allows it to<br>

>> be completed completely in the cpu cache.<br>

><br>

> I didn't realize the difference can be so large. That would make<br>

> streamlining some code worth the effort.<br>

><br>

> Josef<br>

><br>

><br>

>><br>

>> The problem is that inplace operations are a lot less readable so they<br>

>> are often only used in well optimized code. But due to pythons<br>

>> refcounting semantics we can actually do some inplace conversions<br>

>> transparently.<br>

>> If an operand in python has a reference count of one it must be a<br>

>> temporary so we can use it as the destination array. CPython itself does<br>

>> this optimization for string concatenations.<br>

>><br>

>> In numpy we have the issue that we can be called from the C-API directly<br>

>> where the reference count may be one for other reasons.<br>

>> To solve this we can check the backtrace until the python frame<br>

>> evaluation function. If there are only numpy and python functions in<br>

>> between that and our entry point we should be able to elide the temporary.<br>

>><br>

>> This PR implements this:<br>

>> <a href="https://github.com/numpy/numpy/pull/7997" rel="noreferrer" target="_blank">https://github.com/numpy/<wbr>numpy/pull/7997</a><br>

>><br>

>> It currently only supports Linux with glibc (which has reliable<br>

>> backtraces via unwinding) and maybe MacOS depending on how good their<br>

>> backtrace is. On windows the backtrace APIs are different and I don't<br>

>> know them but in theory it could also be done there.<br>

>><br>

>> A problem is that checking the backtrace is quite expensive, so should<br>

>> only be enabled when the involved arrays are large enough for it to be<br>

>> worthwhile. In my testing this seems to be around 180-300KiB sized<br>

>> arrays, basically where they start spilling out of the CPU L2 cache.<br>

>><br>

>> I made a little crappy benchmark script to test this cutoff in this branch:<br>

>> <a href="https://github.com/juliantaylor/numpy/tree/elide-bench" rel="noreferrer" target="_blank">https://github.com/<wbr>juliantaylor/numpy/tree/elide-<wbr>bench</a><br>

>><br>

>> If you are interested you can run it with:<br>

>> python setup.py build_ext -j 4 --inplace<br>

>> ipython --profile=null check.ipy<br>

>><br>

>> At the end it will plot the ratio between elided and non-elided runtime.<br>

>> It should get larger than one around 180KiB on most cpus.<br>

>><br>

>> If no one points out some flaw in the approach, I'm hoping to get this<br>

>> into the next numpy version.<br>

>><br>

>> cheers,<br>

>> Julian<br>

>><br>

>><br>

>> ______________________________<wbr>_________________<br>

>> NumPy-Discussion mailing list<br>

>> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

>> <a href="https://mail.scipy.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.scipy.org/<wbr>mailman/listinfo/numpy-<wbr>discussion</a><br>

>><br>

> ______________________________<wbr>_________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

> <a href="https://mail.scipy.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.scipy.org/<wbr>mailman/listinfo/numpy-<wbr>discussion</a><br>

><br>

<br>

<br>

</div></div><br>______________________________<wbr>_________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

<a href="https://mail.scipy.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.scipy.org/<wbr>mailman/listinfo/numpy-<wbr>discussion</a><br>

<br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><br>Christopher Barker, Ph.D.<br>Oceanographer<br><br>Emergency Response Division<br>NOAA/NOS/OR&R            (206) 526-6959   voice<br>7600 Sand Point Way NE   (206) 526-6329   fax<br>Seattle, WA  98115       (206) 526-6317   main reception<br><br><a href="mailto:Chris.Barker@noaa.gov" target="_blank">Chris.Barker@noaa.gov</a></div>

</div>