[Numpy-discussion] automatically avoiding temporary arrays

Robert McLeod robbmcleod at gmail.com
Wed Oct 5 08:06:06 EDT 2016

On Wed, Oct 5, 2016 at 1:11 PM, srean <srean.list at gmail.com> wrote:

> Thanks Francesc, Robert for giving me a broader picture of where this fits
> in. I believe numexpr does not  handle slicing, so that might be another
> thing to look at.

Dereferencing would be relatively simple to add into numexpr, as it would
just be some getattr() calls.  Personally I will add that at some point
because it will clean up my code.

Slicing, maybe only for continuous blocks in memory?



would be possible, but

imageStack[:, ::2, ::2]

would not be trivial (I think...).  I seem to remember someone asked David
Cooke about slicing and he said something along the lines of, "that's what
Numba is for."  Perhaps NumPy backended by Numba is more so what you are
looking for, as it hooks into the byte compiler? The main advantage of
numexpr is that a series of numpy functions in <expression> can be enclosed
in  ne.evaluate( "<expression>" ) and it provides a big acceleration for
little programmer effort, but it's not nearly as sophisticated as Numba or

> On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod <robbmcleod at gmail.com>
> wrote:
>> As Francesc said, Numexpr is going to get most of its power through
>> grouping a series of operations so it can send blocks to the CPU cache and
>> run the entire series of operations on the cache before returning the block
>> to system memory.  If it was just used to back-end NumPy, it would only
>> gain from the multi-threading portion inside each function call.
> Is that so ?
> I thought numexpr also cuts down on number of temporary buffers that get
> filled (in other words copy operations) if the same expression was written
> as series of operations. My understanding can be wrong, and would
> appreciate correction.
> The 'out' parameter in ufuncs can eliminate extra temporaries but its not
> composable. Right now I have to manually carry along the array where the in
> place operations take place. I think the goal here is to eliminate that.

 The numexpr virtual machine does create temporaries where needed when it
parses the abstract syntax tree for all the operations it has to do.  I
believe the main advantage is that the temporaries are created on the CPU
cache, and not in system memory. It's certainly true that numexpr doesn't
create a lot of OP_COPY operations, rather it's optimized to minimize them,
so probably it's fewer ops than naive successive calls to numpy within
python, but I'm unsure if there's any difference in operation count between
a hand-optimized numpy with out= set and numexpr.  Numexpr just does it for

This blog post from Tim Hochberg is useful for understanding the
performance advantages of blocking versus multithreading:



Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcleod at unibas.ch
robert.mcleod at bsse.ethz.ch <robert.mcleod at ethz.ch>
robbmcleod at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20161005/baae3a52/attachment.html>

More information about the NumPy-Discussion mailing list