[Numpy-discussion] automatically avoiding temporary arrays

Julian Taylor jtaylor.debian at googlemail.com
Mon Oct 3 06:16:48 EDT 2016


the problem with this approach is that we don't really want numpy
hogging on to hundreds of megabytes of memory by default so it would
need to be a user option.
A context manager could work too but it would probably lead to premature
optimization.

Very new Linux versions (4.6+) now finally support MADV_FREE which gives
memory back to the system but does not require refaulting it if nothing
else needed it. So this might be an option now.
But libc implementations will probably use that at some point too and
then numpy doesn't need to do this.


On 02.10.2016 15:10, Benjamin Root wrote:
> Just thinking aloud, an idea I had recently takes a different approach.
> The problem with temporaries isn't so much that they exists, but rather
> they they keep on malloc'ed and cleared. What if numpy kept a small LRU
> cache of weakref'ed temporaries? Whenever a new numpy array is
> requested, numpy could see if there is already one in its cache of
> matching size and use it. If you think about it, expressions that result
> in many temporaries would quite likely have many of them being the same
> size in memory.
> 
> Don't know how feasible it would be to implement though.
> 
> Cheers!
> Ben Root
> 
> 
> On Sat, Oct 1, 2016 at 2:38 PM, Chris Barker <chris.barker at noaa.gov
> <mailto:chris.barker at noaa.gov>> wrote:
> 
>     Julian,
> 
>     This is really, really cool!
> 
>     I have been wanting something like this for years (over a decade?
>     wow!), but always thought it would require hacking the interpreter
>     to intercept operations. This is a really inspired idea, and could
>     buy numpy a lot of performance.
> 
>     I'm afraid I can't say much about the implementation details -- but
>     great work!
> 
>     -Chris
> 
> 
> 
> 
>     On Fri, Sep 30, 2016 at 2:50 PM, Julian Taylor
>     <jtaylor.debian at googlemail.com
>     <mailto:jtaylor.debian at googlemail.com>> wrote:
> 
>         On 30.09.2016 23:09, josef.pktd at gmail.com
>         <mailto:josef.pktd at gmail.com> wrote:
>         > On Fri, Sep 30, 2016 at 9:38 AM, Julian Taylor
>         > <jtaylor.debian at googlemail.com
>         <mailto:jtaylor.debian at googlemail.com>> wrote:
>         >> hi,
>         >> Temporary arrays generated in expressions are expensive as the imply
>         >> extra memory bandwidth which is the bottleneck in most numpy operations.
>         >> For example:
>         >>
>         >> r = a + b + c
>         >>
>         >> creates the b + c temporary and then adds a to it.
>         >> This can be rewritten to be more efficient using inplace operations:
>         >>
>         >> r = b + c
>         >> r += a
>         >
>         > general question (I wouldn't understand the details even if I looked.)
>         >
>         > how is this affected by broadcasting and type promotion?
>         >
>         > Some of the main reasons that I don't like to use inplace operation in
>         > general is that I'm often not sure when type promotion occurs and when
>         > arrays expand during broadcasting.
>         >
>         > for example b + c is 1-D, a is 2-D, and r has the broadcasted shape.
>         > another case when I switch away from broadcasting is when b + c is int
>         > or bool and a is float. Thankfully, we get error messages for casting
>         > now.
> 
>         the temporary is only avoided when the casting follows the safe
>         rule, so
>         it should be the same as what you get without inplace
>         operations. E.g.
>         float32-temporary + float64 will not be converted to the unsafe
>         float32
>         += float64 which a normal inplace operations would allow. But
>         float64-temp + float32 is transformed.
> 
>         Currently the only broadcasting that will be transformed is
>         temporary +
>         scalar value, otherwise it will only work on matching array sizes.
>         Though there is not really anything that prevents full
>         broadcasting but
>         its not implemented yet in the PR.
> 
>         >
>         >>
>         >> This saves some memory bandwidth and can speedup the
>         operation by 50%
>         >> for very large arrays or even more if the inplace operation
>         allows it to
>         >> be completed completely in the cpu cache.
>         >
>         > I didn't realize the difference can be so large. That would make
>         > streamlining some code worth the effort.
>         >
>         > Josef
>         >
>         >
>         >>
>         >> The problem is that inplace operations are a lot less
>         readable so they
>         >> are often only used in well optimized code. But due to pythons
>         >> refcounting semantics we can actually do some inplace conversions
>         >> transparently.
>         >> If an operand in python has a reference count of one it must be a
>         >> temporary so we can use it as the destination array. CPython
>         itself does
>         >> this optimization for string concatenations.
>         >>
>         >> In numpy we have the issue that we can be called from the
>         C-API directly
>         >> where the reference count may be one for other reasons.
>         >> To solve this we can check the backtrace until the python frame
>         >> evaluation function. If there are only numpy and python
>         functions in
>         >> between that and our entry point we should be able to elide
>         the temporary.
>         >>
>         >> This PR implements this:
>         >> https://github.com/numpy/numpy/pull/7997
>         <https://github.com/numpy/numpy/pull/7997>
>         >>
>         >> It currently only supports Linux with glibc (which has reliable
>         >> backtraces via unwinding) and maybe MacOS depending on how
>         good their
>         >> backtrace is. On windows the backtrace APIs are different and
>         I don't
>         >> know them but in theory it could also be done there.
>         >>
>         >> A problem is that checking the backtrace is quite expensive,
>         so should
>         >> only be enabled when the involved arrays are large enough for
>         it to be
>         >> worthwhile. In my testing this seems to be around 180-300KiB
>         sized
>         >> arrays, basically where they start spilling out of the CPU L2
>         cache.
>         >>
>         >> I made a little crappy benchmark script to test this cutoff
>         in this branch:
>         >> https://github.com/juliantaylor/numpy/tree/elide-bench
>         <https://github.com/juliantaylor/numpy/tree/elide-bench>
>         >>
>         >> If you are interested you can run it with:
>         >> python setup.py build_ext -j 4 --inplace
>         >> ipython --profile=null check.ipy
>         >>
>         >> At the end it will plot the ratio between elided and
>         non-elided runtime.
>         >> It should get larger than one around 180KiB on most cpus.
>         >>
>         >> If no one points out some flaw in the approach, I'm hoping to
>         get this
>         >> into the next numpy version.
>         >>
>         >> cheers,
>         >> Julian
>         >>
>         >>
>         >> _______________________________________________
>         >> NumPy-Discussion mailing list
>         >> NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>         >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>         <https://mail.scipy.org/mailman/listinfo/numpy-discussion>
>         >>
>         > _______________________________________________
>         > NumPy-Discussion mailing list
>         > NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>         > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>         <https://mail.scipy.org/mailman/listinfo/numpy-discussion>
>         >
> 
> 
> 
>         _______________________________________________
>         NumPy-Discussion mailing list
>         NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>         https://mail.scipy.org/mailman/listinfo/numpy-discussion
>         <https://mail.scipy.org/mailman/listinfo/numpy-discussion>
> 
> 
> 
> 
>     -- 
> 
>     Christopher Barker, Ph.D.
>     Oceanographer
> 
>     Emergency Response Division
>     NOAA/NOS/OR&R            (206) 526-6959 <tel:%28206%29%20526-6959>  
>     voice
>     7600 Sand Point Way NE   (206) 526-6329 <tel:%28206%29%20526-6329>   fax
>     Seattle, WA  98115       (206) 526-6317 <tel:%28206%29%20526-6317>  
>     main reception
> 
>     Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>
> 
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     https://mail.scipy.org/mailman/listinfo/numpy-discussion
>     <https://mail.scipy.org/mailman/listinfo/numpy-discussion>
> 
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20161003/ebdc374b/attachment.sig>


More information about the NumPy-Discussion mailing list