[Numpy-discussion] automatically avoiding temporary arrays

Hush Hush lxx9xx at gmail.com
Sun Oct 2 20:15:13 EDT 2016


The same idea was published two years ago:

http://hiperfit.dk/pdf/Doubling.pdf


On Mon, Oct 3, 2016 at 8:53 AM, <numpy-discussion-request at scipy.org> wrote:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at scipy.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>    1. Re: automatically avoiding temporary arrays (Benjamin Root)
>    2. Re: Dropping sourceforge for releases. (David Cournapeau)
>    3. Re: Dropping sourceforge for releases. (Vincent Davis)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 2 Oct 2016 09:10:45 -0400
> From: Benjamin Root <ben.v.root at gmail.com>
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Subject: Re: [Numpy-discussion] automatically avoiding temporary
>         arrays
> Message-ID:
>         <CANNq6Fn9eGOnrSGz8Duo8J9oTe3N6xTCSqDDP7nRyNqjFKpAjQ at mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just thinking aloud, an idea I had recently takes a different approach. The
> problem with temporaries isn't so much that they exists, but rather they
> they keep on malloc'ed and cleared. What if numpy kept a small LRU cache of
> weakref'ed temporaries? Whenever a new numpy array is requested, numpy
> could see if there is already one in its cache of matching size and use it.
> If you think about it, expressions that result in many temporaries would
> quite likely have many of them being the same size in memory.
>
> Don't know how feasible it would be to implement though.
>
> Cheers!
> Ben Root
>
>
> On Sat, Oct 1, 2016 at 2:38 PM, Chris Barker <chris.barker at noaa.gov>
> wrote:
>
> > Julian,
> >
> > This is really, really cool!
> >
> > I have been wanting something like this for years (over a decade? wow!),
> > but always thought it would require hacking the interpreter to intercept
> > operations. This is a really inspired idea, and could buy numpy a lot of
> > performance.
> >
> > I'm afraid I can't say much about the implementation details -- but great
> > work!
> >
> > -Chris
> >
> >
> >
> >
> > On Fri, Sep 30, 2016 at 2:50 PM, Julian Taylor <
> > jtaylor.debian at googlemail.com> wrote:
> >
> >> On 30.09.2016 23:09, josef.pktd at gmail.com wrote:
> >> > On Fri, Sep 30, 2016 at 9:38 AM, Julian Taylor
> >> > <jtaylor.debian at googlemail.com> wrote:
> >> >> hi,
> >> >> Temporary arrays generated in expressions are expensive as the imply
> >> >> extra memory bandwidth which is the bottleneck in most numpy
> >> operations.
> >> >> For example:
> >> >>
> >> >> r = a + b + c
> >> >>
> >> >> creates the b + c temporary and then adds a to it.
> >> >> This can be rewritten to be more efficient using inplace operations:
> >> >>
> >> >> r = b + c
> >> >> r += a
> >> >
> >> > general question (I wouldn't understand the details even if I looked.)
> >> >
> >> > how is this affected by broadcasting and type promotion?
> >> >
> >> > Some of the main reasons that I don't like to use inplace operation in
> >> > general is that I'm often not sure when type promotion occurs and when
> >> > arrays expand during broadcasting.
> >> >
> >> > for example b + c is 1-D, a is 2-D, and r has the broadcasted shape.
> >> > another case when I switch away from broadcasting is when b + c is int
> >> > or bool and a is float. Thankfully, we get error messages for casting
> >> > now.
> >>
> >> the temporary is only avoided when the casting follows the safe rule, so
> >> it should be the same as what you get without inplace operations. E.g.
> >> float32-temporary + float64 will not be converted to the unsafe float32
> >> += float64 which a normal inplace operations would allow. But
> >> float64-temp + float32 is transformed.
> >>
> >> Currently the only broadcasting that will be transformed is temporary +
> >> scalar value, otherwise it will only work on matching array sizes.
> >> Though there is not really anything that prevents full broadcasting but
> >> its not implemented yet in the PR.
> >>
> >> >
> >> >>
> >> >> This saves some memory bandwidth and can speedup the operation by 50%
> >> >> for very large arrays or even more if the inplace operation allows it
> >> to
> >> >> be completed completely in the cpu cache.
> >> >
> >> > I didn't realize the difference can be so large. That would make
> >> > streamlining some code worth the effort.
> >> >
> >> > Josef
> >> >
> >> >
> >> >>
> >> >> The problem is that inplace operations are a lot less readable so
> they
> >> >> are often only used in well optimized code. But due to pythons
> >> >> refcounting semantics we can actually do some inplace conversions
> >> >> transparently.
> >> >> If an operand in python has a reference count of one it must be a
> >> >> temporary so we can use it as the destination array. CPython itself
> >> does
> >> >> this optimization for string concatenations.
> >> >>
> >> >> In numpy we have the issue that we can be called from the C-API
> >> directly
> >> >> where the reference count may be one for other reasons.
> >> >> To solve this we can check the backtrace until the python frame
> >> >> evaluation function. If there are only numpy and python functions in
> >> >> between that and our entry point we should be able to elide the
> >> temporary.
> >> >>
> >> >> This PR implements this:
> >> >> https://github.com/numpy/numpy/pull/7997
> >> >>
> >> >> It currently only supports Linux with glibc (which has reliable
> >> >> backtraces via unwinding) and maybe MacOS depending on how good their
> >> >> backtrace is. On windows the backtrace APIs are different and I don't
> >> >> know them but in theory it could also be done there.
> >> >>
> >> >> A problem is that checking the backtrace is quite expensive, so
> should
> >> >> only be enabled when the involved arrays are large enough for it to
> be
> >> >> worthwhile. In my testing this seems to be around 180-300KiB sized
> >> >> arrays, basically where they start spilling out of the CPU L2 cache.
> >> >>
> >> >> I made a little crappy benchmark script to test this cutoff in this
> >> branch:
> >> >> https://github.com/juliantaylor/numpy/tree/elide-bench
> >> >>
> >> >> If you are interested you can run it with:
> >> >> python setup.py build_ext -j 4 --inplace
> >> >> ipython --profile=null check.ipy
> >> >>
> >> >> At the end it will plot the ratio between elided and non-elided
> >> runtime.
> >> >> It should get larger than one around 180KiB on most cpus.
> >> >>
> >> >> If no one points out some flaw in the approach, I'm hoping to get
> this
> >> >> into the next numpy version.
> >> >>
> >> >> cheers,
> >> >> Julian
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> NumPy-Discussion at scipy.org
> >> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >>
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion at scipy.org
> >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >
> >>
> >>
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR&R            (206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115       (206) 526-6317   main reception
> >
> > Chris.Barker at noaa.gov
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://mail.scipy.org/pipermail/numpy-discussion/
> attachments/20161002/2e65258f/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Sun, 2 Oct 2016 22:26:28 +0100
> From: David Cournapeau <cournape at gmail.com>
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Subject: Re: [Numpy-discussion] Dropping sourceforge for releases.
> Message-ID:
>         <CAGY4rcUzVtnbhbJ542Vjrx3T8-ffO-jhiw06YrMqhk3ezae6XA at mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> +1 from me.
>
> If we really need some distribution on top of github/pypi, note that
> bintray (https://bintray.com/) is free for OSS projects, and is a much
> better experience than sourceforge.
>
> David
>
> On Sun, Oct 2, 2016 at 12:02 AM, Charles R Harris <
> charlesr.harris at gmail.com
> > wrote:
>
> > Hi All,
> >
> > Ralf has suggested dropping sourceforge as a NumPy release site. There
> was
> > discussion of doing that some time back but we have not yet done it. Now
> > that we put wheels up on PyPI for all supported architectures source
> forge
> > is not needed. I note that there are still some 15,000 downloads a week
> > from the site, so it is still used.
> >
> > Thoughts?
> >
> > Chuck
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://mail.scipy.org/pipermail/numpy-discussion/
> attachments/20161002/4e462a48/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Sun, 2 Oct 2016 17:53:32 -0600
> From: Vincent Davis <vincent at vincentdavis.net>
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Subject: Re: [Numpy-discussion] Dropping sourceforge for releases.
> Message-ID:
>         <CALyJZZX=KfKrsOh2QHZahwxW_p0sYvFCPpBVFXU0RNt7s8J4XQ@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> +1, I am very skeptical of anything on SourceForge, it negatively impacts
> my opinion of any project that requires me to download from sourceforge.
>
> On Saturday, October 1, 2016, Charles R Harris <charlesr.harris at gmail.com>
> wrote:
>
> > Hi All,
> >
> > Ralf has suggested dropping sourceforge as a NumPy release site. There
> was
> > discussion of doing that some time back but we have not yet done it. Now
> > that we put wheels up on PyPI for all supported architectures source
> forge
> > is not needed. I note that there are still some 15,000 downloads a week
> > from the site, so it is still used.
> >
> > Thoughts?
> >
> > Chuck
> >
>
>
> --
> Sent from mobile app.
> Vincent Davis
> 720-301-3003
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://mail.scipy.org/pipermail/numpy-discussion/
> attachments/20161002/eb4cbff3/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------
>
> End of NumPy-Discussion Digest, Vol 121, Issue 3
> ************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20161003/241cd52c/attachment.html>


More information about the NumPy-Discussion mailing list