[Numpy-discussion] ufunc for sum of squared difference
Matthew Harrigan
harrigan.matthew at gmail.com
Fri Nov 11 11:25:58 EST 2016
I started a ufunc to compute the sum of square differences here
<https://gist.github.com/mattharrigan/6f678b3d6df5efd236fc23bfb59fd3bd>.
It is about 4x faster and uses half the memory compared to
np.sum(np.square(x-c)). This could significantly speed up common
operations like std and var (where c=np.mean(x). It faster because its a
single pass instead of 3, and also because the inner loop is specialized
for the common reduce case, which might be an optimization considering more
generally.
I think I have answered my question #2&3 above.
Can someone please point me to an example where "data" was used in a ufunc
inner loop? How can that value be set at runtime? Thanks
On Fri, Nov 4, 2016 at 5:33 PM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:
> On Fr, 2016-11-04 at 15:42 -0400, Matthew Harrigan wrote:
> > I didn't notice identity before. Seems like frompyfunc always sets
> > it to None. If it were zero maybe it would work as desired here.
> >
> > In the writing your own ufunc doc, I was wondering if the pointer to
> > data could be used to get a constant at runtime. If not, what could
> > that be used for?
> > static void double_logit(char **args, npy_intp *dimensions,
> > npy_intp* steps, void* data)
> > Why would the numerical accuracy be any different? The subtraction
> > and square operations look identical and I thought np.sum just calls
> > np.add.reduce, so the reduction step uses the same code and would
> > therefore have the same accuracy.
> >
>
> Sorry, did not read it carefully, I guess `c` is the mean, so you are
> doing the two pass method.
>
> - Sebastian
>
>
> > Thanks
> >
> > On Fri, Nov 4, 2016 at 1:56 PM, Sebastian Berg <sebastian at sipsolution
> > s.net> wrote:
> > > On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> > > > I was reading this and got thinking about if a ufunc could
> > > compute
> > > > the sum of squared differences in a single pass without a
> > > temporary
> > > > array. The python code below demonstrates a possible approach.
> > > >
> > > > import numpy as np
> > > > x = np.arange(10)
> > > > c = 1.0
> > > > def add_square_diff(x1, x2):
> > > > return x1 + (x2-c)**2
> > > > ufunc = np.frompyfunc(add_square_diff, 2, 1)
> > > > print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> > > > print(np.sum(np.square(x-c)))
> > > >
> > > > I have (at least) 4 questions:
> > > > 1. Is it possible to pass run time constants to a ufunc written
> > > in C
> > > > for use in its inner loop, and if so how?
> > >
> > > I don't think its anticipated, since a ufunc could in most cases
> > > use a
> > > third argument, but a 3 arg ufunc can't be reduced. Not sure if
> > > there
> > > might be some trickery possible.
> > >
> > > > 2. Is it possible to pass an initial value to reduce to avoid the
> > > > clean up required for the first element?
> > >
> > > This is the identity normally. But the identity can only be 0, 1 or
> > > -1
> > > right now I think. The identity is what the output array gets
> > > initialized with (which effectively makes it the first value passed
> > > into the inner loop).
> > >
> > > > 3. Does that ufunc work, or are there special cases which cause
> > > it to
> > > > fall apart?
> > > > 4. Would a very specialized ufunc such as this be considered for
> > > > incorporating in numpy since it would help reduce time and memory
> > > of
> > > > functions already in numpy?
> > > >
> > >
> > > Might be mixing up things, however, IIRC the single pass approach
> > > has a
> > > bad numerical accuracy, so that I doubt that it is a good default
> > > algorithm.
> > >
> > > - Sebastian
> > >
> > >
> > > > Thank you,
> > > > Matt
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at scipy.org
> > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20161111/338b4178/attachment.html>
More information about the NumPy-Discussion
mailing list