[Numpy-discussion] Ready for review: PyArrayNeighIterObject, an iterator to iterate over a neighborhood in arbitrary arrays

Sat Jun 13 14:51:06 EDT 2009

On Sat, Jun 13, 2009 at 12:35 PM, David Cournapeau <cournape at gmail.com>wrote:

> On Sun, Jun 14, 2009 at 3:22 AM, Charles R
> Harris<charlesr.harris at gmail.com> wrote:
>
> > 1) Since reference counting is such a pain, you should document that the
> > constructor returns a new reference and that the PyArrayIterObject does
> not
> > need to have its reference count incremented before the call and that the
> > reference count is unchanged on failure.
>
> OK.
>
> > 2) Why are _update_coord_iter(c) and _inc_set_ptr(c) macros? Why are they
> > defined inside functions? If left as macros, they should be in CAPS, but
> why
> > not just write them out?
>
> They are macro because they are reused in the 2d specialized functions
> (I will add 3d too)
>

IIRC, inline doesn't recurse, so there is some advantage to having these as
macros. But I really dislike seeing macros defined inside of functions,
especially when they aren't exclusive to that function. So at least move
them outside. But often it is clearer for code maintainence to simply write
them out, it just takes a few more lines.  IOW, use macros judiciously.

>
> > 3) Is it really worth the hassle to use inline functions? What does it
> buy
> > in terms of speed that justifies the complication?
>
> Which complication are you talking about ? Except NPY_INLINE, I see
> none. In terms of speed, we are talking about several times faster.

That's what I wanted to hear. But in c++ is generally best for simplicity
and debugging to start out not using inlines, then add them is benchmarks
show a decent advantage. And in those cases it is best if the inlines are
just a few lines long.

>
> Think about using if for correlate, for example: you have a NxN image
> with a MxM kernel: PyArrayNeigh_IterNext will be called NxNxMxM
> times... I don't remember the numbers, but it was several times slower
> without the inline with gcc 4.3 on Linux. The 2d optimized functions,
> which just do manual loop unrolling, already buy up to a factor 2x.
>

So what is the tradeoff between just unrolling the loops vs inline
functions?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090613/dea03314/attachment.html>