[Numpy-discussion] Ready for review: PyArrayNeighIterObject, an iterator to iterate over a neighborhood in arbitrary arrays
cournape at gmail.com
Sun Jun 14 03:59:09 EDT 2009
On Sun, Jun 14, 2009 at 3:51 AM, Charles R
Harris<charlesr.harris at gmail.com> wrote:
> On Sat, Jun 13, 2009 at 12:35 PM, David Cournapeau <cournape at gmail.com>
>> On Sun, Jun 14, 2009 at 3:22 AM, Charles R
>> Harris<charlesr.harris at gmail.com> wrote:
>> > 1) Since reference counting is such a pain, you should document that the
>> > constructor returns a new reference and that the PyArrayIterObject does
>> > not
>> > need to have its reference count incremented before the call and that
>> > the
>> > reference count is unchanged on failure.
>> > 2) Why are _update_coord_iter(c) and _inc_set_ptr(c) macros? Why are
>> > they
>> > defined inside functions? If left as macros, they should be in CAPS, but
>> > why
>> > not just write them out?
>> They are macro because they are reused in the 2d specialized functions
>> (I will add 3d too)
> IIRC, inline doesn't recurse, so there is some advantage to having these as
> macros. But I really dislike seeing macros defined inside of functions,
> especially when they aren't exclusive to that function.
Well, they are kind of exclusive to this function - and the special 2d
case; they are not supposed to be used by themselves (which is why
they are undefined right away). But I changed this anyway to ALL CAPS
and defined outside.
> IOW, use macros judiciously.
It may not be obvious, because it looks really simple, but the code is
heavily optimized. I spent several hours to find an implementation
which works as it does now. The macro is used for a reason :)
> That's what I wanted to hear. But in c++ is generally best for simplicity
> and debugging to start out not using inlines, then add them is benchmarks
> show a decent advantage. And in those cases it is best if the inlines are
> just a few lines long.
That's how I did it :) - and the inlines are short I think. To be more
concrete about the numbers, the new correlate which uses the iterator
gives me (first shape for x, second for y, compute correlate(x, y),
time of one iteration on average for timeit):
# Using inline + unrolling (unrolling not implemented for 3d arrays)
100x100 with 20x20 0.0521752119064
300x300 with 10x10 0.0856973171234
9x9x9 with 8x8x8 0.0261607170105
# No explicit inline or unrolling
100x100 with 20x20 0.0859595060349
300x300 with 10x10 0.154387807846
9x9x9 with 8x8x8 0.0328009128571
This is on RHEL5, with a relatively old compiler (gcc 4.1). That's the
kind of code where optimization flags/compiler/os matters a lot
(differences are over 100 % on a same machine with different
combinations of OS/Compilers). As a comparison, the inline+unrolling
gives same speed as the old code for correlate, which assumed
contiguity and manipulated indexes directly.
More information about the NumPy-Discussion