Fwd: [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)
[Manual PR notification]
 Forwarded message 
From: timcera
Date: Sat, Jun 9, 2012 at 10:13 PM
Subject: [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)
To: njsmith
This PR submitted a few months ago adds a substantial new API to numpy, so
it'd be great to get more review. Noone's replied yet, though...
Any thoughts, anyone? Is it useful, could it be better...?
n
On 9 Jun 2012 22:47, "Nathaniel Smith"
[Manual PR notification]
 Forwarded message  From: timcera Date: Sat, Jun 9, 2012 at 10:13 PM Subject: [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303) To: njsmith
Each element is assigned the result of a function based on it's neighbors. Neighbors are selected based on a weight array.
It uses the new pad routines to pad arrays if neighboring values are required that would be off the edge of the input array.
Will be great to have the masked array settled because right now you can only sort of exclude from the neighborhood using a zero in the weight array. Zero or np.IGNORE don't affect np.sum, but functions like np.mean and np.std would give different answers. Because of this my early implementations of neighbor included an optional mask array along with the weight array, but I decided would be best to wait for the new masked arrays.
This in some ways could be considered a generalization of a convolution, and comparison with existing numpy/scipy convolution results are included in the tests. The advantage to neighbor is that any function that accepts a 1d array, and returns a single result, can be used instead of convolution only using summation. The convolution functions require the weight array to be flipped to get the same answer as neighbor.
You can merge this Pull Request by running:
git pull https://github.com/timcera/numpy neighbor
Or you can view, comment on it, or merge it online at:
https://github.com/numpy/numpy/pull/303
 Commit Summary 
* ENH: Initial implementation of a 'neighbor' calculation where the each
 File Changes 
M numpy/lib/__init__.py (2) A numpy/lib/neighbor.py (305) A numpy/lib/tests/test_neighbor.py (278)
 Patch Links 
https://github.com/numpy/numpy/pull/303.patch https://github.com/numpy/numpy/pull/303.diff
 Reply to this email directly or view it on GitHub: https://github.com/numpy/numpy/pull/303
On Wed, Oct 10, 2012 at 1:58 AM, Travis E. Oliphant < notifications@github.com> wrote:
I'm not sure what to make of no comments on this PR. This seems like a useful addition. @timcera https://github.com/timcera are you still interested in having this PR merged?
Yes.
I mentioned in PR comments that the lack of discussion is because my code
engenders speechless awe in anyone who looks at it. :) Of course
speechless awe can come from two different reasons! Hopefully it is
because my code is so awesome.
Seriously, I really wanted some input, especially after I found
#31https://github.com/numpy/numpy/issues/31
.
On Wed, Oct 10, 2012 at 7:24 AM, Eric Moore
This seems to be trying to solve a very similar problem to #31https://github.com/numpy/numpy/issues/31
Internally I implemented something like rolling window, but I don't return the windows. Instead the contents of the windows are used for calculation of each windows 'central' cell in the results array. After seeing the rolling window function I thought it might be nice to bring that out into a callable function, so that similar functionality would be available. That particular function isn't useful to me directly, but perhaps others? Kindest regards, Tim
I missed the original post but I personally find this addition especially useful for my work in computational neuroscience. I did something vaguely similar in a small framework (http://dana.loria.fr/, you can look more specifically at http://dana.loria.fr/doc/connection.html for details). Examples are available from: http://dana.loria.fr/examples.html The actual computation can be made in several ways depending on the properties of the kernel but the idea is to compute an array "K" such that given an array "A" and a kernel "k", A*K holds the expected result. This also work with sparse array for example when the kernel is very small. I suspect the PR will be quite efficient compared to what I did. Nicolas On Oct 10, 2012, at 18:55 , Cera, Tim wrote:
On Wed, Oct 10, 2012 at 1:58 AM, Travis E. Oliphant
wrote: I'm not sure what to make of no comments on this PR. This seems like a useful addition. @timcera are you still interested in having this PR merged? Yes.
I mentioned in PR comments that the lack of discussion is because my code engenders speechless awe in anyone who looks at it. :) Of course speechless awe can come from two different reasons! Hopefully it is because my code is so awesome.
Seriously, I really wanted some input, especially after I found #31.
On Wed, Oct 10, 2012 at 7:24 AM, Eric Moore
wrote: This seems to be trying to solve a very similar problem to #31 Internally I implemented something like rolling window, but I don't return the windows. Instead the contents of the windows are used for calculation of each windows 'central' cell in the results array.
After seeing the rolling window function I thought it might be nice to bring that out into a callable function, so that similar functionality would be available. That particular function isn't useful to me directly, but perhaps others?
Kindest regards, Tim _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Wed, 20121010 at 12:55 0400, Cera, Tim wrote:
On Wed, Oct 10, 2012 at 1:58 AM, Travis E. Oliphant
wrote: I'm not sure what to make of no comments on this PR. This seems like a useful addition. @timcera are you still interested in having this PR merged?
<snip>
Internally I implemented something like rolling window, but I don't return the windows. Instead the contents of the windows are used for calculation of each windows 'central' cell in the results array.
Rolling window can help with everything but the I guess typical case of neighbor(..., mode='same', pad=None), where the idea must fail since not all neighborhoods are the same size. It is probably quite a bit faster to replace the shapeintersect with it in those cases, but not sure if it is worth it. What I personally do not quite like is that for the case of the function being a ufunc, it not being (..., mode='same', pad=None) and weights=np.ones(...), the rolling window approach will be much faster as it can be fully vectorized with the new numpy versions. But thats just me and the documentation would have a "see also", so I guess users should be able to figure that out if they need the speed. Plus I don't see an elegant way to find the neighborhoods for (mode='same', pad=None) so having a function to help with it should be nice. On a general note, maybe it would make sense to allow a binary mask instead of the np.isfinite check and would it be better if the returned arrays are not raveled unless there is this mask? Also there is a ** missing for the kwargs in the function call.
After seeing the rolling window function I thought it might be nice to bring that out into a callable function, so that similar functionality would be available. That particular function isn't useful to me directly, but perhaps others?
Kindest regards, Tim _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Thu, Oct 11, 2012 at 10:50 AM, Nicolas Rougier
I missed the original post but I personally find this addition especially useful for my work in computational neuroscience.
I did something vaguely similar in a small framework (http://dana.loria.fr/, you can look more specifically at http://dana.loria.fr/doc/connection.html for details). Examples are available from: http://dana.loria.fr/examples.html
The actual computation can be made in several ways depending on the properties of the kernel but the idea is to compute an array "K" such that given an array "A" and a kernel "k", A*K holds the expected result. This also work with sparse array for example when the kernel is very small. I suspect the PR will be quite efficient compared to what I did.
Would the current PR be useful to you if merged asis? A common pitfall with these sorts of contributions is that we realize only after merging it that there is some tiny detail of the API that makes it notquiteusable for some people with related problems, so it'd be awesome if you could take a closer look. n
On 10.10.2012 15:42, Nathaniel Smith wrote:
This PR submitted a few months ago adds a substantial new API to numpy, so it'd be great to get more review. Noone's replied yet, though...
Any thoughts, anyone? Is it useful, could it be better...?
Fast neighbor search is what scipy.spatial.cKDTree is designed for. There is an brand new version in SciPy master. Sturla
For the neighbor module, the neighborhood is input specified by the
'weight' array. All values in the neighborhood are processed by a function.
In the geosciences, ArcGIS is a very important tool and the neighbor module
is very loosely modeled after
http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=An%20overview%...
If the language is confusing, now is the time to change the names.
Kindest regards,
Tim
On Fri, Oct 12, 2012 at 8:33 AM, Sturla Molden
On 10.10.2012 15:42, Nathaniel Smith wrote:
This PR submitted a few months ago adds a substantial new API to numpy, so it'd be great to get more review. Noone's replied yet, though...
Any thoughts, anyone? Is it useful, could it be better...?
Fast neighbor search is what scipy.spatial.cKDTree is designed for. There is an brand new version in SciPy master.
Sturla _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
If you followed the link http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=An%20overview%... note that the current neighborhood implementation that we are talking about implements the ArcGIS 'Focal*' functionality, not the 'Block*' ones. Note also that ArcGIS is limited to 2d, and a 3x3 neighborhood. Kindest regards, Tim
Sorry, I'm away from the lab and did not have a chance to test is yet. I will do next week. Nicolas On Oct 11, 2012, at 15:48 , Nathaniel Smith wrote:
On Thu, Oct 11, 2012 at 10:50 AM, Nicolas Rougier
wrote: I missed the original post but I personally find this addition especially useful for my work in computational neuroscience.
I did something vaguely similar in a small framework (http://dana.loria.fr/, you can look more specifically at http://dana.loria.fr/doc/connection.html for details). Examples are available from: http://dana.loria.fr/examples.html
The actual computation can be made in several ways depending on the properties of the kernel but the idea is to compute an array "K" such that given an array "A" and a kernel "k", A*K holds the expected result. This also work with sparse array for example when the kernel is very small. I suspect the PR will be quite efficient compared to what I did.
Would the current PR be useful to you if merged asis? A common pitfall with these sorts of contributions is that we realize only after merging it that there is some tiny detail of the API that makes it notquiteusable for some people with related problems, so it'd be awesome if you could take a closer look.
n _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
I'm still rather sure GIS functionality belongs in scipy.spatial instead of numpy.
Sturla
Sendt fra min iPad
Den 12. okt. 2012 kl. 15:29 skrev "Cera, Tim"
For the neighbor module, the neighborhood is input specified by the 'weight' array. All values in the neighborhood are processed by a function.
In the geosciences, ArcGIS is a very important tool and the neighbor module is very loosely modeled after http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=An%20overview%...
If the language is confusing, now is the time to change the names.
Kindest regards, Tim
On Fri, Oct 12, 2012 at 8:33 AM, Sturla Molden
wrote: On 10.10.2012 15:42, Nathaniel Smith wrote:
This PR submitted a few months ago adds a substantial new API to numpy, so it'd be great to get more review. Noone's replied yet, though...
Any thoughts, anyone? Is it useful, could it be better...?
Fast neighbor search is what scipy.spatial.cKDTree is designed for. There is an brand new version in SciPy master.
Sturla _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Fri, Oct 12, 2012 at 1:04 PM, Sturla Molden
I'm still rather sure GIS functionality belongs in scipy.spatial instead of numpy.
From the link:
""" FocalMax Finds the highest value for each cell location on an input grid within a specified neighborhood and sends it to the corresponding cell location on the output grid. """ Isn't this much more generic than GIS? Maybe the criteria should be: "Would you expect to find extensive discussion on this functionality within a standard text on spatial data structures?" Sliding windows and nice boundary handling seems less "spatial" to me, and much further from Btrees, quadtress, kdtrees, persistent data structures, etc. Can you elaborate your position a bit more? Sorry if I'm being dense.
On 10.10.2012 15:42, Nathaniel Smith wrote:
This PR submitted a few months ago adds a substantial new API to numpy, so it'd be great to get more review. Noone's replied yet, though...
Any thoughts, anyone? Is it useful, could it be better...?
Fast neighbor search is what scipy.spatial.cKDTree is designed for. There is an brand new version in SciPy master.
It's not neighbor *search*, it's applying a function over an (arbitrary chosen and weighted) moving neighborhood in an nd array. It would be useful for the author of the PR to post a detailed comparison of this functionality with scipy.ndimage.generic_filter, which appears to have very similar functionality. Zach
On Sun, Oct 14, 2012 at 8:24 PM, Zachary Pincus
It would be useful for the author of the PR to post a detailed comparison of this functionality with scipy.ndimage.generic_filter, which appears to have very similar functionality.
I'll be durned. I created neighbor because I didn't find what I wanted, and to find now that I just didn't look in the right place is well ... Let's just say that I went for a long run last night. Searching for ndimage, I found that is has been around a long, long time. First in numarray, then moved to scipy. Really I could only nitpick about minor differences  kinda like a primary political campaign. On the face of it though, generic_filter looks better. First off it is written in C so likely will be faster and more efficient memory use. I didn't look at optimizing neighbor at all and at least my part of it is pure Python. Of course for all of the small differences, I like my choices better. :) I would like to make a mild suggestion. Emphasis on mild. Maybe ndimage, in all or in part, should be brought into (back into?) Numpy and renamed. About the PR. Given that the neighbor functionality exists already, I will close the PR later today. Move along, nothing to see here... Side note: I wrote arraypad with the future idea that it would become easyish to include that functionality in other places, for example in neighbor. A Don't Repeat Yourself idea. Previously I had only seen Fortran pad capabilities in some of the Fast Fourier Transform functions. The generic_filter function includes several padding functions  written in C. This means that if arraypad needs be more efficient we have C code to base a better arraypad. Another side node: The measurements functions in ndimage are called zonal functions in the GIS field. Kindest regards, Tim
Hi Tim, It's tricky to find functionality sometimes because as you've seen numpy and especially scipy are spread over very diverse domains with very diverse terminology! Usually someone on one or the other of the lists can help folks find some functionality, if not by name than by description... In any case, though, I hope you'll keep your code around and accessible to people in standalone form. Despite being a bit slower than the ndimage stuff, it seems like you've got a better set of boundary conditions, and some other niceties that may appeal to others too. Zach
On Sun, Oct 14, 2012 at 8:24 PM, Zachary Pincus
wrote: It would be useful for the author of the PR to post a detailed comparison of this functionality with scipy.ndimage.generic_filter, which appears to have very similar functionality.
I'll be durned. I created neighbor because I didn't find what I wanted, and to find now that I just didn't look in the right place is well ... Let's just say that I went for a long run last night.
Searching for ndimage, I found that is has been around a long, long time. First in numarray, then moved to scipy.
Really I could only nitpick about minor differences  kinda like a primary political campaign. On the face of it though, generic_filter looks better. First off it is written in C so likely will be faster and more efficient memory use. I didn't look at optimizing neighbor at all and at least my part of it is pure Python. Of course for all of the small differences, I like my choices better. :)
I would like to make a mild suggestion. Emphasis on mild. Maybe ndimage, in all or in part, should be brought into (back into?) Numpy and renamed.
About the PR. Given that the neighbor functionality exists already, I will close the PR later today. Move along, nothing to see here...
Side note: I wrote arraypad with the future idea that it would become easyish to include that functionality in other places, for example in neighbor. A Don't Repeat Yourself idea. Previously I had only seen Fortran pad capabilities in some of the Fast Fourier Transform functions. The generic_filter function includes several padding functions  written in C. This means that if arraypad needs be more efficient we have C code to base a better arraypad.
Another side node: The measurements functions in ndimage are called zonal functions in the GIS field.
Kindest regards, Tim _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (7)

Cera, Tim

Nathaniel Smith

Nicolas Rougier

Sebastian Berg

Sturla Molden

T J

Zachary Pincus