[Numpy-discussion] Proposal: scipy.spatial

David Bolme bolme1234 at comcast.net
Thu Oct 9 16:28:34 EDT 2008


I have written up basic nearest neighbor algorithm.  It does a brute  
force search so it will be slower than kdtrees as the number of points  
gets large.  It should however work well for high dimensional data.  I  
have also added the option for user defined distance measures.  The  
user can set a default "p".  "p" has the same functionality if it is a  
float.  "p" can also be a function that computes a distance matrix or  
the measure can be selected using the strings: "Manhattan",  
"Euclidean", or "Correlation".

https://pyvision.svn.sourceforge.net/svnroot/pyvision/trunk/src/pyvision/vector/knn.py

The interface is similar to Anne's code and in many cases can be used  
as a drop in replacement.  I have posted the code to my own project  
because I have a short term need and I do not have access to the scipy  
repository.  Feel free to include the code with scipy under the scipy  
license.

I did find a typo your documentation.
typo "trie -> tree" - ... kd-tree is a binary trie, each of whose ...

Also I found the use of k in the documentation some what confusing as  
it is the dimensionality of the data points in the kd-tree and the  
number of neighbors for k-nearest neighbors.

>> I believe that with mean subtracted and unit length vectors, a
>> Euclidean knn algorithm will produces the same result as if the
>> vectors were compared using correlation.  I am not sure if kd-trees
>> will perform well on the normalized vectors as they have a very
>> specific geometry.  If my math checks out it may be worth adding
>> Pearson's correlation as a default option or as a separate class.
>
> Actually it's probably easier if the user just does the  
> prenormalization.

I agree.  I think we should keep your class as-is and maybe create a  
class that wraps the kdtree and handles the normalization for  
correlation.  I would also like to look at cover trees, however that  
will have to wait a couple months until I have more time.

Dave
---
http://www.cs.colostate.edu/~bolme



More information about the NumPy-Discussion mailing list