[SciPy-user] Scipy for Cluster Detection

Thu Dec 27 01:36:53 EST 2007

On 27-Dec-07, at 1:24 AM, David Warde-Farley wrote:
>
> This doesn't sound like what would typically be considered a

This should read, "This doesn't sound like what would typically be  
considered a clustering problem." What I meant is, "clustering"  
usually means finding a natural way of grouping the data (either  
hierarchically or non-hierarchically) where you only have some notion  
of similarity or distance to work with.
]
In your problem, your similarity metric is essentially binary ("are  
you less than d away from each other", yes or no) and so it reduces to  
a rather graph-theoretic problem.

That said, if your number of particles is particularly large, Nathan's  
suggestion of a kd-tree solution sounds like it's on the right track.

"Locality sensitive hashing" comes to mind, which is basically a fast  
way of finding the nearest neighbour of a given point -- if you knew  
this then you'd be able to tell whether any point at all was within d,  
since if the nearest neighbour isn't then none of them are.

Take a look at http://web.mit.edu/andoni/www/LSH/index.html

Cheers,

David