[SciPy-User] maximally sparse subset of points

Fri Jul 29 09:48:05 EDT 2011

Gustavo,
  I'd go with KDTree, fast and easy in 4d;
if you build a tree with big leaves, leafsize ~ N/M,
then take the midpoints of each leaf, that should do ?

To walk the leaves, add the below to .../scipy/spatial/kdtree.py
(or .pyx, but building the tree is not much slower in pure python).

cheers
  -- denis

    def forleaves( self, func, *args, **kwargs ):
        """ call func( data ) for each leaf, e.g.
                leafmid = []
                def leaffunc( data, leafmid=leafmid ):
                    leafmid.append( data.mean(axis=0 ))
        """
        q = [self.tree]
        while q:
            node = heappop(q)
            if isinstance( node, KDTree.leafnode ):
                data = self.data[node.idx]
                func( data, *args, **kwargs )  # test-leaves.py
            else:
                heappush( q, node.less )
                heappush( q, node.greater )

On Jul 27, 11:45 pm, Gustavo Goretkin <gustavo.goret... at gmail.com>
wrote:
> I have a dataset of N points (in 4 dimensions) and I'd like to select a
> smaller subset, size M, of those points that are maximally spread out. An
> approximation is fine. Other than the K-d tree, is there anything in SciPy