Mailman 3 maximally sparse subset of points - SciPy-User

maximally sparse subset of points

Gustavo Goretkin

27 Jul 2011 27 Jul '11

2:45 p.m.

I have a dataset of N points (in 4 dimensions) and I'd like to select a smaller subset, size M, of those points that are maximally spread out. An approximation is fine. Other than the K-d tree, is there anything in SciPy or other Python module to help accomplish this? Thanks

Attachments:

attachment.htm (text/html — 293 bytes)

Show replies by date

Gael Varoquaux

28 Jul 28 Jul

6:10 a.m.

On Wed, Jul 27, 2011 at 05:45:20PM -0400, Gustavo Goretkin wrote:

...

I have a dataset of N points (in 4 dimensions) and I'd like to select a smaller subset, size M, of those points that are maximally spread out.

The problem that you are trying to solve is close to the k-medoids problem. I don't know of Python modules implementing a k-medoids. Alternatively, the k_init function used to initialize the k-means in the scikits.learn [1] might be a useful approximation. It's a pretty brutal approximation, and it might not work for you, but it should be fast. Gaël [1] https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/clust...

denis

29 Jul 29 Jul

6:48 a.m.

Gustavo, I'd go with KDTree, fast and easy in 4d; if you build a tree with big leaves, leafsize ~ N/M, then take the midpoints of each leaf, that should do ? To walk the leaves, add the below to .../scipy/spatial/kdtree.py (or .pyx, but building the tree is not much slower in pure python). cheers -- denis def forleaves( self, func, *args, **kwargs ): """ call func( data ) for each leaf, e.g. leafmid = [] def leaffunc( data, leafmid=leafmid ): leafmid.append( data.mean(axis=0 )) """ q = [self.tree] while q: node = heappop(q) if isinstance( node, KDTree.leafnode ): data = self.data[node.idx] func( data, *args, **kwargs ) # test-leaves.py else: heappush( q, node.less ) heappush( q, node.greater ) On Jul 27, 11:45 pm, Gustavo Goretkin <gustavo.goret...@gmail.com> wrote:

...

I have a dataset of N points (in 4 dimensions) and I'd like to select a smaller subset, size M, of those points that are maximally spread out. An approximation is fine. Other than the K-d tree, is there anything in SciPy

4680

Age (days ago)

4682

Last active (days ago)

List overview

Download

2 comments

3 participants

participants (3)

denis
Gael Varoquaux
Gustavo Goretkin

maximally sparse subset of points

Gustavo Goretkin

Gael Varoquaux

denis

tags

participants (3)