[SciPy-User] multi-dimensional scaling

Fri May 3 11:33:58 EDT 2013

On 03/05/13 16:20, Nelle Varoquaux wrote:
>
>
>
> On 3 May 2013 17:18, Dan Stowell <dan.stowell at eecs.qmul.ac.uk
> <mailto:dan.stowell at eecs.qmul.ac.uk>> wrote:
>
>     On 03/05/13 13:39, Nelle Varoquaux wrote:
>      >
>      >      > I'm looking in scipy for something to perform
>     multi-dimensional
>      >      > scaling*. I don't see anything - have I missed it? Is it
>     easy to
>      >     make it
>      >      > from scipy components?
>      >      >
>      >      > Thanks
>      >      > Dan
>      >      >
>      >      > * http://en.wikipedia.org/wiki/Multidimensional_scaling
>      >
>      >
>      >     MDS is more a class of approaches than a specific algorithm.
>     If you
>      >     want to do "classic" MDS with euclidian distances as the metric,
>      >     then you would use PCA to implement that:
>      >
>     http://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional
>      >
>      >     And PCA is just a simple eigendecomposition that you can
>     build from
>      >     the basic linear algebra tools in numpy. I'm happy to send
>     over the
>      >     short wrapper code I wrote to do "PCA" on data in a vaguely smart
>      >     way, if you want.
>      >
>      >
>      > There are both the classical MDS (smacof algorithm) and NMDS (non
>      > metric) in scikit-learn (and PCA :) ).
>
>     Thanks both. In my case, for each pair of points I have a list of binary
>     values representing match-or-no-match, so I will use Hamming distance
>     rather than Euclidean. It looks like sklearn.manifold.MDS(metric=False)
>     will do the job for me. Just need to update my installation of sklearn
>     to 0.12+...
>
>     By the way: the example here
>     <http://scikit-learn.org/stable/auto_examples/manifold/plot_mds.html>
>     uses a variable called "similarities", but most of the way through they
>     are really dissimilarities, and then later (AFTER their use in mds)
>     converted to similarities - a touch confusing.
>
>     Also, the documentation for fit_transform() here
>     <http://scikit-learn.sourceforge.net/dev/modules/generated/sklearn.manifold.MDS.html>
>     just uses "X" and "Input data" and doesn't explicitly say whether it
>     expects similarities or dissimilarities. It would really help if the
>     documentation was a little clearer about that. (I think it wants
>     dissimilarities - please correct me if I'm wrong...)
>
>
> I'll try to improve the documentation on the MDS in the near future.
> Thanks for the feedback.

And thanks for the code!

I've written a simple example script, but I must be doing something 
wrong. It simply generates two classes of data points then does NMDS on 
them, but contrary to my expectations it doesn't cluster the two classes 
separately from each other in the solution. If you have any hints I'd be 
grateful.

Thanks
Dan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mds_test.py
Type: text/x-python
Size: 1441 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20130503/464a8b8b/attachment.py>