[scikit-learn] Dimension Reduction - MDS

Alexandre Gramfort alexandre.gramfort at inria.fr
Thu Oct 11 07:12:31 EDT 2018


hi Guillaume,

I cannot use our MDS solver at this scale. Even if you fit it in RAM
it will be slow.

I would play with https://github.com/lmcinnes/umap unless you really
what a classic MDS.

Alex

On Thu, Oct 11, 2018 at 10:31 AM Guillaume Favelier
<Guillaume.Favelier at lip6.fr> wrote:
>
> Hello J.B,
>
> Thank you for your quick reply.
>
> > If you try with a very small (e.g., 100 sample) data file, does your code
> > employing MDS work?
> > As you increase the number of samples, does the script continue to work?
> So I tried the same script while increasing the number of samples (100,
> 1000 and 10000) and it works indeed without swapping on my workstation.
>
> > That is 49,000,000 entries, plus overhead for a data structure.
> I thought that even 49M entries of doubles would be able to be processed
> with 64G of RAM. Is there something to configure to allow this computation?
>
> The typical datasets I use can have around 200-300k rows with a few columns
> (usually up to 3).
>
> Best regards,
>
> Guillaume
>
> Quoting "Brown J.B. via scikit-learn" <scikit-learn at python.org>:
>
> > Hello Guillaume,
> >
> > You are computing a distance matrix of shape 70000x70000 to generate MDS
> > coordinates.
> > That is 49,000,000 entries, plus overhead for a data structure.
> >
> > If you try with a very small (e.g., 100 sample) data file, does your code
> > employing MDS work?
> > As you increase the number of samples, does the script continue to work?
> >
> > Hope this helps you get started.
> > J.B.
> >
> > 2018年10月9日(火) 18:22 Guillaume Favelier <Guillaume.Favelier at lip6.fr>:
> >
> >> Hi everyone,
> >>
> >> I'm trying to use some dimension reduction algorithm [1] on my dataset
> >> [2] in a
> >> python script [3] but for some reason, Python seems to consume a lot of my
> >> main memory and even swap on my configuration [4] so I don't have the
> >> expected result
> >> but a memory error instead.
> >>
> >> I have the impression that this behaviour is not intended so can you
> >> help me know
> >> what I did wrong or miss somewhere please?
> >>
> >> [1]: MDS -
> >> http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html
> >> [2]: dragon.csv - 69827 rows, 3 columns (x,y,z)
> >> [3]: dragon.py - 10 lines
> >> [4]: dragon_swap.png - htop on my workstation
> >>
> >> TAR archive:
> >> https://drive.google.com/open?id=1d1S99XeI7wNEq131wkBUCBrctPQRgpxn
> >>
> >> Best regards,
> >>
> >> Guillaume Favelier
> >>
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> >>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


More information about the scikit-learn mailing list