[scikit-learn] How to determine suitable cluster algo

Matti Viljamaa matti.v.viljamaa at gmail.com
Fri Jan 25 15:31:20 EST 2019


Also,

Remember that some algos may exhibit “sweet spots” w.r.t. computation time and gained accuracy.

So you might want to keep measuring “explained variance”, while you add complexity to your models. And then do plots of model complexity vs explained variance.

E.g. in MLPClassifier you’d plot e.g. hidden layers against explained variance to figure out where adding hidden layers starts to exhibit lesser gain in explained variance.

Lähetetty Windows 10:n Sähköpostista

Lähettäjä: Matti Viljamaa
Lähetetty: Friday, 25 January 2019 13.43
Vastaanottaja: Scikit-learn mailing list
Aihe: VS: [scikit-learn] How to determine suitable cluster algo

For determining what one can afford computaionally, see e.g.:
https://stackoverflow.com/questions/22443041/predicting-how-long-an-scikit-learn-classification-will-take-to-run
https://www.reddit.com/r/scikit_learn/comments/a746h0/is_there_any_way_to_estimate_how_long_a_given/

Lähetetty Windows 10:n Sähköpostista

Lähettäjä: lampahome
Lähetetty: Friday, 25 January 2019 3.42
Vastaanottaja: Scikit-learn mailing list
Aihe: Re: [scikit-learn] How to determine suitable cluster algo

Maybe the suitable way is try-and-error?

What I'm interesting is that my datasets is very huge and I can't try number of cluster from 1 to N if I have N samples
That cost too much time for me.

Maybe I should define the initial number of cluster based on execution time?

Then analyze the next step is increase/decrease the number of cluster?

thx



Virus-free. www.avast.com 



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190125/e0caa2ba/attachment.html>


More information about the scikit-learn mailing list