[scikit-learn] Elbow method module for estimating of optimal clustering

Maiia Bakhova myabakhova at gmail.com
Wed Apr 14 16:06:36 EDT 2021


Hello,
A while ago I made a numeric computation of Elbow method to estimate an
optimal number of clusters created by K-means and presented it in
scikit-learn digest as a function. As a result
I got a lot of clones from my github repo, and was encouraged to work
further on it. I got
feedback from people with whom I discussed it. Now it is in the format of
scikit-learn Estimator
and has bootstrapping to verify if the chosen number is more or less
reliable and not too random.
It returns a suggested optimal number and a dictionary of all calculated
suggestions with corresponding frequencies.
Here is the method:
https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means/blob/master/EstimatedClusterNumberWithWCSS.py
Here is an example of its application in a jupiter notebook:
https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means/blob/master/A%20scikit-learn%20compatible%20method%20with%20WCSS%20metric.ipynb
I got a number of other suggestions, like incorporating other metrics and
methods.
I have seen a discussion about it with some pullrequests on scikit-learn
github, but it does not appear to be finished. As I understand a lot people
would like to have something now,
so I offer my work.
Please do not hesitate with questions or suggestions,
Mya

-- 
Maiia Bakhova
Mathematician in Data Science
http://myabakhova.blogspot.com
https://www.linkedin.com/in/myabakhova
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210414/7fa27226/attachment.html>


More information about the scikit-learn mailing list