[scikit-learn] Drawing contours in KMeans

Brown J.B. jbbrown at kuhp.kyoto-u.ac.jp
Wed Dec 9 15:40:15 EST 2020


Dear Mahmood,

Andrew's solution with a circle will guarantee you render an image in which
every point is covered within some circle.

However, if data contains outliers or artifacts, you might get circles
which are excessively large and distort the image you want.
For example, imagine if there were a single red point in Andrew's image at
the coordinate (3,10); then, the resulting circle would cover all points in
the entire plot, which is unlikely what you want.
You could potentially generate a density estimate for each class and then
have matplotlib render the contour lines (e.g., solutions of where
estimates have a specific value), but as was said, this is not the job of
Kmeans, but rather of general data analysis.

The ellipsoid solution proposed to you is, in a sense, a middle ground
between these two solutions (the circles and the density plots).
You could adjust the (4 or 5) parameters of an ellipsoid to cover "most" of
the points for a particular class and tolerate that the ellipsoids don't
cover a few outliers or artifacts (e.g., the coordinate (3,10) I mentioned
above).
The resulting functional forms of the ellipses might be more precise than
circles and less complex than density contours, and might lead to
actionable knowledge depending on your context/domain.

Hope this helps.
J.B. Brown

2020年12月9日(水) 21:08 Mahmood Naderan <mahmood.nt at gmail.com>:

> >Mebbe principal components analysis would suggest an
> >ellipsoid containing "most" of the points in a "cloud".
>
> Sorry I didn't understand. Can you explain more?
> Regards,
> Mahmood
>
>
>
>
> On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn <
> scikit-learn at python.org> wrote:
>
>> [scikit-learn] Drawing contours in KMeans4
>>
>>
>> Mebbe principal components analysis would suggest an ellipsoid containing
>> "most" of the points in a "cloud".
>>
>>
>>
>>
>> "You won't find the right answers if you don't ask the right questions!"
>> (Robert Helmbold, 2013)
>>
>>
>> On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe <
>> ahowe42 at gmail.com> wrote:
>>
>>
>> Ok, I see. Well the attached notebook demonstrates doing this by simply
>> finding the maximum distance from each centroid to it's datapoints and
>> drawing a circle using that radius. It's simple, but will hopefully at
>> least point you in a useful direction.
>> [image: image.png]
>> Andrew
>>
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>> J. Andrew Howe, PhD
>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>> Open Researcher and Contributor ID (ORCID)
>> <http://orcid.org/0000-0002-3553-1990>
>> Github Profile <http://github.com/ahowe42>
>> Personal Website <http://www.andrewhowe.com>
>> I live to learn, so I can learn to live. - me
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>
>>
>> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan <mahmood.nt at gmail.com>
>> wrote:
>>
>> I mean a circle/contour to group the points in a cluster for better
>> representation.
>> For example, if there are 6 six clusters, it will be more meaningful to
>> group large data points in a circle or contour.
>>
>> Regards,
>> Mahmood
>>
>>
>>
>>
>> On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe <ahowe42 at gmail.com> wrote:
>>
>> Contours generally indicate a third variable - often a probability
>> density. Kmeans doesn't provide density estimates, so what precisely would
>> you want the contours to represent?
>>
>> Andrew
>>
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>> J. Andrew Howe, PhD
>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>> Open Researcher and Contributor ID (ORCID)
>> <http://orcid.org/0000-0002-3553-1990>
>> Github Profile <http://github.com/ahowe42>
>> Personal Website <http://www.andrewhowe.com>
>> I live to learn, so I can learn to live. - me
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>
>>
>> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan <mahmood.nt at gmail.com>
>> wrote:
>>
>> Hi
>> I use the following code to highlight the cluster centers with some red
>> dots.
>>
>> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10,
>> random_state=0)
>> pred_y = kmeans.fit_predict(a)
>> plt.scatter(a[:,0], a[:,1])
>> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
>> s=100, c='red')
>> plt.show()
>>
>> I would like to know if it is possible to draw contours over the
>> clusters. Is there any way for that?
>> Please let me know if there is a function or option in KMeans.
>>
>> Regards,
>> Mahmood
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201209/2c484e0b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 44525 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201209/2c484e0b/attachment-0001.png>


More information about the scikit-learn mailing list