[scikit-learn] Drawing contours in KMeans

Abhishek Ghose abhishek.ghose.82 at gmail.com
Wed Dec 9 16:21:27 EST 2020


Hi,

A quick way I use is to draw a convex hull (scipy) around the points in a
cluster.
Here's a short example - k-means with k=2 is run on synthetic data:

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
from scipy.spatial import ConvexHull


X, _ = make_blobs(centers=2)
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

# uncomment the next line if you're using a notebook
#%matplotlib inline
for label in set(kmeans.labels_):
    X_clust = X[kmeans.labels_==label]
    hull = ConvexHull(X_clust, qhull_options='QJ')
    vertices_cycle = hull.vertices.tolist()
    vertices_cycle.append(hull.vertices[0])
    plt.plot(X_clust[vertices_cycle, 0], X_clust[vertices_cycle, 1], 'k--',
lw=1)
    plt.scatter(X_clust[:, 0], X_clust[:, 1])

Note:

   1. You can still have overlaps between boundaries - but I think this is
   a good effort-to-results tradeoff.
   2. To draw a closed boundary, you'd need to add the first vertex to the
   list returned by the hull function - the above code does that.
   3. You'd need to handle the case for clusters with <=2 points explicitly
   - not shown in the above code.
   4. I use the "QJ" option (other options at the qhull library page, which
   scipy internally uses: http://www.qhull.org/html/qh-optq.htm) to joggle
   the points a bit when they lie on a line.

Regards


On Wed, Dec 9, 2020 at 12:41 PM Brown J.B. via scikit-learn <
scikit-learn at python.org> wrote:

> Dear Mahmood,
>
> Andrew's solution with a circle will guarantee you render an image in
> which every point is covered within some circle.
>
> However, if data contains outliers or artifacts, you might get circles
> which are excessively large and distort the image you want.
> For example, imagine if there were a single red point in Andrew's image at
> the coordinate (3,10); then, the resulting circle would cover all points in
> the entire plot, which is unlikely what you want.
> You could potentially generate a density estimate for each class and then
> have matplotlib render the contour lines (e.g., solutions of where
> estimates have a specific value), but as was said, this is not the job of
> Kmeans, but rather of general data analysis.
>
> The ellipsoid solution proposed to you is, in a sense, a middle ground
> between these two solutions (the circles and the density plots).
> You could adjust the (4 or 5) parameters of an ellipsoid to cover "most"
> of the points for a particular class and tolerate that the ellipsoids don't
> cover a few outliers or artifacts (e.g., the coordinate (3,10) I mentioned
> above).
> The resulting functional forms of the ellipses might be more precise than
> circles and less complex than density contours, and might lead to
> actionable knowledge depending on your context/domain.
>
> Hope this helps.
> J.B. Brown
>
> 2020年12月9日(水) 21:08 Mahmood Naderan <mahmood.nt at gmail.com>:
>
>> >Mebbe principal components analysis would suggest an
>> >ellipsoid containing "most" of the points in a "cloud".
>>
>> Sorry I didn't understand. Can you explain more?
>> Regards,
>> Mahmood
>>
>>
>>
>>
>> On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn <
>> scikit-learn at python.org> wrote:
>>
>>> [scikit-learn] Drawing contours in KMeans4
>>>
>>>
>>> Mebbe principal components analysis would suggest an ellipsoid
>>> containing "most" of the points in a "cloud".
>>>
>>>
>>>
>>>
>>> "You won't find the right answers if you don't ask the right questions!"
>>> (Robert Helmbold, 2013)
>>>
>>>
>>> On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe <
>>> ahowe42 at gmail.com> wrote:
>>>
>>>
>>> Ok, I see. Well the attached notebook demonstrates doing this by simply
>>> finding the maximum distance from each centroid to it's datapoints and
>>> drawing a circle using that radius. It's simple, but will hopefully at
>>> least point you in a useful direction.
>>> [image: image.png]
>>> Andrew
>>>
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>> J. Andrew Howe, PhD
>>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>>> Open Researcher and Contributor ID (ORCID)
>>> <http://orcid.org/0000-0002-3553-1990>
>>> Github Profile <http://github.com/ahowe42>
>>> Personal Website <http://www.andrewhowe.com>
>>> I live to learn, so I can learn to live. - me
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>
>>>
>>> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan <mahmood.nt at gmail.com>
>>> wrote:
>>>
>>> I mean a circle/contour to group the points in a cluster for better
>>> representation.
>>> For example, if there are 6 six clusters, it will be more meaningful to
>>> group large data points in a circle or contour.
>>>
>>> Regards,
>>> Mahmood
>>>
>>>
>>>
>>>
>>> On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe <ahowe42 at gmail.com> wrote:
>>>
>>> Contours generally indicate a third variable - often a probability
>>> density. Kmeans doesn't provide density estimates, so what precisely would
>>> you want the contours to represent?
>>>
>>> Andrew
>>>
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>> J. Andrew Howe, PhD
>>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>>> Open Researcher and Contributor ID (ORCID)
>>> <http://orcid.org/0000-0002-3553-1990>
>>> Github Profile <http://github.com/ahowe42>
>>> Personal Website <http://www.andrewhowe.com>
>>> I live to learn, so I can learn to live. - me
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>
>>>
>>> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan <mahmood.nt at gmail.com>
>>> wrote:
>>>
>>> Hi
>>> I use the following code to highlight the cluster centers with some red
>>> dots.
>>>
>>> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10,
>>> random_state=0)
>>> pred_y = kmeans.fit_predict(a)
>>> plt.scatter(a[:,0], a[:,1])
>>> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,
>>> 1], s=100, c='red')
>>> plt.show()
>>>
>>> I would like to know if it is possible to draw contours over the
>>> clusters. Is there any way for that?
>>> Please let me know if there is a function or option in KMeans.
>>>
>>> Regards,
>>> Mahmood
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Computers: The eventual realization of Douglas Adams' musings - the world
depends on machines controlled by mice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201209/09ad1340/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 44525 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201209/09ad1340/attachment-0001.png>


More information about the scikit-learn mailing list