[scikit-learn] Drawing contours in KMeans
Abhishek Ghose
abhishek.ghose.82 at gmail.com
Wed Dec 9 19:25:38 EST 2020
Sorry, just noticed that I had forgotten to attach a sample image.
Regards
On Wed, Dec 9, 2020 at 1:21 PM Abhishek Ghose <abhishek.ghose.82 at gmail.com>
wrote:
> Hi,
>
> A quick way I use is to draw a convex hull (scipy) around the points in a
> cluster.
> Here's a short example - k-means with k=2 is run on synthetic data:
>
> from sklearn.datasets import make_blobs
> from sklearn.cluster import KMeans
> from matplotlib import pyplot as plt
> from scipy.spatial import ConvexHull
>
>
> X, _ = make_blobs(centers=2)
> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>
> # uncomment the next line if you're using a notebook
> #%matplotlib inline
> for label in set(kmeans.labels_):
> X_clust = X[kmeans.labels_==label]
> hull = ConvexHull(X_clust, qhull_options='QJ')
> vertices_cycle = hull.vertices.tolist()
> vertices_cycle.append(hull.vertices[0])
> plt.plot(X_clust[vertices_cycle, 0], X_clust[vertices_cycle, 1],
> 'k--', lw=1)
> plt.scatter(X_clust[:, 0], X_clust[:, 1])
>
> Note:
>
> 1. You can still have overlaps between boundaries - but I think this
> is a good effort-to-results tradeoff.
> 2. To draw a closed boundary, you'd need to add the first vertex to
> the list returned by the hull function - the above code does that.
> 3. You'd need to handle the case for clusters with <=2 points
> explicitly - not shown in the above code.
> 4. I use the "QJ" option (other options at the qhull library page,
> which scipy internally uses: http://www.qhull.org/html/qh-optq.htm) to
> joggle the points a bit when they lie on a line.
>
> Regards
>
>
> On Wed, Dec 9, 2020 at 12:41 PM Brown J.B. via scikit-learn <
> scikit-learn at python.org> wrote:
>
>> Dear Mahmood,
>>
>> Andrew's solution with a circle will guarantee you render an image in
>> which every point is covered within some circle.
>>
>> However, if data contains outliers or artifacts, you might get circles
>> which are excessively large and distort the image you want.
>> For example, imagine if there were a single red point in Andrew's image
>> at the coordinate (3,10); then, the resulting circle would cover all points
>> in the entire plot, which is unlikely what you want.
>> You could potentially generate a density estimate for each class and then
>> have matplotlib render the contour lines (e.g., solutions of where
>> estimates have a specific value), but as was said, this is not the job of
>> Kmeans, but rather of general data analysis.
>>
>> The ellipsoid solution proposed to you is, in a sense, a middle ground
>> between these two solutions (the circles and the density plots).
>> You could adjust the (4 or 5) parameters of an ellipsoid to cover "most"
>> of the points for a particular class and tolerate that the ellipsoids don't
>> cover a few outliers or artifacts (e.g., the coordinate (3,10) I mentioned
>> above).
>> The resulting functional forms of the ellipses might be more precise than
>> circles and less complex than density contours, and might lead to
>> actionable knowledge depending on your context/domain.
>>
>> Hope this helps.
>> J.B. Brown
>>
>> 2020年12月9日(水) 21:08 Mahmood Naderan <mahmood.nt at gmail.com>:
>>
>>> >Mebbe principal components analysis would suggest an
>>> >ellipsoid containing "most" of the points in a "cloud".
>>>
>>> Sorry I didn't understand. Can you explain more?
>>> Regards,
>>> Mahmood
>>>
>>>
>>>
>>>
>>> On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn <
>>> scikit-learn at python.org> wrote:
>>>
>>>> [scikit-learn] Drawing contours in KMeans4
>>>>
>>>>
>>>> Mebbe principal components analysis would suggest an ellipsoid
>>>> containing "most" of the points in a "cloud".
>>>>
>>>>
>>>>
>>>>
>>>> "You won't find the right answers if you don't ask the right
>>>> questions!" (Robert Helmbold, 2013)
>>>>
>>>>
>>>> On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe <
>>>> ahowe42 at gmail.com> wrote:
>>>>
>>>>
>>>> Ok, I see. Well the attached notebook demonstrates doing this by simply
>>>> finding the maximum distance from each centroid to it's datapoints and
>>>> drawing a circle using that radius. It's simple, but will hopefully at
>>>> least point you in a useful direction.
>>>> [image: image.png]
>>>> Andrew
>>>>
>>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>> J. Andrew Howe, PhD
>>>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>>>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>>>> Open Researcher and Contributor ID (ORCID)
>>>> <http://orcid.org/0000-0002-3553-1990>
>>>> Github Profile <http://github.com/ahowe42>
>>>> Personal Website <http://www.andrewhowe.com>
>>>> I live to learn, so I can learn to live. - me
>>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>>
>>>>
>>>> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan <mahmood.nt at gmail.com>
>>>> wrote:
>>>>
>>>> I mean a circle/contour to group the points in a cluster for better
>>>> representation.
>>>> For example, if there are 6 six clusters, it will be more meaningful to
>>>> group large data points in a circle or contour.
>>>>
>>>> Regards,
>>>> Mahmood
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe <ahowe42 at gmail.com> wrote:
>>>>
>>>> Contours generally indicate a third variable - often a probability
>>>> density. Kmeans doesn't provide density estimates, so what precisely would
>>>> you want the contours to represent?
>>>>
>>>> Andrew
>>>>
>>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>> J. Andrew Howe, PhD
>>>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>>>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>>>> Open Researcher and Contributor ID (ORCID)
>>>> <http://orcid.org/0000-0002-3553-1990>
>>>> Github Profile <http://github.com/ahowe42>
>>>> Personal Website <http://www.andrewhowe.com>
>>>> I live to learn, so I can learn to live. - me
>>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>>
>>>>
>>>> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan <mahmood.nt at gmail.com>
>>>> wrote:
>>>>
>>>> Hi
>>>> I use the following code to highlight the cluster centers with some red
>>>> dots.
>>>>
>>>> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100,
>>>> n_init=10, random_state=0)
>>>> pred_y = kmeans.fit_predict(a)
>>>> plt.scatter(a[:,0], a[:,1])
>>>> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,
>>>> 1], s=100, c='red')
>>>> plt.show()
>>>>
>>>> I would like to know if it is possible to draw contours over the
>>>> clusters. Is there any way for that?
>>>> Please let me know if there is a function or option in KMeans.
>>>>
>>>> Regards,
>>>> Mahmood
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Computers: The eventual realization of Douglas Adams' musings - the world
> depends on machines controlled by mice.
>
--
Computers: The eventual realization of Douglas Adams' musings - the world
depends on machines controlled by mice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201209/92e35350/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 44525 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201209/92e35350/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kmeans_convexhull.PNG
Type: image/png
Size: 29519 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20201209/92e35350/attachment-0003.png>
More information about the scikit-learn
mailing list