[scikit-learn] Rerunning Kmeans with Python

Stephen Malcolm stephen_malcolm at hotmail.com
Sun Oct 4 17:14:23 EDT 2020


Hello all,

I've written some code to run Kmeans on a data set (please see below).
And I've plotted the results, with my two clusters/ centroids.

However, I've to re-run Kmeans several times and pull up different plots (showing the different centroid positions).
Can someone point me in the right direction how to write this extra code to perform this task?

Then I've to conclude if Kmeans is stable. I believe this is the lowest sum of squared errors?
Thanking you in advance.

#pandas used to read dataset and return the data
#numpy and matplotlib to represent and visualize the data
#sklearn to implement kmeans algorithm

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

#import the data
data = pd.read_csv('file.csv')

#extract values
x=data['V1']
y=data['V2']

V1_V2 = np.column_stack ((V1, V2))

km_res = KMeans (n_clusters= 2).fit(V1_V2)
y_kmeans = km_res.predict(V1_V2)

plt.scatter(V1, V2, c=y_kmeans, cmap='viridis',  s = 50, alpha = 0.5)
plt.xlabel('V1')
plt.ylabel('V2')
plt.title('Visualization of raw data');

clusters = km_res.cluster_centers_
plt.scatter(clusters[:,0], clusters[:,1], c='blue', s=150)


Get Outlook for iOS<https://aka.ms/o0ukef>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20201004/07b0adab/attachment.html>


More information about the scikit-learn mailing list