Hello All,
I'm having some trouble adding a graphical overlay i.e. an ellipse onto my plot.
I wish to do this, as I need to explain/ portray the mean, standard deviation and outliers. And hence evaluate the suitability of the dataset.
Could you please let me know what code I'm missing/ or need to add, in order to insert this ellipse?
I have no trouble plotting the data points and the mean using this code, however, the ellipse (width and height/ standard deviation) doesn't appear.
I have no errors, instead, I'm getting a separate graph (without data points or ellipse) below the plotted one.
Please find my code below:
#pandas used to read dataset and return the data
#numpy and matplotlib to represent and visualize the data
#sklearn to implement kmeans algorithm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
#import the data
data = pd.read_csv('banknotes.csv')
#extract values
x=data['V1']
y=data['V2']
#print range to determine normalization
print ("X_max : ",x.max())
print ("X_min : ",x.min())
print ("Y_max : ",y.max())
print ("Y_min : ",y.min())
#normalize values
mean_x=x.mean()
mean_y=y.mean()
max_x=x.max()
max_y=y.max()
min_x=x.min()
min_y=y.min()
for i in range(0,x.size):
x[i] = (x[i] - mean_x) / (max_x - min_x)
for i in range(0,y.size):
y[i] = (y[i] - mean_y) / (max_y - min_y)
#statistical analyis using mean and standard deviation
import matplotlib.patches as patches
mean = np.mean(data, 0)
std_dev = np.std(data, 0)
ellipse = patches.Ellipse([mean[0], mean [1]], std_dev[0]*2, std_dev[1]*2, alpha=0.25)
plt.xlabel('V1')
plt.ylabel('V2')
plt.title('Visualization of raw data');
plt.scatter(data.iloc[:, 0], data.iloc[:, 1])
plt.scatter(mean[0],mean[1])
plt.figure(figsize=(6, 6))
fig,graph = plt.subplots()
graph.add_patch(ellipse)
Kind Regards,
Stephen