[scikit-learn] affinity propagation not giving desired answer
Neal Becker
ndbecker2 at gmail.com
Wed Jan 23 13:26:44 EST 2019
I am not too familiar with affinity propagation, but just trying it out.
The problem is to cluster using a distance metric that is euclidean distance
but with a limit. When the distance is greater than some threshold than the
metric is -Inf. In other words, a point can be accepted into a cluster only
if the distance from the point to the cluster center is less than some
threshold.
It seems my test with affinity propagation will sometimes produce a correct
result, but other times the result seems to violate the condition. In the
example code, a couple of outlier points seem to be in clusters that are not
close at all.
I've tried playing with parameters (such as preference) without eliminating
the problem. Any suggestions?
---------
import numpy as np
from sklearn.cluster import AffinityPropagation
# from randomgen import RandomGenerator, Xoroshiro128
# rs = RandomGenerator (Xoroshiro128 (0))
from numpy.random import RandomState
rs = RandomState(3)
pts = rs.uniform (-5, 5, (50,2))
import seaborn as sns
import matplotlib.pyplot as plt
def distance (ax, ay, bx, by):
d = (ax - bx)**2 + (ay - by)**2
if d > 1:
return -1e6
else:
return -d
d = np.empty ((pts.shape[0], pts.shape[0]))
for i in range(pts.shape[0]):
for j in range(pts.shape[0]):
d[i,j] = distance(pts[i,0], pts[i,1], pts[j,0], pts[j,1])
preference = -20 #np.mean (d[d > -1e6])
print ('preference:', preference)
clustering = AffinityPropagation(affinity='precomputed', verbose=True,
preference=preference)
res = clustering.fit(d)
c = clustering
colors = np.array(sns.color_palette("hls", np.max(c.labels_)+1))
print('n_clusters:', np.max(c.labels_)+1)
centers = pts[c.cluster_centers_indices_]
plt.scatter (pts[:,0], pts[:,1], c=colors[c.labels_])
plt.scatter (centers[:,0], centers[:,1], marker='X', s=100, c=colors)
plt.show()
More information about the scikit-learn
mailing list