[scikit-learn] affinity propagation not giving desired answer

Neal Becker ndbecker2 at gmail.com
Wed Jan 23 13:26:44 EST 2019


I am not too familiar with affinity propagation, but just trying it out.  
The problem is to cluster using a distance metric that is euclidean distance 
but with a limit.  When the distance is greater than some threshold than the 
metric is -Inf.  In other words, a point can be accepted into a cluster only 
if the distance from the point to the cluster center is less than some 
threshold.

It seems my test with affinity propagation will sometimes produce a correct 
result, but other times the result seems to violate the condition.  In the 
example code, a couple of outlier points seem to be in clusters that are not 
close at all.

I've tried playing with parameters (such as preference) without eliminating 
the problem.  Any suggestions?

---------
import numpy as np
from sklearn.cluster import AffinityPropagation

# from randomgen import RandomGenerator, Xoroshiro128
# rs = RandomGenerator (Xoroshiro128 (0))
from numpy.random import RandomState
rs = RandomState(3)
pts = rs.uniform (-5, 5, (50,2))
import seaborn as sns
import matplotlib.pyplot as plt

def distance (ax, ay, bx, by):
    d = (ax - bx)**2 + (ay - by)**2
    if d > 1:
        return -1e6
    else:
        return -d
    
d = np.empty ((pts.shape[0], pts.shape[0]))
for i in range(pts.shape[0]):
    for j in range(pts.shape[0]):
        d[i,j] = distance(pts[i,0], pts[i,1], pts[j,0], pts[j,1])

preference = -20 #np.mean (d[d > -1e6])
print ('preference:', preference)
clustering = AffinityPropagation(affinity='precomputed', verbose=True, 
preference=preference)

res = clustering.fit(d)
c = clustering
colors = np.array(sns.color_palette("hls", np.max(c.labels_)+1))
print('n_clusters:', np.max(c.labels_)+1)
centers = pts[c.cluster_centers_indices_]
plt.scatter (pts[:,0], pts[:,1], c=colors[c.labels_])
plt.scatter (centers[:,0], centers[:,1], marker='X', s=100, c=colors)
plt.show()




More information about the scikit-learn mailing list