[scikit-learn] affinity propagation not giving desired answer

Neal Becker ndbecker2 at gmail.com
Wed Jan 23 13:26:44 EST 2019

I am not too familiar with affinity propagation, but just trying it out.  
The problem is to cluster using a distance metric that is euclidean distance 
but with a limit.  When the distance is greater than some threshold than the 
metric is -Inf.  In other words, a point can be accepted into a cluster only 
if the distance from the point to the cluster center is less than some 

It seems my test with affinity propagation will sometimes produce a correct 
result, but other times the result seems to violate the condition.  In the 
example code, a couple of outlier points seem to be in clusters that are not 
close at all.

I've tried playing with parameters (such as preference) without eliminating 
the problem.  Any suggestions?

import numpy as np
from sklearn.cluster import AffinityPropagation

# from randomgen import RandomGenerator, Xoroshiro128
# rs = RandomGenerator (Xoroshiro128 (0))
from numpy.random import RandomState
rs = RandomState(3)
pts = rs.uniform (-5, 5, (50,2))
import seaborn as sns
import matplotlib.pyplot as plt

def distance (ax, ay, bx, by):
    d = (ax - bx)**2 + (ay - by)**2
    if d > 1:
        return -1e6
        return -d
d = np.empty ((pts.shape[0], pts.shape[0]))
for i in range(pts.shape[0]):
    for j in range(pts.shape[0]):
        d[i,j] = distance(pts[i,0], pts[i,1], pts[j,0], pts[j,1])

preference = -20 #np.mean (d[d > -1e6])
print ('preference:', preference)
clustering = AffinityPropagation(affinity='precomputed', verbose=True, 

res = clustering.fit(d)
c = clustering
colors = np.array(sns.color_palette("hls", np.max(c.labels_)+1))
print('n_clusters:', np.max(c.labels_)+1)
centers = pts[c.cluster_centers_indices_]
plt.scatter (pts[:,0], pts[:,1], c=colors[c.labels_])
plt.scatter (centers[:,0], centers[:,1], marker='X', s=100, c=colors)

More information about the scikit-learn mailing list