[scikit-learn] Finding a single cluster in 1d data
Raphael C
drraph at gmail.com
Thu Apr 12 15:22:44 EDT 2018
I have a set of points in 1d represented by a list X of floating point
numbers. The list has one dense section and the rest is sparse and I
want to find the dense part. I can't release the actual data but here
is a simulation:
N = 100
start = 0
points = []
rate = 0.1
for i in range(N):
points.append(start)
start = start + random.expovariate(rate)
rate = 10
for i in range(N*10):
points.append(start)
start = start + random.expovariate(rate)
rate = 0.1
for i in range(N):
points.append(start)
start = start + random.expovariate(rate)
plt.hist(points, bins = 100)
plt.show()
I would like to use scikit learn to find the dense region. This feels
a little like outlier detection or the task of finding one cluster
with noise.
Is there a suitable method in scikit learn for this task?
Raphael
More information about the scikit-learn
mailing list