[scikit-learn] Finding a single cluster in 1d data

Raphael C drraph at gmail.com
Sat Apr 14 06:06:05 EDT 2018


Thank you very much!  I didn't know about jenkspy.

Raphael

On 13 April 2018 at 02:19, Pedro Pazzini <pedropazzini at gmail.com> wrote:
> Hi Raphael.
>
> An option to highlight a dense region in your vector is to use a density
> estimator (http://scikit-learn.org/stable/modules/density.html).
>
> But I think that the python module jenkspy
> (https://pypi.python.org/pypi/jenkspy and https://github.com/mthh/jenkspy)
> can help you also. The method finds the natural breaks of data in 1d
> (https://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization). I think
> that if you find a good value for the 'nb_class' parameter you can separate
> the dense region of your data from the sparse one.
>
> K-means is a generalization of Jenks break optimization for multivariate
> data, so, maybe, you could use the K-means module of scikit-learn for that
> also. On this approach, personally, I think the jenskpy module more
> straightforward.
>
> I hope it helps.
>
> Pedro Pazzini
>
> 2018-04-12 16:22 GMT-03:00 Raphael C <drraph at gmail.com>:
>>
>> I have a set of points in 1d represented by a list X of floating point
>> numbers.  The list has one dense section and the rest is sparse and I
>> want to find the dense part. I can't release the actual data but here
>> is a simulation:
>>
>> N = 100
>>
>> start = 0
>> points = []
>> rate = 0.1
>> for i in range(N):
>>     points.append(start)
>>     start = start + random.expovariate(rate)
>> rate = 10
>> for i in range(N*10):
>>     points.append(start)
>>     start = start + random.expovariate(rate)
>> rate = 0.1
>> for i in range(N):
>>     points.append(start)
>>     start = start + random.expovariate(rate)
>> plt.hist(points, bins = 100)
>> plt.show()
>>
>> I would like to use scikit learn to find the dense region. This feels
>> a little like outlier detection or the task of finding one cluster
>> with noise.
>>
>> Is there a suitable method in scikit learn for this task?
>>
>> Raphael
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


More information about the scikit-learn mailing list