Any clustering algo to cluster multiple timing series data?
Cluster algo cluster samples by calculating the euclidean distance. I wonder if any clustering algo can cluster the timing series data? EX: Every items has there sold numbers of everyday. Item,Day1,Day2,Day3,Day4,Day5 A,1,5,1,5,1 B,10,50,10,50,10, C,4,70,30,10,50 The difference ratio of A and B are 500%,20%,500%,20%, I want to make A&B be the same cluster, C is another one. If I don't want to calculate the difference ratio of each samples Is there anyway to cluster by the difference ratio of each samples? thx
What about dynamic time warping ? Sendt fra min iPhone
Den 17. jan. 2019 kl. 05.31 skrev lampahome <pahome.chen@mirlab.org>:
Cluster algo cluster samples by calculating the euclidean distance. I wonder if any clustering algo can cluster the timing series data?
EX: Every items has there sold numbers of everyday. Item,Day1,Day2,Day3,Day4,Day5 A,1,5,1,5,1 B,10,50,10,50,10, C,4,70,30,10,50
The difference ratio of A and B are 500%,20%,500%,20%, I want to make A&B be the same cluster, C is another one.
If I don't want to calculate the difference ratio of each samples
Is there anyway to cluster by the difference ratio of each samples?
thx _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
You can use it to get a single similarity / closeness number between two timeseries and then feed that into a clustering algorithm. For instance look at https://github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping as a first idea: if you expand the distance function d = lambda x,y: abs(x-y) to a multivariate local distance d2 = lambda a,b: np.sqrt(float((a[0]-b[0])**2 + (a[1]-b[1])**2) (or any other n-dim metric) Then you have an algorithm that could cluster the timeseries. It does also work when the timeseries are of equal length… Best Mikkel Brynildsen From: scikit-learn <scikit-learn-bounces+mbrynildsen=grundfos.com@python.org> On Behalf Of lampahome Sent: 17. januar 2019 08:45 To: Scikit-learn mailing list <scikit-learn@python.org> Subject: Re: [scikit-learn] Any clustering algo to cluster multiple timing series data? Mikkel Haggren Brynildsen <mbrynildsen@grundfos.com<mailto:mbrynildsen@grundfos.com>> 於 2019年1月17日 週四 下午3:07寫道: What about dynamic time warping ? I thought DTW is used to different length of two datasets But I only get the same length of two datasets. Maybe it doesn't work?
you can have a look at : https://tslearn.readthedocs.io/en/latest/ Alex On Thu, Jan 17, 2019 at 9:01 AM Mikkel Haggren Brynildsen <mbrynildsen@grundfos.com> wrote:
You can use it to get a single similarity / closeness number between two timeseries and then feed that into a clustering algorithm.
For instance look at
https://github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping
as a first idea:
if you expand the distance function d = lambda x,y: abs(x-y) to a multivariate local distance
d2 = lambda a,b: np.sqrt(float((a[0]-b[0])**2 + (a[1]-b[1])**2)
(or any other n-dim metric)
Then you have an algorithm that could cluster the timeseries.
It does also work when the timeseries are of equal length…
Best
Mikkel Brynildsen
From: scikit-learn <scikit-learn-bounces+mbrynildsen=grundfos.com@python.org> On Behalf Of lampahome Sent: 17. januar 2019 08:45 To: Scikit-learn mailing list <scikit-learn@python.org> Subject: Re: [scikit-learn] Any clustering algo to cluster multiple timing series data?
Mikkel Haggren Brynildsen <mbrynildsen@grundfos.com> 於 2019年1月17日 週四 下午3:07寫道:
What about dynamic time warping ?
I thought DTW is used to different length of two datasets
But I only get the same length of two datasets.
Maybe it doesn't work?
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
How about scaling data first by MinMaxScaler and then cluster? What I thought is scaling can scale then into 0~1 section, and it can ignore the quantity of each data After scaling, it shows the increasing/decreasing ratio between each points. Then cluster then by the eucledian distance should work?
participants (3)
-
Alexandre Gramfort -
lampahome -
Mikkel Haggren Brynildsen