[scikit-learn] Is there any general way to make clustering huge time-series dataset better?

lampahome pahome.chen at mirlab.org
Thu Jun 20 10:33:17 EDT 2019


I have a huge time-series dataset and should load batch by batch.

My procedures like below:
Scale to (0~1)
Shuffle (because I use Birch not MiniBatchKMeans)
Train Birch model with partial_fit
Evaluate with silhouette_score (large is better)

Why I use Birch is because it have partial_fit and no need to specify the
cluster number
But...I found evaluting by silhouette_score and db score, it will cluster
with fewer cluster numbers.

When I look into the data, it should cluster more than the clustering
results.

Should I change the evaluating way? or else?

thx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190620/1bb5272d/attachment.html>


More information about the scikit-learn mailing list