[scikit-learn] Any drawbacks when using partial_fit?

lampahome pahome.chen at mirlab.org
Thu Jun 27 06:40:09 EDT 2019


I try to use Birch to cluster time-series data incrementally.

Because insufficient memory, so I train it batch by batch. Every batch is
1000 samples and for 50 batch.

I found when I only train the first batch, it cluster well.

After first trained, I train following batch with the same model and use
partial_fit to train them.

I found the clustering result become worse after I trained many rounds
until finish.

Some samples will mix into another cluster which that seems very different
with another samples in the same cluster.

Is there any way to make it better? Or I use the wrong method?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190627/1c9e4785/attachment.html>


More information about the scikit-learn mailing list