[scikit-learn] How to deal with hierarchical and real-time analysis in machine learning?

Max Halford maxhalford25 at gmail.com
Wed Feb 13 05:13:26 EST 2019


Hey lampahome,

I'm currently working on an online learning library called creme:
https://creme-ml.github.io/. Each estimator and transformer has a
fit_one(x, y) method so that you can learn from a stream of data. I've
only been working on it for a bit less than a month now but it might
be of interest to you nonetheless. Maybe it will give you some ideas.
There's an introductory tutorial on GitHub.

Kind regards.

On 13/02/2019, lampahome <pahome.chen at mirlab.org> wrote:
> For example, I may have huge different regions and every regions have many
> or less points.
>
> And I also want to real-time to analyze the newest data and older data, but
> I don't want to put data into memory cuz I don't have enough memory.
>
> What I thought I can use is partial_fit to accept streaming data when new
> data comes in.
>
> But the incoming data has hierarchical, it's hard to cluster them cuz I
> don't have older and newer data together to cluster.
>
> How to design the system better?
>
> thx
>


-- 
Max Halford
+336 28 25 13 38


More information about the scikit-learn mailing list