[scikit-learn] partial_fit implementation for IsolationForest
donkey-hotei at cryptolab.net
donkey-hotei at cryptolab.net
Thu Jun 9 08:32:49 EDT 2016
hi nicolas,
excuse me, didn't mean to drop this thread for so long.
> There is a paper from the same authors as iforest but for streaming
> data: http://ijcai.org/Proceedings/11/Papers/254.pdf
>
> For now it is not cited enough (24) to satisfy the sklearn
> requirements. Waiting for more citations, this could be a nice
> addition to sklearn-contrib.
agreed, I started on a weak implementation of hstree but it is not
scikit-learn compatible,
let's see what happens...
it would be nice to see some guidance here, maybe a new splitter will
have to be added?
> Otherwise, we could imagine extending iforest to streaming data by
> building new
> trees when data come (and removing the oldest ones), prediction still
> being based on
> the average depth of the forest. I'm not sure this heuristic could be
> merged on
> scikit-learn, since it is not based on well-cited papers. In the same
> time,
> it is a natural and simple extension of iforest to streaming data...
>
> Any opinion on it?
It is, as I thought a simple extension - my first naive approach was to
use the 'warm_start' attribute
of the BaseBagging parent class to preserve older estimators and then,
in the 'partial_fit' method, we have a loop
which deleted popped off some n-number of estimators before calling the
original 'fit' method again on incoming data -
adding new estimators to the ensemble.
We run into the problem of concept drift. Is this the way you'd
implement this? if not, how would you approach?
thanks so much for reading,
isaak
More information about the scikit-learn
mailing list