[scikit-learn] Decision stubs?
stuart at stuartreynolds.net
Sun Aug 27 18:23:45 EDT 2017
Is it possible to efficiently get at the branch statistics that
decision tree algorithms iterate over in scikit?
For example if the root population has the class counts in the output vector:
Then I'd like to iterate over:
# For a boolean (2 valued category)
f1=True: c0=3000, c1=450
f1=False: c0=300, c1=30
f1=Null: c0=1700, c1=20 # ? Is considered?
# For a continuous value
f2<10: c0= ... c1= ...
f2>=10: c0= ... c1= ...
f2<22: c0= ... c1= ...
f2>=22: c0= ... c1= ...
I'd like to experiment with building models on-demand for each input
row in a predict.
To work efficiently, I'd like to reduce the training set to the 'most
significant' sub-space(s) using the population statistics.
I can do it in pandas, although its fairly inefficient to iterate over
each feature column many times.
More information about the scikit-learn