[scikit-learn] sample_weights in RandomForestRegressor

Thomas Evangelidis tevang3 at gmail.com
Sun Jul 15 19:51:28 EDT 2018


I am kind of confused about the use of sample_weights parameter in the
fit() function of RandomForestRegressor. Here is my problem:

I am trying to predict the binding affinity of small molecules to a
protein. I have a training set of 709 molecules and a blind test set of 180
molecules. I want to find those features that are more important for the
correct prediction of the binding affinity of those 180 molecules of my
blind test set.  My rationale is that if I give more emphasis to the
similar molecules in the training set, then I will get higher importances
for those features that have higher predictive ability for this specific
blind test set of 180 molecules. To this end, I weighted the 709 training
set molecules by their maximum similarity to the 180 molecules, selected
only those features with high importance and trained a new RF with all 709
molecules. I got some results but I am not satisfied. Is this the right way
to use sample_weights in RF. I would appreciate any advice or suggested
work flow.



Dr Thomas Evangelidis

Post-doctoral Researcher
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/2S049,
62500 Brno, Czech Republic

email: tevang at pharm.uoa.gr

          tevang3 at gmail.com

website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180716/98b6dd94/attachment.html>

More information about the scikit-learn mailing list