[scikit-learn] suggested machine learning algorithm
ericmajinglong at gmail.com
Sat Oct 1 14:37:35 EDT 2016
A number of people I've learned from have given me the following "recipe",
which I hold to loosely.
1. Start with Random Forest - it should be able to give you good
baseline predictive capacity.
2. Let's say you don't care about interpretability, but only care about
predictive value. Keep tweaking RF parameters (use grid search + cross
validation), or switch to gradient boosting.
3. Let's say you do care about interpretability. Use RF's
feature_importances_ to get out the features that are important for
prediction. Try linear regression on just those, may also want to try
multiplying those features together to get the "interaction" product of
those features. (this is using RF as a feature selection method).
Beyond this, I am sure more "expert" types will be able to chime in, and
also correct me if I've said anything wrong here.
On Sat, Oct 1, 2016 at 10:59 AM, Thomas Evangelidis <tevang3 at gmail.com>
> Dear scikit-learn users and developers,
> I have a dataset consisting of 42 observation (molnames) and 4 variables (
> VDWAALS, EEL, EGB, ESURF) with which I want to make a predictive model
> that estimates the experimental value (Expr). I tried multivariate linear
> regression using 10,000 bootstrap repeats each time using 21 observations
> for training and the rest 21 for testing, but the average correlation was
> only R= 0.1727 +- 0.19779.
> molname VDWAALS EEL EGB
>> ESURF Expr
>> CHEMBL108457 -20.4848 -96.5826 23.4584
>> -5.4045 -7.27193
>> CHEMBL388269 -50.3860 28.9403 -51.5147
>> -6.4061 -6.8022
>> CHEMBL244078 -49.1466 -21.9869 17.7999
>> -6.4588 -6.61742
>> CHEMBL244077 -53.4365 -32.8943 34.8723
>> -7.0384 -6.61742
>> CHEMBL396772 -51.4111 -34.4904 36.0326
>> -6.5443 -5.82207
> I would like your advice about what other machine learning algorithm I
> could try with these data. E.g. can I make a decision tree or the
> observations and variable are too few to avoid overfitting? I could
> include more variables but the observations will always remain 42.
> I would greatly appreciate any advice!
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn