<div dir="ltr">It is probably a good idea to start by separating off part of your training data into a held-out development set that is not used for training, which you can use to create learning curves and estimate probable performance on unseen data. I really recommend Andrew Ng's machine learning course material from Stanford and Coursera. It shows you how to use learning curves to understand your problem and also the way that different estimators behave.<div><br></div><div><br></div><div>There are many estimators that will achieve an extremely good fit to typical training data, but the differences between estimators show up mostly in what happens with unseen test data. Personally I always start by seeing how well simple classifiers or regressors do (Naive Bayes, linear regression, etc.), then try regularized linear models like ElasticNets then try SVMs, then try random forests or other ensemble models. That way, I finish up using the powerful and complex models only when the data demands it.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 23 June 2016 at 10:20, muhammad waseem <span dir="ltr"><<a href="mailto:m.waseem.ahmad@gmail.com" target="_blank">m.waseem.ahmad@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi All, <div>I am trying to use random forests for a regression problem, with 10 input variables and one output variable. I am getting very good fit even with default parameters and low n_estimators. Even with n_estimator = 10, I get R^2 value of 0.95 on testing dataset (MSE=23) and a value of 0.99 for the training set. I was wondering, if this is common with random forest or I am missing something, Could you please share your experience? The total number of sample (training +testing) are equal to 10971.</div><div>Also, what are the most important parameters (max_depth, bootstrap, max_leaf_nodes etc.) that I need to play with to tune my model even further? Lastly, is there is a way I can visualise a single tree of my forest (just for demonstration purposes)?</div><div>Please see a figure below to demonstrate how well it is fitting with default values.</div><div><br></div><div><br></div><div><br></div><div><img src="cid:ii_1557c8e6609011d8" alt="Inline image 1" width="412" height="130"></div><div>Thanks</div><div>Kindest Regards</div><span class="HOEnZb"><font color="#888888"><div>Waseem</div></font></span></div>
<br>_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div><br></div>