<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
code
{mso-style-priority:99;
font-family:"Courier New";}
span.gmail-pre
{mso-style-name:gmail-pre;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Each tree is built using a random sample with replacement from the provided training data. The data not in the sample is used to calculate the out-of-bag score.
The “bag” is the sampled data.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The “random” refers to several features of the algorithm, including random sampling of features<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">So for each tree<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> Get a random sample of the training data<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> For I to n_estimators:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> Build a tree – this involves a
<b>random sample of features</b> and thresholds for each feature in the sample at each node.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> Use the rest of the training data, not in the sample, to calculate the out-of-bag score<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Random Forest already incorporates “random features”.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://github.com/glouppe/phd-thesis">https://github.com/glouppe/phd-thesis</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:red;background:white"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:red;background:white">__________________________________________________________________________________________</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#212121"><br>
</span><b><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D;background:white">Dale Smith</span></b><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D;background:white"> | Macy's Systems and Technology | IFS
eCommerce | Data Science<br>
</span><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">770-658-5176 | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org]
<b>On Behalf Of </b>??<br>
<b>Sent:</b> Tuesday, September 13, 2016 4:16 AM<br>
<b>To:</b> scikit-learn@python.org<br>
<b>Subject:</b> [scikit-learn] is RandomForest random samples or random features?<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><span style="color:red">⚠ EXT MSG:</span> <o:p></o:p></p>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">I have read the Guide of sklearn's RandomForest :<br>
<br>
"""<br>
In random forests (see <a href="http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier" title="sklearn.ensemble.RandomForestClassifier">
<span class="gmail-pre"><span style="font-size:10.0pt;font-family:"Courier New"">RandomForestClassifier</span></span></a> and
<a href="http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor" title="sklearn.ensemble.RandomForestRegressor">
<span class="gmail-pre"><span style="font-size:10.0pt;font-family:"Courier New"">RandomForestRegressor</span></span></a> classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set.<br>
"""<o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">But I prefer RandomForest as :<br>
"""<br>
features ("attributes", "predictors", "independent variables") are randomly sampled<br>
"""<o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">is RandomForest random samples or random features? where can I find a features random version of RandomForest?<o:p></o:p></p>
</div>
<p class="MsoNormal">thx.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:red">* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.</span>
<o:p></o:p></p>
</div>
</div>
</body>
</html>