<div dir="ltr">Trees are not a traditional choice for bag of words models, but you should make sure you are at least using the parameters of the random forest to limit the size (depth, branching) of the trees.</div><div class="gmail_extra"><br><div class="gmail_quote">On 16 March 2017 at 12:20, Sasha Kacanski <span dir="ltr"><<a href="mailto:skacanski@gmail.com" target="_blank">skacanski@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div>Hi,<br></div>As soon as number of trees and features goes higher, 70Gb of ram is gone and i am getting out of memory errors.<br></div>file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but there is ton of text ...<br></div>with 10 estimators and 100 features per word I can't tackle ~900 k of records ...<br></div>Training set, about 15% of data does perfectly fine but when test come that is it.<br><br></div>i can split stuff and multiprocess it but I believe that will simply skew results...<br><br clear="all"><div><div><div><div><div><div><div><div><div>Any ideas?<span class="HOEnZb"><font color="#888888"><br><br><br></font></span></div><span class="HOEnZb"><font color="#888888"><div>-- <br><div class="m_3979999317911961391gmail_signature" data-smartmail="gmail_signature">Aleksandar Kacanski<br></div>
</div></font></span></div></div></div></div></div></div></div></div></div>
<br>______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div><br></div>