[scikit-learn] best way to scale on the random forest for text w bag of words ...

Joel Nothman joel.nothman at gmail.com
Wed Mar 15 21:44:05 EDT 2017


Trees are not a traditional choice for bag of words models, but you should
make sure you are at least using the parameters of the random forest to
limit the size (depth, branching) of the trees.

On 16 March 2017 at 12:20, Sasha Kacanski <skacanski at gmail.com> wrote:

> Hi,
> As soon as number of trees and features goes higher, 70Gb of ram is gone
> and i am getting out of memory errors.
> file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but
> there is ton of text ...
> with 10 estimators and 100 features per word I can't tackle ~900 k of
> records ...
> Training set, about 15% of data does perfectly fine but when test come
> that is it.
>
> i can split stuff and multiprocess it but I believe that will simply skew
> results...
>
> Any ideas?
>
>
> --
> Aleksandar Kacanski
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170316/7c0224e7/attachment.html>


More information about the scikit-learn mailing list