[scikit-learn] How to get the most important features from a RF efficiently

Raphael C drraph at gmail.com
Thu Jul 21 11:22:09 EDT 2016


I have a set of feature vectors associated with binary class labels,
each of which has about 40,000 features. I can train a random forest
classifier in sklearn which works well. I would however like to see
the most important features.

I tried simply printing out forest.feature_importances_ but this takes
about 1 second per feature making about 40,000 seconds overall. This
is much much longer than the time needed to train the classifier in
the first place?

Is there a more efficient way to find out which features are most important?

Raphael

On 21 July 2016 at 15:58, Nelson Liu <nfliu at uw.edu> wrote:
> Hi,
> If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the
> maintainers don't have control over downtime and issues like the one you're
> having). Can you connect to GitHub, or any site on GitHub Pages?
>
> Thanks
> Nelson
>
> On Thu, Jul 21, 2016, 07:52 Rahul Ahuja <rahul.ahuja at live.com> wrote:
>>
>> Hi there,
>>
>>
>> Sklearn website has been down for couple of days. Please look into it.
>>
>>
>> I reside in Pakistan, Karachi city.
>>
>>
>>
>>
>>
>>
>> Kind regards,
>> Rahul Ahuja
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


More information about the scikit-learn mailing list