How to get the most important features from a RF efficiently
I have a set of feature vectors associated with binary class labels, each of which has about 40,000 features. I can train a random forest classifier in sklearn which works well. I would however like to see the most important features. I tried simply printing out forest.feature_importances_ but this takes about 1 second per feature making about 40,000 seconds overall. This is much much longer than the time needed to train the classifier in the first place? Is there a more efficient way to find out which features are most important? Raphael On 21 July 2016 at 15:58, Nelson Liu <nfliu@uw.edu> wrote:
Hi, If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the maintainers don't have control over downtime and issues like the one you're having). Can you connect to GitHub, or any site on GitHub Pages?
Thanks Nelson
On Thu, Jul 21, 2016, 07:52 Rahul Ahuja <rahul.ahuja@live.com> wrote:
Hi there,
Sklearn website has been down for couple of days. Please look into it.
I reside in Pakistan, Karachi city.
Kind regards, Rahul Ahuja _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
The problem was that I had a loop like for i in xrange(len(clf.feature_importances_)): print clf.feature_importances_[i] which recomputes the feature importance array in every step. Obvious in hindsight. Raphael On 21 July 2016 at 16:22, Raphael C <drraph@gmail.com> wrote:
I have a set of feature vectors associated with binary class labels, each of which has about 40,000 features. I can train a random forest classifier in sklearn which works well. I would however like to see the most important features.
I tried simply printing out forest.feature_importances_ but this takes about 1 second per feature making about 40,000 seconds overall. This is much much longer than the time needed to train the classifier in the first place?
Is there a more efficient way to find out which features are most important?
Raphael
On 21 July 2016 at 15:58, Nelson Liu <nfliu@uw.edu> wrote:
Hi, If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the maintainers don't have control over downtime and issues like the one you're having). Can you connect to GitHub, or any site on GitHub Pages?
Thanks Nelson
On Thu, Jul 21, 2016, 07:52 Rahul Ahuja <rahul.ahuja@live.com> wrote:
Hi there,
Sklearn website has been down for couple of days. Please look into it.
I reside in Pakistan, Karachi city.
Kind regards, Rahul Ahuja _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (1)
-
Raphael C