Hello, Let's assume that I have data with 1000 features. I want to apply SVM-RFE on this data where each time 10% for the features are removed. How one can get the accuracy overall the levels of the elimination stages. For example, I want to get performance over 1000 features, 900 features, 800 features,....,2 features, 1 feature. Also, I want to keep track of the features in each level. [image: image.png] Best --------------------------------------------------------------------------------------- *Prof. Malik Yousef (Associate Professor) * *The Head of the** Galilee Digital Health Research Center (GDH)* *Zefat Academic College , Department of Information System * Home Page: https://malikyousef.com/ Google Scholar Profile : https://scholar.google.com/citations?user=9UCZ_q4AAAAJ&hl=en&oi=ao ----------------------------------------------------------------------------------------------------
Dear Malik, Your request to do performance checking of the steps of SVM-RFE is a pretty common task. Since the contributors to scikit-learn have done great to make the interface to RFE easy to use, the only real work required from you would be to build a small wrapper function that: (a) computes the step sizes you want to output prediction performances for, and (b) loops over the step sizes, making each step size the n_features attribute of RFE (and built from the remaining features), making predictions from a SVM retrained (and possibly optimized) on the reduced feature set, and then outputting your metric(s) appropriate to your problem. Tracing the feature weights is then done by accessing the "coef_" attribute of the linear SVM trained. This can be output in loop step (b) as well. where each time 10% for the features are removed.
How one can get the accuracy overall the levels of the elimination stages. For example, I want to get performance over 1000 features, 900 features, 800 features,....,2 features, 1 feature.
Just a technicality, but by 10% reduction you would have 1000, 900, 810, 729, 656, ... . Either way, if you allow your wrapper function to take a pre-computed list of feature sizes, you can flexibly change between a systematic way or a context-informed way of specifying feature sizes (and resulting weights) to trace. Hope this helps. J.B. Brown Kyoto University Graduate School of Medicine
I think you can also use RFECV directly without doing any wrapping. On 11/20/19 12:24 AM, Brown J.B. via scikit-learn wrote:
Dear Malik,
Your request to do performance checking of the steps of SVM-RFE is a pretty common task.
Since the contributors to scikit-learn have done great to make the interface to RFE easy to use, the only real work required from you would be to build a small wrapper function that: (a) computes the step sizes you want to output prediction performances for, and (b) loops over the step sizes, making each step size the n_features attribute of RFE (and built from the remaining features), making predictions from a SVM retrained (and possibly optimized) on the reduced feature set, and then outputting your metric(s) appropriate to your problem.
Tracing the feature weights is then done by accessing the "coef_" attribute of the linear SVM trained. This can be output in loop step (b) as well.
where each time 10% for the features are removed. How one can get the accuracy overall the levels of the elimination stages. For example, I want to get performance over 1000 features, 900 features, 800 features,....,2 features, 1 feature.
Just a technicality, but by 10% reduction you would have 1000, 900, 810, 729, 656, ... . Either way, if you allow your wrapper function to take a pre-computed list of feature sizes, you can flexibly change between a systematic way or a context-informed way of specifying feature sizes (and resulting weights) to trace.
Hope this helps.
J.B. Brown Kyoto University Graduate School of Medicine
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
2019年11月23日(土) 2:12 Andreas Mueller <t3kcit@gmail.com>:
I think you can also use RFECV directly without doing any wrapping.
Your request to do performance checking of the steps of SVM-RFE is a pretty common task.
Yes, RFECV works well (and I should know as an appreciative long-time user ;-) ), but does it actually provide a mechanism (accessors) for tracing the step by step feature weights and predictive ability as the features are continually reduced? (Or perhaps it's because I'm looking at 0.20.1 and 0.21.2 documentation...?) J.B.
It does not provide access for tracing the step by step feature weights and predictive ability- The user provides the n_feature. Malik --------------------------------------------------------------------------------------- *Prof. Malik Yousef (Associate Professor) * *The Head of the** Galilee Digital Health Research Center (GDH)* *Zefat Academic College , Department of Information System * Home Page: https://malikyousef.com/ Google Scholar Profile : https://scholar.google.com/citations?user=9UCZ_q4AAAAJ&hl=en&oi=ao ---------------------------------------------------------------------------------------------------- On Mon, Nov 25, 2019 at 1:36 PM Brown J.B. via scikit-learn < scikit-learn@python.org> wrote:
2019年11月23日(土) 2:12 Andreas Mueller <t3kcit@gmail.com>:
I think you can also use RFECV directly without doing any wrapping.
Your request to do performance checking of the steps of SVM-RFE is a pretty common task.
Yes, RFECV works well (and I should know as an appreciative long-time user ;-) ), but does it actually provide a mechanism (accessors) for tracing the step by step feature weights and predictive ability as the features are continually reduced? (Or perhaps it's because I'm looking at 0.20.1 and 0.21.2 documentation...?)
J.B. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
It does provide the ranking of features in the ranking_ attribute and it provides the cross-validation accuracies for all subsets in grid_scores_. It doesn't provide the feature weights for all subsets, but that's something that would be easy to add if it's desired. On 11/25/19 10:50 AM, Malik Yousef wrote:
It does not provide access for tracing the step by step feature weights and predictive ability- The user provides the n_feature.
Malik
--------------------------------------------------------------------------------------- *Prof. Malik Yousef (Associate Professor) * *The Head of the** Galilee Digital Health Research Center (GDH)*** *Zefat Academic College , Department of Information System * Home Page: https://malikyousef.com/ Google Scholar Profile : https://scholar.google.com/citations?user=9UCZ_q4AAAAJ&hl=en&oi=ao ----------------------------------------------------------------------------------------------------
On Mon, Nov 25, 2019 at 1:36 PM Brown J.B. via scikit-learn <scikit-learn@python.org <mailto:scikit-learn@python.org>> wrote:
2019年11月23日(土) 2:12 Andreas Mueller <t3kcit@gmail.com <mailto:t3kcit@gmail.com>>:
I think you can also use RFECV directly without doing any wrapping.
Your request to do performance checking of the steps of SVM-RFE is a pretty common task.
Yes, RFECV works well (and I should know as an appreciative long-time user ;-) ), but does it actually provide a mechanism (accessors) for tracing the step by step feature weights and predictive ability as the features are continually reduced? (Or perhaps it's because I'm looking at 0.20.1 and 0.21.2 documentation...?)
J.B. _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
2019年12月3日(火) 5:36 Andreas Mueller <t3kcit@gmail.com>:
It does provide the ranking of features in the ranking_ attribute and it provides the cross-validation accuracies for all subsets in grid_scores_. It doesn't provide the feature weights for all subsets, but that's something that would be easy to add if it's desired.
I would guess that there is some population of the user base that would like to track the per-iteration feature weights. It would appear to me that a straightforward (un-optimized) implementation would be place a NaN value for a feature once it is eliminated, so that a numpy.ndarray can be returned and immediately dumped to matplotlib.pcolormesh or other visualization routines in various libraries. Just an idea. J.B.
PR welcome ;) On 12/3/19 11:02 PM, Brown J.B. via scikit-learn wrote:
2019年12月3日(火) 5:36 Andreas Mueller <t3kcit@gmail.com <mailto:t3kcit@gmail.com>>:
It does provide the ranking of features in the ranking_ attribute and it provides the cross-validation accuracies for all subsets in grid_scores_. It doesn't provide the feature weights for all subsets, but that's something that would be easy to add if it's desired.
I would guess that there is some population of the user base that would like to track the per-iteration feature weights. It would appear to me that a straightforward (un-optimized) implementation would be place a NaN value for a feature once it is eliminated, so that a numpy.ndarray can be returned and immediately dumped to matplotlib.pcolormesh or other visualization routines in various libraries.
Just an idea.
J.B.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I certainly am guilty of only commenting in the mailing list and not engaging more via GitHub! :) (Much like many of you PIs on this list, the typical ActualWork-GrantWriting-ReportWriting-InvitedLectures-RealLifeParenting cycle eats the day away.) While I've failed previously to get involved after showing interest, let's see if I can't actually succeed for once. 2019年12月5日(木) 1:14 Andreas Mueller <t3kcit@gmail.com>:
PR welcome ;)
On 12/3/19 11:02 PM, Brown J.B. via scikit-learn wrote:
2019年12月3日(火) 5:36 Andreas Mueller <t3kcit@gmail.com>:
It does provide the ranking of features in the ranking_ attribute and it provides the cross-validation accuracies for all subsets in grid_scores_. It doesn't provide the feature weights for all subsets, but that's something that would be easy to add if it's desired.
I would guess that there is some population of the user base that would like to track the per-iteration feature weights. It would appear to me that a straightforward (un-optimized) implementation would be place a NaN value for a feature once it is eliminated, so that a numpy.ndarray can be returned and immediately dumped to matplotlib.pcolormesh or other visualization routines in various libraries.
Just an idea.
J.B.
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
Andreas Mueller -
Brown J.B. -
Malik Yousef