Request / Proposal: integrating IEEE paper in scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark datasets
Dear scikit-learn mailing list similarly to standing feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE and RFECV*, this is a request to openly discuss the *PROPOSAL* and requirements of *feature_selection.EFS and/or EFSCV* which would stand for "Evolutionary Feature Selection" with starting 8 algorithms or methods to be used with scikit-learn estimators, just as published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to help integrate it (in cc). *PROPOSAL* Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn: *1) CODE* - implementing *feature_selection.EFS and/or EFSC*V (a space for evolutionary computing community interested in feature selection) RFE is: feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *RFE*(estimator, *[, ...]) Feature ranking with recursive feature elimination. feature_selection.RFECV <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> (estimator, *[, ...]) Recursive feature elimination with cross-validation to select features. The "EFS" could be: feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS*(estimator, *[, ...]) Feature ranking and feature elimination with *8 different algorithms, SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc. * feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFSCV*(estimator, *[, ...]) Feature elimination with cross-validation to select features *2) DATASETS & CANCER BENCHMARK* - curating and integrating fetch of *cancer_benchmark* 40 datasets, directly in scikit-learn or externally pullable somehow and maintained (space for contributing expanding high-dimensional datasets on cancer topics). fetch_c <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_cal...> ancer-benchmark(*[,, ...]) Loads 40 individual cancer related high-dimensional datasets for benchmarking feature selection methods (classification). *3) TUTORIAL / WEBSITE* - writing tutorial to replicate IEEE paper results with *feature_selection.EFS and/or EFSCV* on *cancer_benchmark (40 datasets)* I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very interesting novelty in working with high-dimensional datasets as it reports small subsets of predictive features selected with SVM, KNN across 40 datasets. Replicability under BSD-3 and high quality under scikit-learn could assure benchmarking novel feature selection algorithms easier - in my very first opinion. Since this is the very first touch of myself with IEEE paper authors and the scikit-learn list altogether, we would welcome some help/guide how integration could work out, and if there is any interest on that line at all. Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/ On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort < alexandre.gramfort@inria.fr> wrote:
Dear Dalibor
you should discuss this on the main scikit-learn mailing list.
https://mail.python.org/mailman/listinfo/scikit-learn
Alex
On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor.hrg@gmail.com> wrote:
Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
This is a request to openly discuss the idea of potential for feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS* which would stand for "Evolutionary Feature Selection" or shortly EFS with starting 8 algorithms as published in IEEE https://arxiv.org/abs/2303.10182 by the authors on high-dimensional datasets. I have identified this work to be of very interesting novelty in working with high-dimensional datasets, especially for health fields, and it could mean a lot to the ML community and scikit-learn project - in my very first opinion.
A Jupyter Notebook and scikit-learn tutorial replicating this IEEE paper/work as feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS *and 8 algorithms in it could be a near term goal. And eventually, scikit-learn EFSCV and diverse classification algorithms could be benchmarked for "joint paper" in JOSS, or a health journal.
My initial idea (doesn't need to be that way or is open to discussion) has some first thought like this:
RFE has:
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *RFE*(estimator, *[, ...])
Feature ranking with recursive feature elimination.
feature_selection.RFECV <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> (estimator, *[, ...])
Recursive feature elimination with cross-validation to select features. The "EFS" could have:
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS*(estimator, *[, ...])
Feature ranking and feature elimination with *8 different algorithms, SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc. *
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFSCV*(estimator, *[, ...])
Feature elimination with cross-validation to select features Looking forward to an open discussion and if Evolutionary Feature Selection EFS is something for sklearn project, or maybe a separate pip install package.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadzade@yahoo.com> wrote:
Dear Dalibor Hrg,
Thank you very much for your attention to the SFE algorithm. Thank you very much for the time you took to guide me and my colleagues. According to your guidance, we will add this algorithm to the scikit-learn library as soon as possible.
Kind regards, Ahadzadeh. On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg < dalibor.hrg@gmail.com> wrote:
Dear Authors,
you have done some amazing work on feature selection here published in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code here without a LICENSE file or any info on this: https://github.com/Ahadzadeh2022/SFE and in the paper some links are mentioned to download data.
I would be interested with you that we:
Step 1) make and release a pip package, publish this code in JOSS https://joss.readthedocs.io i.e. https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license and replicate IEEE paper table results. All 8 algorithms could be in potentially one class "EFS" meaning "Evolutionary Feature Selection", selectable as 8 options among them SFE. Or something like that.
Step 2) try integrate and work with scikit-learn people, I would recommend it to integrate this under https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_... similarly to sklearn.feature_selection.RFE. I believe this would be a great contribution to the best open library for ML, scikit-learn.
I am unsure what is the status of datasets and licenses therein?. But, the datasets could be fetched externally from OpenML.org repository, for example https://scikit-learn.org/stable/datasets/loading_other_datasets.html or CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit on the dataset licenses?
Overall, I hope this can hugely maximize your published work visibility but also for others to credit you in papers in a more citable and replicable way. I believe your IEEE paper and work definitely deserve a spot in scikit-learn. There is need for some replicable code on "Evolutionary Methods for Feature Selection" and such Benchmark in life-science datasets, and you have done some great work so far.
Let me know what you think.
Best regards, Dalibor Hrg
starting with the Efroymson stepwise regression, the selection of relevant regressors has a long history. Of course, Efroymson's case is an old and simple one in a very wide set of more general problems where the number of variables and the missingness pattern make things very hard to tackle. I had a look at the paper that seems to me to be based on a wide review of the literature and an in depth focus on the main extant algorithms. I do not feel as an expert about the matter. However, the subject is so important that, in view of the thorough analysis the authors performed, I think this enterprise worthwhile. My best regards. Ulderico Santarelli. Il giorno dom 24 set 2023 alle ore 11:12 Dalibor Hrg <dalibor.hrg@gmail.com> ha scritto:
Dear scikit-learn mailing list
similarly to standing feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE and RFECV*, this is a request to openly discuss the *PROPOSAL* and requirements of *feature_selection.EFS and/or EFSCV* which would stand for "Evolutionary Feature Selection" with starting 8 algorithms or methods to be used with scikit-learn estimators, just as published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to help integrate it (in cc).
*PROPOSAL* Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn:
*1) CODE*
- implementing *feature_selection.EFS and/or EFSC*V (a space for evolutionary computing community interested in feature selection)
RFE is:
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *RFE*(estimator, *[, ...])
Feature ranking with recursive feature elimination.
feature_selection.RFECV <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> (estimator, *[, ...])
Recursive feature elimination with cross-validation to select features. The "EFS" could be:
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS*(estimator, *[, ...])
Feature ranking and feature elimination with *8 different algorithms, SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc. *
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFSCV*(estimator, *[, ...])
Feature elimination with cross-validation to select features
*2) DATASETS & CANCER BENCHMARK*
- curating and integrating fetch of *cancer_benchmark* 40 datasets, directly in scikit-learn or externally pullable somehow and maintained (space for contributing expanding high-dimensional datasets on cancer topics).
fetch_c <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_cal...> ancer-benchmark(*[,, ...])
Loads 40 individual cancer related high-dimensional datasets for benchmarking feature selection methods (classification).
*3) TUTORIAL / WEBSITE*
- writing tutorial to replicate IEEE paper results with *feature_selection.EFS and/or EFSCV* on *cancer_benchmark (40 datasets)*
I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very interesting novelty in working with high-dimensional datasets as it reports small subsets of predictive features selected with SVM, KNN across 40 datasets. Replicability under BSD-3 and high quality under scikit-learn could assure benchmarking novel feature selection algorithms easier - in my very first opinion. Since this is the very first touch of myself with IEEE paper authors and the scikit-learn list altogether, we would welcome some help/guide how integration could work out, and if there is any interest on that line at all.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort < alexandre.gramfort@inria.fr> wrote:
Dear Dalibor
you should discuss this on the main scikit-learn mailing list.
https://mail.python.org/mailman/listinfo/scikit-learn
Alex
On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor.hrg@gmail.com> wrote:
Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
This is a request to openly discuss the idea of potential for feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS* which would stand for "Evolutionary Feature Selection" or shortly EFS with starting 8 algorithms as published in IEEE https://arxiv.org/abs/2303.10182 by the authors on high-dimensional datasets. I have identified this work to be of very interesting novelty in working with high-dimensional datasets, especially for health fields, and it could mean a lot to the ML community and scikit-learn project - in my very first opinion.
A Jupyter Notebook and scikit-learn tutorial replicating this IEEE paper/work as feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS *and 8 algorithms in it could be a near term goal. And eventually, scikit-learn EFSCV and diverse classification algorithms could be benchmarked for "joint paper" in JOSS, or a health journal.
My initial idea (doesn't need to be that way or is open to discussion) has some first thought like this:
RFE has:
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *RFE*(estimator, *[, ...])
Feature ranking with recursive feature elimination.
feature_selection.RFECV <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> (estimator, *[, ...])
Recursive feature elimination with cross-validation to select features. The "EFS" could have:
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFS*(estimator, *[, ...])
Feature ranking and feature elimination with *8 different algorithms, SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc. *
feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection....> *EFSCV*(estimator, *[, ...])
Feature elimination with cross-validation to select features Looking forward to an open discussion and if Evolutionary Feature Selection EFS is something for sklearn project, or maybe a separate pip install package.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadzade@yahoo.com> wrote:
Dear Dalibor Hrg,
Thank you very much for your attention to the SFE algorithm. Thank you very much for the time you took to guide me and my colleagues. According to your guidance, we will add this algorithm to the scikit-learn library as soon as possible.
Kind regards, Ahadzadeh. On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg < dalibor.hrg@gmail.com> wrote:
Dear Authors,
you have done some amazing work on feature selection here published in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code here without a LICENSE file or any info on this: https://github.com/Ahadzadeh2022/SFE and in the paper some links are mentioned to download data.
I would be interested with you that we:
Step 1) make and release a pip package, publish this code in JOSS https://joss.readthedocs.io i.e. https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license and replicate IEEE paper table results. All 8 algorithms could be in potentially one class "EFS" meaning "Evolutionary Feature Selection", selectable as 8 options among them SFE. Or something like that.
Step 2) try integrate and work with scikit-learn people, I would recommend it to integrate this under https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_... similarly to sklearn.feature_selection.RFE. I believe this would be a great contribution to the best open library for ML, scikit-learn.
I am unsure what is the status of datasets and licenses therein?. But, the datasets could be fetched externally from OpenML.org repository, for example https://scikit-learn.org/stable/datasets/loading_other_datasets.html or CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit on the dataset licenses?
Overall, I hope this can hugely maximize your published work visibility but also for others to credit you in papers in a more citable and replicable way. I believe your IEEE paper and work definitely deserve a spot in scikit-learn. There is need for some replicable code on "Evolutionary Methods for Feature Selection" and such Benchmark in life-science datasets, and you have done some great work so far.
Let me know what you think.
Best regards, Dalibor Hrg
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Dalibor, As detailed in the FAQ, https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for... """ We only consider well-established algorithms for inclusion. A rule of thumb is at least 3 years since publication, 200+ citations, and wide use and usefulness. """ These days, I would say that the bar is even harder, as we are finding that we prioritize things such as high-quality documentation or better dataframe support to new algorithms. Best, Gaël On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
Dear scikit-learn mailing list
similarly to standing feature_selection.RFE and RFECV, this is a request to openly discuss the PROPOSAL and requirements of feature_selection.EFS and/or EFSCV which would stand for "Evolutionary Feature Selection" with starting 8 algorithms or methods to be used with scikit-learn estimators, just as published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to help integrate it (in cc).
PROPOSAL Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn:
1) CODE
• implementing feature_selection.EFS and/or EFSCV (a space for evolutionary computing community interested in feature selection)
RFE is:
feature_selection.RFE Feature ranking with recursive feature (estimator, *[, ...]) elimination.
feature_selection.RFECV Recursive feature elimination with (estimator, *[, ...]) cross-validation to select features.
The "EFS" could be:
Feature ranking and feature elimination with 8 feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new (estimator, *[, ...]) algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc.
feature_selection.EFSCV Feature elimination with cross-validation to select (estimator, *[, ...]) features
2) DATASETS & CANCER BENCHMARK
• curating and integrating fetch of cancer_benchmark 40 datasets, directly in scikit-learn or externally pullable somehow and maintained (space for contributing expanding high-dimensional datasets on cancer topics).
fetch_cancer-benchmark Loads 40 individual cancer related high-dimensional (*[,, ...]) datasets for benchmarking feature selection methods (classification).
3) TUTORIAL / WEBSITE
• writing tutorial to replicate IEEE paper results with feature_selection.EFS and/or EFSCV on cancer_benchmark (40 datasets)
I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very interesting novelty in working with high-dimensional datasets as it reports small subsets of predictive features selected with SVM, KNN across 40 datasets. Replicability under BSD-3 and high quality under scikit-learn could assure benchmarking novel feature selection algorithms easier - in my very first opinion. Since this is the very first touch of myself with IEEE paper authors and the scikit-learn list altogether, we would welcome some help/guide how integration could work out, and if there is any interest on that line at all.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort <alexandre.gramfort@inria.fr
wrote:
Dear Dalibor
you should discuss this on the main scikit-learn mailing list.
Alex
On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor.hrg@gmail.com> wrote:
Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
This is a request to openly discuss the idea of potential for feature_selection.EFS which would stand for "Evolutionary Feature Selection" or shortly EFS with starting 8 algorithms as published in IEEE https://arxiv.org/abs/2303.10182 by the authors on high-dimensional datasets. I have identified this work to be of very interesting novelty in working with high-dimensional datasets, especially for health fields, and it could mean a lot to the ML community and scikit-learn project - in my very first opinion.
A Jupyter Notebook and scikit-learn tutorial replicating this IEEE paper/work as feature_selection.EFS and 8 algorithms in it could be a near term goal. And eventually, scikit-learn EFSCV and diverse classification algorithms could be benchmarked for "joint paper" in JOSS, or a health journal.
My initial idea (doesn't need to be that way or is open to discussion) has some first thought like this: RFE has:
feature_selection.RFE Feature ranking with recursive feature (estimator, *[, ...]) elimination.
feature_selection.RFECV Recursive feature elimination with (estimator, *[, ...]) cross-validation to select features.
The "EFS" could have:
Feature ranking and feature elimination with 8 feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new (estimator, *[, ...]) algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc.
feature_selection.EFSCV Feature elimination with cross-validation to (estimator, *[, ...]) select features
Looking forward to an open discussion and if Evolutionary Feature Selection EFS is something for sklearn project, or maybe a separate pip install package.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadzade@yahoo.com > wrote:
Dear Dalibor Hrg,
Thank you very much for your attention to the SFE algorithm. Thank you very much for the time you took to guide me and my colleagues. According to your guidance, we will add this algorithm to the scikit-learn library as soon as possible.
Kind regards, Ahadzadeh. On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg <dalibor.hrg@gmail.com> wrote:
Dear Authors,
you have done some amazing work on feature selection here published in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code here without a LICENSE file or any info on this: https:// github.com/Ahadzadeh2022/SFE and in the paper some links are mentioned to download data.
I would be interested with you that we:
Step 1) make and release a pip package, publish this code in JOSS https://joss.readthedocs.io i.e. https://joss.theoj.org/papers/ 10.21105/joss.04611 under BSD-3 license and replicate IEEE paper table results. All 8 algorithms could be in potentially one class "EFS" meaning "Evolutionary Feature Selection", selectable as 8 options among them SFE. Or something like that. Step 2) try integrate and work with scikit-learn people, I would recommend it to integrate this under https://scikit-learn.org/ stable/modules/classes.html#module-sklearn.feature_selection similarly to sklearn.feature_selection.RFE. I believe this would be a great contribution to the best open library for ML, scikit-learn.
I am unsure what is the status of datasets and licenses therein?. But, the datasets could be fetched externally from OpenML.org repository, for example https://scikit-learn.org/stable/datasets/ loading_other_datasets.html or CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit on the dataset licenses?
Overall, I hope this can hugely maximize your published work visibility but also for others to credit you in papers in a more citable and replicable way. I believe your IEEE paper and work definitely deserve a spot in scikit-learn. There is need for some replicable code on "Evolutionary Methods for Feature Selection" and such Benchmark in life-science datasets, and you have done some great work so far.
Let me know what you think.
Best regards, Dalibor Hrg
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
Dear Gael, Thanks for clarification. Yes, I see, there is a need for more broad use of evidence and citations of such methods or approaches. This is somehow what I was thinking. By looking here at sister projects https://scikit-learn.org/stable/related_projects.html#related-projects or especially package "Boruta" https://github.com/scikit-learn-contrib/boruta_py, small question for a hint: do you think such a pip package as Boruta could be closest fit by implementing it with the cancer benchmark dataset, and replicating the paper results? Certainly, potential is to benchmark and publish on RFE and EFS how they go along the benchmark, and demonstrate on diverse high-dimensional datasets coming from other domains by other publications. Doing that is a long term journey to show the usefulness of the method/algorithm. Best, Dalibor On Sun, Sep 24, 2023, 21:37 Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
Dear Dalibor,
As detailed in the FAQ,
https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for... """ We only consider well-established algorithms for inclusion. A rule of thumb is at least 3 years since publication, 200+ citations, and wide use and usefulness. """
These days, I would say that the bar is even harder, as we are finding that we prioritize things such as high-quality documentation or better dataframe support to new algorithms.
Best,
Gaël
On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
Dear scikit-learn mailing list
similarly to standing feature_selection.RFE and RFECV, this is a request to openly discuss the PROPOSAL and requirements of feature_selection.EFS and/or EFSCV which would stand for "Evolutionary Feature Selection" with starting 8 algorithms or methods to be used with scikit-learn estimators, just as published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to help integrate it (in cc).
PROPOSAL Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn:
1) CODE
• implementing feature_selection.EFS and/or EFSCV (a space for evolutionary computing community interested in feature selection)
RFE is:
feature_selection.RFE Feature ranking with recursive feature (estimator, *[, ...]) elimination.
feature_selection.RFECV Recursive feature elimination with (estimator, *[, ...]) cross-validation to select features.
The "EFS" could be:
Feature ranking and feature elimination with 8 feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new (estimator, *[, ...]) algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc.
feature_selection.EFSCV Feature elimination with cross-validation to select (estimator, *[, ...]) features
2) DATASETS & CANCER BENCHMARK
• curating and integrating fetch of cancer_benchmark 40 datasets, directly in scikit-learn or externally pullable somehow and maintained (space for contributing expanding high-dimensional datasets on cancer topics).
fetch_cancer-benchmark Loads 40 individual cancer related high-dimensional (*[,, ...]) datasets for benchmarking feature selection methods (classification).
3) TUTORIAL / WEBSITE
• writing tutorial to replicate IEEE paper results with feature_selection.EFS and/or EFSCV on cancer_benchmark (40 datasets)
I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very interesting novelty in working with high-dimensional datasets as it reports small subsets of predictive features selected with SVM, KNN across 40 datasets. Replicability under BSD-3 and high quality under scikit-learn could assure benchmarking novel feature selection algorithms easier - in my very first opinion. Since this is the very first touch of myself with IEEE paper authors and the scikit-learn list altogether, we would welcome some help/guide how integration could work out, and if there is any interest on that line at all.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort < alexandre.gramfort@inria.fr
wrote:
Dear Dalibor
you should discuss this on the main scikit-learn mailing list.
Alex
On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor.hrg@gmail.com>
wrote:
Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
This is a request to openly discuss the idea of potential for feature_selection.EFS which would stand for "Evolutionary Feature Selection" or shortly EFS with starting 8 algorithms as
published in
IEEE https://arxiv.org/abs/2303.10182 by the authors on high-dimensional datasets. I have identified this work to be of
very
interesting novelty in working with high-dimensional datasets, especially for health fields, and it could mean a lot to the ML community and scikit-learn project - in my very first opinion.
A Jupyter Notebook and scikit-learn tutorial replicating this
IEEE
paper/work as feature_selection.EFS and 8 algorithms in it could
be a
near term goal. And eventually, scikit-learn EFSCV and diverse classification algorithms could be benchmarked for "joint paper"
in
JOSS, or a health journal.
My initial idea (doesn't need to be that way or is open to
discussion)
has some first thought like this:
RFE has:
feature_selection.RFE Feature ranking with recursive
feature
(estimator, *[, ...]) elimination.
feature_selection.RFECV Recursive feature elimination with (estimator, *[, ...]) cross-validation to select features.
The "EFS" could have:
Feature ranking and feature elimination
with 8
feature_selection.EFS different algorithms, SFE, SFE-PSO etc.
<- new
(estimator, *[, ...]) algorithms could be added and
benchmarked with
evolutionary computing, swarm, genetic
etc.
feature_selection.EFSCV Feature elimination with
cross-validation to
(estimator, *[, ...]) select features
Looking forward to an open discussion and if Evolutionary Feature Selection EFS is something for sklearn project, or maybe a
separate pip
install package.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <
b.ahadzade@yahoo.com
> wrote:
Dear Dalibor Hrg,
Thank you very much for your attention to the SFE algorithm.
Thank
you very much for the time you took to guide me and my
colleagues.
According to your guidance, we will add this algorithm to the scikit-learn library as soon as possible.
Kind regards, Ahadzadeh. On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30,
Dalibor
Hrg <dalibor.hrg@gmail.com> wrote:
Dear Authors,
you have done some amazing work on feature selection here
published
in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed
Python
code here without a LICENSE file or any info on
this: https://
github.com/Ahadzadeh2022/SFE and in the paper some links are mentioned to download data.
I would be interested with you that we:
Step 1) make and release a pip package, publish this code in
JOSS
https://joss.theoj.org/papers/
10.21105/joss.04611 under BSD-3 license and replicate IEEE
paper
table results. All 8 algorithms could be in potentially one
class
"EFS" meaning "Evolutionary Feature Selection", selectable
as 8
options among them SFE. Or something like that.
Step 2) try integrate and work with scikit-learn people, I
would
recommend it to integrate this under
stable/modules/classes.html#module-sklearn.feature_selection similarly to sklearn.feature_selection.RFE. I believe this
would
be a great contribution to the best open library for ML, scikit-learn.
I am unsure what is the status of datasets and licenses
therein?.
But, the datasets could be fetched externally from OpenML.org repository, for example
https://scikit-learn.org/stable/datasets/
loading_other_datasets.html or CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit on the dataset licenses?
Overall, I hope this can hugely maximize your published work visibility but also for others to credit you in papers in a
more
citable and replicable way. I believe your IEEE paper and
work
definitely deserve a spot in scikit-learn. There is need for
some
replicable code on "Evolutionary Methods for Feature
Selection" and
such Benchmark in life-science datasets, and you have done
some
great work so far.
Let me know what you think.
Best regards, Dalibor Hrg
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
p.s. As of efforts, I fully agree as written in FAQ. I wonder if it could be an EU project for going multiple domain high-dimensional datasets. It looks as opportunity to discuss in virtual coffee if anybody interested. I am unsure if scikit-learn community or groups collaborate mutually for investigating directions or maintaining through funded projects, but just saying. Perhaps an opportunity along this discussion. Cherio Dalibor On Sun, Sep 24, 2023 at 3:29 AM Dalibor Hrg <dalibor.hrg@gmail.com> wrote:
Dear Gael,
Thanks for clarification. Yes, I see, there is a need for more broad use of evidence and citations of such methods or approaches. This is somehow what I was thinking.
By looking here at sister projects https://scikit-learn.org/stable/related_projects.html#related-projects or especially package "Boruta" https://github.com/scikit-learn-contrib/boruta_py, small question for a hint: do you think such a pip package as Boruta could be closest fit by implementing it with the cancer benchmark dataset, and replicating the paper results?
Certainly, potential is to benchmark and publish on RFE and EFS how they go along the benchmark, and demonstrate on diverse high-dimensional datasets coming from other domains by other publications. Doing that is a long term journey to show the usefulness of the method/algorithm.
Best, Dalibor
On Sun, Sep 24, 2023, 21:37 Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
Dear Dalibor,
As detailed in the FAQ,
https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for... """ We only consider well-established algorithms for inclusion. A rule of thumb is at least 3 years since publication, 200+ citations, and wide use and usefulness. """
These days, I would say that the bar is even harder, as we are finding that we prioritize things such as high-quality documentation or better dataframe support to new algorithms.
Best,
Gaël
On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
Dear scikit-learn mailing list
similarly to standing feature_selection.RFE and RFECV, this is a request to openly discuss the PROPOSAL and requirements of feature_selection.EFS and/or EFSCV which would stand for "Evolutionary Feature Selection" with starting 8 algorithms or methods to be used with scikit-learn estimators, just as published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to help integrate it (in cc).
PROPOSAL Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn:
1) CODE
• implementing feature_selection.EFS and/or EFSCV (a space for evolutionary computing community interested in feature selection)
RFE is:
feature_selection.RFE Feature ranking with recursive feature (estimator, *[, ...]) elimination.
feature_selection.RFECV Recursive feature elimination with (estimator, *[, ...]) cross-validation to select features.
The "EFS" could be:
Feature ranking and feature elimination with 8 feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new (estimator, *[, ...]) algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc.
feature_selection.EFSCV Feature elimination with cross-validation to select (estimator, *[, ...]) features
2) DATASETS & CANCER BENCHMARK
• curating and integrating fetch of cancer_benchmark 40 datasets, directly in scikit-learn or externally pullable somehow and maintained (space for contributing expanding high-dimensional datasets on cancer topics).
fetch_cancer-benchmark Loads 40 individual cancer related high-dimensional (*[,, ...]) datasets for benchmarking feature selection methods (classification).
3) TUTORIAL / WEBSITE
• writing tutorial to replicate IEEE paper results with feature_selection.EFS and/or EFSCV on cancer_benchmark (40 datasets)
I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very interesting novelty in working with high-dimensional datasets as it reports small subsets of predictive features selected with SVM, KNN across 40 datasets. Replicability under BSD-3 and high quality under scikit-learn could assure benchmarking novel feature selection algorithms easier - in my very first opinion. Since this is the very first touch of myself with IEEE paper authors and the scikit-learn list altogether, we would welcome some help/guide how integration could work out, and if there is any interest on that line at all.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort < alexandre.gramfort@inria.fr
wrote:
Dear Dalibor
you should discuss this on the main scikit-learn mailing list.
Alex
On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor.hrg@gmail.com>
wrote:
Dear sklearn feature_selection.RFE Team and IEEE Authors
(in-cc),
This is a request to openly discuss the idea of potential for feature_selection.EFS which would stand for "Evolutionary
Feature
Selection" or shortly EFS with starting 8 algorithms as
published in
IEEE https://arxiv.org/abs/2303.10182 by the authors on high-dimensional datasets. I have identified this work to be of
very
interesting novelty in working with high-dimensional datasets, especially for health fields, and it could mean a lot to the ML community and scikit-learn project - in my very first opinion.
A Jupyter Notebook and scikit-learn tutorial replicating this
IEEE
paper/work as feature_selection.EFS and 8 algorithms in it
could be a
near term goal. And eventually, scikit-learn EFSCV and diverse classification algorithms could be benchmarked for "joint
paper" in
JOSS, or a health journal.
My initial idea (doesn't need to be that way or is open to
discussion)
has some first thought like this:
RFE has:
feature_selection.RFE Feature ranking with recursive
feature
(estimator, *[, ...]) elimination.
feature_selection.RFECV Recursive feature elimination with (estimator, *[, ...]) cross-validation to select features.
The "EFS" could have:
Feature ranking and feature elimination
with 8
feature_selection.EFS different algorithms, SFE, SFE-PSO etc.
<- new
(estimator, *[, ...]) algorithms could be added and
benchmarked with
evolutionary computing, swarm, genetic
etc.
feature_selection.EFSCV Feature elimination with
cross-validation to
(estimator, *[, ...]) select features
Looking forward to an open discussion and if Evolutionary
Feature
Selection EFS is something for sklearn project, or maybe a
separate pip
install package.
Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/
On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <
b.ahadzade@yahoo.com
> wrote:
Dear Dalibor Hrg,
Thank you very much for your attention to the SFE
algorithm. Thank
you very much for the time you took to guide me and my
colleagues.
According to your guidance, we will add this algorithm to
the
scikit-learn library as soon as possible.
Kind regards, Ahadzadeh. On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30,
Dalibor
Hrg <dalibor.hrg@gmail.com> wrote:
Dear Authors,
you have done some amazing work on feature selection here
published
in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed
Python
code here without a LICENSE file or any info on
this: https://
github.com/Ahadzadeh2022/SFE and in the paper some links
are
mentioned to download data.
I would be interested with you that we:
Step 1) make and release a pip package, publish this code
in JOSS
https://joss.theoj.org/papers/
10.21105/joss.04611 under BSD-3 license and replicate IEEE
paper
table results. All 8 algorithms could be in potentially one
class
"EFS" meaning "Evolutionary Feature Selection", selectable
as 8
options among them SFE. Or something like that.
Step 2) try integrate and work with scikit-learn people, I
would
recommend it to integrate this under
stable/modules/classes.html#module-sklearn.feature_selection similarly to sklearn.feature_selection.RFE. I believe this
would
be a great contribution to the best open library for ML, scikit-learn.
I am unsure what is the status of datasets and licenses
therein?.
But, the datasets could be fetched externally from
OpenML.org
repository, for example
https://scikit-learn.org/stable/datasets/
loading_other_datasets.html or CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit on the dataset licenses?
Overall, I hope this can hugely maximize your published work visibility but also for others to credit you in papers in a
more
citable and replicable way. I believe your IEEE paper and
work
definitely deserve a spot in scikit-learn. There is need
for some
replicable code on "Evolutionary Methods for Feature
Selection" and
such Benchmark in life-science datasets, and you have done
some
great work so far.
Let me know what you think.
Best regards, Dalibor Hrg
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
Dalibor Hrg -
Gael Varoquaux -
Ulderico Santarelli