[scikit-learn] Contribute to Scikit-learn

Sebastian Raschka mail at sebastianraschka.com
Mon Sep 3 12:43:22 EDT 2018


Hi all,

first of all, I think that having more feature selection capabilities in scikit-learn would be nice, especially, an algorithm from the wrapper category that also regards dependence/interaction between features.

Regarding the SequentialFeatureSelection class... We actually decided to simplify this a little bit (compared to the mlxtend variant) and only include the "simple" or "regular" forward and backward selection, and not the floating variants. So, we probably don't want to go overboard and have too many comprehensive algos in a core package such as sklearn, but focus on the main ones whereas we can delegate others (e.g., genetic algorithms, which may implementation-wise rely on an external  GP package?) to contrib projects?

Anyway, regarding the PR ...
I didn't mean to drag in on for that long, but between PR and review, other things always came up and I never got around adding the docs -- I actually forgot at some point then. I think the current state is that the implementation is more or less ok and just needs some polishing maybe. Primarily, what's missing though are the docs and more comprehensive unit tests. This is something I can do in the next few days or weeks (now that I am aware of it) but I also wouldn't mind if someone else works on it.

So, let me know if you like to work on the PR, and otherwise, I will make a note for next weekend to look into adding the docs. In any case though, I would appreciate feedback regarding the current implementation.

Best,
Sebastian

> On Sep 3, 2018, at 7:50 AM, Guillaume Lemaître <g.lemaitre58 at gmail.com> wrote:
> 
> I would add that Sequential Forward Selection is on the way to be
> ported by Sebastian (@rabst)
> to scikit-learn:
> 
> https://github.com/scikit-learn/scikit-learn/pull/8684
> 
> However, I am sure that Sebastian would be grateful if you wish to
> take over the PR and to move it forward.
> But Sebastian is probably going to comment himself ;)
> 
> Cheers,
> On Mon, 3 Sep 2018 at 13:35, Oliver Tomic <olivertomic at zoho.com> wrote:
>> 
>> Hi Shuki and Yaniv,
>> 
>> the sequential forward selection algorithm is already implemented in the mlxtend python package, which is complimentary to scikit learn.
>> https://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
>> 
>> best wishes
>> Oliver
>> 
>> 
>> 
>> 
>> ---- On Mon, 03 Sep 2018 13:17:30 +0200 Shuki Cohen <shokyco at gmail.com> wrote ----
>> 
>> 
>> 
>> On Mon, Sep 3, 2018 at 1:21 PM Shuki Cohen <shokyco at gmail.com> wrote:
>> 
>> Hi all,
>> 
>> Me and a friend of mine found lack of feature selection functionalities in Scikit-learn and we thought to contribute in order to answer this need. More specifically, we want to add:
>> 1. Sequential Forward Selection algorithm
>> 2. Multivariate Feature Selection
>> to the Scikit-learn code base, and this mail is to get your approval that such a project has good chances to be added to the next version.
>> 
>> Thanks in advance
>> Shuki & Yaniv
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> 
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list