From hadrien.lorenzo at inria.fr Mon Mar 1 08:19:28 2021
From: hadrien.lorenzo at inria.fr (Hadrien Lorenzo)
Date: Mon, 1 Mar 2021 14:19:28 +0100 (CET)
Subject: [scikit-learn] Sparse Partial Least Squares (PLS)
In-Reply-To: <766112199.7795329.1614601648384.JavaMail.zimbra@inria.fr>
References: <766112199.7795329.1614601648384.JavaMail.zimbra@inria.fr>
Message-ID: <658313534.7832967.1614604768708.JavaMail.zimbra@inria.fr>
Dear Maintainers,
I work on sparse PLS from now many years (doc+postDoc in INRIA and INSERM, see [ https://hadrienlorenzo.netlify.app/ | https://hadrienlorenzo.netlify.app ] for light view) and published about applications. Main problems are about dealing with missing values in the multi-output and degenerate n<
From j.tan at rug.nl Mon Mar 1 12:12:31 2021
From: j.tan at rug.nl (Tan, J.)
Date: Mon, 1 Mar 2021 18:12:31 +0100
Subject: [scikit-learn] Invitation to participate in a survey about
Scikit-Learn
Message-ID:
Dear Scikit-Learn contributor,
We are doing research on understanding how developers manage a special kind
of Technical Debt in *Python.*
We kindly ask 15-20 minutes of your time to fill out our survey. To help
you decide whether to fill it in, we clarify two points.
?Why should I answer this survey??
Your participation is essential for us to correctly understand how
developers manage Technical Debt.
?What is in it for me??
Your valuable contributions to *Scikit-Learn* are part of the information
we analyzed for this study. Thus, if you help us further by answering
this survey, there are two immediate benefits:
- you help to improve the efficiency of maintaining the quality of
*Scikit-Learn*.
- the results will be used to propose recommendations to manage
technical debt and create tool support.
Here is the link to the survey
.
Thank you for your time and attention.
Kind regards,
Jie Tan, Daniel Feitosa and Paris Avgeriou
Software Engineering and Architecture group
Faculty of Science and Engineering
University of Groningen, the Netherlands
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From seralouk at hotmail.com Sat Mar 6 15:00:10 2021
From: seralouk at hotmail.com (Serafeim Loukas)
Date: Sat, 6 Mar 2021 20:00:10 +0000
Subject: [scikit-learn] Consensus Clustering
In-Reply-To: <0E057EC7-9B81-470D-8FB4-E723362D2728@hotmail.com>
References: <1071563725.605939.1604163089727.ref@mail.yahoo.com>
<1071563725.605939.1604163089727@mail.yahoo.com>
<0E057EC7-9B81-470D-8FB4-E723362D2728@hotmail.com>
Message-ID: <9C03210A-25F2-41B9-81DC-D99153B92753@hotmail.com>
Hi all,
Is there an implemented method for Consensus Clustering (https://link.springer.com/article/10.1023%2FA%3A1023949509487)?
Cheers,
Makis
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From reshama.stat at gmail.com Mon Mar 8 08:00:00 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Mon, 8 Mar 2021 08:00:00 -0500
Subject: [scikit-learn] Data Umbrella: AFME Sprint Report
Message-ID:
Hello,
The Data Umbrella AFME (Africa & Middle East) scikit-learn sprint was on
February 6, 2021, and the report is now available. [a]
For folks who are interested in contributing, or an educator who would like
to share with students, there are resources for getting started in
contributing to scikit-learn. [b]
[a]
https://reshamas.github.io/data-umbrella-afme-2021-scikit-learn-sprint-report/
[b]
https://www.dataumbrella.org/open-source/contributing-to-scikit-learn
Best,
Reshama
---
Reshama Shaikh
she/her
Blog | Twitter
| LinkedIn | GitHub
Data Umbrella
NYC PyLadies
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From marmochiaskl at gmail.com Tue Mar 9 06:13:17 2021
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Tue, 9 Mar 2021 12:13:17 +0100
Subject: [scikit-learn] scikit-learn github label wiki
Message-ID:
Dear list,
dear triagers,
dear core-devs
I have started a page on the scikit-learn wiki, about the use of labels in
the scikit-learn github repository:
https://github.com/scikit-learn/scikit-learn/wiki/label
The page is meant to provide help in labeling issues and pull requests,
improving the standardization of the workflow and perhaps accelerate the
automation process (thanks to the identification of 'obvious' procedures).
The idea in the end is to make the review more understandable for the
contributors but also making reviewers trust the system a bit more ... :)
Triagers and core devs , feel free to comment and edit there.
List, feel free to comment and ask questions about the triaging process in
this mailing list, any help is welcome!
Thanks for your attention.
Best,
Chiara
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From joel.nothman at gmail.com Tue Mar 9 18:43:06 2021
From: joel.nothman at gmail.com (Joel Nothman)
Date: Wed, 10 Mar 2021 10:43:06 +1100
Subject: [scikit-learn] [Vote] SLEP006: Routing sample-aligned metadata
In-Reply-To:
References:
Message-ID:
We are now twenty days into the voting period. I assume ten remain?
While votes are still scarce, issues that have been raised include:
- The naming of "metadata"
.
There is some support for the word "auxiliary" instead of "meta". (I agree
that weight/groups are not typically what I would call metadata.)
- Whether the API for setting requests should be more generic, e.g.
rolled into the "tags" concept:
https://github.com/scikit-learn/scikit-learn/pull/16079#issuecomment-794091868
.
- Areas in which the WIP PR is still immature and needs substantial
review: even if the SLEP is accepted, this is unlikely to be ready for 1.0,
and the API (although not the structure of the model) will continue to be
refined were the SLEP accepted.
Thanks all for your considered critique and contributions.
On Sat, 27 Feb 2021 at 20:42, Joel Nothman wrote:
> Hi all,
>
> Just a reminder that we are ten days into the month-long voting period,
> with one vote on record. Core devs, please find time to consider this
> proposal. Thanks to Andy's suggestion, we have added an example of the new
> API to the opening section:
>
>
> This SLEP proposes an API where users can request certain metadata to be
> passed to its consumer by the meta-estimator it is wrapped in.
>
> The following example illustrates the new request_metadata parameter for
> making scorers, the request_sample_weight estimator method, the metadata parameter
> replacing fit_params in cross_validate, and the automatic passing of
> groups to
> GroupKFold to enable nested grouped cross validation. Here, the user
> requests that the sample_weight
> metadata
> key should be passed to a customised accuracy scorer (although a predefined
> ?weighted_accuracy? scorer could be introduced), and to the
> LogisticRegressionCV. GroupKFold requests groups
> by default.
>
> >>> from sklearn.metrics import accuracy_score, make_scorer>>> from sklearn.model_selection import cross_validate, GroupKFold>>> from sklearn.linear_model import LogisticRegressionCV>>> weighted_acc = make_scorer(accuracy_score,... request_metadata=['sample_weight'])>>> group_cv = GroupKFold()>>> lr = LogisticRegressionCV(... cv=group_cv,... scoring=weighted_acc,... ).request_sample_weight(fit=True)>>> cross_validate(lr, X, y, cv=group_cv,... metadata={'sample_weight': my_weights,... 'groups': my_groups},... scoring=weighted_acc)
>
>
> On Thu, 18 Feb 2021 at 00:08, Joel Nothman wrote:
>
>> With thanks to Alex, Adrin and Christian, we have a proposal to implement
>> what we used to call "sample props" that should be expressive enough for us
>> to resolve tens of issues and PRs, but will be largely unobtrusive for most
>> current users.
>>
>> Core developers, please cast your vote in this PR
>> after
>> considering the proposal here
>> ,
>> which has a partial implementation in #16079
>> .
>>
>>
>> In brief, the problem we are trying to solve:
>>
>> Scikit-learn has limited support for information pertaining to each
>> sample (henceforth ?sample properties?) to be passed through an estimation
>> pipeline. The user can, for instance, pass fit parameters to all members of
>> a FeatureUnion, or to a specified member of a Pipeline using dunder (__)
>> prefixing:
>>
>> >>> from sklearn.pipeline import Pipeline>>> from sklearn.linear_model import LogisticRegression>>> pipe = Pipeline([('clf', LogisticRegression())])>>> pipe.fit([[1, 2], [3, 4]], [5, 6],... clf__sample_weight=[.5, .7])
>>
>> Several other meta-estimators, such as GridSearchCV, support forwarding
>> these fit parameters to their base estimator when fitting. Yet a number of
>> important use cases are currently not supported.
>>
>> Features we currently do not support and wish to include:
>>
>> - passing sample properties (e.g. sample_weight
>> )
>> to a scorer used in cross-validation
>> - passing sample properties (e.g. groups
>> ) to a CV
>> splitter in nested cross validation
>> - passing sample properties (e.g. sample_weight
>> )
>> to some scorers and not others in a multi-metric cross-validation setup
>>
>> Solution: Each consumer requests
>>
>> A meta-estimator provides along to its children only what they request. A
>> meta-estimator needs to request, on behalf of its children, any metadata
>> that descendant consumers request.
>>
>> Each object that could receive metadata should have a method called
>> get_metadata_request() which returns a dict that specifies which
>> metadata is consumed by each of its methods (keys of this dictionary are
>> therefore method names, e.g. fit
>> , transform
>> etc.).
>> Estimators supporting weighted fitting may return {} by default, but
>> have a method called request_sample_weight which allows the user to
>> specify the requested sample_weight
>> in
>> each of its methods. make_scorer accepts request_metadata as keyword
>> parameter through which the user can specify what metadata is requested.
>>
>> Regards,
>>
>> Joel
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From phamgiaminh0112 at gmail.com Sun Mar 21 09:57:58 2021
From: phamgiaminh0112 at gmail.com (Pham Gia Minh)
Date: Sun, 21 Mar 2021 20:57:58 +0700
Subject: [scikit-learn] Outlier Detection Problem
Message-ID:
Hello scikit-learn group. I am a beginner to libsvm. I'm having an
outlier detection problem. I train the data in the format