[scikit-learn] Sprint discussion points?

Andreas Mueller t3kcit at gmail.com
Thu Feb 14 08:31:27 EST 2019


As I said, I think it's too much and we need to prioritize.

We could either rank issues and start with some and see how far we get, 
or we could go as far as to schedule meetings on the different topics.

Also, I'll be only arriving Tuesday late morning, I think.


On 2/14/19 8:05 AM, Adrin wrote:
> I've been working on some bias mitigation metrics and methods and that 
> usecase
> changes the data as well as up/down sampling as a transformer. Almost 
> all those
> methods also need sample properties for the observations to work. I'm 
> trying to
> make them "sklearn compatible", but for now it's pretty hacky. So I'd 
> be happy if
> we discuss the union of what Joel and Andy suggest.
>
> Cheers,
> Adrin.
>
> On Thu, Feb 14, 2019, 11:47 Guillaume Lemaître <g.lemaitre58 at gmail.com 
> <mailto:g.lemaitre58 at gmail.com> wrote:
>
>     I am really interested in the union of the list given by Andy and
>     Joel.
>
>     I'll like to have some discussions related to the "impute" module.
>     Compare to the other topics, it is not a high priority discussion
>     thought.
>
>     On Thu, 14 Feb 2019 at 05:31, Joel Nothman <joel.nothman at gmail.com
>     <mailto:joel.nothman at gmail.com>> wrote:
>
>         Convergence in logistic regression
>         (https://github.com/scikit-learn/scikit-learn/issues/11536) is
>         indeed one problem (and it presents a general issue of what
>         max_iter means when you have several solvers, or how good
>         defaults are selected). But I was sure we had problems with
>         non-determinism on some platforms... but now can't find.
>
>         > my students have basically no way to figure out what
>         features the coefficients in their linear model correspond to,
>         that seems a bit more important to me.
>
>         Yes, I agree... Assuming coefficients are helpful, rather than
>         using permutation-based measures of importance, for instance.
>
>         I generally think a review of distances might be a good thing
>         at some point, given the confusing triplication across
>         sklearn.neighbors, sklearn.metrics.pairwise, scipy.spatial...
>         and that minkowski,p=2 is not implemented the same as euclidean.
>
>
>         On Thu, 14 Feb 2019 at 12:56, Andreas Mueller
>         <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>
>             Do you have a reference for the logistic regression
>             stability? Is it convergence warnings?
>
>             Happy to discuss the other two issues, though I feel they
>             seem easier than most of what's on my list.
>
>             I have no idea what's going on with OPTICS tbh, and I'll
>             leave it up to you and the others to decide whether that's
>             something we should discuss.
>             I can try to read up and weigh in but that might not be
>             the most effective way to do it.
>
>             the sample props is something I left out because I
>             personally don't feel it's a priority compared to all the
>             other things;
>             my students have basically no way to figure out what
>             features the coefficients in their linear model correspond
>             to, that seems a bit more important to me.
>
>             We can put it on the discussion list again, but I'm not
>             super enthusiastic about it.
>
>             How should we prioritize things?
>
>
>             On 2/13/19 8:08 PM, Joel Nothman wrote:
>>             Yes, I was thinking the same. I think there are some
>>             other core issues to solve, such as:
>>
>>             * euclidean_distances numerical issues
>>             * commitment to ARM testing and debugging
>>             * logistic regression stability
>>
>>             We should also nut out OPTICS issues or remove it from
>>             0.21. I'm still keen on trying to work out sample props
>>             (supporting weighted scoring at least), but perhaps I'm
>>             being persuaded this will never be a top-priority
>>             requirement, and the solutions add much complexity.
>>
>>             On Thu, 14 Feb 2019 at 07:39, Andreas Mueller
>>             <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>>
>>                 Hey all.
>>
>>                 Should we collect some discussion points for the sprint?
>>
>>                 There's an unusual amount of core-devs present and I
>>                 think we should seize the opportunity.
>>                 Maybe we should create a page in the wiki or add it
>>                 to the sprint page?
>>
>>                 Things that are high on my list of priorities are:
>>
>>                   * slicing pipelines
>>                   * add get_feature_names to pipelines
>>                   * freezing estimator
>>                   * faster multi-metric scoring
>>                   * fit_transform doing something other than
>>                     fit.transform
>>                   * imbalance-learn interface / subsampling in pipelines
>>                   * Specifying search spaces and valid hyper
>>                     parameters
>>                     (https://github.com/scikit-learn/scikit-learn/issues/13031).
>>                   * allowing EstimatorCV-style speed-up in GridSearches
>>                   * storing pandas column names and using them as
>>                     feature names
>>
>>
>>                 Trying to discuss all of these might be too much, but
>>                 maybe we can figure out a subset and make sure we
>>                 have sleps to discuss?
>>                 Most of these issues are on the roadmap, issue 13031
>>                 is reladed to #18 but not directly on the roadmap.
>>
>>                 Thanks,
>>                 Andy
>>                 _______________________________________________
>>                 scikit-learn mailing list
>>                 scikit-learn at python.org <mailto:scikit-learn at python.org>
>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>             _______________________________________________
>>             scikit-learn mailing list
>>             scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>             https://mail.python.org/mailman/listinfo/scikit-learn
>             _______________________________________________
>             scikit-learn mailing list
>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>             https://mail.python.org/mailman/listinfo/scikit-learn
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
>     -- 
>     Guillaume Lemaitre
>     INRIA Saclay - Parietal team
>     Center for Data Science Paris-Saclay
>     https://glemaitre.github.io/
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190214/af2cb9c6/attachment-0001.html>


More information about the scikit-learn mailing list