[scikit-learn] Sprint discussion points?

Thu Feb 14 11:40:05 EST 2019

> or we could go as far as to schedule meetings on the different topics.

Given the number of issues to discuss this is probably the best approach IMO

On 2/14/19 8:31 AM, Andreas Mueller wrote:
>
> As I said, I think it's too much and we need to prioritize.
>
> We could either rank issues and start with some and see how far we 
> get, or we could go as far as to schedule meetings on the different 
> topics.
>
> Also, I'll be only arriving Tuesday late morning, I think.
>
>
> On 2/14/19 8:05 AM, Adrin wrote:
>> I've been working on some bias mitigation metrics and methods and 
>> that usecase
>> changes the data as well as up/down sampling as a transformer. Almost 
>> all those
>> methods also need sample properties for the observations to work. I'm 
>> trying to
>> make them "sklearn compatible", but for now it's pretty hacky. So I'd 
>> be happy if
>> we discuss the union of what Joel and Andy suggest.
>>
>> Cheers,
>> Adrin.
>>
>> On Thu, Feb 14, 2019, 11:47 Guillaume Lemaître 
>> <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com> wrote:
>>
>>     I am really interested in the union of the list given by Andy and
>>     Joel.
>>
>>     I'll like to have some discussions related to the "impute"
>>     module. Compare to the other topics, it is not a high priority
>>     discussion thought.
>>
>>     On Thu, 14 Feb 2019 at 05:31, Joel Nothman
>>     <joel.nothman at gmail.com <mailto:joel.nothman at gmail.com>> wrote:
>>
>>         Convergence in logistic regression
>>         (https://github.com/scikit-learn/scikit-learn/issues/11536) is
>>         indeed one problem (and it presents a general issue of what
>>         max_iter means when you have several solvers, or how good
>>         defaults are selected). But I was sure we had problems with
>>         non-determinism on some platforms... but now can't find.
>>
>>         > my students have basically no way to figure out what
>>         features the coefficients in their linear model correspond
>>         to, that seems a bit more important to me.
>>
>>         Yes, I agree... Assuming coefficients are helpful, rather
>>         than using permutation-based measures of importance, for
>>         instance.
>>
>>         I generally think a review of distances might be a good thing
>>         at some point, given the confusing triplication across
>>         sklearn.neighbors, sklearn.metrics.pairwise, scipy.spatial...
>>         and that minkowski,p=2 is not implemented the same as euclidean.
>>
>>
>>         On Thu, 14 Feb 2019 at 12:56, Andreas Mueller
>>         <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>>
>>             Do you have a reference for the logistic regression
>>             stability? Is it convergence warnings?
>>
>>             Happy to discuss the other two issues, though I feel they
>>             seem easier than most of what's on my list.
>>
>>             I have no idea what's going on with OPTICS tbh, and I'll
>>             leave it up to you and the others to decide whether
>>             that's something we should discuss.
>>             I can try to read up and weigh in but that might not be
>>             the most effective way to do it.
>>
>>             the sample props is something I left out because I
>>             personally don't feel it's a priority compared to all the
>>             other things;
>>             my students have basically no way to figure out what
>>             features the coefficients in their linear model
>>             correspond to, that seems a bit more important to me.
>>
>>             We can put it on the discussion list again, but I'm not
>>             super enthusiastic about it.
>>
>>             How should we prioritize things?
>>
>>
>>             On 2/13/19 8:08 PM, Joel Nothman wrote:
>>>             Yes, I was thinking the same. I think there are some
>>>             other core issues to solve, such as:
>>>
>>>             * euclidean_distances numerical issues
>>>             * commitment to ARM testing and debugging
>>>             * logistic regression stability
>>>
>>>             We should also nut out OPTICS issues or remove it from
>>>             0.21. I'm still keen on trying to work out sample props
>>>             (supporting weighted scoring at least), but perhaps I'm
>>>             being persuaded this will never be a top-priority
>>>             requirement, and the solutions add much complexity.
>>>
>>>             On Thu, 14 Feb 2019 at 07:39, Andreas Mueller
>>>             <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>>>
>>>                 Hey all.
>>>
>>>                 Should we collect some discussion points for the sprint?
>>>
>>>                 There's an unusual amount of core-devs present and I
>>>                 think we should seize the opportunity.
>>>                 Maybe we should create a page in the wiki or add it
>>>                 to the sprint page?
>>>
>>>                 Things that are high on my list of priorities are:
>>>
>>>                   * slicing pipelines
>>>                   * add get_feature_names to pipelines
>>>                   * freezing estimator
>>>                   * faster multi-metric scoring
>>>                   * fit_transform doing something other than
>>>                     fit.transform
>>>                   * imbalance-learn interface / subsampling in pipelines
>>>                   * Specifying search spaces and valid hyper
>>>                     parameters
>>>                     (https://github.com/scikit-learn/scikit-learn/issues/13031).
>>>                   * allowing EstimatorCV-style speed-up in GridSearches
>>>                   * storing pandas column names and using them as
>>>                     feature names
>>>
>>>
>>>                 Trying to discuss all of these might be too much,
>>>                 but maybe we can figure out a subset and make sure
>>>                 we have sleps to discuss?
>>>                 Most of these issues are on the roadmap, issue 13031
>>>                 is reladed to #18 but not directly on the roadmap.
>>>
>>>                 Thanks,
>>>                 Andy
>>>                 _______________________________________________
>>>                 scikit-learn mailing list
>>>                 scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>>             _______________________________________________
>>>             scikit-learn mailing list
>>>             scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>>             https://mail.python.org/mailman/listinfo/scikit-learn
>>             _______________________________________________
>>             scikit-learn mailing list
>>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>>             https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>         _______________________________________________
>>         scikit-learn mailing list
>>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>>         https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>
>>     -- 
>>     Guillaume Lemaitre
>>     INRIA Saclay - Parietal team
>>     Center for Data Science Paris-Saclay
>>     https://glemaitre.github.io/
>>     _______________________________________________
>>     scikit-learn mailing list
>>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190214/1f507706/attachment-0001.html>