[scikit-learn] Sprint discussion points?

Joel Nothman joel.nothman at gmail.com
Wed Feb 13 23:28:55 EST 2019


Convergence in logistic regression (
https://github.com/scikit-learn/scikit-learn/issues/11536) is indeed one
problem (and it presents a general issue of what max_iter means when you
have several solvers, or how good defaults are selected). But I was sure we
had problems with non-determinism on some platforms... but now can't find.

> my students have basically no way to figure out what features the
coefficients in their linear model correspond to, that seems a bit more
important to me.

Yes, I agree... Assuming coefficients are helpful, rather than using
permutation-based measures of importance, for instance.

I generally think a review of distances might be a good thing at some
point, given the confusing triplication across sklearn.neighbors,
sklearn.metrics.pairwise, scipy.spatial... and that minkowski,p=2 is not
implemented the same as euclidean.


On Thu, 14 Feb 2019 at 12:56, Andreas Mueller <t3kcit at gmail.com> wrote:

> Do you have a reference for the logistic regression stability? Is it
> convergence warnings?
>
> Happy to discuss the other two issues, though I feel they seem easier than
> most of what's on my list.
>
> I have no idea what's going on with OPTICS tbh, and I'll leave it up to
> you and the others to decide whether that's something we should discuss.
> I can try to read up and weigh in but that might not be the most effective
> way to do it.
>
> the sample props is something I left out because I personally don't feel
> it's a priority compared to all the other things;
> my students have basically no way to figure out what features the
> coefficients in their linear model correspond to, that seems a bit more
> important to me.
>
> We can put it on the discussion list again, but I'm not super enthusiastic
> about it.
>
> How should we prioritize things?
>
>
> On 2/13/19 8:08 PM, Joel Nothman wrote:
>
> Yes, I was thinking the same. I think there are some other core issues to
> solve, such as:
>
> * euclidean_distances numerical issues
> * commitment to ARM testing and debugging
> * logistic regression stability
>
> We should also nut out OPTICS issues or remove it from 0.21. I'm still
> keen on trying to work out sample props (supporting weighted scoring at
> least), but perhaps I'm being persuaded this will never be a top-priority
> requirement, and the solutions add much complexity.
>
> On Thu, 14 Feb 2019 at 07:39, Andreas Mueller <t3kcit at gmail.com> wrote:
>
>> Hey all.
>>
>> Should we collect some discussion points for the sprint?
>>
>> There's an unusual amount of core-devs present and I think we should
>> seize the opportunity.
>> Maybe we should create a page in the wiki or add it to the sprint page?
>>
>> Things that are high on my list of priorities are:
>>
>>    - slicing pipelines
>>    - add get_feature_names to pipelines
>>    - freezing estimator
>>    - faster multi-metric scoring
>>    - fit_transform doing something other than fit.transform
>>    - imbalance-learn interface / subsampling in pipelines
>>    - Specifying search spaces and valid hyper parameters (
>>    https://github.com/scikit-learn/scikit-learn/issues/13031).
>>    - allowing EstimatorCV-style speed-up in GridSearches
>>    - storing pandas column names and using them as feature names
>>
>>
>> Trying to discuss all of these might be too much, but maybe we can figure
>> out a subset and make sure we have sleps to discuss?
>> Most of these issues are on the roadmap, issue 13031 is reladed to #18
>> but not directly on the roadmap.
>>
>> Thanks,
>> Andy
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190214/e277488f/attachment.html>


More information about the scikit-learn mailing list