-------- Forwarded Message -------- Subject: Re: [scikit-learn] Topic for thesis work on scikit learn Date: Tue, 23 Jan 2018 10:16:36 -0500 From: Andreas Mueller <t3kcit@gmail.com> To: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com> Hi Gaurav. Is your mentor experienced in contributing to sklearn? Will they be able to review your code to the scikit-learn standards? Have you worked on any other pull requests so far? Getting anything into scikit-learn without close collaboration with the community is quite tricky. Having a faster K-means implementation based on recent research in the area would be interesting, There's also interest in adding Robust PCA, probabilistic inference trees, and improving the latent dirichlet alloctation code. You can find issues on any of these in the issue tracker, which also has many more feature requests. Andy On 12/31/2017 05:46 AM, Gaurav Dhingra wrote:
Hi Andreas,
I think I'll get access to a local mentor from my college, so I think I rule that issue out, though for technicalities still I would /like/ to be more dependent on feedback from the scikit-learn community, since my aim wouldn't be to make something for my own use but rather something that would be more useful for the scikit-learn community, so that it eventually gets merged into master.
I'm currently looking for topic that I can take up, I tried looking into scikit-learn wiki but it doesn't mention for what I'm looking for (no topic is mentioned). Do you have some topic in mind that could be useful for addition to scikit-learn? Even if you could direct me to appropriate links I would be happy to look into those.
On Wednesday 01 November 2017 01:43 AM, Andreas Mueller wrote:
Hi Gaurav.
Do you have a local mentor? I think having a mentor that can guide you during a thesis is very important. You could get some feedback from the community for a contribution, but that can be slow, and is entirely on volunteer basis, so there is no guarantee that you'll get the necessary feedback in time to finish your thesis.
Mentoring a thesis - in particular without knowing you - is a serious commitment, so I'm not sure someone from inside the project will want to do this. I saw you already made a contribution in https://github.com/scikit-learn/scikit-learn/pull/10005 but that's a very different scope than doing what I expect would be several month of work.
Though in this regard I've made a few more contributions, here is the link https://github.com/scikit-learn/scikit-learn/pulls/gxyd, though I know none of them is a big contribution. If you think I should work on a big enough PR, can you please suggest me some issue in that regard?
Thanks.
Best, Andy
On 10/31/2017 03:31 PM, Gaurav Dhingra wrote:
Hi everyone,
I am a final year (5th year) undergraduate Applied Mathematics student in India. I am thinking of doing my final year thesis by doing some work (coding part) on scikit learn, so I was thinking if anyone could tell me if there are available topics (not necessarily names of those topics) that I could work on being an undergraduate student? I would want to expand upon this in December when my exams will be over. But in the mean time would want to take a step in that direction by just knowing if there will be available topics that I could work on.
It could be the case that available topics are not so easy for an undergraduate, still in that case I would like to do some research on the topics first.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gaurav Dhingra (sent from Thunderbird email client)
Hi Andreas, On Tuesday 23 January 2018 09:12 PM, Gaurav Dhingra wrote:
-------- Forwarded Message -------- Subject: Re: [scikit-learn] Topic for thesis work on scikit learn Date: Tue, 23 Jan 2018 10:16:36 -0500 From: Andreas Mueller <t3kcit@gmail.com> To: Gaurav Dhingra <gauravdhingra.gxyd@gmail.com>
Hi Gaurav.
Is your mentor experienced in contributing to sklearn?
No, she isn't.
Will they be able to review your code to the scikit-learn standards?
No.
Have you worked on any other pull requests so far?
I've on a few. Please have a look at https://github.com/scikit-learn/scikit-learn/pulls/gxyd, infact I expect that 3 of the open PR's will be merged soon.
Getting anything into scikit-learn without close collaboration with the community is quite tricky.
Having a faster K-means implementation based on recent research in the area would be interesting, There's also interest in adding Robust PCA, probabilistic inference trees, and improving the latent dirichlet alloctation code.
I tried to look into what /scikit-learn community/////devs/ consider a priority to have in their code-base (instead of me looking explicitly for topics I like). When I looked, I thought of https://github.com/scikit-learn/scikit-learn/issues/8337, or https://github.com/scikit-learn/scikit-learn/issues/6557 as the possible topics. But since I'm aware that unavailability of yours (busy in teaching purpose can be an issue), so I simultaneously looked for other options. I'd a conversation with Joel (he was kind enough to PM me), this is what he said (only the important part of conversation): | Tricky thinngs we’ve been trying to do for years: | * estimator tags | * sample props | tools for optimising cluster parameters (e.g. #6948) | sample props == #4497 and associated | related to clusterer parameters, #6160 | estimator tags relates to #6715 | #6777 looks tricky from an ML perspective. I'm thinking of choosing https://github.com/scikit-learn/scikit-learn/pull/6948 (ENH optimal n_clusters value), i.e completing that PR. If you will be having availability to review my PR's (if I do open them), then I'd glad to work with you on either /Conditional inference trees /or /adding post-pruning for decision trees/. I'm aware as Joel earlier put it /Andreas has escaped into the teaching world/. Anyways, I don't expect my guide to provide me feedback in regards to scikit-learn code, though she will have theoretical explanation to my questions definitely. Also, since we can also have a co-guide (apart from local guide), I would definitely consider that as an option for someone from scikit-learn, even if it be you or may be Joel. But even Joel is expected to get back to academic world as well. If things don't go a little positive (neither you or Joel or may be someone else from scikit-learn community is available), I'm gonna be taking a little longer but I'll eventually get there probably.
You can find issues on any of these in the issue tracker, which also has many more feature requests.
Andy
On 12/31/2017 05:46 AM, Gaurav Dhingra wrote:
Hi Andreas,
I think I'll get access to a local mentor from my college, so I think I rule that issue out, though for technicalities still I would /like/ to be more dependent on feedback from the scikit-learn community, since my aim wouldn't be to make something for my own use but rather something that would be more useful for the scikit-learn community, so that it eventually gets merged into master.
I'm currently looking for topic that I can take up, I tried looking into scikit-learn wiki but it doesn't mention for what I'm looking for (no topic is mentioned). Do you have some topic in mind that could be useful for addition to scikit-learn? Even if you could direct me to appropriate links I would be happy to look into those.
On Wednesday 01 November 2017 01:43 AM, Andreas Mueller wrote:
Hi Gaurav.
Do you have a local mentor? I think having a mentor that can guide you during a thesis is very important. You could get some feedback from the community for a contribution, but that can be slow, and is entirely on volunteer basis, so there is no guarantee that you'll get the necessary feedback in time to finish your thesis.
Mentoring a thesis - in particular without knowing you - is a serious commitment, so I'm not sure someone from inside the project will want to do this. I saw you already made a contribution in https://github.com/scikit-learn/scikit-learn/pull/10005 but that's a very different scope than doing what I expect would be several month of work.
Though in this regard I've made a few more contributions, here is the link https://github.com/scikit-learn/scikit-learn/pulls/gxyd, though I know none of them is a big contribution. If you think I should work on a big enough PR, can you please suggest me some issue in that regard?
Thanks.
Best, Andy
On 10/31/2017 03:31 PM, Gaurav Dhingra wrote:
Hi everyone,
I am a final year (5th year) undergraduate Applied Mathematics student in India. I am thinking of doing my final year thesis by doing some work (coding part) on scikit learn, so I was thinking if anyone could tell me if there are available topics (not necessarily names of those topics) that I could work on being an undergraduate student? I would want to expand upon this in December when my exams will be over. But in the mean time would want to take a step in that direction by just knowing if there will be available topics that I could work on.
It could be the case that available topics are not so easy for an undergraduate, still in that case I would like to do some research on the topics first.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gaurav Dhingra (sent from Thunderbird email client)
-- Gaurav Dhingra (sent from Thunderbird email client)
On 01/23/2018 11:09 AM, Gaurav Dhingra wrote:
| Tricky thinngs we’ve been trying to do for years: | * estimator tags | * sample props
I actually have a student working on estimator tags right now.
I'm thinking of choosing https://github.com/scikit-learn/scikit-learn/pull/6948 (ENH optimal n_clusters value), i.e completing that PR. If you will be having availability to review my PR's (if I do open them), then I'd glad to work with you on either /Conditional inference trees /or /adding post-pruning for decision trees/. No, I don't have time to review PRs.
If things don't go a little positive (neither you or Joel or may be someone else from scikit-learn community is available), I'm gonna be taking a little longer but I'll eventually get there probably. No, it's not possible to contribute to scikit-learn without working with someone from the community. Each pull request requires two reviewers to be merged. And that is usually a prolonged process of back and forth. Without someone stepping up to review, you can't get your code in.
participants (2)
-
Andreas Mueller -
Gaurav Dhingra