[scikit-learn] Fwd: Re: Topic for thesis work on scikit learn

Gaurav Dhingra gauravdhingra.gxyd at gmail.com
Tue Jan 23 11:09:23 EST 2018


Hi Andreas,


On Tuesday 23 January 2018 09:12 PM, Gaurav Dhingra wrote:
>
>
>
>
> -------- Forwarded Message --------
> Subject: 	Re: [scikit-learn] Topic for thesis work on scikit learn
> Date: 	Tue, 23 Jan 2018 10:16:36 -0500
> From: 	Andreas Mueller <t3kcit at gmail.com>
> To: 	Gaurav Dhingra <gauravdhingra.gxyd at gmail.com>
>
>
>
> Hi Gaurav.
>
> Is your mentor experienced in contributing to sklearn?
>

No, she isn't.

> Will they be able to review your code to the scikit-learn standards?
>

No.

> Have you worked on any other pull requests so far?
>

I've on a few. Please have a look at 
https://github.com/scikit-learn/scikit-learn/pulls/gxyd, infact I expect 
that 3 of the open PR's will be merged soon.

> Getting anything into scikit-learn without close collaboration with 
> the community is quite tricky.
>
> Having a faster K-means implementation based on recent research in the 
> area would be interesting,
> There's also interest in adding Robust PCA, probabilistic inference 
> trees, and improving the latent dirichlet alloctation code.
>

I tried to look into what /scikit-learn community/////devs/ consider a 
priority to have in their code-base (instead of me looking explicitly 
for topics I like). When I looked, I thought of 
https://github.com/scikit-learn/scikit-learn/issues/8337, or 
https://github.com/scikit-learn/scikit-learn/issues/6557 as the possible 
topics. But since I'm aware that unavailability of yours (busy in 
teaching purpose can be an issue), so I simultaneously looked for other 
options. I'd a conversation with Joel (he was kind enough to PM me), 
this is what he said (only the important part of conversation):

| Tricky thinngs we’ve been trying to do for years:
|     * estimator tags
|     * sample props
| tools for optimising cluster parameters (e.g. #6948)
| sample props == #4497 and associated
| related to clusterer parameters, #6160
| estimator tags relates to #6715
| #6777 looks tricky from an ML perspective.

I'm thinking of choosing 
https://github.com/scikit-learn/scikit-learn/pull/6948 (ENH optimal 
n_clusters value),  i.e completing that PR. If you will be having 
availability to review my PR's (if I do open them), then I'd glad to 
work with you on either /Conditional inference trees /or /adding 
post-pruning for decision trees/.

I'm aware as Joel earlier put it /Andreas has escaped into the teaching 
world/. Anyways, I don't expect my guide to provide me feedback in 
regards to scikit-learn code, though she will have theoretical 
explanation to my questions definitely. Also, since we can also have a 
co-guide (apart from local guide), I would definitely consider that as 
an option for someone from scikit-learn, even if it be you or may be 
Joel. But even Joel is expected to get back to academic world as well.

If things don't go a little positive (neither you or Joel or may be 
someone else from scikit-learn community is available), I'm gonna be 
taking a little longer but I'll eventually get there probably.

> You can find issues on any of these in the issue tracker, which also 
> has many more feature requests.
>
> Andy
>
>
> On 12/31/2017 05:46 AM, Gaurav Dhingra wrote:
>>
>> Hi Andreas,
>>
>> I think I'll get access to a local mentor from my college, so I think 
>> I rule that issue out, though for technicalities still I would /like/ 
>> to be more dependent on feedback from the scikit-learn community, 
>> since my aim wouldn't be to make something for my own use but rather 
>> something that would be more useful for the scikit-learn community, 
>> so that it eventually gets merged into master.
>>
>> I'm currently looking for topic that I can take up, I tried looking 
>> into scikit-learn wiki but it doesn't mention for what I'm looking 
>> for (no topic is mentioned). Do you have some topic in mind that 
>> could be useful for addition to scikit-learn? Even if you could 
>> direct me to appropriate links I would be happy to look into those.
>>
>>
>> On Wednesday 01 November 2017 01:43 AM, Andreas Mueller wrote:
>>> Hi Gaurav.
>>>
>>> Do you have a local mentor? I think having a mentor that can guide 
>>> you during a thesis is very important.
>>> You could get some feedback from the community for a contribution, 
>>> but that can be slow,
>>> and is entirely on volunteer basis, so there is no guarantee that 
>>> you'll get the necessary feedback in time
>>> to finish your thesis.
>>>
>>> Mentoring a thesis - in particular without knowing you - is a 
>>> serious commitment, so I'm not sure someone
>>> from inside the project will want to do this. I saw you already made 
>>> a contribution in 
>>> https://github.com/scikit-learn/scikit-learn/pull/10005
>>> but that's a very different scope than doing what I expect would be 
>>> several month of work.
>>
>> Though in this regard I've made a few more contributions, here is the 
>> link https://github.com/scikit-learn/scikit-learn/pulls/gxyd, though 
>> I know none of them is a big contribution. If you think I should work 
>> on a big enough PR, can you please suggest me some issue in that regard?
>>
>> Thanks.
>>
>>>
>>>
>>> Best,
>>> Andy
>>>
>>> On 10/31/2017 03:31 PM, Gaurav Dhingra wrote:
>>>> Hi everyone,
>>>>
>>>> I am a final year (5th year) undergraduate Applied Mathematics 
>>>> student in India. I am thinking of doing my final year thesis by 
>>>> doing some work (coding part) on scikit learn, so I was thinking if 
>>>> anyone could tell me if there are available topics (not necessarily 
>>>> names of those topics) that I could work on being an undergraduate 
>>>> student? I would want to expand upon this in December when my exams 
>>>> will be over. But in the mean time would want to take a step in 
>>>> that direction by just knowing if there will be available topics 
>>>> that I could work on.
>>>>
>>>> It could be the case that available topics are not so easy for an 
>>>> undergraduate, still in that case I would like to do some research 
>>>> on the topics first.
>>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> -- 
>> Gaurav Dhingra
>> (sent from Thunderbird email client)
>

-- 
Gaurav Dhingra
(sent from Thunderbird email client)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180123/fd9e5073/attachment-0001.html>


More information about the scikit-learn mailing list