[scikit-learn] GSoC 2017 : "Parallel Decision Tree Building"
Aman Pratik
amanpratik10 at gmail.com
Sun Mar 26 13:31:43 EDT 2017
Hello Jacob,
This is my second draft for the proposal,
Proposal : Second Draft
<https://github.com/amanp10/scikit-learn/wiki/GSoC-2017-:-Parallel-Decision-Tree-Building>
It is incomplete in some places, related to detailing etc. I will need
little more time for that. Meanwhile, I await your feedback and guidance.
Thank You
On 23 March 2017 at 02:38, Jacob Schreiber <jmschreiber91 at gmail.com> wrote:
> Hi Aman
>
> Likely the easiest way to parallelize decision tree building is to
> parallelize the finding of the best split at each node, as it checks every
> non-constant feature for the best split. Several other approaches focus on
> how to parallelize tree building in the streaming or distributed cases,
> which we are not interested in at the moment (though partially fitting
> decision trees is a good separate project).
>
> As I mentioned in the github issue, it is likely easier to focus on this
> single issue for GSoC as opposed to making it distinct from the multiclass
> prediction, as this will provide similar speedups either way but be more
> general.
>
> It'd be great if you could add your experience directly to the gist and
> perhaps links to prior work if you have any of those.
>
> Something major missing from this is a proposed timeline. Several projects
> fail because they are overly ambitious or not well managed time-wise.
> Showing a timeline will help us manage the project later on, and ensure
> that you're aware of what the steps of the project will be.
>
> Thanks for the effort so far! Let me know when you've made updates.
>
> Jacob
>
> On Wed, Mar 22, 2017 at 12:55 AM, Aman Pratik <amanpratik10 at gmail.com>
> wrote:
>
>> Hello Developers,
>>
>> This is Aman Pratik. I am currently pursuing my B.Tech from Indian
>> Institute of Technology, Varanasi. After doing some research I have found
>> some material on Decision Trees and Parallelization. Hence, I propose my
>> first draft for the project "Parallel Decision Tree Building" for GSoC 2017.
>>
>> Proposal : First Draft
>> <https://github.com/amanp10/scikit-learn/wiki/GSoC-2017-:-Parallel-Decision-Tree-Building>
>>
>> Why me?
>>
>> I have been working in Python for the past 2 years and have good idea
>> about Machine Learning algorithms. I am quite familiar with scikit-learn
>> both as a user and a developer.
>>
>> These are the issues/PRs I have worked/working on for the past few months.
>>
>> [MRG+1] Issue#5803 : Regression Test added #8112
>> <https://github.com/scikit-learn/scikit-learn/pull/8112>
>>
>> [MRG] Issue#6673:Make a wrapper around functions that score an individual
>> feature #8038 <https://github.com/scikit-learn/scikit-learn/pull/8038>
>>
>> [MRG] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in
>> GaussianProcessRegressor #7997
>> <https://github.com/scikit-learn/scikit-learn/pull/7997>
>>
>> My GitHub Profile: amanp10 <https://www.github.com/amanp10>
>>
>> I have worked with parallelization in one of my PR, so I am not new to
>> it. I have used cython a couple of times, though as a beginner. I have not
>> used Decision Tree much but I am familiar with the theory and algorithm.
>> Also, I am familiar with Benchmark tests, Unit tests and other technical
>> knowledge I would require for this project.
>>
>> Meanwhile, I have started my study for the subject and gaining experience
>> with Cython. I am looking forward to guidance from the potential mentors or
>> anyone willing to help.
>>
>> Thank You
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170326/549f9a1c/attachment.html>
More information about the scikit-learn
mailing list