GSoC Proposal: Circular statistics
Thanks to Josef and other colleagues for productive discussion! It clarified a lot for me! This is a first iteration of my GSoC proposal. The goal of the project is to implement circular statistics functionality in scipy.stats. Motivation: There exist several Python packages related to this theme, but most of them has some shortcomings. Also, it will be very conveniently to have this functionality in SciPy. It will simplify users work because they will not have to search and install other packages. What will be done: Base class rv_circular that will provide infrastructure for distributions characteristics. List of supposed characteristics: pdf, cdf, moments, mean, dispersion, standard deviation, percentiles, median, entropy, kurtosis, skewness. Derived classes for distributions. Methods for calculating values of characteristics can be redefined in this classes. List of supposed distributions: Uniform distribution, Triangular distribution, Cardioid distribution, Wrapped distributions: Cauchy, Normal, Von Mises. Functions for statistical tests: Tests of uniformity and Goodness-of-Fit. Tests related to Von Mises distribution: tests for mean and concentration parameters, multi-sample tests. Some non-parametric tests. I have developed a little “demo-version” of circular stats toolbox: https://github.com/yakovlevvs/Circular-Statistics It is very raw and will be rewritten; I created this just to demonstrate my vision of how it will be arranged. Approximate timeline: April: Reading books, understanding related mathematics; May – June, 20: Implementing rv_circular class, documentation and tests for it; June, 20 – July, 10: Implementing distribution classes; July, 10 – August, 20: Implementing statistical tests and point estimations; Related literature: Kanti V. Mardia, Peter E. Jupp: Directional statistics, 2000. Jammalamadaka, S. Rao. Topics in circular statistics / S. Rao Jammalamadaka, A. SenGupta.
On Sat, Mar 18, 2017 at 12:59 AM, Vladislav Iakovlev <iavlaserg@gmail.com> wrote:
Thanks to Josef and other colleagues for productive discussion! It clarified a lot for me!
This is a first iteration of my GSoC proposal.
The goal of the project is to implement circular statistics functionality in scipy.stats.
Motivation:
There exist several Python packages related to this theme, but most of them has some shortcomings.
Would be good to list packages and shortcomings.
Also, it will be very conveniently to have this functionality in SciPy. It will simplify users work because they will not have to search and install other packages.
What will be done:
Base class rv_circular that will provide infrastructure for distributions characteristics. List of supposed characteristics:
pdf, cdf, moments, mean, dispersion, standard deviation, percentiles, median, entropy, kurtosis, skewness.
Derived classes for distributions. Methods for calculating values of characteristics can be redefined in this classes. List of supposed distributions:
Uniform distribution,
Triangular distribution,
Cardioid distribution,
Wrapped distributions: Cauchy, Normal, Von Mises.
Seems like a reasonable list. Triangular may be a bit less common. There's also https://en.wikipedia.org/wiki/Wrapped_exponential_distribution and https://en.wikipedia.org/wiki/Wrapped_asymmetric_Laplace_distribution, and the R package "circular" (useful for testing against) has even more.
Functions for statistical tests:
Tests of uniformity and Goodness-of-Fit.
Tests related to Von Mises distribution: tests for mean and concentration parameters, multi-sample tests.
Some non-parametric tests.
I have developed a little “demo-version” of circular stats toolbox:
https://github.com/yakovlevvs/Circular-Statistics
It is very raw and will be rewritten; I created this just to demonstrate my vision of how it will be arranged.
That seems quite sensible. It matches your rv_circular and derived classes description above. Did you think about whether to implement frozen versions, like rv_continuous and rv_discrete do? Probably makes sense to do this, if only for design symmetry.
Approximate timeline:
April: Reading books, understanding related mathematics;
May – June, 20: Implementing rv_circular class, documentation and tests for it;
June, 20 – July, 10: Implementing distribution classes;
July, 10 – August, 20: Implementing statistical tests and point estimations;
This will need a bit more details, normally you write something with granularity of one or two weeks per task. It's important to get that right, both because it helps you think about how much you can promise to do in 12 weeks and because there's an evaluation point halfway through the program. @Evgeni: were you interested in mentoring this topic? I could too potentially. Cheers, Ralf
Related literature:
Kanti V. Mardia, Peter E. Jupp: Directional statistics, 2000.
Jammalamadaka, S. Rao. Topics in circular statistics / S. Rao Jammalamadaka, A. SenGupta.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
Thanks to Josef and other colleagues for productive discussion! It
clarified a lot for me!
This is a first iteration of my GSoC proposal.
The goal of the project is to implement circular statistics
functionality in scipy.stats.
Motivation:
There exist several Python packages related to this theme, but most of
them has some shortcomings.
Would be good to list packages and shortcomings.
Also, it will be very conveniently to have this functionality in SciPy.
It will simplify users work because they will not have to search and install other packages.
What will be done:
Base class rv_circular that will provide infrastructure for
distributions characteristics. List of supposed characteristics:
pdf, cdf, moments, mean, dispersion, standard deviation, percentiles,
median, entropy, kurtosis, skewness.
Derived classes for distributions. Methods for calculating values of
characteristics can be redefined in this classes. List of supposed distributions:
Uniform distribution,
Triangular distribution,
Cardioid distribution,
Wrapped distributions: Cauchy, Normal, Von Mises.
Seems like a reasonable list. Triangular may be a bit less common. There's also https://en.wikipedia.org/wiki/Wrapped_exponential_distribution and https://en.wikipedia.org/wiki/Wrapped_asymmetric_Laplace_distribution, and the R package "circular" (useful for testing against) has even more.
Functions for statistical tests:
Tests of uniformity and Goodness-of-Fit.
Tests related to Von Mises distribution: tests for mean and
concentration parameters, multi-sample tests.
Some non-parametric tests.
I have developed a little “demo-version” of circular stats toolbox:
https://github.com/yakovlevvs/Circular-Statistics
It is very raw and will be rewritten; I created this just to demonstrate
my vision of how it will be arranged.
That seems quite sensible. It matches your rv_circular and derived classes description above. Did you think about whether to implement frozen versions, like rv_continuous and rv_discrete do? Probably makes sense to do this, if only for design symmetry.
Approximate timeline:
April: Reading books, understanding related mathematics;
May – June, 20: Implementing rv_circular class, documentation and tests
for it;
June, 20 – July, 10: Implementing distribution classes;
July, 10 – August, 20: Implementing statistical tests and point
estimations;
This will need a bit more details, normally you write something with granularity of one or two weeks per task. It's important to get that right, both because it helps you think about how much you can promise to do in 12 weeks and because there's an evaluation point halfway through the program.
@Evgeni: were you interested in mentoring this topic? I could too potentially.
I am, yes. Would be good to have a co-mentor though, so it's not an inside offline job.
Cheers, Ralf
Related literature:
Kanti V. Mardia, Peter E. Jupp: Directional statistics, 2000.
Jammalamadaka, S. Rao. Topics in circular statistics / S. Rao
Jammalamadaka, A. SenGupta.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Hello, I have improved my proposal, now it is placed here: https://github.com/yakovlevvs/scipy/wiki/GSoC-proposal:- Implementation-of-circular-statistics If there are any shortcomings, please, let me know. ---------- Forwarded message ---------- From: Vladislav Iakovlev <iavlaserg@gmail.com> Date: 2017-03-17 14:59 GMT+03:00 Subject: GSoC Proposal: Circular statistics To: scipy-dev@scipy.org Thanks to Josef and other colleagues for productive discussion! It clarified a lot for me! This is a first iteration of my GSoC proposal. The goal of the project is to implement circular statistics functionality in scipy.stats. Motivation: There exist several Python packages related to this theme, but most of them has some shortcomings. Also, it will be very conveniently to have this functionality in SciPy. It will simplify users work because they will not have to search and install other packages. What will be done: Base class rv_circular that will provide infrastructure for distributions characteristics. List of supposed characteristics: pdf, cdf, moments, mean, dispersion, standard deviation, percentiles, median, entropy, kurtosis, skewness. Derived classes for distributions. Methods for calculating values of characteristics can be redefined in this classes. List of supposed distributions: Uniform distribution, Triangular distribution, Cardioid distribution, Wrapped distributions: Cauchy, Normal, Von Mises. Functions for statistical tests: Tests of uniformity and Goodness-of-Fit. Tests related to Von Mises distribution: tests for mean and concentration parameters, multi-sample tests. Some non-parametric tests. I have developed a little “demo-version” of circular stats toolbox: https://github.com/yakovlevvs/Circular-Statistics It is very raw and will be rewritten; I created this just to demonstrate my vision of how it will be arranged. Approximate timeline: April: Reading books, understanding related mathematics; May – June, 20: Implementing rv_circular class, documentation and tests for it; June, 20 – July, 10: Implementing distribution classes; July, 10 – August, 20: Implementing statistical tests and point estimations; Related literature: Kanti V. Mardia, Peter E. Jupp: Directional statistics, 2000. Jammalamadaka, S. Rao. Topics in circular statistics / S. Rao Jammalamadaka, A. SenGupta.
Hi Vladislav, On Fri, Mar 31, 2017 at 9:59 AM, Vladislav Iakovlev <iavlaserg@gmail.com> wrote:
Hello,
I have improved my proposal, now it is placed here: https://github.com/yakovlevvs/scipy/wiki/GSoC-proposal:-Impl ementation-of-circular-statistics
The reasons you list for ignoring the three existing packages don't look convincing when I read the text. They're also all license-compatible. So I think you want to explain that your proposed implementation will have more functionality or have certain advantages (beyond being in SciPy), and how you can use those packages (for testing against, or taking some code from them, or ...?). Where you say "This is a short draft of the future functionality", please be clear what that is and is not (e.g. a quick and dirty prototype of the API, not meant to be a basis for future work); you don't want a proposal reviewer to judge that code on its quality. And a trivial but nevertheless important comment: please spell check your proposal before submission. week 1: "initiation method"? not clear what you mean, __init__, or an initial implementation? week 4: it's fine to take time out to prepare for and do your exams, but it's then expected that you make up that time - for example by starting the coding period earlier. Overall comments on Planned functionality and Timeline sections: there's too many bullet points, it doesn't read well and there aren't many details. You may well understand all the details, but that won't be clear to a reviewer. You should insert a few details about key concepts. For example, for someone who doesn't know scipy.stats in-depth, what rv_circular is supposed to be will be unclear. What is special about rv_* classes or what functionality do they provide? Cheers, Ralf
If there are any shortcomings, please, let me know.
---------- Forwarded message ---------- From: Vladislav Iakovlev <iavlaserg@gmail.com> Date: 2017-03-17 14:59 GMT+03:00 Subject: GSoC Proposal: Circular statistics To: scipy-dev@scipy.org
Thanks to Josef and other colleagues for productive discussion! It clarified a lot for me!
This is a first iteration of my GSoC proposal.
The goal of the project is to implement circular statistics functionality in scipy.stats.
Motivation:
There exist several Python packages related to this theme, but most of them has some shortcomings. Also, it will be very conveniently to have this functionality in SciPy. It will simplify users work because they will not have to search and install other packages.
What will be done:
Base class rv_circular that will provide infrastructure for distributions characteristics. List of supposed characteristics:
pdf, cdf, moments, mean, dispersion, standard deviation, percentiles, median, entropy, kurtosis, skewness.
Derived classes for distributions. Methods for calculating values of characteristics can be redefined in this classes. List of supposed distributions:
Uniform distribution,
Triangular distribution,
Cardioid distribution,
Wrapped distributions: Cauchy, Normal, Von Mises.
Functions for statistical tests:
Tests of uniformity and Goodness-of-Fit.
Tests related to Von Mises distribution: tests for mean and concentration parameters, multi-sample tests.
Some non-parametric tests.
I have developed a little “demo-version” of circular stats toolbox:
https://github.com/yakovlevvs/Circular-Statistics
It is very raw and will be rewritten; I created this just to demonstrate my vision of how it will be arranged.
Approximate timeline:
April: Reading books, understanding related mathematics;
May – June, 20: Implementing rv_circular class, documentation and tests for it;
June, 20 – July, 10: Implementing distribution classes;
July, 10 – August, 20: Implementing statistical tests and point estimations;
Related literature:
Kanti V. Mardia, Peter E. Jupp: Directional statistics, 2000.
Jammalamadaka, S. Rao. Topics in circular statistics / S. Rao Jammalamadaka, A. SenGupta.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (3)
-
Evgeni Burovski -
Ralf Gommers -
Vladislav Iakovlev