Suggest addition to scipy.stats : Mann-Kendall test
Dear list, the routine below evaluates the Mann-Kendall test for non-parametrically checking the significance of any trend estimate. It would provide a nice complement to the existing theilslopes routine and others in scipy.stats. I found this code when searching for "Mann Kendall python" and saw that the original link was no longer existing. In my opinion it would be good to preserve this piece of work and make it available to others. I tested this routine on hundreds of datasets and it seemed to work well. This implementation was also compared to a few calculations with the Matlab implementation of this test and provided identical results. Best regards, Martin <pre> # -*- coding: utf-8 -*- """ Created on Wed Jul 29 09:16:06 2015 @author: Michael Schramm """ from __future__ import division import numpy as np from scipy.stats import norm, mstats def mk_test(x, alpha = 0.05): """ This function is derived from code originally posted by Sat Kumar Tomer (satkumartomer@gmail.com) See also: http://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm The purpose of the Mann-Kendall (MK) test (Mann 1945, Kendall 1975, Gilbert 1987) is to statistically assess if there is a monotonic upward or downward trend of the variable of interest over time. A monotonic upward (downward) trend means that the variable consistently increases (decreases) through time, but the trend may or may not be linear. The MK test can be used in place of a parametric linear regression analysis, which can be used to test if the slope of the estimated linear regression line is different from zero. The regression analysis requires that the residuals from the fitted regression line be normally distributed; an assumption not required by the MK test, that is, the MK test is a non-parametric (distribution-free) test. Hirsch, Slack and Smith (1982, page 107) indicate that the MK test is best viewed as an exploratory analysis and is most appropriately used to identify stations where changes are significant or of large magnitude and to quantify these findings. Input: x: a vector of data alpha: significance level (0.05 default) Output: trend: tells the trend (increasing, decreasing or no trend) h: True (if trend is present) or False (if trend is absence) p: p value of the significance test z: normalized test statistics Examples -------- >>> x = np.random.rand(100) >>> trend,h,p,z = mk_test(x,0.05) """ n = len(x) # calculate S s = 0 for k in range(n-1): for j in range(k+1,n): s += np.sign(x[j] - x[k]) # calculate the unique data unique_x = np.unique(x) g = len(unique_x) # calculate the var(s) if n == g: # there is no tie var_s = (n*(n-1)*(2*n+5))/18 else: # there are some ties in data tp = np.zeros(unique_x.shape) for i in range(len(unique_x)): tp[i] = sum(unique_x[i] == x) var_s = (n*(n-1)*(2*n+5) + np.sum(tp*(tp-1)*(2*tp+5)))/18 if s>0: z = (s - 1)/np.sqrt(var_s) elif s == 0: z = 0 elif s<0: z = (s + 1)/np.sqrt(var_s) # calculate the p_value p = 2*(1-norm.cdf(abs(z))) # two tail test h = abs(z) > norm.ppf(1-alpha/2) if (z<0) and h: trend = 'decreasing' elif (z>0) and h: trend = 'increasing' else: trend = 'no trend' return trend, h, p, z </pre> ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
Hi, Mon, 18 Jul 2016 10:11:31 +0000, Schultz, Martin kirjoitti:
the routine below evaluates the Mann-Kendall test for non-parametrically checking the significance of any trend estimate. It would provide a nice complement to the existing theilslopes routine and others in scipy.stats. I found this code when searching for "Mann Kendall python" and saw that the original link was no longer existing. In my opinion it would be good to preserve this piece of work and make it available to others. I tested this routine on hundreds of datasets and it seemed to work well. This implementation was also compared to a few calculations with the Matlab implementation of this test and provided identical results.
Thanks for the information and code. Note that the following technical points would need to be addressed for including such a routine in Scipy: - No license information, so the code or works derived from it cannot be used. - Tests. While checking the results manually is valuable, it would also be necessary to write automated tests --- to make it possible for other people to verify the correctness, and to ensure the code stays working in the future. If you do not have time for this yourself, getting it done would need someone else to volunteer. -- Pauli Virtanen
Hi Pauli, thank you for the feedback. I would be willing to inquire with the original author(s) about licensing, and I can provide a few small test datasets with results for inclusion into an automated test, but I don't think I would be able to write the automated test myself. So, if someone volunteers on that end, I will communicate the data with her or him. May I get back to you in case of questions concerning the license terms etc. if needed? Martin Schultz -----Original Message----- From: SciPy-Dev [mailto:scipy-dev-bounces@scipy.org] On Behalf Of Pauli Virtanen Sent: Tuesday, July 19, 2016 1:30 AM To: scipy-dev@scipy.org Subject: Re: [SciPy-Dev] Suggest addition to scipy.stats : Mann-Kendall test Hi, Mon, 18 Jul 2016 10:11:31 +0000, Schultz, Martin kirjoitti:
the routine below evaluates the Mann-Kendall test for non-parametrically checking the significance of any trend estimate. It would provide a nice complement to the existing theilslopes routine and others in scipy.stats. I found this code when searching for "Mann Kendall python" and saw that the original link was no longer existing. In my opinion it would be good to preserve this piece of work and make it available to others. I tested this routine on hundreds of datasets and it seemed to work well. This implementation was also compared to a few calculations with the Matlab implementation of this test and provided identical results.
Thanks for the information and code. Note that the following technical points would need to be addressed for including such a routine in Scipy: - No license information, so the code or works derived from it cannot be used. - Tests. While checking the results manually is valuable, it would also be necessary to write automated tests --- to make it possible for other people to verify the correctness, and to ensure the code stays working in the future. If you do not have time for this yourself, getting it done would need someone else to volunteer. -- Pauli Virtanen _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
On Tue, Jul 19, 2016 at 2:50 AM, Schultz, Martin <m.schultz@fz-juelich.de> wrote:
Hi Pauli,
thank you for the feedback. I would be willing to inquire with the original author(s) about licensing, and I can provide a few small test datasets with results for inclusion into an automated test, but I don't think I would be able to write the automated test myself. So, if someone volunteers on that end, I will communicate the data with her or him. May I get back to you in case of questions concerning the license terms etc. if needed?
Martin Schultz
-----Original Message----- From: SciPy-Dev [mailto:scipy-dev-bounces@scipy.org] On Behalf Of Pauli Virtanen Sent: Tuesday, July 19, 2016 1:30 AM To: scipy-dev@scipy.org Subject: Re: [SciPy-Dev] Suggest addition to scipy.stats : Mann-Kendall test
Hi,
Mon, 18 Jul 2016 10:11:31 +0000, Schultz, Martin kirjoitti:
the routine below evaluates the Mann-Kendall test for non-parametrically checking the significance of any trend estimate. It would provide a nice complement to the existing theilslopes routine and others in scipy.stats. I found this code when searching for "Mann Kendall python" and saw that the original link was no longer existing. In my opinion it would be good to preserve this piece of work and make it available to others. I tested this routine on hundreds of datasets and it seemed to work well. This implementation was also compared to a few calculations with the Matlab implementation of this test and provided identical results.
Thanks for the information and code.
Note that the following technical points would need to be addressed for including such a routine in Scipy:
- No license information, so the code or works derived from it cannot be used.
- Tests. While checking the results manually is valuable, it would also be necessary to write automated tests --- to make it possible for other people to verify the correctness, and to ensure the code stays working in the future.
If you do not have time for this yourself, getting it done would need someone else to volunteer.
The function looks interesting, I've never heard of this hypothesis test. The implementation in the function is very straightforward and I expect that it will be pretty slow for longer arrays, especially when there are just a few ties. Maybe it is possible to streamline the function along some of the existing functions. I don't know if it can be based on ranked data, which IIRC are already in cython. (shapiro which is another all pair comparison algorithm is in Fortran, IIRC) (aside: the vsp package in the docstring looks interesting, and might have a few more things that are missing in scipy.stats/statsmodels.) Josef
-- Pauli Virtanen
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
Hi Pauli,
thank you for the feedback. I would be willing to inquire with the original author(s) about licensing, and I can provide a few small test datasets with results for inclusion into an automated test, but I don't
Just so you know, there's a Kendall's tau routine in scipy.mstats. On 19 July 2016 at 10:04:48, josef.pktd@gmail.com (josef.pktd@gmail.com) wrote: On Tue, Jul 19, 2016 at 2:50 AM, Schultz, Martin <m.schultz@fz-juelich.de> wrote: think I would be able to write the automated test myself. So, if someone volunteers on that end, I will communicate the data with her or him. May I get back to you in case of questions concerning the license terms etc. if needed?
Martin Schultz
-----Original Message----- From: SciPy-Dev [mailto:scipy-dev-bounces@scipy.org] On Behalf Of Pauli
Virtanen
Sent: Tuesday, July 19, 2016 1:30 AM To: scipy-dev@scipy.org Subject: Re: [SciPy-Dev] Suggest addition to scipy.stats : Mann-Kendall test
Hi,
Mon, 18 Jul 2016 10:11:31 +0000, Schultz, Martin kirjoitti:
the routine below evaluates the Mann-Kendall test for non-parametrically checking the significance of any trend estimate. It would provide a nice complement to the existing theilslopes routine and others in scipy.stats. I found this code when searching for "Mann Kendall python" and saw that the original link was no longer existing. In my opinion it would be good to preserve this piece of work and make it available to others. I tested this routine on hundreds of datasets and it seemed to work well. This implementation was also compared to a few calculations with the Matlab implementation of this test and provided identical results.
Thanks for the information and code.
Note that the following technical points would need to be addressed for including such a routine in Scipy:
- No license information, so the code or works derived from it cannot be used.
- Tests. While checking the results manually is valuable, it would also be necessary to write automated tests --- to make it possible for other people to verify the correctness, and to ensure the code stays working in the future.
If you do not have time for this yourself, getting it done would need someone else to volunteer.
The function looks interesting, I've never heard of this hypothesis test. The implementation in the function is very straightforward and I expect that it will be pretty slow for longer arrays, especially when there are just a few ties. Maybe it is possible to streamline the function along some of the existing functions. I don't know if it can be based on ranked data, which IIRC are already in cython. (shapiro which is another all pair comparison algorithm is in Fortran, IIRC) (aside: the vsp package in the docstring looks interesting, and might have a few more things that are missing in scipy.stats/statsmodels.) Josef
-- Pauli Virtanen
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
On Tue, Jul 19, 2016 at 4:21 AM, Pierre GM <pgmdevlist@gmail.com> wrote:
Just so you know, there's a Kendall's tau routine in scipy.mstats.
the kendall_tau in scipy.stats has been rewritten with a mix of numpy and cython, and, based on how far I followed the PR, has very good performance now. I don't know if anything can be reused because the hypothesis is pretty different, sign instead of correlation and univariate instead of bivariate. Still only mstats has the seasonal version of kendall_tau. vsp package also has a seasonal version of the mann-kendall trend test Josef
On 19 July 2016 at 10:04:48, josef.pktd@gmail.com (josef.pktd@gmail.com) wrote:
On Tue, Jul 19, 2016 at 2:50 AM, Schultz, Martin <m.schultz@fz-juelich.de> wrote:
Hi Pauli,
thank you for the feedback. I would be willing to inquire with the original author(s) about licensing, and I can provide a few small test datasets with results for inclusion into an automated test, but I don't think I would be able to write the automated test myself. So, if someone volunteers on that end, I will communicate the data with her or him. May I get back to you in case of questions concerning the license terms etc. if needed?
Martin Schultz
-----Original Message----- From: SciPy-Dev [mailto:scipy-dev-bounces@scipy.org] On Behalf Of Pauli Virtanen Sent: Tuesday, July 19, 2016 1:30 AM To: scipy-dev@scipy.org Subject: Re: [SciPy-Dev] Suggest addition to scipy.stats : Mann-Kendall test
Hi,
Mon, 18 Jul 2016 10:11:31 +0000, Schultz, Martin kirjoitti:
the routine below evaluates the Mann-Kendall test for non-parametrically checking the significance of any trend estimate. It would provide a nice complement to the existing theilslopes routine and others in scipy.stats. I found this code when searching for "Mann Kendall python" and saw that the original link was no longer existing. In my opinion it would be good to preserve this piece of work and make it available to others. I tested this routine on hundreds of datasets and it seemed to work well. This implementation was also compared to a few calculations with the Matlab implementation of this test and provided identical results.
Thanks for the information and code.
Note that the following technical points would need to be addressed for including such a routine in Scipy:
- No license information, so the code or works derived from it cannot be used.
- Tests. While checking the results manually is valuable, it would also be necessary to write automated tests --- to make it possible for other people to verify the correctness, and to ensure the code stays working in the future.
If you do not have time for this yourself, getting it done would need someone else to volunteer.
The function looks interesting, I've never heard of this hypothesis test.
The implementation in the function is very straightforward and I expect that it will be pretty slow for longer arrays, especially when there are just a few ties. Maybe it is possible to streamline the function along some of the existing functions. I don't know if it can be based on ranked data, which IIRC are already in cython. (shapiro which is another all pair comparison algorithm is in Fortran, IIRC)
(aside: the vsp package in the docstring looks interesting, and might have a few more things that are missing in scipy.stats/statsmodels.)
Josef
-- Pauli Virtanen
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
On Tue, Jul 19, 2016 at 1:04 AM, <josef.pktd@gmail.com> wrote:
The function looks interesting, I've never heard of this hypothesis test.
EPA guidance specifies that the Mann-Kendall test be used to determine if there's a trend (seasonal, or long-term) in pollutant concentrations. Even though regulators sometimes *require* that you use their ProUCL[0] software, this would be still be a very welcome addition to the stats module for environmental engineers like me. Martin: if you get the pull request started, I'll help your write the tests. Just @-mention me in the PR's discussion (@phobson). -Paul [0] https://www.epa.gov/land-research/proucl-software
Dear all, discussion has apparently ceased on this topic. As I don’t see myself in a position to open the pull request for this and lead the implementation (I am too unexperienced with GitHub and don’t have the time to learn all the rules necessary to create a good scipy code), I am asking if someone would volunteer to take the lead. I will help where I can, but you don’t want me to be responsible for this. Thanks, Martin From: SciPy-Dev [mailto:scipy-dev-bounces@scipy.org] On Behalf Of Paul Hobson Sent: Wednesday, July 20, 2016 3:24 AM To: SciPy Developers List Subject: Re: [SciPy-Dev] Suggest addition to scipy.stats : Mann-Kendall test On Tue, Jul 19, 2016 at 1:04 AM, <josef.pktd@gmail.com<mailto:josef.pktd@gmail.com>> wrote: The function looks interesting, I've never heard of this hypothesis test. EPA guidance specifies that the Mann-Kendall test be used to determine if there's a trend (seasonal, or long-term) in pollutant concentrations. Even though regulators sometimes *require* that you use their ProUCL[0] software, this would be still be a very welcome addition to the stats module for environmental engineers like me. Martin: if you get the pull request started, I'll help your write the tests. Just @-mention me in the PR's discussion (@phobson). -Paul [0] https://www.epa.gov/land-research/proucl-software ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
On Wed, Jul 27, 2016 at 8:23 AM, Schultz, Martin <m.schultz@fz-juelich.de> wrote:
Dear all,
discussion has apparently ceased on this topic. As I don’t see myself in a position to open the pull request for this and lead the implementation (I am too unexperienced with GitHub and don’t have the time to learn all the rules necessary to create a good scipy code), I am asking if someone would volunteer to take the lead. I will help where I can, but you don’t want me to be responsible for this.
You could also write test cases with and without ties and whatever options are used for this function. This would simplify the work for whoever is implementing this. Those could just be added to a scipy issue. I was thinking whether it's possible to just use the ranks `rankdata` to simplify the calculations and remove some of the loops, but didn't figure out whether that works. (I'm busy with "models" and didn't plan to go back to "stats" until fall.) Josef
Thanks,
Martin
From: SciPy-Dev [mailto:scipy-dev-bounces@scipy.org] On Behalf Of Paul Hobson Sent: Wednesday, July 20, 2016 3:24 AM To: SciPy Developers List Subject: Re: [SciPy-Dev] Suggest addition to scipy.stats : Mann-Kendall test
On Tue, Jul 19, 2016 at 1:04 AM, <josef.pktd@gmail.com> wrote:
The function looks interesting, I've never heard of this hypothesis test.
EPA guidance specifies that the Mann-Kendall test be used to determine if there's a trend (seasonal, or long-term) in pollutant concentrations.
Even though regulators sometimes *require* that you use their ProUCL[0] software, this would be still be a very welcome addition to the stats module for environmental engineers like me.
Martin: if you get the pull request started, I'll help your write the tests. Just @-mention me in the PR's discussion (@phobson).
-Paul
[0] https://www.epa.gov/land-research/proucl-software
------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
participants (5)
-
josef.pktd@gmail.com
-
Paul Hobson
-
Pauli Virtanen
-
Pierre GM
-
Schultz, Martin