Adding tau-a to scipy.stats.kendalltau variants and then changing Somers' D calculation to using tau-a instead of crosstab for better significant runtime improvements

Hello, this is my first time trying to contribute, so please be not too harsh. When I recently used the scipy.stats.somersd function on larger data I experienced quite some runtime problems. I found a way to calculate Somers' D in an equivalent manner by using D(Y|X) = tau_a(X, Y)/tau_a(X, X), for which I added the support for variant "a" to the scipy.stats.kendalltau function. The runtime improvement was significant for large datasets where this approach achieved approx. 30 times faster runtimes. I believe the reason for this runtime improvement is due to the crosstab calculation in the current setup, while kendalltau uses for the disconcordant measures a cypthon implementation making it much faster. Would be great to have someone I could ask if I have questions in the process of submitting my contribution and maybe to also review my code. Thanks a lot and best regards coming from Vienna Paul

Hi Paul, Feel free to email me (or even better message me or the #newcomers channel on our community slack (https://join.slack.com/t/scipy-community/shared_invite/zt-1a76bomjr-fuS1ZTnm...)) if you have any questions about submitting a PR. That performance improvement sounds promising! Cheers, Lucas
On 19 Jan 2024, at 12:41, P. v.H. <p.von.hirschhausen@gmail.com> wrote: Hello,
this is my first time trying to contribute, so please be not too harsh.
When I recently used the scipy.stats.somersd function on larger data I experienced quite some runtime problems. I found a way to calculate Somers' D in an equivalent manner by using D(Y|X) = tau_a(X, Y)/tau_a(X, X), for which I added the support for variant "a" to the scipy.stats.kendalltau function. The runtime improvement was significant for large datasets where this approach achieved approx. 30 times faster runtimes. I believe the reason for this runtime improvement is due to the crosstab calculation in the current setup, while kendalltau uses for the disconcordant measures a cypthon implementation making it much faster.
Would be great to have someone I could ask if I have questions in the process of submitting my contribution and maybe to also review my code.
Thanks a lot and best regards coming from Vienna
Paul _______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: lucas.colley8@gmail.com
participants (2)
-
Lucas Colley
-
P. v.H.