New feature: fitting distributions to censored data.

Hey folks, A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in https://github.com/scipy/scipy/pull/13699 This is a feature in the statistics roadmap, and is part of the work for the CZI EOSS cycle 1 grant. Censored data is data where the true value of the measurement is unknown, but there is a known bound. The common terminology is: * left-censored: The true value is greater than some known bound. * right-censored: The true value is less than some known bound. * interval-censored: The true value is between known bounds. (By allowing the bounds to be +inf and -inf, we can think of all these cases as interval-censored.) In the PR, a new class, `CensoredData`, is defined to represent censored data. The `fit` method of the univariate continuous distributions is updated to accept an instance of `CensoredData`. The `CensoredData` class constructor accepts two arrays, `lower` and `upper`, to hold the known bounds of the data. (If `lower[i] == upper[i]`, it means that data value is not censored.) Because it is quite common for data to be encountered where the only censored values in the data set are either all right-censored or all left-censored, two convenience methods, `CensoredData.right_censored(x, censored)` and `CensoredData.left_censored(x, censored)`, are provide to create a `CensoredData` instance from an array of values and a corresponding boolean array that indicates if the value is censored. Here's a quick example, with data from https://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm. The data table shows 10 regular values and 10 right-censored values. The results reported there for fitting the two-parameter Weibull distribution (location fixed at 0) to that data are shape = 1.7208 scale = 606.5280 Here's the calculation using the proposed API in SciPy: In [55]: from scipy.stats import weibull_min, CensoredData Create the `CensoredData` instance: In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10) In [57]: data = CensoredData.right_censored(x, x == 500) In [58]: print(data) CensoredData(20 values: 10 not censored, 10 right-censored) Fit `weibull_min` (with the location fixed at 0) to the censored data: In [59]: shape, loc, scale = weibull_min.fit(data, floc=0) In [60]: shape Out[60]: 1.720797180719942 In [61]: scale Out[61]: 606.527565269458 Matt Haberland has already suggested quite a few improvements to the PR. Additional comments would be appreciated. Warren

On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
Hey folks,
A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in
Thanks Warren. Overall this looks quite good to me.
This is a feature in the statistics roadmap, and is part of the work for the CZI EOSS cycle 1 grant.
Censored data is data where the true value of the measurement is unknown, but there is a known bound. The common terminology is:
* left-censored: The true value is greater than some known bound.
* right-censored: The true value is less than some known bound.
* interval-censored: The true value is between known bounds.
(By allowing the bounds to be +inf and -inf, we can think of all these cases as interval-censored.)
In the PR, a new class, `CensoredData`, is defined to represent censored data. The `fit` method of the univariate continuous distributions is updated to accept an instance of `CensoredData`. The `CensoredData` class constructor accepts two arrays, `lower` and `upper`, to hold the known bounds of the data. (If `lower[i] == upper[i]`, it means that data value is not censored.)
Because it is quite common for data to be encountered where the only censored values in the data set are either all right-censored or all left-censored, two convenience methods, `CensoredData.right_censored(x, censored)` and `CensoredData.left_censored(x, censored)`, are provide to create a `CensoredData` instance from an array of values and a corresponding boolean array that indicates if the value is censored.
Here's a quick example, with data from https://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm. The data table shows 10 regular values and 10 right-censored values. The results reported there for fitting the two-parameter Weibull distribution (location fixed at 0) to that data are
shape = 1.7208 scale = 606.5280
Here's the calculation using the proposed API in SciPy:
In [55]: from scipy.stats import weibull_min, CensoredData
Create the `CensoredData` instance:
In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10)
In [57]: data = CensoredData.right_censored(x, x == 500)
This `x == 500` looks a little odd API-wise. Why not `right_censored(x, 500)`. Or, more importantly, why not something like: data = CensoredData(x).rightcensor(500) There are of course multiple ways of doing this, but the use of classmethods and only taking arrays seems unusual. Also in the constructor, why do `lower` and `upper` have to be boolean arrays rather than scalars? Cheers, Ralf
In [58]: print(data)
CensoredData(20 values: 10 not censored, 10 right-censored)
Fit `weibull_min` (with the location fixed at 0) to the censored data:
In [59]: shape, loc, scale = weibull_min.fit(data, floc=0)
In [60]: shape
Out[60]: 1.720797180719942
In [61]: scale
Out[61]: 606.527565269458
Matt Haberland has already suggested quite a few improvements to the PR. Additional comments would be appreciated.
Warren _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

On Fri, Mar 26, 2021 at 5:11 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser < warren.weckesser@gmail.com> wrote:
Hey folks,
A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in
Thanks Warren. Overall this looks quite good to me.
This is a feature in the statistics roadmap, and is part of the work for the CZI EOSS cycle 1 grant.
Censored data is data where the true value of the measurement is unknown, but there is a known bound. The common terminology is:
* left-censored: The true value is greater than some known bound.
* right-censored: The true value is less than some known bound.
It's the other way around, isn't it?
* interval-censored:
The true value is between known bounds.
(By allowing the bounds to be +inf and -inf, we can think of all these cases as interval-censored.)
In the PR, a new class, `CensoredData`, is defined to represent censored data. The `fit` method of the univariate continuous distributions is updated to accept an instance of `CensoredData`. The `CensoredData` class constructor accepts two arrays, `lower` and `upper`, to hold the known bounds of the data. (If `lower[i] == upper[i]`, it means that data value is not censored.)
Because it is quite common for data to be encountered where the only censored values in the data set are either all right-censored or all left-censored, two convenience methods, `CensoredData.right_censored(x, censored)` and `CensoredData.left_censored(x, censored)`, are provide to create a `CensoredData` instance from an array of values and a corresponding boolean array that indicates if the value is censored.
Here's a quick example, with data from
https://www.itl.nist.gov/div898/handboodifferentlyk/apr/section4/apr413.htm <https://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm>. The data table shows 10 regular values and 10 right-censored values. The results reported there for fitting the two-parameter Weibull distribution (location fixed at 0) to that data are
shape = 1.7208 scale = 606.5280
Here's the calculation using the proposed API in SciPy:
In [55]: from scipy.stats import weibull_min, CensoredData
Create the `CensoredData` instance:
In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10)
In [57]: data = CensoredData.right_censored(x, x == 500)
This `x == 500` looks a little odd API-wise. Why not `right_censored(x, 500)`. Or, more importantly, why not something like:
data = CensoredData(x).rightcensor(500)
There are of course multiple ways of doing this, but the use of classmethods and only taking arrays seems unusual. Also in the constructor, why do `lower` and `upper` have to be boolean arrays rather than scalars?
The value in `x` itself is the bound. Each data point can be censored with a different bound. The censoring bound is not a shared property of the whole dataset. It just happened to be the case for this example (which may indicate that a more general motivating example should be used, at least while the API is under design). For example, if you are studying the duration of some condition in a longitudinal study. Individuals entered the study at different times, and now you have to close the books and write the paper, but some stubborn individuals still have the condition. The durations for those individuals would be right-censored, but with different durations because of their different entry points. -- Robert Kern

On 3/26/21, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, Mar 26, 2021 at 5:11 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser < warren.weckesser@gmail.com> wrote:
Hey folks,
A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in
Thanks Warren. Overall this looks quite good to me.
This is a feature in the statistics roadmap, and is part of the work for the CZI EOSS cycle 1 grant.
Censored data is data where the true value of the measurement is unknown, but there is a known bound. The common terminology is:
* left-censored: The true value is greater than some known bound.
* right-censored: The true value is less than some known bound.
It's the other way around, isn't it?
Argh, yes, you are correct. That's what I get for last-second editing before hitting the "send" button. It should be * left-censored: The true value is *less than* some known bound. * right-censored: The true value is *greater than* some known bound. Warren
* interval-censored:
The true value is between known bounds.
(By allowing the bounds to be +inf and -inf, we can think of all these cases as interval-censored.)
In the PR, a new class, `CensoredData`, is defined to represent censored data. The `fit` method of the univariate continuous distributions is updated to accept an instance of `CensoredData`. The `CensoredData` class constructor accepts two arrays, `lower` and `upper`, to hold the known bounds of the data. (If `lower[i] == upper[i]`, it means that data value is not censored.)
Because it is quite common for data to be encountered where the only censored values in the data set are either all right-censored or all left-censored, two convenience methods, `CensoredData.right_censored(x, censored)` and `CensoredData.left_censored(x, censored)`, are provide to create a `CensoredData` instance from an array of values and a corresponding boolean array that indicates if the value is censored.
Here's a quick example, with data from
https://www.itl.nist.gov/div898/handboodifferentlyk/apr/section4/apr413.htm <https://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm>. The data table shows 10 regular values and 10 right-censored values. The results reported there for fitting the two-parameter Weibull distribution (location fixed at 0) to that data are
shape = 1.7208 scale = 606.5280
Here's the calculation using the proposed API in SciPy:
In [55]: from scipy.stats import weibull_min, CensoredData
Create the `CensoredData` instance:
In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10)
In [57]: data = CensoredData.right_censored(x, x == 500)
This `x == 500` looks a little odd API-wise. Why not `right_censored(x, 500)`. Or, more importantly, why not something like:
data = CensoredData(x).rightcensor(500)
There are of course multiple ways of doing this, but the use of classmethods and only taking arrays seems unusual. Also in the constructor, why do `lower` and `upper` have to be boolean arrays rather than scalars?
The value in `x` itself is the bound. Each data point can be censored with a different bound. The censoring bound is not a shared property of the whole dataset. It just happened to be the case for this example (which may indicate that a more general motivating example should be used, at least while the API is under design). For example, if you are studying the duration of some condition in a longitudinal study. Individuals entered the study at different times, and now you have to close the books and write the paper, but some stubborn individuals still have the condition. The durations for those individuals would be right-censored, but with different durations because of their different entry points.
-- Robert Kern

On 3/26/21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
Hey folks,
A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in
Thanks Warren. Overall this looks quite good to me.
Thanks Ralf! (More responses below...)
This is a feature in the statistics roadmap, and is part of the work for the CZI EOSS cycle 1 grant.
Censored data is data where the true value of the measurement is unknown, but there is a known bound. The common terminology is:
* left-censored: The true value is greater than some known bound.
* right-censored: The true value is less than some known bound.
* interval-censored: The true value is between known bounds.
(By allowing the bounds to be +inf and -inf, we can think of all these cases as interval-censored.)
In the PR, a new class, `CensoredData`, is defined to represent censored data. The `fit` method of the univariate continuous distributions is updated to accept an instance of `CensoredData`. The `CensoredData` class constructor accepts two arrays, `lower` and `upper`, to hold the known bounds of the data. (If `lower[i] == upper[i]`, it means that data value is not censored.)
Because it is quite common for data to be encountered where the only censored values in the data set are either all right-censored or all left-censored, two convenience methods, `CensoredData.right_censored(x, censored)` and `CensoredData.left_censored(x, censored)`, are provide to create a `CensoredData` instance from an array of values and a corresponding boolean array that indicates if the value is censored.
Here's a quick example, with data from https://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm. The data table shows 10 regular values and 10 right-censored values. The results reported there for fitting the two-parameter Weibull distribution (location fixed at 0) to that data are
shape = 1.7208 scale = 606.5280
Here's the calculation using the proposed API in SciPy:
In [55]: from scipy.stats import weibull_min, CensoredData
Create the `CensoredData` instance:
In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10)
In [57]: data = CensoredData.right_censored(x, x == 500)
This `x == 500` looks a little odd API-wise. Why not `right_censored(x, 500)`. Or, more importantly, why not something like:
data = CensoredData(x).rightcensor(500)
The API is a work in progress. I didn't do something like that because, in general, the lower bound for right-censored data isn't necessarily the same for each censored value. (See, for example, the data shown in the second slide of http://www.ams.sunysb.edu/~zhu/ams588/Lecture_3_likelihood.pdf, and the example at https://support.sas.com/documentation/cdl/en/qcug/63922/HTML/default/viewer..... Both of those data sets are used for unit tests in the PR.) However, the case of a single bound for left- or right-censored data is common, so it might be nice to have a convenient way to write it. Another possible enhancement to the CensoredData API is a `count` argument that gives the number of times the value is to be repeated, but I figured I would propose that in a follow-up PR.
There are of course multiple ways of doing this, but the use of classmethods and only taking arrays seems unusual. Also in the constructor, why do `lower` and `upper` have to be boolean arrays rather than scalars?
(The constructor args don't have to be boolean, so I assume you meant "have to be 1D arrays".) I suppose accepting a scalar for one of them and using broadcasting would work. I did a lot of searching for examples to use as unit tests, and I don't recall any where *all* the values were censored, so I don't think such behavior would actually be useful. And I would worry that someone might misinterpret what using a scalar means. Warren
Cheers, Ralf
In [58]: print(data)
CensoredData(20 values: 10 not censored, 10 right-censored)
Fit `weibull_min` (with the location fixed at 0) to the censored data:
In [59]: shape, loc, scale = weibull_min.fit(data, floc=0)
In [60]: shape
Out[60]: 1.720797180719942
In [61]: scale
Out[61]: 606.527565269458
Matt Haberland has already suggested quite a few improvements to the PR. Additional comments would be appreciated.
Warren _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

On Fri, Mar 26, 2021 at 5:56 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
On 3/26/21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
Hey folks,
A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in
Here's the calculation using the proposed API in SciPy:
In [55]: from scipy.stats import weibull_min, CensoredData
Create the `CensoredData` instance:
In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10)
In [57]: data = CensoredData.right_censored(x, x == 500)
This `x == 500` looks a little odd API-wise. Why not `right_censored(x, 500)`. Or, more importantly, why not something like:
data = CensoredData(x).rightcensor(500)
The API is a work in progress. I didn't do something like that because, in general, the lower bound for right-censored data isn't necessarily the same for each censored value.
Thanks, after the explanation of Robert and the examples below it makes sense to me. I added some more comments on the PR. (See, for example, the
data shown in the second slide of http://www.ams.sunysb.edu/~zhu/ams588/Lecture_3_likelihood.pdf, and the example at https://support.sas.com/documentation/cdl/en/qcug/63922/HTML/default/viewer.... . Both of those data sets are used for unit tests in the PR.) However, the case of a single bound for left- or right-censored data is common, so it might be nice to have a convenient way to write it.
Another possible enhancement to the CensoredData API is a `count` argument that gives the number of times the value is to be repeated, but I figured I would propose that in a follow-up PR.
There are of course multiple ways of doing this, but the use of classmethods and only taking arrays seems unusual. Also in the constructor, why do `lower` and `upper` have to be boolean arrays rather than scalars?
(The constructor args don't have to be boolean, so I assume you meant "have to be 1D arrays".) I suppose accepting a scalar for one of them and using broadcasting would work. I did a lot of searching for examples to use as unit tests, and I don't recall any where *all* the values were censored, so I don't think such behavior would actually be useful. And I would worry that someone might misinterpret what using a scalar means.
Yes, that makes sense. The asymmetry between the constructor and the `*_censored` classmethods is a concern though, it looks like you can't get from one to the other. The scalar in your example threw me off, I agree a scalar for `lower` or `upper` in the constructor is potentially confusing. Cheers, Ralf
Warren
Cheers, Ralf
In [58]: print(data)
CensoredData(20 values: 10 not censored, 10 right-censored)
Fit `weibull_min` (with the location fixed at 0) to the censored data:
In [59]: shape, loc, scale = weibull_min.fit(data, floc=0)
In [60]: shape
Out[60]: 1.720797180719942
In [61]: scale
Out[61]: 606.527565269458
Matt Haberland has already suggested quite a few improvements to the PR. Additional comments would be appreciated.
Warren _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

On 3/27/21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Mar 26, 2021 at 5:56 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
On 3/26/21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
Hey folks,
A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in
Here's the calculation using the proposed API in SciPy:
In [55]: from scipy.stats import weibull_min, CensoredData
Create the `CensoredData` instance:
In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10)
In [57]: data = CensoredData.right_censored(x, x == 500)
This `x == 500` looks a little odd API-wise. Why not `right_censored(x, 500)`. Or, more importantly, why not something like:
data = CensoredData(x).rightcensor(500)
The API is a work in progress. I didn't do something like that because, in general, the lower bound for right-censored data isn't necessarily the same for each censored value.
Thanks, after the explanation of Robert and the examples below it makes sense to me. I added some more comments on the PR.
(See, for example, the
data shown in the second slide of http://www.ams.sunysb.edu/~zhu/ams588/Lecture_3_likelihood.pdf, and the example at https://support.sas.com/documentation/cdl/en/qcug/63922/HTML/default/viewer.... . Both of those data sets are used for unit tests in the PR.) However, the case of a single bound for left- or right-censored data is common, so it might be nice to have a convenient way to write it.
Another possible enhancement to the CensoredData API is a `count` argument that gives the number of times the value is to be repeated, but I figured I would propose that in a follow-up PR.
There are of course multiple ways of doing this, but the use of classmethods and only taking arrays seems unusual. Also in the constructor, why do `lower` and `upper` have to be boolean arrays rather than scalars?
(The constructor args don't have to be boolean, so I assume you meant "have to be 1D arrays".) I suppose accepting a scalar for one of them and using broadcasting would work. I did a lot of searching for examples to use as unit tests, and I don't recall any where *all* the values were censored, so I don't think such behavior would actually be useful. And I would worry that someone might misinterpret what using a scalar means.
Yes, that makes sense. The asymmetry between the constructor and the `*_censored` classmethods is a concern though, it looks like you can't get from one to the other. The scalar in your example threw me off, I agree a scalar for `lower` or `upper` in the constructor is potentially confusing.
Cheers, Ralf
After several comments and questions from Ralf (and having had earlier questions from Matt about the API), I think it makes sense to have an issue just for the API design for censored data. Here it is: https://github.com/scipy/scipy/issues/13757 Comments, critiques, etc. are welcome, especially from anyone who has used censored data in anger [*]. Warren [*] http://onlineslangdictionary.com/meaning-definition-of/use-in-anger
Warren
Cheers, Ralf
In [58]: print(data)
CensoredData(20 values: 10 not censored, 10 right-censored)
Fit `weibull_min` (with the location fixed at 0) to the censored data:
In [59]: shape, loc, scale = weibull_min.fit(data, floc=0)
In [60]: shape
Out[60]: 1.720797180719942
In [61]: scale
Out[61]: 606.527565269458
Matt Haberland has already suggested quite a few improvements to the PR. Additional comments would be appreciated.
Warren _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

Hi team, I would like to invite feedback of the design of the censored data API again, as development is picking back up. Please join the discussion in https://github.com/scipy/scipy/issues/13757. Thanks, Matt On Sat, Mar 27, 2021 at 1:55 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
On 3/27/21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Mar 26, 2021 at 5:56 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
On 3/26/21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser <warren.weckesser@gmail.com> wrote:
Hey folks,
A new enhancement for fitting probability distributions to censored data by maximum likelihood estimation is in progress in
Here's the calculation using the proposed API in SciPy:
In [55]: from scipy.stats import weibull_min, CensoredData
Create the `CensoredData` instance:
In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375, 386] + [500]*10)
In [57]: data = CensoredData.right_censored(x, x == 500)
This `x == 500` looks a little odd API-wise. Why not `right_censored(x, 500)`. Or, more importantly, why not something like:
data = CensoredData(x).rightcensor(500)
The API is a work in progress. I didn't do something like that because, in general, the lower bound for right-censored data isn't necessarily the same for each censored value.
Thanks, after the explanation of Robert and the examples below it makes sense to me. I added some more comments on the PR.
(See, for example, the
data shown in the second slide of http://www.ams.sunysb.edu/~zhu/ams588/Lecture_3_likelihood.pdf, and the example at
https://support.sas.com/documentation/cdl/en/qcug/63922/HTML/default/viewer....
. Both of those data sets are used for unit tests in the PR.) However, the case of a single bound for left- or right-censored data is common, so it might be nice to have a convenient way to write it.
Another possible enhancement to the CensoredData API is a `count` argument that gives the number of times the value is to be repeated, but I figured I would propose that in a follow-up PR.
There are of course multiple ways of doing this, but the use of classmethods and only taking arrays seems unusual. Also in the constructor, why do `lower` and `upper` have to be boolean arrays rather than scalars?
(The constructor args don't have to be boolean, so I assume you meant "have to be 1D arrays".) I suppose accepting a scalar for one of them and using broadcasting would work. I did a lot of searching for examples to use as unit tests, and I don't recall any where *all* the values were censored, so I don't think such behavior would actually be useful. And I would worry that someone might misinterpret what using a scalar means.
Yes, that makes sense. The asymmetry between the constructor and the `*_censored` classmethods is a concern though, it looks like you can't get from one to the other. The scalar in your example threw me off, I agree a scalar for `lower` or `upper` in the constructor is potentially confusing.
Cheers, Ralf
After several comments and questions from Ralf (and having had earlier questions from Matt about the API), I think it makes sense to have an issue just for the API design for censored data. Here it is:
https://github.com/scipy/scipy/issues/13757
Comments, critiques, etc. are welcome, especially from anyone who has used censored data in anger [*].
Warren
[*] http://onlineslangdictionary.com/meaning-definition-of/use-in-anger
Warren
Cheers, Ralf
In [58]: print(data)
CensoredData(20 values: 10 not censored, 10 right-censored)
Fit `weibull_min` (with the location fixed at 0) to the censored
data:
In [59]: shape, loc, scale = weibull_min.fit(data, floc=0)
In [60]: shape
Out[60]: 1.720797180719942
In [61]: scale
Out[61]: 606.527565269458
Matt Haberland has already suggested quite a few improvements to the PR. Additional comments would be appreciated.
Warren _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
participants (4)
-
Matt Haberland
-
Ralf Gommers
-
Robert Kern
-
Warren Weckesser