fitting discrete probability distributions to data
hi, i have some data: A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array (dimensions ~20k x 50). B) a 1D array that is just a particular row of that 2d array i need to fit a sum of 2 negative binomial distributions to A), and to fit a single negative binomial distrib. to B). i have spent a while now reading the documentation for numpy.stats and the statsmodel package and various stack overflow posts, etc.. but i do not yet understand how to go about fitting a discrete probability distribution to a vector of data. specific subquestions: - do i need to load data in as a pandas df? an ndarray? does it not matter? - i understand endog and exog in the context of the examples given in the docs (where you have one column that you want to use to predict some other column) but not what they should be in the case where i basically am trying to fit a curve to the normalized histogram of my data - if someone can explain how to fit with statsmodels' "Negative Binomial ( http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.disc...) that would be a good start. but i do also need to know how to fit to a sum of two of these, or possibly a sum of two other discrete distributions - is the patsy formula syntax relevant here? i have never used R and could not find an example of the "R-like" syntax that is similar enough to my use case to parse how it works - honestly i don't know what i'm doing, please help! if these questions reveal grave ignorance, or are not directly relevant enough to scipy for this mailing list, i apologize and thanks for bearing with me. i barely know how to flip a coin, this stuff is new to me. thanks a lot c
On Wed, Mar 11, 2015 at 7:14 PM, c <twocolorflipflop@gmail.com> wrote:
hi,
i have some data:
A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array (dimensions ~20k x 50).
B) a 1D array that is just a particular row of that 2d array
i need to fit a sum of 2 negative binomial distributions to A), and to fit a single negative binomial distrib. to B).
i have spent a while now reading the documentation for numpy.stats and the statsmodel package and various stack overflow posts, etc.. but i do not yet understand how to go about fitting a discrete probability distribution to a vector of data.
Do you have the data in the form of histograms (counts) or the original data ? statsmodels can only estimate based on the original data which is assumed to consist of observations drawn from a Negative Binomial distribution. Fitting histogram and fitting mixtures of distributions is not supported "out of the box", and would require some custom models. If you just want to fit a distribution to a histogram or discrete counts, then using curve_fit or leastsq is one possibility. Josef
specific subquestions:
- do i need to load data in as a pandas df? an ndarray? does it not matter?
- i understand endog and exog in the context of the examples given in the docs (where you have one column that you want to use to predict some other column) but not what they should be in the case where i basically am trying to fit a curve to the normalized histogram of my data
- if someone can explain how to fit with statsmodels' "Negative Binomial ( http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.disc...) that would be a good start. but i do also need to know how to fit to a sum of two of these, or possibly a sum of two other discrete distributions
- is the patsy formula syntax relevant here? i have never used R and could not find an example of the "R-like" syntax that is similar enough to my use case to parse how it works
- honestly i don't know what i'm doing, please help!
if these questions reveal grave ignorance, or are not directly relevant enough to scipy for this mailing list, i apologize and thanks for bearing with me. i barely know how to flip a coin, this stuff is new to me.
thanks a lot c
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
yup, i have the original data On Thu, Mar 12, 2015 at 1:42 AM, <josef.pktd@gmail.com> wrote:
On Wed, Mar 11, 2015 at 7:14 PM, c <twocolorflipflop@gmail.com> wrote:
hi,
i have some data:
A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array (dimensions ~20k x 50).
B) a 1D array that is just a particular row of that 2d array
i need to fit a sum of 2 negative binomial distributions to A), and to fit a single negative binomial distrib. to B).
i have spent a while now reading the documentation for numpy.stats and the statsmodel package and various stack overflow posts, etc.. but i do not yet understand how to go about fitting a discrete probability distribution to a vector of data.
Do you have the data in the form of histograms (counts) or the original data ?
statsmodels can only estimate based on the original data which is assumed to consist of observations drawn from a Negative Binomial distribution. Fitting histogram and fitting mixtures of distributions is not supported "out of the box", and would require some custom models.
If you just want to fit a distribution to a histogram or discrete counts, then using curve_fit or leastsq is one possibility.
Josef
specific subquestions:
- do i need to load data in as a pandas df? an ndarray? does it not matter?
- i understand endog and exog in the context of the examples given in the docs (where you have one column that you want to use to predict some other column) but not what they should be in the case where i basically am trying to fit a curve to the normalized histogram of my data
- if someone can explain how to fit with statsmodels' "Negative Binomial ( http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.disc...) that would be a good start. but i do also need to know how to fit to a sum of two of these, or possibly a sum of two other discrete distributions
- is the patsy formula syntax relevant here? i have never used R and could not find an example of the "R-like" syntax that is similar enough to my use case to parse how it works
- honestly i don't know what i'm doing, please help!
if these questions reveal grave ignorance, or are not directly relevant enough to scipy for this mailing list, i apologize and thanks for bearing with me. i barely know how to flip a coin, this stuff is new to me.
thanks a lot c
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
(please comment inline or post at the bottom in scipy related mailing lists) On Wed, Mar 11, 2015 at 7:49 PM, c <twocolorflipflop@gmail.com> wrote:
yup, i have the original data
To estimate a single Negative Binomial, you can use statsmodels.NegativeBinomial and regress on a constant. endog is your negative binomial data, exog = np.ones(len(data)) and result = sm.NegativeBinomial(data, exog).fit() result.params has the estimated parameters but they are in a mean-dispersion parameterization used for regression, not in the "standard" parameterization of a Negative Binomial distribution. There is somewhere (!?) a helper function to transform the params into the standard form as used for example by scipy.stats.negbin estimating the mixture of two or more NegativeBinomial distributions takes a bit of work. Josef
On Thu, Mar 12, 2015 at 1:42 AM, <josef.pktd@gmail.com> wrote:
On Wed, Mar 11, 2015 at 7:14 PM, c <twocolorflipflop@gmail.com> wrote:
hi,
i have some data:
A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array (dimensions ~20k x 50).
B) a 1D array that is just a particular row of that 2d array
i need to fit a sum of 2 negative binomial distributions to A), and to fit a single negative binomial distrib. to B).
i have spent a while now reading the documentation for numpy.stats and the statsmodel package and various stack overflow posts, etc.. but i do not yet understand how to go about fitting a discrete probability distribution to a vector of data.
Do you have the data in the form of histograms (counts) or the original data ?
statsmodels can only estimate based on the original data which is assumed to consist of observations drawn from a Negative Binomial distribution. Fitting histogram and fitting mixtures of distributions is not supported "out of the box", and would require some custom models.
If you just want to fit a distribution to a histogram or discrete counts, then using curve_fit or leastsq is one possibility.
Josef
specific subquestions:
- do i need to load data in as a pandas df? an ndarray? does it not matter?
- i understand endog and exog in the context of the examples given in the docs (where you have one column that you want to use to predict some other column) but not what they should be in the case where i basically am trying to fit a curve to the normalized histogram of my data
- if someone can explain how to fit with statsmodels' "Negative Binomial ( http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.disc...) that would be a good start. but i do also need to know how to fit to a sum of two of these, or possibly a sum of two other discrete distributions
- is the patsy formula syntax relevant here? i have never used R and could not find an example of the "R-like" syntax that is similar enough to my use case to parse how it works
- honestly i don't know what i'm doing, please help!
if these questions reveal grave ignorance, or are not directly relevant enough to scipy for this mailing list, i apologize and thanks for bearing with me. i barely know how to flip a coin, this stuff is new to me.
thanks a lot c
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (2)
-
c
-
josef.pktd@gmail.com