An application question for a change: I need to produce pseudorandom numbers drawn from a lognormal distribution (see http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm). Does anyone have existing code for this in Numeric or numarray?
Stephen Walton wrote:
An application question for a change: I need to produce pseudorandom numbers drawn from a lognormal distribution (see http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm). Does anyone have existing code for this in Numeric or numarray?
In [24]: scipy.stats.lognorm? Type: instance Base Class: scipy.stats.distributions.lognorm_gen String Form: <scipy.stats.distributions.lognorm_gen instance at 0x45187f0c> Namespace: Interactive Docstring: A lognormal continuous random variable. Continuous random variables are defined from a standard form chosen for simplicity of representation. The standard form may require some shape parameters to complete its specification. The distributions also take optional location and scale parameters using loc= and scale= keywords (defaults: loc=0, scale=1) I'm not a stats person, so perhaps this one is not enough for you. If that's the case, sorry about the noise. Best, f
Fernando Perez wrote:
I'm not a stats person, so perhaps this one is not enough for you. If that's the case, sorry about the noise.
No, Fernando, ths was what I was looking for, thanks. I have to say, as it is relevant to a previous discussion, that some of the more object-oriented documentation of the stats package is pretty opaque to a new user. For example, help(scipy.stats.distributions) includes the statement: | rvs(self, *args, **kwds) | Random variates of given type. | | *args | ===== | The shape parameter(s) for the distribution (see docstring of the | instance object for more information) Now, I submit a naive new user of Python is not going to know what the "docstring of the instance object" is. Fortunately, scipy.stats.distributions.lognormal? shows it very nicely :-) FYI, it turns out that the distribution of sunspot areas is log-normal, and I'm working on a project to generate a very simple fake sunspot activity cycle. The first simple version, which just generates sunspots which all have the same area, is coded in MATLAB, but I'm promising myself that this project is going to be my first major one done in scipy.
Stephen Walton wrote:
An application question for a change: I need to produce pseudorandom numbers drawn from a lognormal distribution (see http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm). Does anyone have existing code for this in Numeric or numarray?
It is interesting to me to see this question. If I understand you correctly, I can say that Speakeasy has what you want. Actually, Speakeasy can produce random numbers from any user-defined distribution, even one that is not analytical. A colleague of mine once (in the early '90s) used that capability in modeling nuclear scintillation detectors. He took a *measured* pulse height distribution from a PMT and used it to generate random numbers that he input into a model of a scintillation crystal. His "Monte Carlo" simulation was 13 lines long. I was intrigued by the usefulness of this feature, so I always have my eye open to see if any other software package has that capability. I haven't searched exhaustively, but I haven't seen it in Matlab (or octave or scilab) or Mathematica or Mathcad. Or any of the usual python packages. Speakeasy (http://www.speakeasy.com) is a lot like Matlab. I once spoke to the guy who runs Speakeasy, and the story he told me was that Speakeasy and Matlab were once one and the same, but split over a disagreement on approach: one faction wanted to enhance flexibility and functionality by building .m files on a core of fortran routines, while the other faction wanted to optimize speed and performance by building lots of fortran components. The former became Matlab, the latter Speakeasy. I have no idea if this story is true. Clearly the marketing people went with Matlab. I just check their web site, and they are still around. They had a following in financial circles and it looks like they are focused there now. -gary
Gary wrote:
Stephen Walton wrote:
An application question for a change: I need to produce pseudorandom numbers drawn from a lognormal distribution (see http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm). Does anyone have existing code for this in Numeric or numarray?
It is interesting to me to see this question.
If I understand you correctly, I can say that Speakeasy has what you want. Actually, Speakeasy can produce random numbers from any user-defined distribution, even one that is not analytical. A colleague of mine once (in the early '90s) used that capability in modeling nuclear scintillation detectors. He took a *measured* pulse height distribution from a PMT and used it to generate random numbers that he input into a model of a scintillation crystal. His "Monte Carlo" simulation was 13 lines long. I was intrigued by the usefulness of this feature, so I always have my eye open to see if any other software package has that capability. I haven't searched exhaustively, but I haven't seen it in Matlab (or octave or scilab) or Mathematica or Mathcad. Or any of the usual python packages.
Well, it depends on how tricky the function is. If it's univariate, and you can write out the pdf or cdf as a function, then I believe you can subclass scipy.stats.rv_continuous, and it's rvs() method will numerically invert the cdf to generate it's random numbers. If the function is highly multivariate, you might need to do Markov-Chain Monte Carlo which is implemented by PyMC. If you have a bunch of data points from a continuous distribution, but no functional description, then you can make a kernel density estimate and draw random numbers from that. As of last night, that functionality is in scipy.stats.gaussian_kde. Resampling from discrete data is pretty trivial to handwrite. Is there anything else you need? -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Robert Kern wrote:
Gary wrote:
Stephen Walton wrote:
An application question for a change: I need to produce pseudorandom numbers drawn from a lognormal distribution (see http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm). Does anyone have existing code for this in Numeric or numarray?
It is interesting to me to see this question.
If I understand you correctly, I can say that Speakeasy has what you want. Actually, Speakeasy can produce random numbers from any user-defined distribution, even one that is not analytical. A colleague of mine once (in the early '90s) used that capability in modeling nuclear scintillation detectors. He took a *measured* pulse height distribution from a PMT and used it to generate random numbers that he input into a model of a scintillation crystal. His "Monte Carlo" simulation was 13 lines long. I was intrigued by the usefulness of this feature, so I always have my eye open to see if any other software package has that capability. I haven't searched exhaustively, but I haven't seen it in Matlab (or octave or scilab) or Mathematica or Mathcad. Or any of the usual python packages.
Well, it depends on how tricky the function is.
If it's univariate, and you can write out the pdf or cdf as a function, then I believe you can subclass scipy.stats.rv_continuous, and it's rvs() method will numerically invert the cdf to generate it's random numbers.
If the function is highly multivariate, you might need to do Markov-Chain Monte Carlo which is implemented by PyMC.
If you have a bunch of data points from a continuous distribution, but no functional description, then you can make a kernel density estimate and draw random numbers from that. As of last night, that functionality is in scipy.stats.gaussian_kde.
Resampling from discrete data is pretty trivial to handwrite.
Is there anything else you need?
World peace and/or longer prison sentences for parole violators? I wasn't aware of the rvs() method. Now I am. I've never heard of kernel density estimate. Now I have. I think I'll try it out on data from the intro physics lab that I teach. The students are seeing histograms for the first time, and they don't like it when the ones they get with their data don't look like the nice bell curves in the book. Thanks, gary
Gary wrote:
Robert Kern wrote:
Is there anything else you need?
World peace and/or longer prison sentences for parole violators?
Scipy is a large, broad package, but I think this may be outside its purview.
I wasn't aware of the rvs() method. Now I am. I've never heard of kernel density estimate. Now I have. I think I'll try it out on data from the intro physics lab that I teach. The students are seeing histograms for the first time, and they don't like it when the ones they get with their data don't look like the nice bell curves in the book.
Kernel density estimation is a fairly broad subject all by itself and something of an art rather than science. What I implemented is fairly basic, although that may be just fine for an intro physics lab. There is at least one other Python package for doing KDE that has some more options. Particularly, one problem with Gaussian kernels is that they have infinite support; they are non-zero across the whole, infinite domain. This feature may not be desirable for some data, like those which are constrained to be positive. And for the univariate case, you can sometimes get better (in the area of KDE "better" == "looks nicer to me") with what is called "k-nearest neighbors;" the width of each kernel gets adjusted to be peakier where there are lots of data and flatter where there is less data. KDE is a twiddler's paradise. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Robert Kern wrote:
If it's univariate, and you can write out the pdf or cdf as a function, then I believe you can subclass scipy.stats.rv_continuous, and it's rvs() method will numerically invert the cdf to generate it's random numbers.
This is so cool! I had a desire to generate values on (0,1) where values near 0.5 were less probable than values at the endpoints. Here's the implementation: #---------------begin------------------------------ import Numeric as Num from scipy.stats.distributions import rv_continuous # # CDF for the sunspot generation function. If we make it a subclass # of rv_continuous we get sunspot.rvs for free :-) # class sunspot_gen(rv_continuous): pmin=0.1 def _pdf(self,x): pmax=(Num.pi-2*self.pmin)/(Num.pi-2) return((pmax-self.pmin)*(1-Num.sin(Num.pi*x))+self.pmin) def _cdf(self,x): pmax=(Num.pi-2*self.pmin)/(Num.pi-2) return -(cos(pi*x)*pmin-cos(pi*x)*pmax-pmax*x*pi-pmin+pmax)/pi sunspot = sunspot_gen(a=0.,b=1.,name='sunspot', longname='A sunspot subdivision',extradoc=""" Sunspot distribution This distribution represents the probability of a subdivision of a sunspot into two spots of size x and (1-x), where x is a value from sunspot.rvs(0). The PDF has a high probability of x near 0 or 1 and a low probability of x near 0.5. """) #-----------------end-------------------------------- My only question: what should I replace the 'import Numeric as Num' with if I want to be able to work within the framework of using either Numeric or numarray? 'import numerix as Num' doesn't seem to work.
On Mon, 2005-03-07 at 12:04 -0800, Stephen Walton wrote:
Robert Kern wrote:
If it's univariate, and you can write out the pdf or cdf as a function, then I believe you can subclass scipy.stats.rv_continuous, and it's rvs() method will numerically invert the cdf to generate it's random numbers.
This is so cool! I had a desire to generate values on (0,1) where values near 0.5 were less probable than values at the endpoints. Here's the implementation:
#---------------begin------------------------------ import Numeric as Num from scipy.stats.distributions import rv_continuous
# # CDF for the sunspot generation function. If we make it a subclass # of rv_continuous we get sunspot.rvs for free :-) #
class sunspot_gen(rv_continuous): pmin=0.1 def _pdf(self,x): pmax=(Num.pi-2*self.pmin)/(Num.pi-2) return((pmax-self.pmin)*(1-Num.sin(Num.pi*x))+self.pmin) def _cdf(self,x): pmax=(Num.pi-2*self.pmin)/(Num.pi-2) return -(cos(pi*x)*pmin-cos(pi*x)*pmax-pmax*x*pi-pmin+pmax)/pi sunspot = sunspot_gen(a=0.,b=1.,name='sunspot', longname='A sunspot subdivision',extradoc="""
Sunspot distribution
This distribution represents the probability of a subdivision of a sunspot into two spots of size x and (1-x), where x is a value from sunspot.rvs(0). The PDF has a high probability of x near 0 or 1 and a low probability of x near 0.5.
""") #-----------------end--------------------------------
My only question: what should I replace the 'import Numeric as Num' with if I want to be able to work within the framework of using either Numeric or numarray? 'import numerix as Num' doesn't seem to work.
Try "import scipy_base.numerix as Num" but understand that it probably won't work with numarray until scipy as a whole is ported (right now all we've got is scipy_base). It should work with Numeric however. Regards, Todd
participants (5)
-
Fernando Perez
-
Gary
-
Robert Kern
-
Stephen Walton
-
Todd Miller