[Numpy-discussion] Generating Bell Curves (was: Using normal() )

Fri Apr 25 09:51:36 EDT 2008

Rich Shepard wrote:
>    Thanks to several of you I produced test code using the normal density
> function, and it does not do what we need. Neither does the Gaussian
> function using fwhm that I've tried. The latter comes closer, but the ends
> do not reach y=0 when the inflection point is y=0.5.
>
>    So, let me ask the collective expertise here how to generate the curves
> that we need.
>
>    We need to generate bell-shaped curves given a midpoint, width (where y=0)
> and inflection point (by default, y=0.5) where y is [0.0, 1.0], and x is
> usually [0, 100], but can vary. Using the NumPy arange() function to produce
> the x values (e.g, arange(0, 100, 0.1)), I need a function that will produce
> the associated y values for a bell-shaped curve. These curves represent the
> membership functions for fuzzy term sets, and generally adjacent curves
> overlap where y=0.5. It would be a bonus to be able to adjust the skew and
> kurtosis of the curves, but the controlling data would be the
> center/midpoint and width, with defaults for inflection point, and other
> parameters.
>
>    I've been searching for quite some time without finding a solution that
> works as we need it to work.
>
> TIA,
>
> Rich
>
>   
Hi,
You could use a Gamma distribution to get a skewed distribution. But to 
extend Keith's comment, continuous  distributions typically go from 
minus infinity or zero to positive infinity and, furthermore, the 
probability of a single point in a continuous distribution is always 
zero. The only way you are going to get this from a single continuous 
distribution is via some truncated distribution - essentially Keith's 
reply.

Alternatively, you may get away with a discrete distribution like the 
Poisson since it very quickly approaches normality but is skewed. A 
multinomial distribution may also work but that is more assumptions. In 
either case, you have map the points into the valid space because it is 
the distribution within the set that is used not the distribution of the 
data.

I do not see the requirement for overlapping curves because the expected 
distribution of each set should be independent of the data and of the 
other sets. In that case, you just find the mean and variance of each 
set to get the degree of overlap you require. The inflection point 
requirement is very hard to understand as it different meanings such as 
just crossing or same area under the curve. I don't see any simple 
solution to that - two normals with the same variance but different 
means probably would. If the sets are dependent then you need a 
multivariate solution. Really you probably need a mixture of 
distributions and/or generate your own function to get something that 
meets you full requirements.

Regards
Bruce