[Numpy-discussion] Generating Bell Curves (was: Using normal() )
Bruce Southey
bsouthey at gmail.com
Fri Apr 25 09:51:36 EDT 2008
Rich Shepard wrote:
> Thanks to several of you I produced test code using the normal density
> function, and it does not do what we need. Neither does the Gaussian
> function using fwhm that I've tried. The latter comes closer, but the ends
> do not reach y=0 when the inflection point is y=0.5.
>
> So, let me ask the collective expertise here how to generate the curves
> that we need.
>
> We need to generate bell-shaped curves given a midpoint, width (where y=0)
> and inflection point (by default, y=0.5) where y is [0.0, 1.0], and x is
> usually [0, 100], but can vary. Using the NumPy arange() function to produce
> the x values (e.g, arange(0, 100, 0.1)), I need a function that will produce
> the associated y values for a bell-shaped curve. These curves represent the
> membership functions for fuzzy term sets, and generally adjacent curves
> overlap where y=0.5. It would be a bonus to be able to adjust the skew and
> kurtosis of the curves, but the controlling data would be the
> center/midpoint and width, with defaults for inflection point, and other
> parameters.
>
> I've been searching for quite some time without finding a solution that
> works as we need it to work.
>
> TIA,
>
> Rich
>
>
Hi,
You could use a Gamma distribution to get a skewed distribution. But to
extend Keith's comment, continuous distributions typically go from
minus infinity or zero to positive infinity and, furthermore, the
probability of a single point in a continuous distribution is always
zero. The only way you are going to get this from a single continuous
distribution is via some truncated distribution - essentially Keith's
reply.
Alternatively, you may get away with a discrete distribution like the
Poisson since it very quickly approaches normality but is skewed. A
multinomial distribution may also work but that is more assumptions. In
either case, you have map the points into the valid space because it is
the distribution within the set that is used not the distribution of the
data.
I do not see the requirement for overlapping curves because the expected
distribution of each set should be independent of the data and of the
other sets. In that case, you just find the mean and variance of each
set to get the degree of overlap you require. The inflection point
requirement is very hard to understand as it different meanings such as
just crossing or same area under the curve. I don't see any simple
solution to that - two normals with the same variance but different
means probably would. If the sets are dependent then you need a
multivariate solution. Really you probably need a mixture of
distributions and/or generate your own function to get something that
meets you full requirements.
Regards
Bruce
More information about the NumPy-Discussion
mailing list