[SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ?

Yaroslav Halchenko lists at onerussian.com
Fri Mar 13 10:10:06 EDT 2009


> Fixing numerical integration over the distance of a machine epsilon of
> a function that has a singularity at the boundary was not very high on
> my priority list.
fair enough, but still sad ;)
it is just that I got frustrated since upgrade to 0.7.0 caused quite a
few places to break, and this one was one of the 'soar' ones ;)

> If there is a real use case that requires this, I can do a temporary
well... I think I've mentioned before how I ran into this issue: our
unittests of PyMVPA [1] fail.  Primarily (I think) to my silly
distribution matching function.

> fix. As far as I have seen, you use explicitly this special case as a
> test case and not a test that would reflect a failing use case.
well -- this test case is just an example. that distribution matching is
the one which causes it in unittests

> Overall, I prefer to have a general solution to the boundary problem
> for numerical integration, instead of messing around with the
> theoretically correct boundaries.
sure! proper solution would be nice.  as for "messing with boundaries":
imho it depends on how 'they are messed up with" ;) May be original
self.{a,b} could be left alone, but for numeric integration some others
could be introduced (self.{a,b}_eps), which are used for integration and
some correction term to be added whenever we are in the 'theoretical'
boundaries, to compensate for "missing" part of [self.a, self.a_eps]

Most of the distributions would not need to have them different from a,b
and have 0 correction.
Testing is imho quite obvious -- just go through all distributions and
try to obtain .cdf values within sample points in the vicinity of the
boundaries. I know that it is know exhaustive if a distribution has
singularities within, but well -- it is better than nothing imho ;)

I bet you can come up with a better solution.

> Also, I would like to know what the references for the rdist are.
actually I've found empirically that rdist is the one which I needed,
and indeed there is not much information on the web:

rdist corresponds to the distribution of a (single) coordinate
component for a point located on a hypersphere (in space of N
dimensions) of radius 1. When  N is large it is well approximated by
Gaussian, but in the low dimensions it is very
different and quite interesting (e.g. flat in N=3)

n.b. actually my boss told me that there is a family of distributions where
this one belongs to but I've forgotten which one ;) will ask today again

> Google search for r distribution is pretty useless, and I have not yet
> found a reference or an explanation of the rdist and its uses.
there was just a single page which I ran to which described rdist and
plotted sample pdfs. but can't find it now

[1] http://www.pymvpa.org/
-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]





More information about the SciPy-Dev mailing list