[Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

Thu Feb 16 12:07:58 EST 2012

On Thu, Feb 16, 2012 at 11:47 AM,  <josef.pktd at gmail.com> wrote:
> On Thu, Feb 16, 2012 at 11:30 AM,  <josef.pktd at gmail.com> wrote:
>> On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser
>> <warren.weckesser at enthought.com> wrote:
>>>
>>>
>>> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig <pierre.haessig at crans.org>
>>> wrote:
>>>>
>>>> Le 16/02/2012 16:20, josef.pktd at gmail.com a écrit :
>>>>
>>>>> I don't see any way to fix multivariate_normal for this case, except
>>>>> for dropping svd or for random perturbing a covariance matrix with
>>>>> multiplicity of singular values.
>>>>
>>>> Hi,
>>>> I just made a quick search in what R guys are doing. It happens there are
>>>> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For
>>>> instance, mvtnorm
>>>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached
>>>> the related function from the source code of this package.
>>>>
>>>> Interestingly enough, it seems they provide 3 different methods (svd,
>>>> eigen values, and Cholesky).
>>>> I don't have the time now to dive in the assessments of pros and cons of
>>>> those three. Maybe one works for our problem, but I didn't check yet.
>>>>
>>>> Pierre
>>>>
>>>
>>>
>>> For some alternatives to numpy's multivariate_normal, see
>>> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
>>> (Cholesky and eigh) are just a couple lines of code.
>>
>> Thanks both,
>>
>> The main point is that it is a "Needs decision"
>>
>> Robert argued several times on the mailing list why he chose svd.
>> (with svd covariance can be closer to singular then with cholesky)
>>
>> In statsmodels we usually just use Cholesky for similar
>> transformation, and I use occasionally an eigh version. (I need to
>> look up the thread but I got puzzled about results with eig and
>> multiplicity of eigenvalues before.)
>>
>> The R code is GPL, but the few lines of code look standard without any
>> special provision for non-deterministic linear algebra.
>>
>> If multivariate_normal switches from svd to cholesky or eigh, we still
>> need to check that we don't run into similar "determinacy" problems
>> with numpy's linalg (I think in statsmodels we use mostly scipy, so I
>> don't know.)
>
> np.linalg.eigh always produces the same eigenvectors, both running
> repeatedly in the same session and running the script several times on
> the command line.
>
> so eigh looks good as alternative to svd for this case, I don't know
> if we buy numerical problems in other corner cases, but for near
> singularity it's always possible to check the smallest eigenvalue

cholesky is also deterministic in my runs

What I would suggest is to use cholesky first, catch the singular
exception and then use eigh. With eigh we would get perfectly
correlated random variables.

Again if my reading of linalg comments is correct, cholesky is the
fastest way to detect singularity of a matrix, and is faster then eigh
in the non-singular case.

I have no idea if there is an almost singular case, where cholesky
fails, but the current svd would produce a not perfectly correlated
random sample (up to numerical precision).

(Alternative, which I don't think I like so much, is to use a small
Ridge correction (multiply diagonal by 1 + x*nulp ? This would bound
it away from perfect correlation, I guess.)

Josef

>
> Josef
>
>>
>> Josef
>>
>>>
>>> Warren
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>