[SciPy-User] multivariate empirical distribution function, avoid double loop ?

Robert Kern robert.kern at gmail.com
Wed Aug 24 19:25:12 EDT 2011


On Wed, Aug 24, 2011 at 09:23,  <josef.pktd at gmail.com> wrote:
> Does anyone know whether there is an algorithm that avoids the double
> loop to get a multivariate empirical distribution function?
>
> for point in data:
>     count how many points in data are smaller or equal to point
>
> with 1d data it's just argsort(argsort(data))
>
> double loop version with some test cases is attached.
>
> I didn't see a way that sorting would help.

If you can bear to make a few (nobs, nobs) bool arrays, you can do
just a kvars-sized loop in Python:

dominates = np.ones((len(data), len(data)), dtype=bool)
for x in data.T:
    dominates &= x[:,np.newaxis] > x
sorta_ranks = dominates.sum(axis=1)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the SciPy-User mailing list