[SciPy-user] kmeans2 random initialization

Tue Apr 1 14:41:32 EDT 2008

On Tue, Apr 1, 2008 at 3:41 AM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
>  This might not be relevant, depending on how the covariance is
>  computed, but one 'gotcha' I've seen with numerical algorithms that
>  assume positive-definiteness is that occasionally floating point
>  oddities will induce (very slight) non-symmetry of the input matrix,
>  and thus the algorithm will choke; it's easily solved by averaging the
>  matrix with it's transpose (though there are probably more efficient
>  ways).

Ultimately, the covariance is computed like so:

  dot(X, X.T.conj())

Ultimately, it is up to the implementation of dot() as to whether or
not it will ensure exact symmetry. At least on my MacBook Pro with the
Accelerate.framework providing the BLAS (an ATLAS derivative), I could
not generate a matrix that was not exactly symmetrical over many
random inputs. I think that less-than-full-rank data is the more
likely problem.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco