[Numpy-discussion] Proposal for matrix_rank function in numpy

Tue Dec 15 12:01:53 EST 2009

Hi,

Following on from the occasional discussion on the list, can I propose
a small matrix_rank function for inclusion in numpy/linalg?

I suggest it because it seems rather a basic need for linear algebra,
and it's very small and simple...

I've appended an implementation with some doctests in the hope that it
will be acceptable,

Robert - I hope you don't mind me quoting you in the notes.

Thanks a lot,

Matthew

def matrix_rank(M, tol=None):
    ''' Return rank of matrix using SVD method

    Rank of the array is the number of SVD singular values of the
    array that are greater than `tol`.

    Parameters
    ----------
    M : array-like
        array of <=2 dimensions
    tol : {None, float}
         threshold below which SVD values are considered zero. If `tol`
         is None, and `S` is an array with singular values for `M`, and
         `eps` is the epsilon value for datatype of `S`, then `tol` set
         to ``S.max() * eps``.

    Examples
    --------
    >>> matrix_rank(np.eye(4)) # Full rank matrix
    4
    >>> matrix_rank(np.c_[np.eye(4),np.eye(4)]) # Rank deficient matrix
    4
    >>> matrix_rank(np.zeros((4,4))) # All zeros - zero rank
    0
    >>> matrix_rank(np.ones((4,))) # 1 dimension - rank 1 unless all 0
    1
    >>> matrix_rank(np.zeros((4,)))
    0
    >>> matrix_rank([1]) # accepts array-like
    1

    Notes
    -----
    Golub and van Loan define "numerical rank deficiency" as using
    tol=eps*S[0] (note that S[0] is the maximum singular value and thus
    the 2-norm of the matrix). There really is not one definition, much
    like there isn't a single definition of the norm of a matrix. For
    example, if your data come from uncertain measurements with
    uncertainties greater than floating point epsilon, choosing a
    tolerance of about the uncertainty is probably a better idea (the
    tolerance may be absolute if the uncertainties are absolute rather
    than relative, even). When floating point roundoff is your concern,
    then "numerical rank deficiency" is a better concept, but exactly
    what the relevant measure of the tolerance is depends on the
    operations you intend to do with your matrix. [RK, numpy mailing
    list]

    References
    ----------
    Matrix Computations by Golub and van Loan
    '''
    M = np.asarray(M)
    if M.ndim > 2:
        raise TypeError('array should have 2 or fewer dimensions')
    if M.ndim < 2:
        return int(not np.all(M==0))
    S = npl.svd(M, compute_uv=False)
    if tol is None:
        tol = S.max() * np.finfo(S.dtype).eps
    return np.sum(S > tol)