Proposal for matrix_rank function in numpy
Hi, Following on from the occasional discussion on the list, can I propose a small matrix_rank function for inclusion in numpy/linalg? I suggest it because it seems rather a basic need for linear algebra, and it's very small and simple... I've appended an implementation with some doctests in the hope that it will be acceptable, Robert - I hope you don't mind me quoting you in the notes. Thanks a lot, Matthew def matrix_rank(M, tol=None): ''' Return rank of matrix using SVD method Rank of the array is the number of SVD singular values of the array that are greater than `tol`. Parameters ---------- M : array-like array of <=2 dimensions tol : {None, float} threshold below which SVD values are considered zero. If `tol` is None, and `S` is an array with singular values for `M`, and `eps` is the epsilon value for datatype of `S`, then `tol` set to ``S.max() * eps``. Examples -------- >>> matrix_rank(np.eye(4)) # Full rank matrix 4 >>> matrix_rank(np.c_[np.eye(4),np.eye(4)]) # Rank deficient matrix 4 >>> matrix_rank(np.zeros((4,4))) # All zeros - zero rank 0 >>> matrix_rank(np.ones((4,))) # 1 dimension - rank 1 unless all 0 1 >>> matrix_rank(np.zeros((4,))) 0 >>> matrix_rank([1]) # accepts array-like 1 Notes ----- Golub and van Loan define "numerical rank deficiency" as using tol=eps*S[0] (note that S[0] is the maximum singular value and thus the 2-norm of the matrix). There really is not one definition, much like there isn't a single definition of the norm of a matrix. For example, if your data come from uncertain measurements with uncertainties greater than floating point epsilon, choosing a tolerance of about the uncertainty is probably a better idea (the tolerance may be absolute if the uncertainties are absolute rather than relative, even). When floating point roundoff is your concern, then "numerical rank deficiency" is a better concept, but exactly what the relevant measure of the tolerance is depends on the operations you intend to do with your matrix. [RK, numpy mailing list] References ---------- Matrix Computations by Golub and van Loan ''' M = np.asarray(M) if M.ndim > 2: raise TypeError('array should have 2 or fewer dimensions') if M.ndim < 2: return int(not np.all(M==0)) S = npl.svd(M, compute_uv=False) if tol is None: tol = S.max() * np.finfo(S.dtype).eps return np.sum(S > tol)
On Tue, Dec 15, 2009 at 12:01 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Following on from the occasional discussion on the list, can I propose a small matrix_rank function for inclusion in numpy/linalg?
I suggest it because it seems rather a basic need for linear algebra, and it's very small and simple...
I've appended an implementation with some doctests in the hope that it will be acceptable,
Robert - I hope you don't mind me quoting you in the notes.
Thanks a lot,
Matthew
def matrix_rank(M, tol=None): ''' Return rank of matrix using SVD method
Rank of the array is the number of SVD singular values of the array that are greater than `tol`.
Parameters ---------- M : array-like array of <=2 dimensions tol : {None, float} threshold below which SVD values are considered zero. If `tol` is None, and `S` is an array with singular values for `M`, and `eps` is the epsilon value for datatype of `S`, then `tol` set to ``S.max() * eps``.
Examples -------- >>> matrix_rank(np.eye(4)) # Full rank matrix 4 >>> matrix_rank(np.c_[np.eye(4),np.eye(4)]) # Rank deficient matrix 4 >>> matrix_rank(np.zeros((4,4))) # All zeros - zero rank 0 >>> matrix_rank(np.ones((4,))) # 1 dimension - rank 1 unless all 0 1 >>> matrix_rank(np.zeros((4,))) 0 >>> matrix_rank([1]) # accepts array-like 1
Notes ----- Golub and van Loan define "numerical rank deficiency" as using tol=eps*S[0] (note that S[0] is the maximum singular value and thus the 2-norm of the matrix). There really is not one definition, much like there isn't a single definition of the norm of a matrix. For example, if your data come from uncertain measurements with uncertainties greater than floating point epsilon, choosing a tolerance of about the uncertainty is probably a better idea (the tolerance may be absolute if the uncertainties are absolute rather than relative, even). When floating point roundoff is your concern, then "numerical rank deficiency" is a better concept, but exactly what the relevant measure of the tolerance is depends on the operations you intend to do with your matrix. [RK, numpy mailing list]
References ---------- Matrix Computations by Golub and van Loan ''' M = np.asarray(M) if M.ndim > 2: raise TypeError('array should have 2 or fewer dimensions') if M.ndim < 2: return int(not np.all(M==0)) S = npl.svd(M, compute_uv=False) if tol is None: tol = S.max() * np.finfo(S.dtype).eps return np.sum(S > tol) _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
This was missing from numpy compared to matlab and gauss. If we put it in linalg next to np.linalg.cond, then we could shorten the name to `rank`, since the meaning of np.linalg.rank should be pretty obvious. Josef
On 12/15/2009 11:12 AM, josef.pktd@gmail.com wrote:
On Tue, Dec 15, 2009 at 12:01 PM, Matthew Brett<matthew.brett@gmail.com> wrote:
Hi,
Following on from the occasional discussion on the list, can I propose a small matrix_rank function for inclusion in numpy/linalg?
I suggest it because it seems rather a basic need for linear algebra, and it's very small and simple...
I've appended an implementation with some doctests in the hope that it will be acceptable,
Robert - I hope you don't mind me quoting you in the notes.
Thanks a lot,
Matthew
def matrix_rank(M, tol=None): ''' Return rank of matrix using SVD method
Rank of the array is the number of SVD singular values of the array that are greater than `tol`.
Parameters ---------- M : array-like array of<=2 dimensions tol : {None, float} threshold below which SVD values are considered zero. If `tol` is None, and `S` is an array with singular values for `M`, and `eps` is the epsilon value for datatype of `S`, then `tol` set to ``S.max() * eps``.
Examples -------- >>> matrix_rank(np.eye(4)) # Full rank matrix 4 >>> matrix_rank(np.c_[np.eye(4),np.eye(4)]) # Rank deficient matrix 4 >>> matrix_rank(np.zeros((4,4))) # All zeros - zero rank 0 >>> matrix_rank(np.ones((4,))) # 1 dimension - rank 1 unless all 0 1 >>> matrix_rank(np.zeros((4,))) 0 >>> matrix_rank([1]) # accepts array-like 1
Notes ----- Golub and van Loan define "numerical rank deficiency" as using tol=eps*S[0] (note that S[0] is the maximum singular value and thus the 2-norm of the matrix). There really is not one definition, much like there isn't a single definition of the norm of a matrix. For example, if your data come from uncertain measurements with uncertainties greater than floating point epsilon, choosing a tolerance of about the uncertainty is probably a better idea (the tolerance may be absolute if the uncertainties are absolute rather than relative, even). When floating point roundoff is your concern, then "numerical rank deficiency" is a better concept, but exactly what the relevant measure of the tolerance is depends on the operations you intend to do with your matrix. [RK, numpy mailing list]
References ---------- Matrix Computations by Golub and van Loan ''' M = np.asarray(M) if M.ndim> 2: raise TypeError('array should have 2 or fewer dimensions') if M.ndim< 2: return int(not np.all(M==0)) S = npl.svd(M, compute_uv=False) if tol is None: tol = S.max() * np.finfo(S.dtype).eps return np.sum(S> tol) _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
This was missing from numpy compared to matlab and gauss.
If we put it in linalg next to np.linalg.cond, then we could shorten the name to `rank`, since the meaning of np.linalg.rank should be pretty obvious.
Josef _______________________________________________
+1 for the function but we can not shorten the name because of existing numpy.rank() function. Bruce
Hi,
+1 for the function but we can not shorten the name because of existing numpy.rank() function.
I don't feel strongly about the name, but I imagine you could do from numpy.linalg import rank as matrix_rank if you weren't using the numpy.linalg namespace already... Best, Matthew
On 12/15/2009 1:39 PM, Bruce Southey wrote:
+1 for the function but we can not shorten the name because of existing numpy.rank() function.
1. Is it a rule that there cannot be a name duplication in this different namespace? 2. Is there a commitment to keeping both np.rank and np.ndim? (I.e., can np.rank never be deprecated?) If the answers are both 'yes', then perhaps linalg.rank2d is a possible shorter name. Alan Isaac
On 12/15/2009 12:47 PM, Alan G Isaac wrote:
On 12/15/2009 1:39 PM, Bruce Southey wrote:
+1 for the function but we can not shorten the name because of existing numpy.rank() function.
1. Is it a rule that there cannot be a name duplication in this different namespace?
In my view this is still the same numpy namespace. An example of the potential problems is just using an incorrect import statement somewhere: from numpy import rank instead of from numpy.linalg import rank For a package you control, you should really prevent this type of user mistake.
2. Is there a commitment to keeping both np.rank and np.ndim? (I.e., can np.rank never be deprecated?)
I do not see that is practical because of the number of releases to actually remove a function. Also the current rank function has existed for a very long time in Numerical Python (as it is present in Numeric). So it could be confusing for a user to think that the function just has been moved rather than being a different function.
If the answers are both 'yes', then perhaps linalg.rank2d is a possible shorter name.
Alan Isaac _______________________________________________
0 Actually I do interpret rank in terms of linear algebra definition, but obviously other people have other meanings. I Bruce
On Tue, Dec 15, 2009 at 11:01, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Following on from the occasional discussion on the list, can I propose a small matrix_rank function for inclusion in numpy/linalg?
I suggest it because it seems rather a basic need for linear algebra, and it's very small and simple...
I've appended an implementation with some doctests in the hope that it will be acceptable,
I think you need a real example of a nontrivial numerically rank-deficient matrix. Note that c_[eye(4), eye(4)] is actually a full-rank matrix. A matrix is full rank if its numerical rank is equal to min(rows, cols) not max(rows, cols). Taking I=eye(4); I[-1,-1] = 0.0 should be a sufficient example.
Robert - I hope you don't mind me quoting you in the notes.
I certainly. However, you do not need to cite me; I'm in the authors list already. On the other hand, you probably shouldn't copy-and-paste anything I write on the mailing list to use in a docstring. On the mailing list, I am answering a particular question and use a different voice than is appropriate for a docstring. Also, a full citation of Golub and Van Loan would be appropriate: .. [1] G. H. Golub and C. F. Van Loan, _Matrix Computations_. Baltimore: Johns Hopkins University Press, 1996. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Hi, Is it reasonable to summarize that, to avoid confusion, we keep 'matrix_rank' as the name? I've edited as Robert suggested, attempting to adopt a more suitable tone in the docstring... Thanks a lot, Matthew def matrix_rank(M, tol=None): ''' Return rank of matrix using SVD method Rank of the array is the number of SVD singular values of the array that are greater than `tol`. Parameters ---------- M : array-like array of <=2 dimensions tol : {None, float} threshold below which SVD values are considered zero. If `tol` is None, and `S` is an array with singular values for `M`, and `eps` is the epsilon value for datatype of `S`, then `tol` set to ``S.max() * eps``. Examples -------- >>> matrix_rank(np.eye(4)) # Full rank matrix 4 >>> I=np.eye(4); I[-1,-1] = 0. # rank deficient matrix >>> matrix_rank(I) 3 >>> matrix_rank(np.zeros((4,4))) # All zeros - zero rank 0 >>> matrix_rank(np.ones((4,))) # 1 dimension - rank 1 unless all 0 1 >>> matrix_rank(np.zeros((4,))) 0 >>> matrix_rank([1]) # accepts array-like 1 Notes ----- Golub and van Loan [1]_ define "numerical rank deficiency" as using tol=eps*S[0] (where S[0] is the maximum singular value and thus the 2-norm of the matrix). This is one definition of rank deficiency, and the one we use here. When floating point roundoff is the main concern, then "numerical rank deficiency" is a reasonable choice. In some cases you may prefer other definitions. The most useful measure of the tolerance depends on the operations you intend to use on your matrix. For example, if your data come from uncertain measurements with uncertainties greater than floating point epsilon, choosing a tolerance near that uncertainty may be preferable. The tolerance may be absolute if the uncertainties are absolute rather than relative. References ---------- .. [1] G. H. Golub and C. F. Van Loan, _Matrix Computations_. Baltimore: Johns Hopkins University Press, 1996. ''' M = np.asarray(M) if M.ndim > 2: raise TypeError('array should have 2 or fewer dimensions') if M.ndim < 2: return int(not np.all(M==0)) S = npl.svd(M, compute_uv=False) if tol is None: tol = S.max() * np.finfo(S.dtype).eps return np.sum(S > tol)
On Tue, Dec 15, 2009 at 3:16 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Is it reasonable to summarize that, to avoid confusion, we keep 'matrix_rank' as the name?
I've edited as Robert suggested, attempting to adopt a more suitable tone in the docstring...
Thanks a lot,
Matthew
What comes next when someone offers up a useful function like this? We are using an earlier version of in statsmodels and wouldn't mind seeing this in numpy. Presumably the doctests should be turned into actual tests (noting Robert's comment) to make it more likely that it gets in and an enhancement ticket should be filed? Skipper
Hi,
Is it reasonable to summarize that, to avoid confusion, we keep 'matrix_rank' as the name?
I've edited as Robert suggested, attempting to adopt a more suitable tone in the docstring...
What comes next when someone offers up a useful function like this? We are using an earlier version of in statsmodels and wouldn't mind seeing this in numpy. Presumably the doctests should be turned into actual tests (noting Robert's comment) to make it more likely that it gets in and an enhancement ticket should be filed?
I'm happy to write the doctests as tests. My feeling is there is no objection to this function at the moment, so it would be reasonable, unless I hear otherwise, to commit to SVN. Best, Matthew
On Wed, Dec 16, 2009 at 2:13 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Is it reasonable to summarize that, to avoid confusion, we keep 'matrix_rank' as the name?
I've edited as Robert suggested, attempting to adopt a more suitable tone in the docstring...
What comes next when someone offers up a useful function like this? We are using an earlier version of in statsmodels and wouldn't mind seeing this in numpy. Presumably the doctests should be turned into actual tests (noting Robert's comment) to make it more likely that it gets in and an enhancement ticket should be filed?
I'm happy to write the doctests as tests. My feeling is there is no objection to this function at the moment, so it would be reasonable, unless I hear otherwise, to commit to SVN.
Sounds good. I didn't know you had commit privileges, and I didn't want to see this get lost in the shuffle. Skipper
On Wed, Dec 16, 2009 at 02:13:08PM -0500, Matthew Brett wrote:
I'm happy to write the doctests as tests. My feeling is there is no objection to this function at the moment, so it would be reasonable, unless I hear otherwise, to commit to SVN.
I have one small comment: I am really happy to see you working on this function. It will be very useful, and its great to have it in a generaly-available package. However, I wonder whether it belongs to numpy.linalg, or scipy.linalg. I was under the impression that we should direct users who have linalg problems to scipy, as it can do much more. My 2 cents, Gaël
Hi, On Wed, Dec 16, 2009 at 2:16 PM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Wed, Dec 16, 2009 at 02:13:08PM -0500, Matthew Brett wrote:
I'm happy to write the doctests as tests. My feeling is there is no objection to this function at the moment, so it would be reasonable, unless I hear otherwise, to commit to SVN.
I have one small comment: I am really happy to see you working on this function. It will be very useful, and its great to have it in a generaly-available package. However, I wonder whether it belongs to numpy.linalg, or scipy.linalg. I was under the impression that we should direct users who have linalg problems to scipy, as it can do much more.
It's another option, and one I thought of, but in this case, the use is so ubiquitous in linear algebra, and the function so small, that it would seem a pity to require installing scipy to get it. See you, Matthew
Hi Gael, On 16-Dec-09, at 2:16 PM, Gael Varoquaux wrote:
I was under the impression that we should direct users who have linalg problems to scipy, as it can do much more.
I agree about pushing users in that direction, but I think that's mostly a consequence of all the wrapped Fortran routines that already exist there. If this thing can be implemented in a short Python function I don't see the harm in having it in NumPy. David
On Wed, Dec 16, 2009 at 10:56 AM, Skipper Seabold <jsseabold@gmail.com> wrote:
Presumably the doctests should be turned into actual tests (noting Robert's comment) to make it more likely that it gets in
Just curious: is there a policy against pure doctests in numpy? I've always found that doctests and 'real tests' serve complementary purposes and are both useful: - small, clear tests that make for *illustrative* examples for the end user should be left in the docstring, and picked up by the test suite as normal doctests. - tests with a lot more logic that get cumbersome to write as doctests can go into 'normal' tests into the test suite. - There's also a valid third category: for cases where it's convenient to write the test interactively but one doesn't want the example in the main docstring, putting a function in the test suite that simply has the doctest as a docstring works (it may require a little decorator, I don't recall). I'm just wondering if there's a policy of requiring that all tests become non-doctests... Cheers, f
On Fri, Dec 18, 2009 at 8:21 PM, Fernando Perez <fperez.net@gmail.com>wrote:
On Wed, Dec 16, 2009 at 10:56 AM, Skipper Seabold <jsseabold@gmail.com> wrote:
Presumably the doctests should be turned into actual tests (noting Robert's comment) to make it more likely that it gets in
Just curious: is there a policy against pure doctests in numpy? I've always found that doctests and 'real tests' serve complementary purposes and are both useful:
- small, clear tests that make for *illustrative* examples for the end user should be left in the docstring, and picked up by the test suite as normal doctests.
- tests with a lot more logic that get cumbersome to write as doctests can go into 'normal' tests into the test suite.
- There's also a valid third category: for cases where it's convenient to write the test interactively but one doesn't want the example in the main docstring, putting a function in the test suite that simply has the doctest as a docstring works (it may require a little decorator, I don't recall).
I'm just wondering if there's a policy of requiring that all tests become non-doctests...
I don't think there is a policy, but there is a growing tradition to put "serious" tests in a test suite and avoid making them doctests. Functional tests need to be easy to read, maintain, and document, and doctests don't really fit that description. Your policy for examples in docstrings looks like a good one, though. Chuck
On Fri, Dec 18, 2009 at 10:21 PM, Fernando Perez <fperez.net@gmail.com> wrote:
On Wed, Dec 16, 2009 at 10:56 AM, Skipper Seabold <jsseabold@gmail.com> wrote:
Presumably the doctests should be turned into actual tests (noting Robert's comment) to make it more likely that it gets in
Just curious: is there a policy against pure doctests in numpy? I've always found that doctests and 'real tests' serve complementary purposes and are both useful:
- small, clear tests that make for *illustrative* examples for the end user should be left in the docstring, and picked up by the test suite as normal doctests.
- tests with a lot more logic that get cumbersome to write as doctests can go into 'normal' tests into the test suite.
- There's also a valid third category: for cases where it's convenient to write the test interactively but one doesn't want the example in the main docstring, putting a function in the test suite that simply has the doctest as a docstring works (it may require a little decorator, I don't recall).
I'm just wondering if there's a policy of requiring that all tests become non-doctests...
doctests have cross platform rendering/printing/formatting problems 1e-01 versus 1e-001 there was a test failure recently with, I think, cheby also nans render differently, even scalar nan versus nans in arrays. I don't know about differences across versions of python, numpy, ... but doctests are very fragile for numbers because they also test the formatting. Also, the precision is not as easy to control on a test-by-test basis than with assert_almost_equal or similar and easier to adjust when the implementation changes (e.g. in stats). I think, doctests are faster to write but more work to maintain. But I don't know of any "official" policy, http://projects.scipy.org/numpy/wiki/TestingGuidelines#doctests explains how to do them but no recommendation whether to use them or not. Josef
Cheers,
f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Dec 18, 2009 at 21:21, Fernando Perez <fperez.net@gmail.com> wrote:
On Wed, Dec 16, 2009 at 10:56 AM, Skipper Seabold <jsseabold@gmail.com> wrote:
Presumably the doctests should be turned into actual tests (noting Robert's comment) to make it more likely that it gets in
Just curious: is there a policy against pure doctests in numpy? I've always found that doctests and 'real tests' serve complementary purposes and are both useful:
- small, clear tests that make for *illustrative* examples for the end user should be left in the docstring, and picked up by the test suite as normal doctests.
- tests with a lot more logic that get cumbersome to write as doctests can go into 'normal' tests into the test suite.
- There's also a valid third category: for cases where it's convenient to write the test interactively but one doesn't want the example in the main docstring, putting a function in the test suite that simply has the doctest as a docstring works (it may require a little decorator, I don't recall).
I'm just wondering if there's a policy of requiring that all tests become non-doctests...
My policy and rationale, which I believe is reflected in the docstring standard, is that examples in the docstrings should put pedagogical concerns above all others. In my experience, a properly robust doctest sacrifices the readability, clarity, and terseness of a good example. Thus, not all examples run as doctests, so docstrings are not added to numpy's test suite. For example, for floating point functions, one should use allclose(), etc. to test the results against the gold standard. However, using that in the example makes it less clear what is going on. Compare: In [2]: np.sin(np.linspace(0, 2*np.pi, 10)) Out[2]: array([ 0.00000000e+00, 6.42787610e-01, 9.84807753e-01, 8.66025404e-01, 3.42020143e-01, -3.42020143e-01, -8.66025404e-01, -9.84807753e-01, -6.42787610e-01, -2.44929360e-16]) In [4]: np.allclose(np.sin(np.linspace(0, 2*np.pi, 10)), array([0.0, .642787610, .984807753, .866025404, .342020143, -.342020143, -.866025404, -.984807753, -.642787610, 0.0])) Out[4]: True I certainly don't mind properly written doctests outside of the real docstrings (e.g. in particular files under test/ that just contain doctests); they're a handy way to write certain tests although they have well known and annoying limits for numerical work. I just don't want documentation examples to be doctests. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Fri, Dec 18, 2009 at 8:10 PM, Robert Kern <robert.kern@gmail.com> wrote:
My policy and rationale, which I believe is reflected in the docstring standard, is that examples in the docstrings should put pedagogical concerns above all others. In my experience, a properly robust doctest sacrifices the readability, clarity, and terseness of a good example. Thus, not all examples run as doctests, so docstrings are not added to numpy's test suite. For example, for floating point functions, one should use allclose(), etc. to test the results against the gold standard.
I think we mostly agree, up to a point: I also emphasized pedagogical value, but I think it would be better in the long run if we could also run all examples as part of the test suite. This would act as a protection against examples going stale due to changes in the underlying code. A false example is worse than no example at all. I wonder if this couldn't be addressed by simply having a suitable set of printing options wrapped up in a utility that doctests could all use (4 digits only, setting certain flags, linewidth, etc). This could help with most of the problems with robustness and maintenance, while allowing us to ensure that we can always guarantee that examples actually do work. Just a thought. Cheers, f
participants (10)
-
Alan G Isaac
-
Bruce Southey
-
Charles R Harris
-
David Warde-Farley
-
Fernando Perez
-
Gael Varoquaux
-
josef.pktd@gmail.com
-
Matthew Brett
-
Robert Kern
-
Skipper Seabold