LA improvements (was: dot function or dot notation, matrices, arrays?)
Starting a new thread for this.
On Tue, Dec 22, 2009 at 7:13 PM, Anne Archibald
I think we have one major lacuna: vectorized linear algebra. If I have to solve a whole whack of four-dimensional linear systems, right now I need to either write a python loop and use linear algebra on them one by one, or implement my own linear algebra. It's a frustrating lacuna, because all the machinery is there: generalized ufuncs and LAPACK wrappers. Somebody just needs to glue them together. I've even tried making a start on it, but numpy's ufunc machinery and generic type system is just too much of a pain for me to make any progress as is.
Please be more specific: what (which aspects) have been "too much of a pain"? (I ask out of ignorance, not out of challenging your opinion/experience.)
I think if someone wanted to start building a low-level
Again, please be more specific: what do you mean by this? (I know generally what is meant by "low level," but I'd like you to spell out a little more fully what you mean by this in this context.)
generalized ufunc library interface to LAPACK, that would be a big improvement in numpy/scipy's linear algebra. Pretty much everything else strikes me as a question of notation. (Not to trivialize it: good notation makes a tremendous difference.)
Thanks, Anne. DG
Anne _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2009/12/23 David Goldsmith
Starting a new thread for this.
On Tue, Dec 22, 2009 at 7:13 PM, Anne Archibald
wrote: I think we have one major lacuna: vectorized linear algebra. If I have to solve a whole whack of four-dimensional linear systems, right now I need to either write a python loop and use linear algebra on them one by one, or implement my own linear algebra. It's a frustrating lacuna, because all the machinery is there: generalized ufuncs and LAPACK wrappers. Somebody just needs to glue them together. I've even tried making a start on it, but numpy's ufunc machinery and generic type system is just too much of a pain for me to make any progress as is.
Please be more specific: what (which aspects) have been "too much of a pain"? (I ask out of ignorance, not out of challenging your opinion/experience.)
It's been a little while since I took a really close look at it, but I'll try to describe the problems I had. Chiefly I had problems with documentation - the only way I could figure out how to build additional gufuncs was monkey-see-monkey-do, just copying an existing one in an existing file and hoping the build system figured it out. It was also not at all clear how to, say, link to LAPACK, let alone decide based on input types which arguments to promote and how to call out to LAPACK. I'm not saying this is impossible, just that it was enough frustrating no-progress to defeat my initial "hey, I could do that" impulse.
I think if someone wanted to start building a low-level
Again, please be more specific: what do you mean by this? (I know generally what is meant by "low level," but I'd like you to spell out a little more fully what you mean by this in this context.)
Sure. Let me first say that all this is kind of beside the point - the hard part is not designing an API, so it's a bit silly to dream up an API without implementing anything. I had pictured two interfaces to the vectorized linear algebra code. The first would simply provide more-or-less direct access to vectorized versions of the linear algebra functions we have now, with no dimension inference. Thus inv, pinv, svd, lu factor, lu solve, et cetera - but not dot. Dot would have to be split up into vector-vector, vector-matrix, matrix-vector, and matrix-matrix products, since one can no longer use the dimensionality of the inputs to figure out what is wanted. The key idea would be that the "linear algebra dimensions" would always be the last one(s); this is fairly easy to arrange with rollaxis when it isn't already true, would tend to reduce copying on input to LAPACK, and is what the gufunc API wants. This is mostly what I meant by low-level. (A second generation would do things like combine many vector-vector products into a single LAPACK matrix-vector product.) The higher-level API I was imagining - remember, vaporware here - had a Matrix and a Vector class, each holding an arbitrarily-dimensioned array of the relevant object. The point of this is to avoid having to constantly specify whether you want a matrix-vector or matrix-matrix product; it also tidily avoids the always-two-dimensional nuisance of the current matrix API. Anne
On 23-Dec-09, at 10:34 AM, Anne Archibald wrote:
It's been a little while since I took a really close look at it, but I'll try to describe the problems I had. Chiefly I had problems with documentation - the only way I could figure out how to build additional gufuncs was monkey-see-monkey-do, just copying an existing one in an existing file and hoping the build system figured it out. It was also not at all clear how to, say, link to LAPACK, let alone decide based on input types which arguments to promote and how to call out to LAPACK.
I tried to create a new generalized ufunc (a logsumexp to go with logaddexp, so as to avoid all the needless exp's and log's that would be incurred by logaddexp.reduce) and had exactly the same problem. I did get it to build but it was misbehaving (returning an array of the same size as the input) and I couldn't figure out quite why. I agree that the documentation is lacking, but I think it's (rightly) a low priority in the midst of the release candidate.
The key idea would be that the "linear algebra dimensions" would always be the last one(s); this is fairly easy to arrange with rollaxis when it isn't already true, would tend to reduce copying on input to LAPACK, and is what the gufunc API wants.
Would it actually reduce copying if you were using default C-ordered arrays? Maybe I'm mistaken but I thought one almost always had to copy in order to translate C to Fortran order except for a few functions that can take row-ordered stuff. Otherwise, +1 all the way. David
On Wed, Dec 23, 2009 at 10:30 AM, David Warde-Farley
On 23-Dec-09, at 10:34 AM, Anne Archibald wrote:
It's been a little while since I took a really close look at it, but I'll try to describe the problems I had. Chiefly I had problems with documentation - the only way I could figure out how to build additional gufuncs was monkey-see-monkey-do, just copying an existing one in an existing file and hoping the build system figured it out. It was also not at all clear how to, say, link to LAPACK, let alone decide based on input types which arguments to promote and how to call out to LAPACK.
I tried to create a new generalized ufunc (a logsumexp to go with logaddexp, so as to avoid all the needless exp's and log's that would be incurred by logaddexp.reduce) and had exactly the same problem. I did get it to build but it was misbehaving (returning an array of the same size as the input) and I couldn't figure out quite why. I agree that the documentation is lacking, but I think it's (rightly) a low priority in the midst of the release candidate.
Thanks Anne (and Dave): it may seem to you to be "a bit silly to dream up an API without implementing anything," but I think it's useful to get these things "on the record" so to speak, and as a person charged with being especially concerned w/ the doc, it's particularly important for me to hear when its specific deficiencies are productivity blockers...
The key idea would be that the "linear algebra dimensions" would always be the last one(s); this is fairly easy to arrange with rollaxis when it isn't already true, would tend to reduce copying on input to LAPACK, and is what the gufunc API wants.
Would it actually reduce copying if you were using default C-ordered arrays? Maybe I'm mistaken but I thought one almost always had to copy in order to translate C to Fortran order except for a few functions that can take row-ordered stuff.
Otherwise, +1 all the way.
...and of course, discussing these things here begins a dialog that can be the beginning of getting these improvements made - not necessarily by you... :-) Thanks again, for humoring me DG
On 23-Dec-09, at 2:19 PM, David Goldsmith wrote:
Thanks Anne (and Dave): it may seem to you to be "a bit silly to dream up an API without implementing anything," but I think it's useful to get these things "on the record" so to speak, and as a person charged with being especially concerned w/ the doc, it's particularly important for me to hear when its specific deficiencies are productivity blockers...
In fact, there are gufuncs in the tests that are quite instructive and would form the basis of good documentation, though not enough of them to give a complete picture of what the generalized ufunc architecture can do (I remember looking for an example of a particular supported pattern and coming up short, though I can't for the life of me remember which). The existing documentation, plus source code from the umath_tests module marked up descriptively (what all the parameters do, especially the ones which currently receive magic numbers) would probably be the way to go down the road. David
On Wed, Dec 23, 2009 at 2:26 PM, David Warde-Farley
On 23-Dec-09, at 2:19 PM, David Goldsmith wrote:
Thanks Anne (and Dave): it may seem to you to be "a bit silly to dream up an API without implementing anything," but I think it's useful to get these things "on the record" so to speak, and as a person charged with being especially concerned w/ the doc, it's particularly important for me to hear when its specific deficiencies are productivity blockers...
In fact, there are gufuncs in the tests that are quite instructive and would form the basis of good documentation, though not enough of them to give a complete picture of what the generalized ufunc architecture can do (I remember looking for an example of a particular supported pattern and coming up short,
If you came up short, how/why are you certain that the existing arch would support it?
though I can't for the life of me remember which).
The existing documentation, plus source code from the umath_tests module marked up descriptively (what all the parameters do, especially the ones which currently receive magic numbers) would probably be the way to go down the road.
David
Perfect, David! Thanks... DG
On Wed, Dec 23, 2009 at 05:30:16PM -0800, David Goldsmith wrote:
On Wed, Dec 23, 2009 at 2:26 PM, David Warde-Farley
wrote: On 23-Dec-09, at 2:19 PM, David Goldsmith wrote:
Thanks Anne (and Dave): it may seem to you to be "a bit silly to dream up an API without implementing anything," but I think it's useful to get these things "on the record" so to speak, and as a person charged with being especially concerned w/ the doc, it's particularly important for me to hear when its specific deficiencies are productivity blockers...
In fact, there are gufuncs in the tests that are quite instructive and would form the basis of good documentation, though not enough of them to give a complete picture of what the generalized ufunc architecture can do (I remember looking for an example of a particular supported pattern and coming up short,
If you came up short, how/why are you certain that the existing arch would support it?
The existing documentation made the capabilities of generalized ufuncs pretty clear, however not much is demonstrated in terms of the appropriate C API (or code generator) constructs. David
On Wed, Dec 23, 2009 at 2:26 PM, David Warde-Farley
The existing documentation, plus source code from the umath_tests module marked up descriptively (what all the parameters do, especially the ones which currently receive magic numbers) would probably be the way to go down the road.
David
Searching the Wiki, the most pertinent remaining doc is: http://docs.scipy.org/numpy/docs/numpy-docs/reference/c-api.generalized-ufun... Is this "the existing documentation" that you refer to above? If not, perhaps you can find and point me to what it refers in the Wiki? If it is, I infer from the above that, modulo "cleaning it up" a bit, it's fine, it just needs to be supplemented with some examples taken from: numpy\core\src\umath\umath_tests.c.src, correct? DG
2009/12/23 David Warde-Farley
On 23-Dec-09, at 10:34 AM, Anne Archibald wrote:
The key idea would be that the "linear algebra dimensions" would always be the last one(s); this is fairly easy to arrange with rollaxis when it isn't already true, would tend to reduce copying on input to LAPACK, and is what the gufunc API wants.
Would it actually reduce copying if you were using default C-ordered arrays? Maybe I'm mistaken but I thought one almost always had to copy in order to translate C to Fortran order except for a few functions that can take row-ordered stuff.
That's a good point. But even if you need to do a transpose, it'll be faster to transpose data in a contiguous block than data scattered all over memory. Maybe more to the point, broadcasting adds axes to the beginning, so that (say) two-dimensional arrays can act as "matrix scalars". Anne
participants (3)
-
Anne Archibald
-
David Goldsmith
-
David Warde-Farley