New matvec and vecmat functions
Hi All, I have a PR [1] that adds `np.matvec` and `np.vecmat` gufuncs for matrix-vector and vector-matrix calculations, to add to plain matrix-matrix multiplication with `np.matmul` and the inner vector product with `np.vecdot`. They call BLAS where possible for speed. I'd like to hear whether these are good additions. I also note that for complex numbers, `vecmat` is defined as `x†A`, i.e., the complex conjugate of the vector is taken. This seems to be the standard and is what we used for `vecdot` too (`x†x`). However, it is *not* what `matmul` does for vector-matrix or indeed vector-vector products (remember that those are possible only if the vector is one-dimensional, i.e., not with a stack of vectors). I think this is a bug in matmul, which I'm happy to fix. But I'm posting here in part to get feedback on that. Thanks! Marten [1] https://github.com/numpy/numpy/pull/25675 p.s. Separately, with these functions available, in principle these could be used in `__matmul__` (and thus for `@`) and the specializations in `np.matmul` removed. But that can be a separate PR (if it is wanted at all).
For dot product I can convince myself this is a math definition thing and accept the conjugation. But for "vecmat" why the complex conjugate of the vector? Are we assuming that 1D things are always columns. I am also a bit lost on the difference of dot, vdot and vecdot. Also if __matmul__ and np.matmul give different results, I think you will enjoy many fun tickets. Personally I would agree with them no matter what the reasoning was at the time of divergence. On Tue, Jan 23, 2024 at 11:17 PM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Hi All,
I have a PR [1] that adds `np.matvec` and `np.vecmat` gufuncs for matrix-vector and vector-matrix calculations, to add to plain matrix-matrix multiplication with `np.matmul` and the inner vector product with `np.vecdot`. They call BLAS where possible for speed. I'd like to hear whether these are good additions.
I also note that for complex numbers, `vecmat` is defined as `x†A`, i.e., the complex conjugate of the vector is taken. This seems to be the standard and is what we used for `vecdot` too (`x†x`). However, it is *not* what `matmul` does for vector-matrix or indeed vector-vector products (remember that those are possible only if the vector is one-dimensional, i.e., not with a stack of vectors). I think this is a bug in matmul, which I'm happy to fix. But I'm posting here in part to get feedback on that.
Thanks!
Marten
[1] https://github.com/numpy/numpy/pull/25675
p.s. Separately, with these functions available, in principle these could be used in `__matmul__` (and thus for `@`) and the specializations in `np.matmul` removed. But that can be a separate PR (if it is wanted at all). _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ilhanpolat@gmail.com
For dot product I can convince myself this is a math definition thing and accept the conjugation. But for "vecmat" why the complex conjugate of the vector? Are we assuming that 1D things are always columns. I am also a bit lost on the difference of dot, vdot and vecdot.
Also if __matmul__ and np.matmul give different results, I think you will enjoy many fun tickets. Personally I would agree with them no matter what the reasoning was at the time of divergence.
For vecmat, the logic is indeed that the 1D vectors are always columns, so one needs to transpose and for complex add a conjugate. On the differences between dot, vdot, and vecdot: dot is basically weird, enough for matmul to be added. vdot flattens the arrays before calculating x†x, vecdot uses the last axis (or an explicitly given one). On matmul, indeed we should ensure __matmul__ and np.matmul give identical results; what I meant was that np.matmul doesn't necessarily have to do the same special-casing for vectors, it could just deal with matrices only, and let __matmul__ call vecdot, vecmat, matvec, or matmul as appropriate. Anyway, somewhat orthogonal to the discussion here. -- Marten
On Tue, 23 Jan 2024 at 22:18, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
I also note that for complex numbers, `vecmat` is defined as `x†A`, i.e., the complex conjugate of the vector is taken. This seems to be the standard and is what we used for `vecdot` too (`x†x`). However, it is *not* what `matmul` does for vector-matrix or indeed vector-vector products (remember that those are possible only if the vector is one-dimensional, i.e., not with a stack of vectors). I think this is a bug in matmul, which I'm happy to fix. But I'm posting here in part to get feedback on that.
Does matmul not mean "matrix multiplication"? There is no conjugation in matrix multiplication. Have I misunderstood what matmul is supposed to be? (If it does mean anything other than matrix multiplication then it is an extremely poor choice of name.) -- Oscar
I also note that for complex numbers, `vecmat` is defined as `x†A`, i.e., the complex conjugate of the vector is taken. This seems to be the standard and is what we used for `vecdot` too (`x†x`). However, it is *not* what `matmul` does for vector-matrix or indeed vector-vector products (remember that those are possible only if the vector is one-dimensional, i.e., not with a stack of vectors). I think this is a bug in matmul, which I'm happy to fix. But I'm posting here in part to get feedback on that.
Does matmul not mean "matrix multiplication"?
There is no conjugation in matrix multiplication.
Have I misunderstood what matmul is supposed to be? (If it does mean anything other than matrix multiplication then it is an extremely poor choice of name.)
Right now, there is this weird special-casing of one-1 arguments which are treated as vectors that are promoted to matrices [1]. I agree that it is illogical to have, which is why I had the postscript about removing it from np.matmul, and instead letting `@` call vecdot, vecmat, matvec, or matmul as appropriate. -- Marten [1] From the docstring of np.matmul """ ... The behavior depends on the arguments in the following way. - If both arguments are 2-D they are multiplied like conventional matrices. - If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly. - If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed. - If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed. """ Note that the description is consistent with the current behaviour, but I don't think it is correct (certainly for two 1-d arguments) and my suggestion corresponds to changing the third item to, - If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions and taking the conjugate if complex. After matrix multiplication the prepended 1 is removed.
On Tue, 23 Jan 2024 at 23:13, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
I also note that for complex numbers, `vecmat` is defined as `x†A`, i.e., the complex conjugate of the vector is taken. This seems to be the standard and is what we used for `vecdot` too (`x†x`). However, it is *not* what `matmul` does for vector-matrix or indeed vector-vector products (remember that those are possible only if the vector is one-dimensional, i.e., not with a stack of vectors). I think this is a bug in matmul, which I'm happy to fix. But I'm posting here in part to get feedback on that.
Does matmul not mean "matrix multiplication"?
There is no conjugation in matrix multiplication.
Have I misunderstood what matmul is supposed to be? (If it does mean anything other than matrix multiplication then it is an extremely poor choice of name.)
Right now, there is this weird special-casing of one-1 arguments which are treated as vectors that are promoted to matrices [1]. I agree that it is illogical to have, which is why I had the postscript about removing it from np.matmul, and instead letting `@` call vecdot, vecmat, matvec, or matmul as appropriate.
-- Marten
[1] From the docstring of np.matmul """ ... The behavior depends on the arguments in the following way.
- If both arguments are 2-D they are multiplied like conventional matrices. - If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly. - If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed. - If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed. """
Note that the description is consistent with the current behaviour, but I don't think it is correct (certainly for two 1-d arguments) and my suggestion corresponds to changing the third item to,
- If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions and taking the conjugate if complex. After matrix multiplication the prepended 1 is removed.
I can understand the desire to generalise the idea of matrix multiplication for when the arrays are not both 2-D but taking the complex conjugate makes absolutely no sense in the context of matrix multiplication. You note above that "vecmat is defined as x†A" but my interpretation of that is that vecmat(x, A) == matmul(conjugate(transpose(x)), A). If you want to define vecmat like that then maybe that makes sense in some contexts but including the conjugate as an implicit part of matmul is something that I would find very confusing: such a function should not be called matmul. -- Oscar
I can understand the desire to generalise the idea of matrix multiplication for when the arrays are not both 2-D but taking the complex conjugate makes absolutely no sense in the context of matrix multiplication.
You note above that "vecmat is defined as x†A" but my interpretation of that is that vecmat(x, A) == matmul(conjugate(transpose(x)), A). If you want to define vecmat like that then maybe that makes sense in some contexts but including the conjugate as an implicit part of matmul is something that I would find very confusing: such a function should not be called matmul.
Ah, that's indeed fair. So, I'll remove the idea to change what the special-casing of 1d arrays does in matmul. Options should just be to keep things as they are, or to remove that ability altogether. I'd personally tend to the latter. -- Marten
Why do these belong in NumPy? What is the broad field of application of these functions? And, does a more general concept underpin them? Thanks, Alan Isaac On Tue, Jan 23, 2024 at 5:17 PM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Hi All,
I have a PR [1] that adds `np.matvec` and `np.vecmat` gufuncs for matrix-vector and vector-matrix calculations, to add to plain matrix-matrix multiplication with `np.matmul` and the inner vector product with `np.vecdot`. They call BLAS where possible for speed. I'd like to hear whether these are good additions.
I also note that for complex numbers, `vecmat` is defined as `x†A`, i.e., the complex conjugate of the vector is taken. This seems to be the standard and is what we used for `vecdot` too (`x†x`). However, it is *not* what `matmul` does for vector-matrix or indeed vector-vector products (remember that those are possible only if the vector is one-dimensional, i.e., not with a stack of vectors). I think this is a bug in matmul, which I'm happy to fix. But I'm posting here in part to get feedback on that.
Thanks!
Marten
[1] https://github.com/numpy/numpy/pull/25675
p.s. Separately, with these functions available, in principle these could be used in `__matmul__` (and thus for `@`) and the specializations in `np.matmul` removed. But that can be a separate PR (if it is wanted at all). _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: alan.isaac@gmail.com
Why do these belong in NumPy? What is the broad field of application of these functions? And, does a more general concept underpin them?
Multiplication of a matrix with a vector is about as common as matrix with matrix or vector with vector, and not currently easy to do for stacks of vectors, so I think the case for matvec is similarly strong as that for matmul and vecdot. Arguably, vecmat is slightly less common, though completes the quad. -- Marten
FWIW, +1 for matvec & vecmat to complement matmat (erm, matmul). Having a binop where one argument is a matrix and the other is a stack/batch of vectors is indeed awkward otherwise, and a dedicated function to clearly distinguish "two matrices" from "a matrix and a batch of vectors" sounds great from a user perspective. As to vecmat doing hermitian conjugation of the vector argument --- I'd be in favor (because <\psi | \hat H | \psi>) but this is a weak preference. ср, 24 янв. 2024 г., 21:28 Marten van Kerkwijk <mhvk@astro.utoronto.ca>:
Why do these belong in NumPy? What is the broad field of application of these functions? And, does a more general concept underpin them?
Multiplication of a matrix with a vector is about as common as matrix with matrix or vector with vector, and not currently easy to do for stacks of vectors, so I think the case for matvec is similarly strong as that for matmul and vecdot.
Arguably, vecmat is slightly less common, though completes the quad.
-- Marten
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: evgeny.burovskiy@gmail.com
FWIW, +1 for matvec & vecmat to complement matmat (erm, matmul). Having a binop where one argument is a matrix and the other is a stack/batch of vectors is indeed awkward otherwise, and a dedicated function to clearly distinguish "two matrices" from "a matrix and a batch of vectors" sounds great from a user perspective.
As to vecmat doing hermitian conjugation of the vector argument --- I'd be in favor (because <\psi | \hat H | \psi>) but this is a weak preference.
Indeed, what wikipedia calls the "physics convention" [1] of both the vector dot product <x,y> = x† y and vector matrix product x† A († = transpose-conjugate). -- Marten [1] https://en.wikipedia.org/wiki/Sesquilinear_form
On Wed, 24 Jan 2024 at 19:29, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Why do these belong in NumPy? What is the broad field of application of these functions? And, does a more general concept underpin them?
Multiplication of a matrix with a vector is about as common as matrix with matrix or vector with vector, and not currently easy to do for stacks of vectors, so I think the case for matvec is similarly strong as that for matmul and vecdot.
If either argument is conjugated then I would not describe this simply as "multiplication". A different terminology would be better (and perhaps different function names). -- Oscar
On Wed, Jan 24, 2024 at 2:27 PM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Why do these belong in NumPy? What is the broad field of application of these functions? And, does a more general concept underpin them?
Multiplication of a matrix with a vector is about as common as matrix with matrix or vector with vector, and not currently easy to do for stacks of vectors, so I think the case for matvec is similarly strong as that for matmul and vecdot.
Arguably, vecmat is slightly less common, though completes the quad.
-- Marten
Could you please offer some code or math notation to help communicate this? I am forced to guess at the need. The words "matrix" and "vector" are ambiguous. After all, matrices (of given shape) are a type of vector (i.e., can be added and scaled.) So if by "matrix" you mean "2d array" and by "stack of vectors" you effectively mean "2d array", this sounds like a use for np.dot (possibly after a transpose). However I am going to guess that here by "vector" you actually mean a matrix (i.e., a 2d array) with only one row or only one column, so a "stack" of them is actually 3d. Perhaps the needless dimension is then the real problem and can either not be produced or can be squeezed away.. Thanks, Alan Isaac
Could you please offer some code or math notation to help communicate this? I am forced to guess at the need.
The words "matrix" and "vector" are ambiguous. After all, matrices (of given shape) are a type of vector (i.e., can be added and scaled.) So if by "matrix" you mean "2d array" and by "stack of vectors" you effectively mean "2d array", this sounds like a use for np.dot (possibly after a transpose). However I am going to guess that here by "vector" you actually mean a matrix (i.e., a 2d array) with only one row or only one column, so a "stack" of them is actually 3d. Perhaps the needless dimension is then the real problem and can either not be produced or can be squeezed away..
Stack of matrices in this context is a an ndarray in which the last two dimensions are interpreted as columns and rows of matrices (in that order), stack of vectors as an ndarray in which the last dimension is interpreted as column vectors. (Well, axes in all these functions can be chosen arbitrarily, but those are the defaults.) So a simple example for matvec would be a rotation matrix that I'd like to apply to a large number of vectors (say to points in space); this is currently not easy. Complex vectors might be Fourier components. (Note that I was rather sloppy in calling it multiplication rather than using the term vector-matrix product, etc.; it is definitely not element-wise!). The vector-matrix product comes up a bit less, but as mentioned by Evgeni in physics there is, e.g., the bra-ket stuff with Hamiltonians (<\psi | \hat H | \psi>), and in linear least squares one often gets terms which for real data are written as Xᵀ Z, but for complex would be Xᴴ Z [1] Hope this clarifies things! -- Marten [1] https://en.wikipedia.org/wiki/Linear_least_squares
On Wed, Jan 24, 2024 at 11:02 PM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Stack of matrices in this context is a an ndarray in which the last two dimensions are interpreted as columns and rows of matrices (in that order), stack of vectors as an ndarray in which the last dimension is interpreted as column vectors. (Well, axes in all these functions can be chosen arbitrarily, but those are the defaults.)
So a simple example for matvec would be a rotation matrix that I'd like to apply to a large number of vectors (say to points in space); this is currently not easy. Complex vectors might be Fourier components. (Note that I was rather sloppy in calling it multiplication rather than using the term vector-matrix product, etc.; it is definitely not element-wise!).
The vector-matrix product comes up a bit less, but as mentioned by Evgeni in physics there is, e.g., the bra-ket stuff with Hamiltonians (<\psi | \hat H | \psi>), and in linear least squares one often gets terms which for real data are written as Xᵀ Z, but for complex would be Xᴴ Z [1]
Hope this clarifies things!
Thanks, but I'm still confused about the need. Perhaps the following illustrates why. With respect to the example you offered, what am I missing? Alan Isaac import numpy as np rng = np.random.default_rng() rot = np.array([[0,-1],[1,0]]) #a rotation matrix print("first type of stack of vectors:") vecs1 = rng.random((2,10)) print(vecs1.shape==(2,10)) result1 = rot.dot(vecs1) print("second type of stack of vectors:") rng = np.random.default_rng(314) vecs2 = vecs1.T result2 = rot.dot(vecs2.T) print(f"same result2? {(result1==result2).all()}") print("third type of stack of vectors:") vecs3 = np.reshape(vecs2,(10,2,1)) result3 = rot.dot(vecs3.squeeze().T) print(f"same result3? {(result1==result3).all()}") print("fourth type of stack of vectors:") vecs4 = np.reshape(vecs2,(10,1,2)) result4 = rot.dot(vecs4.squeeze().T) print(f"same result4? {(result1==result4).all()}")
Hi Alan, The problem with .dot is not that it is not possible, but more that it is not obvious exactly what will happen given the overloading of multiple use cases; indeed, this is why `np.matmul` was created. For the stacks of vectors case in particular, it is surprising that the vector dimension is the one but last rather than the last dimension (at least if the total dimension is more than 2). The idea here is to have a function that does just one obvious thing (sadly, for matmul we didn't quite stick to that...). All the best, Marten
On Tue, Jan 23, 2024 at 3:18 PM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Hi All,
I have a PR [1] that adds `np.matvec` and `np.vecmat` gufuncs for matrix-vector and vector-matrix calculations, to add to plain matrix-matrix multiplication with `np.matmul` and the inner vector product with `np.vecdot`. They call BLAS where possible for speed. I'd like to hear whether these are good additions.
I also note that for complex numbers, `vecmat` is defined as `x†A`, i.e., the complex conjugate of the vector is taken. This seems to be the standard and is what we used for `vecdot` too (`x†x`). However, it is *not* what `matmul` does for vector-matrix or indeed vector-vector products (remember that those are possible only if the vector is one-dimensional, i.e., not with a stack of vectors). I think this is a bug in matmul, which I'm happy to fix. But I'm posting here in part to get feedback on that.
Thanks!
Marten
I tend to agree with not using the complex conjugate for vecmat, but would prefer having separate functions for that that make it explicit in the name. I also note that mathematicians use sesquilinear forms, which have the vector conjugate on the other side, so there are different conventions. I prefer the Dirac convention myself, but many mathematical methods texts use the opposite. It is tricky for the teacher in introductory courses, right up there with vectors being called contravariant when they are actually covariant (the coefficients are contravariant). Anyway, I think having the convention explicit in the name will avoid confusion. Chuck
I tend to agree with not using the complex conjugate for vecmat, but would prefer having separate functions for that that make it explicit in the name. I also note that mathematicians use sesquilinear forms, which have the vector conjugate on the other side, so there are different conventions. I prefer the Dirac convention myself, but many mathematical methods texts use the opposite. It is tricky for the teacher in introductory courses, right up there with vectors being called contravariant when they are actually covariant (the coefficients are contravariant). Anyway, I think having the convention explicit in the name will avoid confusion.
Hi Chuck, Indeed, which argument to conjugate seems not generally agreed on -- definitely found wikipedia pages using both convention! But I think numpy made the choice with vdot and vecdot, so that seems mostly a matter of documentation. I'm slightly confused whether or not you agree that the complex conjugate is useful, but can see your point of a name that makes it clear. Or perhaps one could have a wrapper function that let's one switch between conjugating the vector or not? One thing perhaps worth mentioning is that if we decide that vecmat does conjugate, the non-conjugating version can still trivially be done by a transpose using matvec(matrix.mT, vector) [1], while if vecmat does not conjugate by default, getting the conjugate version using the "obvious" vecmat(vector.conj(), matrix) implies a potentially costly copy of the whole vector array. (This can be avoided using the much less obvious np.vecdot(vec[..., np.newaxis, :], mat.mT), but I'm not sure that one really wants to suggest that...) More generally, in my own experience working with complex matrices and vectors, just about anytime one needs a transpose, a conjugate is generally needed too. All the best, Marten [1] Indeed, the implementation of vecmat uses the matvec loops with the matrix transposed for all but the complex numbers.
participants (6)
-
Alan
-
Charles R Harris
-
Evgeni Burovski
-
Ilhan Polat
-
Marten van Kerkwijk
-
Oscar Benjamin