Feature proposal: weighted gram matrix / sandwich product
Dear numpy community I would like to propose a mew feature in np.linalg to compute weighted gram matrix (sandwich product), see https://github.com/numpy/numpy/issues/29559. Proposed new feature or change: The solvers for many important statistical models like Generalized Linear Models often need to compute |X.T @ W @ X| where |X| is a 2-dimensional array of features (features in columns, observations in rows) and |W| is very often a diagonal array of (current) weights. This computation is usually the main computational bottleneck. It would be great if numpy could provide an efficient implementation of it, e.g. |np.linalg.sandwicht_product(X, weight=w)|. Why numpy? It seems even more unrealistic to me, to get it into BLAS implementations. Numpy has support for SIMD via Highways. Computational alternatives Drawbacks of |(X.T * diag(W)) @ X|: * Additional memory allocation to compute |X.T @ diag(W)| , same size as (usually large) |X|. * The result is symmetric, but this fact is not used. So at least a factor of 2 is possible. Note that without weights, |X.T @ X| uses BLAS syrk. But the weights are crucial. Drawback of |Z = np.sqrt(diag(W))[:, None] * X| and then |Z @ Z|: * Additional memory allocation for |Z|, same size as (usually large) |X|. * Taking square roots for possibly large |diag(W)| Additional information https://github.com/Quantco/tabmat has an implementation of it with XSIMD. Best Christian Lorentzen
Why not: XTWX = (X.T * weights[None, :]) @ X if `weights` is a vector? Allocating `diag(weights)` is the big memory and CPU hog at least for my generally long and skinny X matrices. -Kevin On Fri, Aug 15, 2025 at 10:08 AM Christian Lorentzen via NumPy-Discussion < numpy-discussion@python.org> wrote:
Dear numpy community
I would like to propose a mew feature in np.linalg to compute weighted gram matrix (sandwich product), see https://github.com/numpy/numpy/issues/29559. Proposed new feature or change:
The solvers for many important statistical models like Generalized Linear Models often need to compute X.T @ W @ X where X is a 2-dimensional array of features (features in columns, observations in rows) and W is very often a diagonal array of (current) weights. This computation is usually the main computational bottleneck.
It would be great if numpy could provide an efficient implementation of it, e.g. np.linalg.sandwicht_product(X, weight=w).
Why numpy? It seems even more unrealistic to me, to get it into BLAS implementations. Numpy has support for SIMD via Highways. Computational alternatives
Drawbacks of (X.T * diag(W)) @ X:
- Additional memory allocation to compute X.T @ diag(W) , same size as (usually large) X. - The result is symmetric, but this fact is not used. So at least a factor of 2 is possible. Note that without weights, X.T @ X uses BLAS syrk. But the weights are crucial.
Drawback of Z = np.sqrt(diag(W))[:, None] * X and then Z @ Z:
- Additional memory allocation for Z, same size as (usually large) X. - Taking square roots for possibly large diag(W)
Additional information
https://github.com/Quantco/tabmat has an implementation of it with XSIMD.
Best Christian Lorentzen
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: jacobs@bioinformed.com
If W is just a vector, i.e. my main use case, than what you write is what I wanted to imply and actually how I implemented it in several cases. The point is that X.T * weights[None, :] still doubles memory (same size as X) and this solution doesn't exploit the symmetry. Best Christian On 17.08.2025 22:13, Kevin Jacobs wrote:
Why not:
XTWX = (X.T * weights[None, :]) @ X
if `weights` is a vector? Allocating `diag(weights)` is the big memory and CPU hog at least for my generally long and skinny X matrices.
-Kevin
On Fri, Aug 15, 2025 at 10:08 AM Christian Lorentzen via NumPy-Discussion <numpy-discussion@python.org> wrote:
Dear numpy community
I would like to propose a mew feature in np.linalg to compute weighted gram matrix (sandwich product), see https://github.com/numpy/numpy/issues/29559.
Proposed new feature or change:
The solvers for many important statistical models like Generalized Linear Models often need to compute |X.T @ W @ X| where |X| is a 2-dimensional array of features (features in columns, observations in rows) and |W| is very often a diagonal array of (current) weights. This computation is usually the main computational bottleneck.
It would be great if numpy could provide an efficient implementation of it, e.g. |np.linalg.sandwicht_product(X, weight=w)|.
Why numpy? It seems even more unrealistic to me, to get it into BLAS implementations. Numpy has support for SIMD via Highways.
Computational alternatives
Drawbacks of |(X.T * diag(W)) @ X|:
* Additional memory allocation to compute |X.T @ diag(W)| , same size as (usually large) |X|. * The result is symmetric, but this fact is not used. So at least a factor of 2 is possible. Note that without weights, |X.T @ X| uses BLAS syrk. But the weights are crucial.
Drawback of |Z = np.sqrt(diag(W))[:, None] * X| and then |Z @ Z|:
* Additional memory allocation for |Z|, same size as (usually large) |X|. * Taking square roots for possibly large |diag(W)|
Additional information
https://github.com/Quantco/tabmat has an implementation of it with XSIMD.
Best Christian Lorentzen
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: jacobs@bioinformed.com
_______________________________________________ NumPy-Discussion mailing list --numpy-discussion@python.org To unsubscribe send an email tonumpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address:lorentzen.ch@gmail.com
This is a good place for numba if performance and memory use are a concern. Good NumPy 125 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Naive numba (memory efficient) 386 ms ± 1.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Parallel numba (memory efficient, cpu bounded) 46.8 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Here is a gist that produced the timings: https://gist.github.com/bashtage/39c50b223580b9bc3f763a3b1be3e478 All were on a 12 core Ryzen. Kevin On Mon, Aug 18, 2025 at 12:35 PM Christian Lorentzen via NumPy-Discussion < numpy-discussion@python.org> wrote:
If W is just a vector, i.e. my main use case, than what you write is what I wanted to imply and actually how I implemented it in several cases. The point is that X.T * weights[None, :] still doubles memory (same size as X) and this solution doesn't exploit the symmetry. Best Christian
On 17.08.2025 22:13, Kevin Jacobs wrote:
Why not:
XTWX = (X.T * weights[None, :]) @ X
if `weights` is a vector? Allocating `diag(weights)` is the big memory and CPU hog at least for my generally long and skinny X matrices.
-Kevin
On Fri, Aug 15, 2025 at 10:08 AM Christian Lorentzen via NumPy-Discussion < numpy-discussion@python.org> wrote:
Dear numpy community
I would like to propose a mew feature in np.linalg to compute weighted gram matrix (sandwich product), see https://github.com/numpy/numpy/issues/29559. Proposed new feature or change:
The solvers for many important statistical models like Generalized Linear Models often need to compute X.T @ W @ X where X is a 2-dimensional array of features (features in columns, observations in rows) and W is very often a diagonal array of (current) weights. This computation is usually the main computational bottleneck.
It would be great if numpy could provide an efficient implementation of it, e.g. np.linalg.sandwicht_product(X, weight=w).
Why numpy? It seems even more unrealistic to me, to get it into BLAS implementations. Numpy has support for SIMD via Highways. Computational alternatives
Drawbacks of (X.T * diag(W)) @ X:
- Additional memory allocation to compute X.T @ diag(W) , same size as (usually large) X. - The result is symmetric, but this fact is not used. So at least a factor of 2 is possible. Note that without weights, X.T @ X uses BLAS syrk. But the weights are crucial.
Drawback of Z = np.sqrt(diag(W))[:, None] * X and then Z @ Z:
- Additional memory allocation for Z, same size as (usually large) X. - Taking square roots for possibly large diag(W)
Additional information
https://github.com/Quantco/tabmat has an implementation of it with XSIMD.
Best Christian Lorentzen _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: jacobs@bioinformed.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.orghttps://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: lorentzen.ch@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: kevin.k.sheppard@gmail.com
participants (3)
-
Christian Lorentzen -
Kevin Jacobs -
Kevin Sheppard