question: OLS with sparse design and dense parameter
(sunday night quest for another sparse recipe) This is a sparse algorithm question: scipy.sparse.linalg has a choice of linear system solvers, but I have not much idea about the relative merits. Is there a recommendation for the following case? suppose I want to do estimate a standard linear model by OLS y = x * b, find b = argmin(y - x*b) x is sparse, with blocks, medium large (20000 by 2000) but if it works quite a bit larger, but we don't have a small number of dominant singular values, the solution of the linear system will be dense. the basic parts of my minimum example nobsi, n_groups, k_vars = 10, 2000, 3 #nobsi, n_groups, k_vars = 2, 3, 3 xi = np.arange(nobsi)[:,None]**np.arange(3) #vander x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr() (balanced kron structure is just to make a quick example) x_sp.shape (20000, 6000) first try xpx = x_sp.T.dot(x_sp) # sparse xpy = x_sp.T.dot(y) # dense p_dense = np.linalg.solve(xpx.toarray(), xpy) p_sp1 = sparse.linalg.spsolve(xpx, xpy) p_sp2 = sparse.linalg.lsmr(x_sp, y)[0] timing: 15.6659998894 0.00500011444092 0.00600004196167 Is this too small to worry, and anything works, or is there a recommendation? definitely don't use dense. Thanks, Josef
I guess the classic solution for ols are sparse cholesky or sparse qr. scikits.sparse has a cholmod wrapper, and could be modified to have a suitesparseqr wrapper relatively easily. On 28 Apr 2014 03:31, <josef.pktd@gmail.com> wrote:
(sunday night quest for another sparse recipe)
This is a sparse algorithm question:
scipy.sparse.linalg has a choice of linear system solvers, but I have not much idea about the relative merits. Is there a recommendation for the following case?
suppose I want to do estimate a standard linear model by OLS y = x * b, find b = argmin(y - x*b)
x is sparse, with blocks, medium large (20000 by 2000) but if it works quite a bit larger, but we don't have a small number of dominant singular values, the solution of the linear system will be dense.
the basic parts of my minimum example
nobsi, n_groups, k_vars = 10, 2000, 3
#nobsi, n_groups, k_vars = 2, 3, 3
xi = np.arange(nobsi)[:,None]**np.arange(3) #vander
x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()
(balanced kron structure is just to make a quick example)
x_sp.shape
(20000, 6000)
first try
xpx = x_sp.T.dot(x_sp) # sparse
xpy = x_sp.T.dot(y) # dense
p_dense = np.linalg.solve(xpx.toarray(), xpy)
p_sp1 = sparse.linalg.spsolve(xpx, xpy)
p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]
timing: 15.6659998894 0.00500011444092 0.00600004196167
Is this too small to worry, and anything works, or is there a recommendation?
definitely don't use dense.
Thanks,
Josef
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
I needed SuiteSparseQR for my work and added the backslash function of it to scikits.sparse. The modification is available here: https://github.com/duga3/scikits-sparse 2014-04-28 8:42 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
I guess the classic solution for ols are sparse cholesky or sparse qr. scikits.sparse has a cholmod wrapper, and could be modified to have a suitesparseqr wrapper relatively easily.
On 28 Apr 2014 03:31, <josef.pktd@gmail.com> wrote:
(sunday night quest for another sparse recipe)
This is a sparse algorithm question:
scipy.sparse.linalg has a choice of linear system solvers, but I have not much idea about the relative merits. Is there a recommendation for the following case?
suppose I want to do estimate a standard linear model by OLS y = x * b, find b = argmin(y - x*b)
x is sparse, with blocks, medium large (20000 by 2000) but if it works quite a bit larger, but we don't have a small number of dominant singular values, the solution of the linear system will be dense.
the basic parts of my minimum example
nobsi, n_groups, k_vars = 10, 2000, 3
#nobsi, n_groups, k_vars = 2, 3, 3
xi = np.arange(nobsi)[:,None]**np.arange(3) #vander
x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()
(balanced kron structure is just to make a quick example)
x_sp.shape
(20000, 6000)
first try
xpx = x_sp.T.dot(x_sp) # sparse
xpy = x_sp.T.dot(y) # dense
p_dense = np.linalg.solve(xpx.toarray(), xpy)
p_sp1 = sparse.linalg.spsolve(xpx, xpy)
p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]
timing: 15.6659998894 0.00500011444092 0.00600004196167
Is this too small to worry, and anything works, or is there a recommendation?
definitely don't use dense.
Thanks,
Josef
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Mon, Apr 28, 2014 at 2:45 AM, Severin Holzer-Graf <sholzergr@gmail.com> wrote:
I needed SuiteSparseQR for my work and added the backslash function of it to scikits.sparse. The modification is available here: https://github.com/duga3/scikits-sparse
2014-04-28 8:42 GMT+02:00 Nathaniel Smith <njs@pobox.com>:
I guess the classic solution for ols are sparse cholesky or sparse qr. scikits.sparse has a cholmod wrapper, and could be modified to have a suitesparseqr wrapper relatively easily.
On 28 Apr 2014 03:31, <josef.pktd@gmail.com> wrote:
(sunday night quest for another sparse recipe)
This is a sparse algorithm question:
scipy.sparse.linalg has a choice of linear system solvers, but I have not much idea about the relative merits. Is there a recommendation for the following case?
suppose I want to do estimate a standard linear model by OLS y = x * b, find b = argmin(y - x*b)
x is sparse, with blocks, medium large (20000 by 2000) but if it works quite a bit larger, but we don't have a small number of dominant singular values, the solution of the linear system will be dense.
the basic parts of my minimum example
nobsi, n_groups, k_vars = 10, 2000, 3
#nobsi, n_groups, k_vars = 2, 3, 3
xi = np.arange(nobsi)[:,None]**np.arange(3) #vander
x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()
(balanced kron structure is just to make a quick example)
x_sp.shape
(20000, 6000)
first try
xpx = x_sp.T.dot(x_sp) # sparse
xpy = x_sp.T.dot(y) # dense
p_dense = np.linalg.solve(xpx.toarray(), xpy)
p_sp1 = sparse.linalg.spsolve(xpx, xpy)
p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]
timing: 15.6659998894 0.00500011444092 0.00600004196167
Is this too small to worry, and anything works, or is there a recommendation?
definitely don't use dense.
Thanks,
Josef
Thanks for the answers, I would like to stick to scipy for now. I don't know if sparse usage in statsmodels will get heavy enough to require additional optional dependencies. At least not any time soon. scipy.sparse has several solvers for linear systems and one of them should work well enough for my purposes (traditional statistics/econometrics instead of "Big Data"), for now. Josef
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (3)
-
josef.pktd@gmail.com -
Nathaniel Smith -
Severin Holzer-Graf