scipy.io.loadmat and matvec multiply
I've got a largeish array that I have saved in a .MAT file that I need to use for matvec multiply several times. It seems that if I copy the array before running the matvec I get a significant speedup. Is this known? I can attach a link to a particular .MAT file if helpful. -- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo
pe, 2017-08-25 kello 23:08 -0700, Jonathan Taylor kirjoitti:
I've got a largeish array that I have saved in a .MAT file that I need to use for matvec multiply several times.
It seems that if I copy the array before running the matvec I get a significant speedup. Is this known?
If you do the copy in a way such that the format of the matrix is different (e.g. different sparse matrix format), then the speed can differ. Check print(type(original_matrix), type(copied_matrix)). -- Pauli Virtanen
On Fri, Aug 25, 2017 at 11:39 PM, Pauli Virtanen <pav@iki.fi> wrote:
pe, 2017-08-25 kello 23:08 -0700, Jonathan Taylor kirjoitti:
I've got a largeish array that I have saved in a .MAT file that I need to use for matvec multiply several times.
It seems that if I copy the array before running the matvec I get a significant speedup. Is this known?
If you do the copy in a way such that the format of the matrix is different (e.g. different sparse matrix format), then the speed can differ. Check print(type(original_matrix), type(copied_matrix)).
If it's a dense matrix, then it's also possible that the original matrix gets Fortran layout, and the copy is C layout. To test that you want: print(original_matrix.strides, copied_matrix.strides) -n -- Nathaniel J. Smith -- https://vorpus.org
Yes, it's a dense 2500x2000 matrix. Loaded strides: (8, 16000) Copied strides: (20000, 8) So, matvec is just slower because of strides and where numpy retrieves data? Is there a simple way to do this besides a copy? I can easily afford the copy, just wondering. On Fri, Aug 25, 2017 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Aug 25, 2017 at 11:39 PM, Pauli Virtanen <pav@iki.fi> wrote:
pe, 2017-08-25 kello 23:08 -0700, Jonathan Taylor kirjoitti:
I've got a largeish array that I have saved in a .MAT file that I need to use for matvec multiply several times.
It seems that if I copy the array before running the matvec I get a significant speedup. Is this known?
If you do the copy in a way such that the format of the matrix is different (e.g. different sparse matrix format), then the speed can differ. Check print(type(original_matrix), type(copied_matrix)).
If it's a dense matrix, then it's also possible that the original matrix gets Fortran layout, and the copy is C layout. To test that you want: print(original_matrix.strides, copied_matrix.strides)
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo
On Sat, Aug 26, 2017 at 12:09 AM, Jonathan Taylor < jonathan.taylor@stanford.edu> wrote:
So, matvec is just slower because of strides and where numpy retrieves data? Is there a simple way to do this besides a copy? I can easily afford the copy, just wondering.
No, the only way to change the strides of an array with the same data is to make a copy. Array operations will always be fastest when the smallest strides are along the axis iterated over in the inner-most (summed) loop. So this existing strides of your matrix are not sub-optimal in general, just for this specific operation. They would be suitable, for example, in a vector-matrix multiply.
Thanks for the help. What I am actually doing is computing a gradient to a least squares objective. That is, X.T.dot(X.dot(beta) - Y) If X is such that X.dot(beta) is fast (i.e. matvec is fast) then am I missing a "simple" optimization here at the cost of a copy? Alternatively, if X is such that vecmat is fast, then what is the best way to do this? A copy seems easiest, and possibly applying the previous "simple" optimization. Based on my understanding of the other replies, I would guess that if X2=X.copy(), then the fastest way would be (X2.dot(beta) - Y).dot(X) This doesn't pan out in my example, the winner is X2.T.dot(X2.dot(beta) - Y) which is about the same as (X2.dot(beta)-Y).dot(X2) I made a small gist: https://gist.github.com/da7b2ef6ef109511af06a9cebbfc8ed1 One difference I see between a numpy array with the same strides and the array loaded from a MAT file is the ALIGNED flag. On Sat, Aug 26, 2017 at 10:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
On Sat, Aug 26, 2017 at 12:09 AM, Jonathan Taylor < jonathan.taylor@stanford.edu> wrote:
So, matvec is just slower because of strides and where numpy retrieves data? Is there a simple way to do this besides a copy? I can easily afford the copy, just wondering.
No, the only way to change the strides of an array with the same data is to make a copy.
Array operations will always be fastest when the smallest strides are along the axis iterated over in the inner-most (summed) loop. So this existing strides of your matrix are not sub-optimal in general, just for this specific operation. They would be suitable, for example, in a vector-matrix multiply.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo
On Sat, Aug 26, 2017 at 12:09 AM, Jonathan Taylor < jonathan.taylor@stanford.edu> wrote:
Yes, it's a dense 2500x2000 matrix.
Loaded strides: (8, 16000)
Copied strides: (20000, 8)
So, matvec is just slower because of strides and where numpy retrieves
data? Is there a simple way to do this besides a copy? I can easily afford the copy, just wondering. It's not simpler, but the most efficient and idiomatic way to ensure C-contiguity is to use np.ascontiguousarray(). This will make a copy only if necessary. -- Robert Kern
Thanks for all the help. That said, I'm not sure it is an issue of the strides. I can easily recreate the slowdown as in the above gist ( https://gist.github.com/ da7b2ef6ef109511af06a9cebbfc8ed1 ). Also, modifying the flags of a user-created ndarray so they agree with the loaded one is still noticably faster than using the array from `scipy.io.loadmat` For my purposes, a copy is just fine, but I think this might be an issue that could be looked into. Perhaps I should file an issue on github? On Sat, Aug 26, 2017 at 11:06 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Sat, Aug 26, 2017 at 12:09 AM, Jonathan Taylor < jonathan.taylor@stanford.edu> wrote:
Yes, it's a dense 2500x2000 matrix.
Loaded strides: (8, 16000)
Copied strides: (20000, 8)
So, matvec is just slower because of strides and where numpy retrieves
data? Is there a simple way to do this besides a copy? I can easily afford the copy, just wondering.
It's not simpler, but the most efficient and idiomatic way to ensure C-contiguity is to use np.ascontiguousarray(). This will make a copy only if necessary.
-- Robert Kern
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo
On Mon, Aug 28, 2017 at 10:13 AM, Jonathan Taylor < jonathan.taylor@stanford.edu> wrote:
Thanks for all the help.
That said, I'm not sure it is an issue of the strides. I can easily
For my purposes, a copy is just fine, but I think this might be an issue
recreate the slowdown as in the above gist ( https://gist.github.com/da7b2ef6ef109511af06a9cebbfc8ed1 ). Also, modifying the flags of a user-created ndarray so they agree with the loaded one is still noticably faster than using the array from `scipy.io.loadmat` Ah yeah, if the data is aligned, then that might end up faster. Your optimized BLAS will be able to use certain CPU instructions that require aligned data. Setting the ALIGNED flag to false won't actually make the data unaligned; your BLAS checks the data itself, not the numpy flag. that could be looked into. Perhaps I should file an issue on github? It might be worth checking if scipy.io.loadmat() can be made to ensure that it always creates aligned arrays. There isn't anything to be done about .dot(), though. -- Robert Kern
I tried reproducing the results from your gist using scipy 0.19.0, and I found that Xmat is aligned after loadmat: In [3]: Xmat = sio.loadmat('data.mat')['X'] In [4]: Xmat.flags Out[4]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False On Mon, Aug 28, 2017 at 1:26 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Aug 28, 2017 at 10:13 AM, Jonathan Taylor < jonathan.taylor@stanford.edu> wrote:
Thanks for all the help.
That said, I'm not sure it is an issue of the strides. I can easily
recreate the slowdown as in the above gist ( https://gist.github.com/ da7b2ef6ef109511af06a9cebbfc8ed1 ). Also, modifying the flags of a user-created ndarray so they agree with the loaded one is still noticably faster than using the array from `scipy.io.loadmat`
Ah yeah, if the data is aligned, then that might end up faster. Your optimized BLAS will be able to use certain CPU instructions that require aligned data. Setting the ALIGNED flag to false won't actually make the data unaligned; your BLAS checks the data itself, not the numpy flag.
For my purposes, a copy is just fine, but I think this might be an issue that could be looked into. Perhaps I should file an issue on github?
It might be worth checking if scipy.io.loadmat() can be made to ensure that it always creates aligned arrays. There isn't anything to be done about .dot(), though.
-- Robert Kern
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Hmm... I am running scipy 0.19.1 (added that line to gist to be sure). When it loaded as aligned was the timing comparable to `Xnp`? Just to be sure, I created a new conda ipython environment and installed scipy -- version is 0.19.1. Flags are still the same. The docs for scipy.io.loadmat don't seem to have an aligned option -- will ping Matthew Brett.... On Mon, Aug 28, 2017 at 10:35 AM, CJ Carey <perimosocordiae@gmail.com> wrote:
I tried reproducing the results from your gist using scipy 0.19.0, and I found that Xmat is aligned after loadmat:
In [3]: Xmat = sio.loadmat('data.mat')['X']
In [4]: Xmat.flags Out[4]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
On Mon, Aug 28, 2017 at 1:26 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Aug 28, 2017 at 10:13 AM, Jonathan Taylor < jonathan.taylor@stanford.edu> wrote:
Thanks for all the help.
That said, I'm not sure it is an issue of the strides. I can easily
recreate the slowdown as in the above gist ( https://gist.github.com/da7b2ef6ef109511af06a9cebbfc8ed1 ). Also, modifying the flags of a user-created ndarray so they agree with the loaded one is still noticably faster than using the array from `scipy.io.loadmat`
Ah yeah, if the data is aligned, then that might end up faster. Your optimized BLAS will be able to use certain CPU instructions that require aligned data. Setting the ALIGNED flag to false won't actually make the data unaligned; your BLAS checks the data itself, not the numpy flag.
For my purposes, a copy is just fine, but I think this might be an issue that could be looked into. Perhaps I should file an issue on github?
It might be worth checking if scipy.io.loadmat() can be made to ensure that it always creates aligned arrays. There isn't anything to be done about .dot(), though.
-- Robert Kern
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo
participants (7)
-
CJ Carey -
Jonathan Taylor -
Jonathan Taylor -
Nathaniel Smith -
Pauli Virtanen -
Robert Kern -
Stephan Hoyer