when did column_stack become C-contiguous?

np.__version__ '1.9.2rc1'
on my notebook which has numpy 1.6.1 it is f_contiguous I was just trying to optimize a loop over variable adjustment in regression, and found out that we lost fortran contiguity. I always thought column_stack is for fortran usage (linalg) What's the alternative? column_stack was one of my favorite commands, and I always assumed we have in statsmodels the right memory layout to call the linalg libraries. ("assumed" means we don't have timing nor unit tests for it.) Josef

On Mon, Oct 19, 2015 at 12:35 AM, <josef.pktd@gmail.com> wrote:
What's the difference between using array and column_stack except for a transpose and memory order? my current usecase is copying columns on top of each other #exog0 = np.column_stack((np.ones(nobs), x0, x0s2)) exog0 = np.array((np.ones(nobs), x0, x0s2)).T exog_opt = exog0.copy(order='F') the following part is in a loop, followed by some linear algebra for OLS, res_optim is a scalar parameter. exog_opt[:, -1] = np.clip(exog0[:, k] + res_optim, 0, np.inf) Are my assumption on memory access correct, or is there a better way? (I have quite a bit code in statsmodels that is optimized for fortran ordered memory layout especially for sequential regression, under the assumption that column_stack provides that Fortran order.) Also, do I need to start timing and memory benchmarking or is it obvious that a loop for k in range(maxi): x = arr[:, :k] <calculate> depends on memory order? Josef
Josef

Looking at the git logs, column_stack appears to have been that way (creating a new array with concatenate) since at least NumPy 0.9.2, way back in January 2006: https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271 Stephan

On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1 I have my notebook and my desktop with different numpy and python versions next to each other and I don't see a typo in my command. I assume python 2.7 versus python 3.4 doesn't make a difference. ------------------
----------------
--------------------------- comparing all flags, owndata also has changed, but I don't think that has any effect Josef

On Mo, 2015-10-19 at 01:34 -0400, josef.pktd@gmail.com wrote:
<snip>
It looks like in 1.9 it depends on the order of the 2-d arrays, which it didn't do in 1.6
Yes, it uses concatenate, and concatenate probably changed in 1.7 to use "K" (since "K" did not really exists before 1.7 IIRC). Not sure what we can do about it, the order is not something that is easily fixed unless explicitly given. It might be optimized (as in this case I would guess). Whether or not doing the fastest route for these kind of functions is faster for the user is of course impossible to know, we can only hope that in most cases it is better. If someone has an idea how to decide I am all ears, but I think all we can do is put in asserts/tests in the downstream code if it relies heavily on the order (or just copy, if the order is wrong) :(, another example is change of the output order in advanced indexing in some cases, it makes it faster sometimes, and probably slower in others, what is right seems very much non-trivial. - Sebastian

On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
To understand the reason: Is this to have more efficient memory access during copying? AFAIU, column_stack needs to create a new array which has to be either F or C contiguous, so we always have to pick one of the two. With a large number of 1d arrays it seemed more "intuitive" to me to copy them by columns. Josef

On Mon, Oct 19, 2015 at 9:00 AM, <josef.pktd@gmail.com> wrote:
just as background I was mainly surprised last night about having my long held beliefs shattered. I skipped numpy 1.7 and 1.8 in my development environment and still need to catch up now that I use 1.9 as my main numpy version. I might have to update a bit my "folk wisdom", which is not codified anywhere and doesn't have unit tests. For example, the improvement iteration for Fortran contiguous or not C or F contiguous arrays sounded very useful, but I never checked if it would affect us. Josef

On Sun, Oct 18, 2015 at 9:35 PM, <josef.pktd@gmail.com> wrote:
In general practice no numpy functions make any guarantee about memory layout, unless that's explicitly a documented part of their contract (e.g. 'ascontiguous', or some functions that take an order= argument -- I say "some" b/c there are functions like 'reshape' that take an argument called order= that doesn't actually refer to memory layout). This isn't so much an official policy as just a fact of life -- if no-one has any idea that the someone is depending on some memory layout detail then there's no way to realize that we've broken something. (But it is a good policy IMO.) If this kind of problem gets caught during a pre-release cycle then we generally do try to fix it, because we try not to break code, but if it's been broken for 2 full releases then there's no much we can do -- we can't go back in time to fix it so it sounds like you're stuck working around the problem no matter what (unless you want to refuse to support 1.9.0 through 1.10.1, which I assume you don't... worst case, you just have to do a global search replace of np.column_stack with statsmodels.utils.column_stack_f, right?). And the regression issue seems like the only real argument for changing it back -- we'd never guarantee f-contiguity here if starting from a blank slate, I think? -n -- Nathaniel J. Smith -- http://vorpus.org

On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith <njs@pobox.com> wrote:
I understand that in general. However, I always thought column_stack is a array creation function which have guaranteed memory layout. And since it's stacking by columns I thought that order is always Fortran. And the fact that it doesn't have an order keyword yet, I thought is just a missing extension.
When the cat is out of the bag, the down stream developer writes compatibility code or helper functions. I will do that at at least the parts I know are intentionally designed for F memory order. --- statsmodels doesn't really check or consistently optimize the memory order, except in some cython functions. But, I thought we should be doing quite well with getting Fortran ordered arrays. I only paid attention where we have more extensive loops internally. Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating design matrices? Josef

On Mon, Oct 19, 2015 at 5:55 AM, <josef.pktd@gmail.com> wrote:
I guess I don't know what to say except that I'm sorry to hear that and sorry that no-one noticed until several releases later.
I never thought about it :-). So: no, it looks like right now patsy usually returns C-order matrices (or really, whatever np.empty or np.repeat returns), and there aren't any particular guarantees that this will continue to be the case in the future. Is returning matrices in F-contiguous layout really important? Should there be a return_type="fortran_matrix" option or something like that? -n -- Nathaniel J. Smith -- http://vorpus.org

On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
Were there more contiguity changes in 0.10? I just saw a large number of test errors and failures in statespace models which are heavily based on cython code where it's not just a question of performance. I don't know yet what's going on, but I just saw that we have some explicit tests for fortran contiguity which just started to fail.
I don't know, yet. My intuition was that it would be better because we feed the arrays directly to pinv/SVD or QR which, I think, require by default Fortran contiguous. However, my intuition might not be correct, and it might not make much difference in a single OLS estimation. There are a few critical loops in variable selection that I'm planning to investigate to see how much it matters. Memory optimization was never high in our priority compared to expanding the functionality overall, but reading the Julia mailing list is starting to worry me a bit. :) (I'm even starting to see the reason for multiple dispatch.) Josef

On Mon, Oct 19, 2015 at 9:51 PM, <josef.pktd@gmail.com> wrote:
I did some quick timing checks of pinv and qr, and the Fortran ordered is only about 5% to 15% faster and uses about the same amount of memory (watching the Task manager). So, nothing to get excited about. I used functions like this where xf is either C or Fortran contiguous def funcl_f(): for k in range(3, xf.shape[1]): np.linalg.pinv(xf[:,:k]) with MKL which looks, to my surprise, multithreaded. (It used 50% CPU on my Quadcore, single processor is 13% CPU) BTW: default scipy.linalg.qr is pretty bad, 9GB peak memory instead of 200MB in mode='economic' Josef

On Mon, Oct 19, 2015 at 12:35 AM, <josef.pktd@gmail.com> wrote:
What's the difference between using array and column_stack except for a transpose and memory order? my current usecase is copying columns on top of each other #exog0 = np.column_stack((np.ones(nobs), x0, x0s2)) exog0 = np.array((np.ones(nobs), x0, x0s2)).T exog_opt = exog0.copy(order='F') the following part is in a loop, followed by some linear algebra for OLS, res_optim is a scalar parameter. exog_opt[:, -1] = np.clip(exog0[:, k] + res_optim, 0, np.inf) Are my assumption on memory access correct, or is there a better way? (I have quite a bit code in statsmodels that is optimized for fortran ordered memory layout especially for sequential regression, under the assumption that column_stack provides that Fortran order.) Also, do I need to start timing and memory benchmarking or is it obvious that a loop for k in range(maxi): x = arr[:, :k] <calculate> depends on memory order? Josef
Josef

Looking at the git logs, column_stack appears to have been that way (creating a new array with concatenate) since at least NumPy 0.9.2, way back in January 2006: https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271 Stephan

On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1 I have my notebook and my desktop with different numpy and python versions next to each other and I don't see a typo in my command. I assume python 2.7 versus python 3.4 doesn't make a difference. ------------------
----------------
--------------------------- comparing all flags, owndata also has changed, but I don't think that has any effect Josef

On Mo, 2015-10-19 at 01:34 -0400, josef.pktd@gmail.com wrote:
<snip>
It looks like in 1.9 it depends on the order of the 2-d arrays, which it didn't do in 1.6
Yes, it uses concatenate, and concatenate probably changed in 1.7 to use "K" (since "K" did not really exists before 1.7 IIRC). Not sure what we can do about it, the order is not something that is easily fixed unless explicitly given. It might be optimized (as in this case I would guess). Whether or not doing the fastest route for these kind of functions is faster for the user is of course impossible to know, we can only hope that in most cases it is better. If someone has an idea how to decide I am all ears, but I think all we can do is put in asserts/tests in the downstream code if it relies heavily on the order (or just copy, if the order is wrong) :(, another example is change of the output order in advanced indexing in some cases, it makes it faster sometimes, and probably slower in others, what is right seems very much non-trivial. - Sebastian

On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
To understand the reason: Is this to have more efficient memory access during copying? AFAIU, column_stack needs to create a new array which has to be either F or C contiguous, so we always have to pick one of the two. With a large number of 1d arrays it seemed more "intuitive" to me to copy them by columns. Josef

On Mon, Oct 19, 2015 at 9:00 AM, <josef.pktd@gmail.com> wrote:
just as background I was mainly surprised last night about having my long held beliefs shattered. I skipped numpy 1.7 and 1.8 in my development environment and still need to catch up now that I use 1.9 as my main numpy version. I might have to update a bit my "folk wisdom", which is not codified anywhere and doesn't have unit tests. For example, the improvement iteration for Fortran contiguous or not C or F contiguous arrays sounded very useful, but I never checked if it would affect us. Josef

On Sun, Oct 18, 2015 at 9:35 PM, <josef.pktd@gmail.com> wrote:
In general practice no numpy functions make any guarantee about memory layout, unless that's explicitly a documented part of their contract (e.g. 'ascontiguous', or some functions that take an order= argument -- I say "some" b/c there are functions like 'reshape' that take an argument called order= that doesn't actually refer to memory layout). This isn't so much an official policy as just a fact of life -- if no-one has any idea that the someone is depending on some memory layout detail then there's no way to realize that we've broken something. (But it is a good policy IMO.) If this kind of problem gets caught during a pre-release cycle then we generally do try to fix it, because we try not to break code, but if it's been broken for 2 full releases then there's no much we can do -- we can't go back in time to fix it so it sounds like you're stuck working around the problem no matter what (unless you want to refuse to support 1.9.0 through 1.10.1, which I assume you don't... worst case, you just have to do a global search replace of np.column_stack with statsmodels.utils.column_stack_f, right?). And the regression issue seems like the only real argument for changing it back -- we'd never guarantee f-contiguity here if starting from a blank slate, I think? -n -- Nathaniel J. Smith -- http://vorpus.org

On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith <njs@pobox.com> wrote:
I understand that in general. However, I always thought column_stack is a array creation function which have guaranteed memory layout. And since it's stacking by columns I thought that order is always Fortran. And the fact that it doesn't have an order keyword yet, I thought is just a missing extension.
When the cat is out of the bag, the down stream developer writes compatibility code or helper functions. I will do that at at least the parts I know are intentionally designed for F memory order. --- statsmodels doesn't really check or consistently optimize the memory order, except in some cython functions. But, I thought we should be doing quite well with getting Fortran ordered arrays. I only paid attention where we have more extensive loops internally. Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating design matrices? Josef

On Mon, Oct 19, 2015 at 5:55 AM, <josef.pktd@gmail.com> wrote:
I guess I don't know what to say except that I'm sorry to hear that and sorry that no-one noticed until several releases later.
I never thought about it :-). So: no, it looks like right now patsy usually returns C-order matrices (or really, whatever np.empty or np.repeat returns), and there aren't any particular guarantees that this will continue to be the case in the future. Is returning matrices in F-contiguous layout really important? Should there be a return_type="fortran_matrix" option or something like that? -n -- Nathaniel J. Smith -- http://vorpus.org

On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
Were there more contiguity changes in 0.10? I just saw a large number of test errors and failures in statespace models which are heavily based on cython code where it's not just a question of performance. I don't know yet what's going on, but I just saw that we have some explicit tests for fortran contiguity which just started to fail.
I don't know, yet. My intuition was that it would be better because we feed the arrays directly to pinv/SVD or QR which, I think, require by default Fortran contiguous. However, my intuition might not be correct, and it might not make much difference in a single OLS estimation. There are a few critical loops in variable selection that I'm planning to investigate to see how much it matters. Memory optimization was never high in our priority compared to expanding the functionality overall, but reading the Julia mailing list is starting to worry me a bit. :) (I'm even starting to see the reason for multiple dispatch.) Josef

On Mon, Oct 19, 2015 at 9:51 PM, <josef.pktd@gmail.com> wrote:
I did some quick timing checks of pinv and qr, and the Fortran ordered is only about 5% to 15% faster and uses about the same amount of memory (watching the Task manager). So, nothing to get excited about. I used functions like this where xf is either C or Fortran contiguous def funcl_f(): for k in range(3, xf.shape[1]): np.linalg.pinv(xf[:,:k]) with MKL which looks, to my surprise, multithreaded. (It used 50% CPU on my Quadcore, single processor is 13% CPU) BTW: default scipy.linalg.qr is pretty bad, 9GB peak memory instead of 200MB in mode='economic' Josef
participants (4)
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Sebastian Berg
-
Stephan Hoyer