[Numpy-discussion] when did column_stack become C-contiguous?

Mon Oct 19 21:15:51 EDT 2015

On Mon, Oct 19, 2015 at 5:55 AM,  <josef.pktd at gmail.com> wrote:
>
>
> On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Sun, Oct 18, 2015 at 9:35 PM,  <josef.pktd at gmail.com> wrote:
>> >>>> np.column_stack((np.ones(10), np.ones(10))).flags
>> >   C_CONTIGUOUS : True
>> >   F_CONTIGUOUS : False
>> >
>> >>>> np.__version__
>> > '1.9.2rc1'
>> >
>> >
>> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >
>> >
>> > I was just trying to optimize a loop over variable adjustment in
>> > regression,
>> > and found out that we lost fortran contiguity.
>> >
>> > I always thought column_stack is for fortran usage (linalg)
>> >
>> > What's the alternative?
>> > column_stack was one of my favorite commands, and I always assumed we
>> > have
>> > in statsmodels the right memory layout to call the linalg libraries.
>> >
>> > ("assumed" means we don't have timing nor unit tests for it.)
>>
>> In general practice no numpy functions make any guarantee about memory
>> layout, unless that's explicitly a documented part of their contract
>> (e.g. 'ascontiguous', or some functions that take an order= argument
>> -- I say "some" b/c there are functions like 'reshape' that take an
>> argument called order= that doesn't actually refer to memory layout).
>> This isn't so much an official policy as just a fact of life -- if
>> no-one has any idea that the someone is depending on some memory
>> layout detail then there's no way to realize that we've broken
>> something. (But it is a good policy IMO.)
>
>
> I understand that in general.
>
> However, I always thought column_stack is a array creation function which
> have guaranteed memory layout. And since it's stacking by columns I thought
> that order is always Fortran.
> And the fact that it doesn't have an order keyword yet, I thought is just a
> missing extension.

I guess I don't know what to say except that I'm sorry to hear that
and sorry that no-one noticed until several releases later.

>> If this kind of problem gets caught during a pre-release cycle then we
>> generally do try to fix it, because we try not to break code, but if
>> it's been broken for 2 full releases then there's no much we can do --
>> we can't go back in time to fix it so it sounds like you're stuck
>> working around the problem no matter what (unless you want to refuse
>> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> case, you just have to do a global search replace of np.column_stack
>> with statsmodels.utils.column_stack_f, right?).
>>
>> And the regression issue seems like the only real argument for
>> changing it back -- we'd never guarantee f-contiguity here if starting
>> from a blank slate, I think?
>
>
> When the cat is out of the bag, the down stream developer writes
> compatibility code or helper functions.
>
> I will do that at at least the parts I know are intentionally designed for F
> memory order.
>
> ---
>
> statsmodels doesn't really check or consistently optimize the memory order,
> except in some cython functions.
> But, I thought we should be doing quite well with getting Fortran ordered
> arrays. I only paid attention where we have more extensive loops internally.
>
> Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
> design matrices?

I never thought about it :-). So: no, it looks like right now patsy
usually returns C-order matrices (or really, whatever np.empty or
np.repeat returns), and there aren't any particular guarantees that
this will continue to be the case in the future.

Is returning matrices in F-contiguous layout really important? Should
there be a return_type="fortran_matrix" option or something like that?

-n

-- 
Nathaniel J. Smith -- http://vorpus.org