[Numpy-discussion] when did column_stack become C-contiguous?

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Oct 19 00:51:33 EDT 2015


On Mon, Oct 19, 2015 at 12:35 AM, <josef.pktd at gmail.com> wrote:

> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>
> >>> np.__version__
> '1.9.2rc1'
>
>
> on my notebook which has numpy 1.6.1 it is f_contiguous
>
>
> I was just trying to optimize a loop over variable adjustment in
> regression, and found out that we lost fortran contiguity.
>
> I always thought column_stack is for fortran usage (linalg)
>
> What's the alternative?
> column_stack was one of my favorite commands, and I always assumed we have
> in statsmodels the right memory layout to call the linalg libraries.
>
> ("assumed" means we don't have timing nor unit tests for it.)
>

What's the difference between using array and column_stack except for a
transpose and memory order?

my current usecase is copying columns on top of each other

#exog0 = np.column_stack((np.ones(nobs), x0, x0s2))

exog0 = np.array((np.ones(nobs), x0, x0s2)).T

exog_opt = exog0.copy(order='F')


the following part is in a loop, followed by some linear algebra for OLS,
res_optim is a scalar parameter.
exog_opt[:, -1] = np.clip(exog0[:, k] + res_optim, 0, np.inf)

Are my assumption on memory access correct, or is there a better way?

(I have quite a bit code in statsmodels that is optimized for fortran
ordered memory layout especially for sequential regression, under the
assumption that column_stack provides that Fortran order.)

Also, do I need to start timing and memory benchmarking or is it obvious
that a loop

for k in range(maxi):
    x = arr[:, :k]
    <calculate>

depends on memory order?

Josef


>
> Josef
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151019/0bb6a891/attachment.html>


More information about the NumPy-Discussion mailing list