[Numpy-discussion] Unexpected reorganization of internal data

Chris Barker chris.barker at noaa.gov
Tue Jan 31 12:23:58 EST 2012


On Tue, Jan 31, 2012 at 6:14 AM, Malcolm Reynolds
<malcolm.reynolds at gmail.com> wrote:
> Not exactly an answer to your question, but I can highly recommend
> using Boost.python, PyUblas and Ublas for your C++ vectors and
> matrices. It gives you a really good interface on the C++ side to
> numpy arrays and matrices, which can be passed in both directions over
> the language threshold with no copying.

or use Cython...

> If I had to guess I'd say sometimes when transposing numpy simply sets
> a flag internally to avoid copying the data, but in some cases (such
> as perhaps when multiplication needs to take place) the data has to be
> placed in a new object.

good guess:

> V = numpy.dot(R, U.transpose()).transpose()

>>> a
array([[1, 2],
       [3, 4],
       [5, 6]])
>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> b = a.transpose()
>>> b.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

so the transpose() simple re-arranges the strides to Fortran order,
rather than changing anything in memory.

np.dot() produces a new array, so it is C-contiguous, then you
transpose it, so you get a fortran-ordered array.

> Now when I call my C++ function from the Python side, all the data in V is printed, but it has been transposed.

as mentioned, if you are working with arrays in C++ (or fortran, orC,
or...) and need to count on the ordering of the data, you need to
check it in your extension code. There are utilities for this.

> However, if I do:

> V = numpy.array(U.transpose()).transpose()

right:

In [7]: a.flags
Out[7]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [8]: a.transpose().flags
Out[8]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [9]: np.array( a.transpose() ).flags
Out[9]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


so the np.array call doesn't re-arrange the order if it doesn't need
to. If you want to force it, you can specify the order:

In [10]: np.array( a.transpose(), order='C' ).flags
Out[10]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


(note: this does surprise me a bit, as it is making a copy, but there
you go -- if order matters, specify it)

In general, numpy does a lot of things for the sake of efficiency --
avoiding copies when it can, for instance -- this give efficiency and
flexibility, but you do need to be careful, particularly when
interfacing with the binary data directly.

-Chris






-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list