[Numpy-discussion] How to combine a pair of 1D arrays?

Wed Apr 14 14:01:16 EDT 2010

On 14 April 2010 11:34, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Apr 14, 2010 at 10:25, Peter Shinners <pete at shinners.org> wrote:
>> Is there a way to combine two 1D arrays with the same size into a 2D
>> array? It seems like the internal pointers and strides could be
>> combined. My primary goal is to not make any copies of the data.
>
> There is absolutely no way to get around that, I am afraid.

Well, I'm not quite sure I agree with this.

The best way, of course, is to allocate the original two arrays as
subarrays of one larger array, that is, start with the fused array and
select your two 1D arrays as subarrays. Of course, this depends on how
you're generating the 1D arrays; if they're simply being returned to
you from a black-box function, you're stuck with them. But if it's a
ufunc-like operation, it may have an output argument that lets you
write to a supplied array rather than allocating a new one. If they're
coming from a file, you may be able to map the whole file (or a large
chunk) as an array and select them as subarrays (even if alignment and
type are issues).

You should also keep in mind that allocating arrays and copying data
really isn't very expensive - malloc() is extremely fast, especially
for large arrays which are just grabbed as blocks from the OS - and
copying arrays is also cheap, and can be used to reorder data into a
more cache-friendly order. If the problem is that your arrays almost
fill available memory, you will already have noticed that using numpy
is kind of a pain, because many operations involve copies. But if you
really have to do this, it may be possible.

numpy arrays are specified by giving a data area, and strides into
that data area. The steps along each axis must be uniform, but if you
really have two arrays with the same stride, you may be able to use a
gruesome hack to make it work. Each of your two arrays has a data
pointer, which essentially points to the first element. So if you make
up your two-row array using the same data pointer as your first array,
and give it a stride along the second dimension equal to the
difference between pointers, this should work. Of course, you have to
make sure python doesn't deallocate the second array out from under
you, and you may have to defeat some error checking, but in principle
it should be possible.

Anne