[Numpy-discussion] array, asarray as contiguous and friends

Tim Hochberg tim.hochberg at cox.net
Fri Mar 24 08:39:03 EST 2006


Colin J. Williams wrote:

> Tim Hochberg wrote:
>
>> Sasha wrote:
>>
>>> On 3/23/06, Travis Oliphant <oliphant at ee.byu.edu> wrote:
>>>  
>>>
>>>> At any rate, if the fortran flag is there, we need to specify the
>>>> contiguous case as well.   So, either propose a better interface (we
>>>> could change it still --- the fortran flag doesn't have that much
>>>> history) to handle the situation or accept what I do ;-)
>>>>   
>>>
>>>
>>>
> Contiguity is separable from fortran:
> [Dbg]>>> b= _n.array([[1, 2, 3], [4, 5, 6]])
> [Dbg]>>> b.flags.contiguous
> True
> [Dbg]>>> c= b.transpose()
> [Dbg]>>> c
> array([[1, 4],
>       [2, 5],
>       [3, 6]])
> [Dbg]>>> c.flags.contiguous
> False
> [Dbg]>>>


This is true, but irrelevant. To the best of my knowledge, the only 
reason to force an array to be in a specific order is to pass it to a C 
function that expects either FORTRAN- or C-ordered arrays. And, in that 
case, the array also needs to be contiguous. So, for the purpose of 
creating arrays (and for the purposes of ascontiguous), the only cases 
that matter are arrays that are both contiguous and the specified order. 
Thus, specifying  continuity and order separately to the constructor 
needlessly complicates the interface. Or since I'm feeling jargon happy 
today, YAGNI.

>
>>> Let me try. I propose to eliminate the fortran flag in favor of a more
>>> general "strides" argument.  This argument can be either a sequence of
>>> integers that becomes the strides, or a callable object that takes
>>> shape and dtype arguments and return a sequence that becomes the
>>> strides.  For fortran and c order functions that generate appropriate
>>> stride sequences should be predefined to enable array(...,
>>> strides=fortran, ...) and array(..., strides=contiguous).
>>>
>>
>> I like the idea of being able to create an array with custom strides. 
>> The applications aren't entirely clear yet, but it does seem like it 
>> could have some interesting and useful consequences. That said, I 
>> don't think this belongs in 'array'. Historically, array has been 
>> used for all sorts of array creation activities, which is why it 
>> always seems to have a wide, somewhat incoherent interface. However, 
>> most uses of array() boil down to one thing: creating a *new* array 
>> from a python object. My preference would be to focus on that 
>> functionality for array() and spin of it's other historical uses and 
>> new uses, like this custom strided array stuff, into separate factory 
>> functions. For example (and just for example, I make no great claims 
>> for either this name or interface):
>>    a = array_from_data(a_buffer_object, dtype, dims, strides)      [***]
>>
>> One thing that you do make clear is that contiguous and fortran 
>> should really two values of the same flag. 
>
>
> Please see the transpose example above.
>
>> If you combine this with one other simplification: array() always 
>> copies, we end up with a nice thin interface:
>>    # Create a new array in 'order' order. Defaults to "C" order.
>>    array(object, dtype=None, order="C"|"FORTRAN")
>
>
> I feel that [***] above is much cleaner than this.  I suggest that 
> string constants be deprecated.

I'm no huge fan of string constants myself, but I think you need to 
think this through more. First off, the interface I tossed off above 
doesn't cover the same ground as array, since it works off an already 
created buffer object. That means you'd have to go through all sorts of 
contortions and do at least one copy to get data into Fortran order. You 
could allow arbitrary, 1D, python sequences instead, but that doesn't 
help the common case of converting a 2D python object into a 2D array. 
You could allow N-D python objects, but then you have two ways of 
specifying the dims of the object and things become a big krufty mess. 
Compared to that string constants are great.


>> and
>>    # Returns an array. If object is an array and order is satisfied, 
>> return object otherwise a new array.
>>   # If order is set the returned array will be contiguous and have 
>> that ordering
>>    asarray(object, dtype=None, order=None|"C"|"FORTRAN")
>>    # Just the same, but allow subtypes.
>>    asanyarray(object, dtype=None, order=None|"C"|"FORTRAN")
>>
>> You could build asarray, asanyarray, etc on top of the proposed array 
>> without problems by using type(object)==ndarray and isinstance(type, 
>> ndarray) respectively. Stuff like convenience functions for minnd 
>> would also be easy to build on top of there. This looks great to me 
>> (pre-coffee).
>>
>> Embrace simplicity: you have nothing to lose but your clutter;)
>>
>> Regards,
>>
>> -tim
>>
> If [***] above were adopted, it would still be helpful to adopt 
> numarray's iscontiguous method, or better, use a property.


-0.  In my experience, 99% of my use cases would be covered for 
ascontiguous and for the remaining 1% I'm happy to use a.flags.contiguous.


Regards,

-tim





More information about the NumPy-Discussion mailing list