[Numpy-discussion] rank-0 arrays

Thu Sep 12 21:17:05 EDT 2002

Before we implement what we said we would regarding rank-0
arrays in numarray, there became apparent a couple new issues
that didn't really get considered in the first round of 
discussion (at least I don't recall that they did).

To restate the issue: there was a question about whether
an index to an array that identified a single element only
(i.e., not a slice, nor an incomplete index, e.g. x[3] where
x is two dimensional) should return a Python scalar or a
rank-0 array. Currently Numeric is inconsistent on this point.
One usually gets scalars, but on some occasions, rank-0 arrays
are returned. Good arguments are to be had for either alternative.

The primary advantage of returning rank-0 arrays is that they
reduce the need for conditional code checking to see if a result
is a scalar or an array. At the end of the discussion it was
decided to have numarray return rank-0 arrays in all instances of
single item indexing. Since then, a couple potential snags have
arisen. I've already discussed some of these with Paul Dubois
and Eric Jones. I'd like a little wider input before making a
final (or at least experimental) decision.

If we return rank-0 arrays, what should repr return for rank-0
arrays. My initial impression is that the following is highly
undesirable for a interactive session, but maybe it is just me:

>>> x = arange(10)
>>> x[2]
array(2)

We, of course, could arrange __repr__ to return "2" instead,
in other words print the simple scalar for all cases of rank-0
arrays. This would yield the expected output in the above
example. Nevertheless, isn't it violating the intent of repr?
Are there other examples where Python uses repr in a similar,
misleading manner? But perhaps most feel that returning array(2)
is perfectly acceptable and won't annoy users. I am curious
about what people think about this.

The second issue is an efficiency one. Currently numarray uses
Python objects for arrays. If we return rank-0 arrays for
single item indexing, then some naive uses of larger arrays
as sequences may lead to an enormous number of array objects
to be created. True, there will be equivalent means of doing
the same operation that won't result in massive object creations
(such as specifically converting an array to a list, which would 
be done much faster). Is this a serious problem?

These two issues led us to question whether we should indeed
return rank-0 arrays. We can live with either solution. But
we do want to make the right choice. We also know that both
functionalities must exist, e.g., indexing for scalars and 
indexing for rank-0 arrays and we will provide both. The issue
is what indexing syntax returns. One argument is that it is
not a great burden on programmers to use a method (or other
means) to obtain a rank-0 array always if that is important
for the code they are writing and that we should make the
indexing syntax return what most users (especially less expert
ones) intuitively expect (scalars I presume). But others feel
it is just as important for the syntax that a progammer uses
to be as simple as the interactive user expects (instead of
something like x.getindexasarrayalways(2,4,1) [well, with a much
better, and shorter, name])

Do either of these issues change anyone's opinion? If people
still want rank-0 arrays, what should repr do?

Perry Greenfield