[Numpy-discussion] Matching 0-d arrays and NumPy scalars

Thu Feb 21 10:39:15 EST 2008

While we are on the subject of indexing... I use xranges all over the 
place because I tend to loop over big data sets. Thus I try avoid to 
avoid allocating large chunks of memory unnecessarily with range. While 
I try to be careful not to let xranges propagate to the ndarray's [] 
operator, there have been a few times when I've made a mistake. Is there 
any reason why adding support for xrange indexing would be a bad thing 
to do? All one needs to do is convert the xrange to a slice object in 
__getitem__. I've written some simple code to do this conversion in 
Python (note that in C, one can access the start, end, and step of an 
xrange object very easily.)

def xrange_to_slice(ind):
     """
     Converts an xrange object to a slice object.
     """
     retval = slice(None, None, None)

     if type(ind) == XRangeType:
         # Grab a string representation of the xrange object, which takes
         # any of the forms: xrange(a), xrange(a,b), xrange(a,b,s).
         # Break it apart into a, b, and s.
         sind = str(ind)
         xr_params = [int(s) for s in 
sind[(sind.find('(')+1):sind.find(')')].split(",")]
         retval = apply(slice, xr_params)
     else:
         raise TypeError("Index must be an xrange object!")
     #endif
     return retval

----

On another note, I think it would be great if we added support for a 
find function, which takes a boolean array A, and returns the indices 
corresponding to True, but over A's flat view. In many cases, indexing 
with a boolean array is all one needs, making find unnecessary. However, 
I've encountered cases where computing the boolean array was 
computationally burdensome, the boolean arrays were large, and the 
result was needed many times throughout the broader computation. For 
many of my problems, storing away the flat index array uses a lot less 
memory than storing the boolean index arrays.

I frequently define a function like

def find(A):
      return numpy.where(A.flat)[0]

Certainly, we'd need a find with more error checking, and one that 
handles the case when a list of booleans is passed (or a list of lists). 
Conceivably, one might try to index a non-flat array with the result of 
find. To deal with this, find could return a place holder object that 
the index operator checks for. Just an idea.

--

I also think it'd be really useful to have a function that's like arange 
in that it supports floats/doubles, and also like xrange in that 
elements are only generated on demand.

It could be implemented as a generator as shown below.

def axrange(start, stop=None, step=1.0):
     if stop == None:
         stop = start
         start = 0.0
     #endif
     (start, stop, step) = (numpy.float64(start), numpy.float64(stop), 
numpy.float64(step))

     for i in xrange(0,numpy.ceil((stop-start)/step)):
         yield numpy.float64(start + step * i)
     #endfor

Or, as a class,

class axrangeiter:

     def __init__(self, rng):
         "An iterator over an axrange object."
         self.rng = rng
         self.i = 0

     def next(self):
         "Returns the next float in the sequence."
         if self.i >= len(self.rng):
             raise StopIteration()
         self.i += 1
         return self.rng[self.i-1]

class axrange:

     def __init__(self, *args):
         """
         axrange(stop)
         axrange(start, stop, [step])

         An axrange object is an iterable numerical sequence between
         start and stop. Similar to arange, there are 
n=ceil((stop-start)/step)
         elements in the sequence. Elements are generated on demand, 
which can
         be more memory efficient.
         """
         if len(args) == 1:
             self.start = numpy.float64(0.0)
             self.stop = numpy.float64(args[0])
             self.step = numpy.float64(1.0)
         elif len(args) == 2:
             self.start = numpy.float64(args[0])
             self.stop = numpy.float64(args[1])
             self.step = numpy.float64(1.0)
         elif len(args) == 3:
             self.start = numpy.float64(args[0])
             self.stop = numpy.float64(args[1])
             self.step = numpy.float64(args[2])
         else:
             raise TypeError("axrange requires 3 arguments.")
         #endif
         self.len = max(int(numpy.ceil((self.stop-self.start)/self.step)),0)

     def __len__(self):
         return self.len

     def __getitem__(self, i):
         return numpy.float64(self.start + self.step * i)

     def __iter__(self):
         return axrangeiter(self)

     def __repr__(self):
         if self.start == 0.0 and self.step == 1.0:
             return "axrange(%s)" % str(self.stop)
         elif self.step == 1.0:
             return "axrange(%s,%s)" % (str(self.start), str(self.stop))
         else:
             return "axrange(%s,%s,%s)" % (str(self.start), 
str(self.stop), str(self.step))
         #endif

Travis E. Oliphant wrote:
> Hi everybody,
> 
> In writing some generic code, I've encountered situations where it would 
> reduce code complexity to allow NumPy scalars to be "indexed" in the 
> same number of limited ways, that 0-d arrays support.
> 
> For example, 0-d arrays can be indexed with
> 
>     * Boolean masks
>     * Ellipses x[...]  and x[..., newaxis]
>     * Empty tuple x[()]
> 
> I think that numpy scalars should also be indexable in these particular 
> cases as well (read-only of course,  i.e. no setting of the value would 
> be possible).
> 
> This is an easy change to implement, and I don't think it would cause 
> any backward compatibility issues.
> 
> Any opinions from the list?
> 
> 
> Best regards,
> 
> -Travis O.
> 
> 
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion