[Numpy-discussion] Matching 0-d arrays and NumPy scalars
Damian Eads
eads at soe.ucsc.edu
Thu Feb 21 10:39:15 EST 2008
While we are on the subject of indexing... I use xranges all over the
place because I tend to loop over big data sets. Thus I try avoid to
avoid allocating large chunks of memory unnecessarily with range. While
I try to be careful not to let xranges propagate to the ndarray's []
operator, there have been a few times when I've made a mistake. Is there
any reason why adding support for xrange indexing would be a bad thing
to do? All one needs to do is convert the xrange to a slice object in
__getitem__. I've written some simple code to do this conversion in
Python (note that in C, one can access the start, end, and step of an
xrange object very easily.)
def xrange_to_slice(ind):
"""
Converts an xrange object to a slice object.
"""
retval = slice(None, None, None)
if type(ind) == XRangeType:
# Grab a string representation of the xrange object, which takes
# any of the forms: xrange(a), xrange(a,b), xrange(a,b,s).
# Break it apart into a, b, and s.
sind = str(ind)
xr_params = [int(s) for s in
sind[(sind.find('(')+1):sind.find(')')].split(",")]
retval = apply(slice, xr_params)
else:
raise TypeError("Index must be an xrange object!")
#endif
return retval
----
On another note, I think it would be great if we added support for a
find function, which takes a boolean array A, and returns the indices
corresponding to True, but over A's flat view. In many cases, indexing
with a boolean array is all one needs, making find unnecessary. However,
I've encountered cases where computing the boolean array was
computationally burdensome, the boolean arrays were large, and the
result was needed many times throughout the broader computation. For
many of my problems, storing away the flat index array uses a lot less
memory than storing the boolean index arrays.
I frequently define a function like
def find(A):
return numpy.where(A.flat)[0]
Certainly, we'd need a find with more error checking, and one that
handles the case when a list of booleans is passed (or a list of lists).
Conceivably, one might try to index a non-flat array with the result of
find. To deal with this, find could return a place holder object that
the index operator checks for. Just an idea.
--
I also think it'd be really useful to have a function that's like arange
in that it supports floats/doubles, and also like xrange in that
elements are only generated on demand.
It could be implemented as a generator as shown below.
def axrange(start, stop=None, step=1.0):
if stop == None:
stop = start
start = 0.0
#endif
(start, stop, step) = (numpy.float64(start), numpy.float64(stop),
numpy.float64(step))
for i in xrange(0,numpy.ceil((stop-start)/step)):
yield numpy.float64(start + step * i)
#endfor
Or, as a class,
class axrangeiter:
def __init__(self, rng):
"An iterator over an axrange object."
self.rng = rng
self.i = 0
def next(self):
"Returns the next float in the sequence."
if self.i >= len(self.rng):
raise StopIteration()
self.i += 1
return self.rng[self.i-1]
class axrange:
def __init__(self, *args):
"""
axrange(stop)
axrange(start, stop, [step])
An axrange object is an iterable numerical sequence between
start and stop. Similar to arange, there are
n=ceil((stop-start)/step)
elements in the sequence. Elements are generated on demand,
which can
be more memory efficient.
"""
if len(args) == 1:
self.start = numpy.float64(0.0)
self.stop = numpy.float64(args[0])
self.step = numpy.float64(1.0)
elif len(args) == 2:
self.start = numpy.float64(args[0])
self.stop = numpy.float64(args[1])
self.step = numpy.float64(1.0)
elif len(args) == 3:
self.start = numpy.float64(args[0])
self.stop = numpy.float64(args[1])
self.step = numpy.float64(args[2])
else:
raise TypeError("axrange requires 3 arguments.")
#endif
self.len = max(int(numpy.ceil((self.stop-self.start)/self.step)),0)
def __len__(self):
return self.len
def __getitem__(self, i):
return numpy.float64(self.start + self.step * i)
def __iter__(self):
return axrangeiter(self)
def __repr__(self):
if self.start == 0.0 and self.step == 1.0:
return "axrange(%s)" % str(self.stop)
elif self.step == 1.0:
return "axrange(%s,%s)" % (str(self.start), str(self.stop))
else:
return "axrange(%s,%s,%s)" % (str(self.start),
str(self.stop), str(self.step))
#endif
Travis E. Oliphant wrote:
> Hi everybody,
>
> In writing some generic code, I've encountered situations where it would
> reduce code complexity to allow NumPy scalars to be "indexed" in the
> same number of limited ways, that 0-d arrays support.
>
> For example, 0-d arrays can be indexed with
>
> * Boolean masks
> * Ellipses x[...] and x[..., newaxis]
> * Empty tuple x[()]
>
> I think that numpy scalars should also be indexable in these particular
> cases as well (read-only of course, i.e. no setting of the value would
> be possible).
>
> This is an easy change to implement, and I don't think it would cause
> any backward compatibility issues.
>
> Any opinions from the list?
>
>
> Best regards,
>
> -Travis O.
>
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list