numarray rank-0 decisions, rationale, and summary
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
[I posted this almost a week ago, but apparently an email problem prevented it from actually getting posted!] I think there has been sufficient discussion about rank-0 arrays to make some decisions about how numarray will handle them. [If you don't want to wade through the rationale, jump to the end where there is a short summary of what we plan to do and what we have questions about] ******************************************************************** First I'd like to take a stab at summarizing the case made for rank-0 arrays in general and adding some of my own comments regarding these points. 1) rank-0 arrays are a useful mechanism to avoid having binary operations with scalars cause unintended promotion of other arrays to larger numeric types (e.g. 2*arange(10, typecode=Int16) results in an Int32 result). *** For numarray this is a non-issue because the coercion rules prevent scalars from increasing the type of an array if the scalar is the same kind of number (e.g., Int, Float, Complex) as the array. 2) rank-0 arrays preserve the type information instead of converting scalars to Python scalars. *** This seems of limited value. With only a couple possible exceptions in the future (and none now), Python scalars are effectively the largest type available so no information is lost. One can convert to and from Python scalars and not lose any information. The possible future exceptions are long doubles and UInt32 (supported in Numeric, but not numarray yet--but frankly, I'm not yet sure how important UInt32 is at the moment). It is possible that Python scalars may move up in size so this may or may not become an issue in the future. By itself, it does not appear to be a compelling reason. 3) rank-0 arrays allow controlling exceptions (e.g. divide by zero) in a way different from how Python handles them (exception always) *** This is a valid concern...maybe. I was more impressed by it initially, but it occurred to me that most expressions that involve a scalar exception (NaN, divide-by-zero, overflow, etc.) generally corrupt everything, unlike exceptions with arrays where only a few values may be tainted. Unless one is only interested in ignoring some scalar results in a scalar expression as part of a larger computation, it seems of very limited use to ignore, or warn on scalar exceptions. In any event, this is really of no relevance to the use of rank-0 for indexed results or reduction operations. 4) Using rank-0 arrays in place of scalars would promote more generic programming. This was really the last point of real contention as far as I was concerned. In the end, it really came down to seeing good examples of how lacking this caused code to be much worse than it could be with rank-0 arrays. There really are two cases being discussed: whether indexing a single item ("complete dereferencing" in Scott Gilbert's description) returns rank-0, and whether reduction operations return rank-0. *** indexing returning rank-0. Amazingly enough no one was able to provide even one real code example of where rank-0 returns from indexing was a problem (this includes MA as far as I can tell). In the end, this has been much ado about nothing. Henceforth, numarray will return Python scalars when arrays are indexed (as it does currently). *** reduction operations. There are good examples of where reduction operations returning rank-0 are made simpler. However, the situation is muddied quite a bit by other issues which I will discuss below. This is an area that deserves a bit more discussion in general. But before I tackle that, there is a point about rank-0 arrays that needs to be made which I think is in some respects is an obvious point, but somehow got lost in much of the discussions. Even if it were true that rank-0 arrays made for much simpler, generic code, they are far less useful than might appear in simplifying code. Why? Because even if array operations (whether indexing, reduction or other functions) were entirely consistent about never returning scalars, it is a general fact that most Numeric/numarray code must be prepared to handle Python scalars thrown at it in place of arrays by the user. Since Python scalars can come leaking into your code at many points, consistency in Numeric/numarray in avoiding Python scalars really doesn't solve the issue. I would hazard a guess that the great majority of the conditional code that exist today would not be eliminated because of this (e.g., this appears to be the case for MA) Reduction operations: There is a good case to be made that reduction operations should result in rank-0 arrays rather than scalars (after all, they are reducing dimensions), but not everyone agrees that is what should be done. But before deciding what is to be done there, some problems with rank-0 arrays should be discussed. I think Konrad and Huaiyu have made very powerful arguments about how certain operations like indexing, attributes and such should or shouldn't work. In particular, some current Numeric behaviors should be changed. Indexing by 0 should not work (instead Scott Gilbert's suggestion of indexing with an empty tuple sounds right, if a bit syntatically clumsy due to Python not accepting an empty index). len should not return 1, etc. So even if we return rank-0 values in reduction operations, this appears to still cause problems with some of the examples given by Eric that depend on len(rank-0) = 1. What should be done about that? One possibility is to use different numarray functions designed to help write generic code (e.g., an alternate function to len). But there is one aspect to this that ought to be pointed out. Some have asked for rank-0 arrays that can be indexed with 0 and produce len of 1. There is such an object that does this and it is a rank-1 len-1 array. One alternative is to have reduction operations have as their endpoint a rank-1 len-1 array rather than a rank-0 array. The rank-0 endpoint is more justified conceptually, but apparently less practical. If a reduction on a 1-d arrray always produced a 1-d array, then one can always be guaranteed that it can be indexed, and that len works on it. The drawback is that it can never be used as a scalar directly as rank-0 arrays could be. I think this is a case where you can't have it both ways. If you want a scalar-like object, then some operations that work on higher rank arrays won't work on it (or shouldn't). If you want something where these operations do work, don't expect to use it where a scalar is expected unless you index it. Is there any interest in this alternate approach to reductions? We plan to have two reduction methods available, one that results in scalars, and one in arrays. The main question is which one .reduce maps to, and what the endpoint is for the method that always returns arrays is. *********************************************************************** SUMMARY *********************************************************************** 1) Indexing returns Python scalars in numarray. No rank-0 arrays are ever returned from indexing. 2) rank-0 arrays will be supported, but len(), and indexing will not work as they do in Numeric. In particular, to get a scalar, one will have to index with an empty tuple (e.g., x[()], x[0] will raise an exception), len() will return None. Questions: 1) given 2, is there still a desire for .reduce() to return rank-0 arrays (if not, we have .areduce() which is intented to return arrays always). 2) whichever is the "returns arrays always" reduce method, should the endpoint be rank-0 arrays or rank-1 len-1 arrays?
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Questions:
1) given 2, is there still a desire for .reduce() to return rank-0 arrays (if not, we have .areduce() which is intented to return arrays always).
2) whichever is the "returns arrays always" reduce method, should the endpoint be rank-0 arrays or rank-1 len-1 arrays?
I don't really see an application where a reduction operation yielding rank-1 or higher arrays would be useful. It would be a special case, not useful for generic programming. So my answer to 2) is rank-0. As for 1), if indexing doesn't return rank-0 arrays, then standard reduction shouldn't either. We would then have a system in which rank-0 arrays are "expert only" stuff, most users would never see them, and they could safely be ignored in tutorials. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
Konrad Hinsen writes:
"Perry Greenfield" <perry@stsci.edu> writes:
Questions:
1) given 2, is there still a desire for .reduce() to return rank-0 arrays (if not, we have .areduce() which is intented to return arrays always).
2) whichever is the "returns arrays always" reduce method, should the endpoint be rank-0 arrays or rank-1 len-1 arrays?
I don't really see an application where a reduction operation yielding rank-1 or higher arrays would be useful. It would be a special case, not useful for generic programming. So my answer to 2) is rank-0.
What I am wondering is what behavior suits "generic" programming more. Eric Jones and Paul Dubois have given examples where having to deal with entities that may be scalars or arrays is a pain. But, having said that, the behavior that Eric wanted was not consistent with what many thought rank-0 arrays should have (i.e., len(rank-0)=1, rank-0[0] = value). The proposal to generate rank-1 len-1 arrays was made since len() of these arrays is = 1, and indexing with [0] does work. So for the kinds of examples he gave, rank-1 len-1 arrays appear to allow for more generic code. But I'm not trying to speak for everyone; that's why I'm asking for opinions. Do you have examples where you find rank-0 arrays make for more generic code in your cases when len() and indexing on these does not have the behavior Eric wanted? May I see a couple?
As for 1), if indexing doesn't return rank-0 arrays, then standard reduction shouldn't either. We would then have a system in which rank-0 arrays are "expert only" stuff, most users would never see them, and they could safely be ignored in tutorials.
That's my inclination, but I think that the question of whether there should be some reduce mechanism that returns arrays always is still a valid one. I can see that there are good uses for that (or at least a function to cast a scalar to a rank-1 len-1 array if it isn't already an array, like what array() does except that it now generates rank-0 arrays). Perry
participants (2)
-
Konrad Hinsen
-
Perry Greenfield