Matching 0-d arrays and NumPy scalars

Hi everybody, In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support. For example, 0-d arrays can be indexed with * Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()] I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible). This is an easy change to implement, and I don't think it would cause any backward compatibility issues. Any opinions from the list? Best regards, -Travis O.

A Thursday 21 February 2008, Travis E. Oliphant escrigué:
Hi everybody,
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()]
I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible).
This is an easy change to implement, and I don't think it would cause any backward compatibility issues.
Any opinions from the list?
Well, it seems like a non-intrusive modification, but I like the scalars to remain un-indexable, mainly because it would be useful to raise an error when you are trying to index them. In fact, I thought that when you want a kind of scalar but indexable, you should use a 0-d array. So, my vote is -0. Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

On 21.02.2008, at 08:41, Francesc Altet wrote:
Well, it seems like a non-intrusive modification, but I like the scalars to remain un-indexable, mainly because it would be useful to raise an error when you are trying to index them. In fact, I thought that when you want a kind of scalar but indexable, you should use a 0-d array.
I agree. In fact, I'd rather see NumPy scalars move towards Python scalars rather than towards NumPy arrays in behaviour. In particular, their nasty habit of coercing everything they are combined with into arrays is still my #1 source of compatibility problems with porting code from Numeric to NumPy. I end up converting NumPy scalars to Python scalars explicitly in lots of places. Konrad.

On 21.02.2008, at 08:41, Francesc Altet wrote:
Well, it seems like a non-intrusive modification, but I like the scalars to remain un-indexable, mainly because it would be useful to raise an error when you are trying to index them. In fact, I thought that when you want a kind of scalar but indexable, you should use a 0-d array.
I agree. In fact, I'd rather see NumPy scalars move towards Python scalars rather than towards NumPy arrays in behaviour. A good balance should be sought. I agree that improvements are needed, especially because much behavior is still just a side-effect of how
Konrad Hinsen wrote: things were implemented rather than specifically intentional.
In particular, their nasty habit of coercing everything they are combined with into arrays is still my #1 source of compatibility problems with porting code from Numeric to NumPy. I end up converting NumPy scalars to Python scalars explicitly in lots of places.
This bit, for example, comes from the fact that most of the math on scalars still uses ufuncs for their implementation. The numpy scalars could definitely use some improvements. However, I think my proposal for limited indexing capabilities should be considered separately from coercion behavior of NumPy scalars. NumPy scalars are intentionally different from Python scalars, and I see this difference growing due to where Python itself is going. For example, the int/long unification is going to change the ability for numpy.int to inherit from int. I could also forsee the Python float being an instance of a Decimal object or some other infinite precision float at some point which would prevent inheritance for the numpy.float object. The legitimate question is *how* different should they really be in each specific case. -Travis

On Feb 21, 2008, at 16:03, Travis E. Oliphant wrote:
However, I think my proposal for limited indexing capabilities should be considered separately from coercion behavior of NumPy scalars. NumPy scalars are intentionally different from Python scalars, and I see this difference growing due to where Python itself is going. For example, the int/long unification is going to change the ability for numpy.int to inherit from int.
True, but this is almost an implementation detail. What I see as more fundamental is the behaviour of Python container objects (lists, sets, etc.). If you add an object to a container and then access it as an element of the container, you get the original object (or something that behaves like the original object) without any trace of the container itself. I don't see why arrays should behave differently from all the other Python container objects - certainly not because it would be rather easy to implement. NumPy has been inspired a lot by array languages like APL or Matlab. In those languages, everything is an array, and plain numbers that would be scalars elsewhere are considered 0-d arrays. Python is not an array language but an OO language with the more general concepts of containers, sequences, iterators, etc. Arrays are just one kind of container object among many others, so they should respect the common behaviours of containers. Konrad.

On Thu, 21 Feb 2008, Konrad Hinsen apparently wrote:
What I see as more fundamental is the behaviour of Python container objects (lists, sets, etc.). If you add an object to a container and then access it as an element of the container, you get the original object (or something that behaves like the original object) without any trace of the container itself.
I am not a CS type, but your statement seems related to a matrix behavior that I find bothersome and unnatural:: >>> M = N.mat('1 2;3 4') >>> M[0] matrix([[1, 2]]) >>> M[0][0] matrix([[1, 2]]) I do not think anyone has really defended this behavior, *but* the reply to me when I suggested that a matrix contains arrays and we should see that in its behavior was that, no, a matrix is a container of matrices so this is what you get. So a possible problem with your phrasing of the argument (from a non-CS, user point of view) is that it fails to address what is actually "contained" (as opposed to what you might wish were contained). Apologies if this proves OT. Cheers, Alan Isaac

On Feb 21, 2008, at 18:08, Alan G Isaac wrote:
I do not think anyone has really defended this behavior, *but* the reply to me when I suggested that a matrix contains arrays and we should see that in its behavior was that, no, a matrix is a container of matrices so this is what you get.
I can't say much about matrices in NumPy as I never used them, nor tried to understand them. The example you give looks weird to me.
So a possible problem with your phrasing of the argument (from a non-CS, user point of view) is that it fails to address what is actually "contained" (as opposed to what you might wish were contained).
Most Python container objects contain arbitrary objects. Arrays are an exception (the exception being justified by the enormous memory and performance gains) in that all its elements are necessarily of identical type. A float64 array is thus a container of float64 values. BTW, I am not a CS type either, my background is in physics. I see myself on the "user" side as well. Konrad.

On Thu, 21 Feb 2008, Konrad Hinsen apparently wrote:
A float64 array is thus a container of float64 values.
Well ... ok:: >>> x = N.array([1,2],dtype='float') >>> x0 = x[0] >>> type(x0) <type 'numpy.float64'> >>> So a "float64 value" is whatever a numpy.float64 is, and that is part of what is under discussion. So it seems to me. If so, then expected behavior and use cases seem relevant. Alan PS I agree that the posted matrix behavior is "weird". For this and other reasons I think it hurts the matrix object, and I have requested that it change ...

On 21.02.2008, at 18:40, Alan G Isaac wrote:
x = N.array([1,2],dtype='float') x0 = x[0] type(x0) <type 'numpy.float64'>
So a "float64 value" is whatever a numpy.float64 is, and that is part of what is under discussion.
numpy.float64 is a very recent invention. During the first decade of numerical arrays in Python (Numeric), typ(x0) was the standard Python float type. And even today, what you put into an array (via the array constructor or by assignment) is Python scalar objects, mostly int, float, and complex. The reason for defining special types for the scalar elements of arrays was efficiency considerations. Python has only a single float type, there is no distinction between single and double precision. Extracting an array element would thus always yield a double precision float, and adding it to a single-precision array would yield a double precision result, meaning that it was extremely difficult to maintain single-precision storage across array arithmetic. For huge arrays, that was a serious problem. However, the intention was always to have numpy's scalar objects behave as similarly as possible to Python scalars. Ideally, application code should not see a difference at all. This was largely successful, with the notable exception of the coercion problem that I mentioned a few mails ago. Konrad.

On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac wrote:
On Thu, 21 Feb 2008, Konrad Hinsen apparently wrote:
What I see as more fundamental is the behaviour of Python container objects (lists, sets, etc.). If you add an object to a container and then access it as an element of the container, you get the original object (or something that behaves like the original object) without any trace of the container itself.
I am not a CS type, but your statement seems related to a matrix behavior that I find bothersome and unnatural::
>>> M = N.mat('1 2;3 4') >>> M[0] matrix([[1, 2]]) >>> M[0][0] matrix([[1, 2]])
This is exactly what I would expect for matrices: M[0] is the first row of the matrix. Note that you don't see this behaviour for ndarrays, since those don't insist on having a minimum of 2-dimensions. In [2]: x = np.arange(12).reshape((3,4)) In [3]: x Out[3]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [4]: x[0][0] Out[4]: 0 In [5]: x[0] Out[5]: array([0, 1, 2, 3]) Regards Stefan

On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac wrote:
a matrix behavior that I find bothersome and unnatural::
>>> M = N.mat('1 2;3 4') >>> M[0] matrix([[1, 2]]) >>> M[0][0] matrix([[1, 2]])
On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote:
This is exactly what I would expect for matrices: M[0] is the first row of the matrix.
Define what "first row" means! There is no standard definition that says this is means the **submatrix** that can be created from the first row. Someone once pointed out on this list that one might consider a matrix to be a container of 1d vectors. For NumPy, however, it is natural that it be a container of 1d arrays. (See the discussion for the distinction.) Imagine if a 2d array behaved this way. Ugh! Note that it too is 2d; you could have the same "expectation" based on its 2d-ness. Why don't you? You "expect" this matrix behavior only from experience with it, which is why I "expect" it too, while hating it. It is not what new users will expect and also not desirable. As Konrad noted, it is very odd behavior to treat a matrix as a container of matrices. You can only "expect" this behavior by learning to expect it (by use), which is undesirable. Nobody has objected to returning matrices when getitem is fed multiple arguments: these are naturally interpreted as requests for submatrices. M[0][0] and M[:1,:1] are very different kinds of requests: the first should return the 0,0 element but does not, while M[0,0] does! Bizarre! How to guess?? If you teach, do your students expect this behavior? Mine don't! This is a wart. The example really speaks for itself. Since Konrad is an extremely experienced user/developer, his reaction should speak volumes. Cheers, Alan Isaac

On 22.02.2008, at 01:10, Alan G Isaac wrote:
Someone once pointed out on this list that one might consider a matrix to be a container of 1d vectors. For NumPy, however, it is natural that it be a container of 1d arrays. (See the discussion for the distinction.)
If I were to design a Pythonic implementation of the mathematical concept of a matrix, I'd implement three classes: Matrix, ColumnVector, and RowVector. It would work like this: m = Matrix([[1, 2], [3, 4]]) m[0, :] --> ColumnVector([1, 3]) m[:, 0] --> RowVector([1, 2]) m[0,0] --> 1 # scalar m.shape --> (2, 2) m[0].shape --> (2,) However, the matrix implementation in Numeric was inspired by Matlab, where everything is a matrix. But as I said before, Python is not Matlab. Konrad.

Konrad Hinsen wrote:
On 22.02.2008, at 01:10, Alan G Isaac wrote:
Someone once pointed out on this list that one might consider a matrix to be a container of 1d vectors. For NumPy, however, it is natural that it be a container of 1d arrays. (See the discussion for the distinction.)
If I were to design a Pythonic implementation of the mathematical concept of a matrix, I'd implement three classes: Matrix, ColumnVector, and RowVector. It would work like this:
m = Matrix([[1, 2], [3, 4]])
m[0, :] --> ColumnVector([1, 3]) m[:, 0] --> RowVector([1, 2])
These seem backward to me. I would think that m[0,:] would be the RowVector([1,2]) and m[:,0] be the ColumnVector([1,3]).
m[0,0] --> 1 # scalar
m.shape --> (2, 2) m[0].shape --> (2,)
What is m[0] in this case? The same as m[0, :]?
However, the matrix implementation in Numeric was inspired by Matlab, where everything is a matrix. But as I said before, Python is not Matlab. It should be kept in mind, however, that Matlab's matrix object is used successfully by a lot of people and should not be dismissed as irrelevant.
I would like to see an improved Matrix object as a built-in type (for 1.1). I am aware of two implementations that could be referred to in creating it: CVXOPT's matrix object and NumPy's matrix object. There may be others as well. If somebody has strong feelings about this sufficient to write a matrix built-in, then the door is wide open. Best, -Travis

On Feb 22, 2008, at 15:55, Travis E. Oliphant wrote:
ColumnVector, and RowVector. It would work like this:
m = Matrix([[1, 2], [3, 4]])
m[0, :] --> ColumnVector([1, 3]) m[:, 0] --> RowVector([1, 2])
These seem backward to me. I would think that m[0,:] would be the RowVector([1,2]) and m[:,0] be the ColumnVector([1,3]).
Right.
What is m[0] in this case? The same as m[0, :]?
Yes.
However, the matrix implementation in Numeric was inspired by Matlab, where everything is a matrix. But as I said before, Python is not Matlab. It should be kept in mind, however, that Matlab's matrix object is used successfully by a lot of people and should not be dismissed as irrelevant.
Matlab's approach is fine for Matlab, of course. All I am saying is that it is a misfit for Python. Just like 1-based indexing is used successfully by lots of Fortran programmers, but would be an eternal source of confusion if it were introduced for a specific container object in Python. Konrad.

On Thu, Feb 21, 2008 at 07:10:24PM -0500, Alan G Isaac wrote:
On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac wrote:
a matrix behavior that I find bothersome and unnatural::
>>> M = N.mat('1 2;3 4') >>> M[0] matrix([[1, 2]]) >>> M[0][0] matrix([[1, 2]])
On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote:
This is exactly what I would expect for matrices: M[0] is the first row of the matrix.
Define what "first row" means! There is no standard definition that says this is means the **submatrix** that can be created from the first row. Someone once pointed out on this list that one might consider a matrix to be a container of 1d vectors. For NumPy, however, it is natural that it be a container of 1d arrays. (See the discussion for the distinction.)
Could you explain to me how you'd like this to be fixed? If the matrix becomes a container of 1-d arrays, then you can no longer expect x[:,0] to return a column vector -- which was one of the reasons the matrix class was created. While not entirely consistent, one workaround would be to detect when a matrix is a "vector", and then do 1-d-like indexing on it.
You "expect" this matrix behavior only from experience with it, which is why I "expect" it too, while hating it.
No, really, I don't ever use the matrix class :) But it is not like the behaviour is set in stone, so I would spend less time hating and more time patching.
The example really speaks for itself. Since Konrad is an extremely experienced user/developer, his reaction should speak volumes.
Of course, I meant no disrespect to Konrad. I'm just trying to understand the best way to address your concern. Regards Stefan

On Thu, Feb 21, 2008 at 12:08:32PM -0500, Alan G Isaac wrote:
a matrix behavior that I find bothersome and unnatural::
>>> M = N.mat('1 2;3 4') >>> M[0] matrix([[1, 2]]) >>> M[0][0] matrix([[1, 2]]) On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote:
Could you explain to me how you'd like this to be fixed? If the matrix becomes a container of 1-d arrays, then you can no longer expect x[:,0] to return a column vector -- which was one of the reasons the matrix class was created. While not entirely consistent, one workaround would be to detect when a matrix is a "vector", and then do 1-d-like indexing on it.
Letting M be a matrix and A=M.A, and i and j are integers. I would want two principles to be honored. 1. ordinary Python indexing produces unsurprising results, so that e.g. M[0][0] returns the first element of the matrix 2. indexing that produces a 2-d array when applied to A will produce the equivalent matrix when applied to M There is some tension between these two requirements, and they do not address your specific example. Various reconciliations can be imagined. I believe a nice one can be achieved with a truly minimal change, as follows. Let M[i] return a 1d array. (Unsurprising!) This is a change: a matrix becomes a container of arrays (e.g., when iterating). Let M[:,i] and M[i,:] behave as now. In addition, as a consistency measure, one might ask that M[i,j] return a 1 x 1 matrix. (This is of secondary importance, but it follows the principle that the use of multiple indexes produces matrices.) Right now I'm operating on caffeine instead of sleep, but that looks right ... Alan Isaac

Could you explain to me how you'd like this to be fixed? If the matrix becomes a container of 1-d arrays, then you can no longer expect x[:,0] to return a column vector -- which was one of the reasons the matrix class was created. While not entirely consistent, one workaround would be to detect when a matrix is a "vector", and then do 1-d-like indexing on it.
Letting M be a matrix and A=M.A, and i and j are integers. I would want two principles to be honored.
1. ordinary Python indexing produces unsurprising results, so that e.g. M[0][0] returns the first element of the matrix 2. indexing that produces a 2-d array when applied to A will produce the equivalent matrix when applied to M
There is some tension between these two requirements, and they do not address your specific example.
Various reconciliations can be imagined. I believe a nice one can be achieved with a truly minimal change, as follows.
Let M[i] return a 1d array. (Unsurprising!) This is a change: a matrix becomes a container
This is a concrete proposal and I don't immediately have a problem with it (other than it will break code and so must go in to 1.1).
Let M[:,i] and M[i,:] behave as now.
Some would expect M[i,:] and M[i] to be the same thing, but I would be willing to squelsh those expectations if many can agree that M[i] should return an array.
In addition, as a consistency measure, one might ask that M[i,j] return a 1 x 1 matrix. (This is of secondary importance, but it follows the principle that the use of multiple indexes produces matrices.)
I'm pretty sure that wasn't the original "principle", but again this is not unreasonable.
Right now I'm operating on caffeine instead of sleep, but that looks right ...
Alan Isaac

On Fri, 22 Feb 2008, Stefan van der Walt apparently wrote:
This is exactly what I would expect for matrices: M[0] is the first row of the matrix.
Define what "first row" means!
Konrad has shown that do "get it right" you really have to introduce three separate things (matrices, row vectors, and column vectors). This is a fine direction to proceed in, but it does complicate things as well. The current implementation has the advantage that row vectors are just 1xN matrices and column vectors are Nx1 matrices, so there is only 1 kind of thing: matrices. The expectation that M[0][0] and M[0,0] return the same thing stems from believing that all objects using [] syntax are just containers. (Think of a dictionary with keys, '0', and '(0,0)' for an example). The matrix object is not a "container" object. A NumPy array, however, is. They have different behaviors, on purpose. If you don't like the matrix object, then just use the NumPy array. There are situations, however, when the matrix object is very useful. I use it in limited fashion to make expressions easier to read.
Imagine if a 2d array behaved this way. Ugh! Note that it too is 2d; you could have the same "expectation" based on its 2d-ness. Why don't you?
The 2d-ness is not the point. The point is that a matrix object is a matrix object and *not* a generic container.
Nobody has objected to returning matrices when getitem is fed multiple arguments: these are naturally interpreted as requests for submatrices. M[0][0] and M[:1,:1] are very different kinds of requests: the first should return the 0,0 element but does not, while M[0,0] does! Bizarre! How to guess?? If you teach, do your students expect this behavior? Mine don't!
Again, stop believing that M[0][0] and M[0,0] should return the same thing. There is nothing in Python that requires this. As far as I know, the matrix object is consistent. It may not behave as you, or people that you teach, would expect, but it does have reasonable behavior. Expectations are generally "learned" based on previous experience. Our different experiences will always lead to different expectations. What somebody expects for a matrix behavior will depend on how they were taught what it means to "be" a matrix.
This is a wart.
I disagree. It's not a wart, it is intentional.
The example really speaks for itself. Since Konrad is an extremely experienced user/developer, his reaction should speak volumes.
I'm not as convinced by this kind of argument. I respect Konrad a great deal and am always interested to hear his opinion, and make use of all of the code that he shares with us. His example has been an important part of my Python "education." However, we do approach problems differently (probably again based on previous experiences) which leads us to promote different solutions. I also see this in the wider Python community where there is a "diversity" of user/developers who promote different approaches as well (e.g. the PIL vs NumPy concept of Images comes to mind as well). I've heard many differing points of view on the Matrix object. Stefan's comment is most relevant: the Matrix object can be changed (in 1.1), especially because we are keen on merging CVXOPT's matrix object with NumPy's and making it a builtin type. -Travis O.

On Fri, 22 Feb 2008, "Travis E. Oliphant" apparently wrote:
The point is that a matrix object is a matrix object and not a generic container.
I see the point a bit differently: there are costs and benefits to the abandonment of a specific and natural behavior of containers. (The kind of behavior that arrays have.) The costs outweigh the benefits.
stop believing that M[0][0] and M[0,0] should return the same thing. There is nothing in Python that requires this.
I never suggested there is. My question "how to guess?" does not imply that. My point is: the matrix object could have more intuitive behavior with no loss of functionality. Or so it seems to me. See my other post. Cheers, Alan

Alan G Isaac wrote:
stop believing that M[0][0] and M[0,0] should return the same thing. There is nothing in Python that requires this.
I never suggested there is. My question "how to guess?" does not imply that.
My point is: the matrix object could have more intuitive behavior with no loss of functionality.
Do I understand correctly, that by intuitive you mean based on experience with lists, and NumPy arrays? I agree, it is very valuable to be able to use previous understanding to navigate a new thing. That's a big part of why I could see changing the matrix object in 1.1 to behave as you described in your previous post: where M[i] returned a 1-d array and matrices were returned with 2-d (slice-involved) indexing (I would not mind M[0,0] to still return a scalar, however). -Travis

On Fri, 22 Feb 2008, "Travis E. Oliphant" apparently wrote:
Do I understand correctly, that by intuitive you mean based on experience with lists, and NumPy arrays?
Yes. In particular, array behavior is quite lovely and almost never surprising, so matrices should deviate from it only when there is an adequate payoff and, ideally, an easily stated principle. Thanks! Alan PS If you choose to implement such changes, I would find M[0,0] returning a 1×1 matrix to be more consistent, but to be clear, for me this is *very* much a secondary issue. Not even in the same ballpark.

Travis E. Oliphant wrote:
to behave as you described in your previous post: where M[i] returned a 1-d array
My thoughts on this: As Konrad suggested, row vectors and column vectors are different beasts ,and both need to be easily and intuitively available. M[i] returning a 1-d array breaks this -- that's what raw numpy arrays do, and I like it, but it's not so natural for linear algebra. If we really want to support matrixes, then no, M[i] should not return a 1-d array -- what is a 1-d array mean in the matrix/linear algebra context? It makes me think that M[i] should not even be possible, as you would always want one of: row vector: M[i,:] column vector: M[:,i] element: M[i,j] I do like the idea of a row/column vectors being different objects than matrices, then you could naturally index the elements from them. If you really want a 1-d array, you can always do: M.A[i] What if you want to naturally iterate through all the rows, or all the columns? what about: for row in M.rows for column in M.columns M.rows and M.columns would be iterators. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, 22 Feb 2008, Christopher Barker apparently wrote:
It makes me think that M[i] should not even be possible, as you would always want one of:
row vector: M[i,:] column vector: M[:,i] element: M[i,j]
I propose that the user-friendly question is: why deviate needlessly from array behavior? (Needlessly means: no increase in functionality.) Cheers, Alan Isaac

Alan G Isaac wrote:
I propose that the user-friendly question is: why deviate needlessly from array behavior?
because that's the whole point of a Matrix object in the first place.
(Needlessly means: no increase in functionality.)
Functionally, you can do everything you need to do with numpy arrays. The only reason there is a matrix class is to create a more natural, and readable way to do linear algebra. That's why the current version always returns matrices -- people don't want to have to keep converting back to matrices from arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Alan G Isaac wrote:
I propose that the user-friendly question is: why deviate needlessly from array behavior? (Needlessly means: no increase in functionality.)
On Fri, 22 Feb 2008, Christopher Barker apparently wrote:
because that's the whole point of a Matrix object in the first place.
Do you really believe that? As phrased?? (Out of curiosity: do you use matrices?) On Fri, 22 Feb 2008, Christopher Barker apparently wrote:
Functionally, you can do everything you need to do with numpy arrays.
That is a pretty narrow concept of functionality, which excludes all user convenience aspects. I do not understand why you are introducing it; it seems irrelevant. If you push this line of reasoning, you should just tell me I can do it all in C. On Fri, 22 Feb 2008, Christopher Barker apparently wrote:
The only reason there is a matrix class is to create a more natural, and readable way to do linear algebra. That's why the current version always returns matrices -- people don't want to have to keep converting back to matrices from arrays.
You are begging the question. Of course we want to be able to conveniently extract submatrices and build new matrices. Nobody has challenged that or proposed otherwise. Or are you complaining that you would have to type M[i,:] instead of M[i]? (No, that cannot be; you were proposing that M[i] be an error...) Alan Isaac

Alan G Isaac wrote:
On Fri, 22 Feb 2008, Christopher Barker apparently wrote:
because that's the whole point of a Matrix object in the first place.
Do you really believe that? As phrased??
yes -- the matrix object is about style, not functionality -- not that style isn't important
(Out of curiosity: do you use matrices?)
No. In fact, that's one of the reasons I was overjoyed to find Numeric after using Matlab for along time -- I hardly ever need linear algebra, what I need is n-d arrays. So, yes, I should just shut up and leave the discussion to those that really want to use them. I will note, however, that in reading this list for years, I haven't found that many people really do want matrices -- they are asked for a lot by Matlab converts, then often the users find that they can more easily do what they want with arrays after all. Maybe that's because the Matrix API needs improvement, so I guess what we really need is someone that really wants them to champion the cause. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

A Thursday 21 February 2008, Konrad Hinsen escrigué:
I agree. In fact, I'd rather see NumPy scalars move towards Python scalars rather than towards NumPy arrays in behaviour. In particular, their nasty habit of coercing everything they are combined with into arrays is still my #1 source of compatibility problems with porting code from Numeric to NumPy. I end up converting NumPy scalars to Python scalars explicitly in lots of places.
Yeah, that happened to me too quite frequently, and it is quite uncomfortable. Also, I find this specially unpleasant: In [87]: numpy.int(1)/numpy.uint64(2) Out[87]: 0.5 Is this avoidable, or it's a consequence of the coercing rules? I guess this is the same case of: In [88]: numpy.array([1])/numpy.array([2], 'uint64') Out[88]: array([ 0.5]) By the way: In [89]: numpy.array(1)/numpy.array(2, 'uint64') Out[89]: 0.5 shouldn't this be array(0.5)? Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Travis E. Oliphant wrote:
Hi everybody,
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()]
I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible).
This is an easy change to implement, and I don't think it would cause any backward compatibility issues.
Any opinions from the list?
Best regards,
-Travis O.
a=80 disp(a) 80 disp(a(1,1)) 80 ok, for numpy having at least possibility to use a=array(80)
As for me I would be glad to see same behavior for numbers as for arrays at all, like it's implemented in MATLAB, i.e. print a[0] would be very convenient, now atleast_1d(a) is required very often, and sometimes errors occur only some times later, already during execution of user-installed code, when user usually pass several-variables arrays and some time later suddenly single-variable array have been encountered. I guess it could be implemented via a simple check: if user calls for a[0] and a is array of shape () (i.e. like a=array(80)) then return a[()] D.

In MATLAB, scalars are 1x1 arrays, and thus they can be indexed. There have been situations in my use of Numpy when I would have liked to index scalars to make my code more general. It's not a very pressing issue for me but it is an interesting issue. Whenever I index an array with a sequence or slice I'm guaranteed to get another array out. This consistency is nice. In [1]: A=numpy.random.rand(10) In [2]: A[range(0,1)] Out[2]: array([ 0.88109759]) In [3]: A[slice(0,1)] Out[3]: array([ 0.88109759]) In [3]: A[[0]] Out[3]: array([ 0.88109759]) However, when I index an array with an integer, I can get either a sequence or a scalar out. In [4]: c1=A[0] Out[4]: 0.88109759 In [5]: B=numpy.random.rand(5,5) In [5]: c2=B[0] Out[5]: array([ 0.81589633, 0.9762584 , 0.72666631, 0.12700816, 0.40653243]) Although c1 and c2 were derived by integer-indexing two different arrays of doubles, one is a sequence and the other is a scalar. This lack of consistency might be confusing to some people, and I'd imagine it occasionally results in programming errors. Damian Travis E. Oliphant wrote:
Hi everybody,
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()]
I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible).
This is an easy change to implement, and I don't think it would cause any backward compatibility issues.
Any opinions from the list?
Best regards,
-Travis O.

While we are on the subject of indexing... I use xranges all over the place because I tend to loop over big data sets. Thus I try avoid to avoid allocating large chunks of memory unnecessarily with range. While I try to be careful not to let xranges propagate to the ndarray's [] operator, there have been a few times when I've made a mistake. Is there any reason why adding support for xrange indexing would be a bad thing to do? All one needs to do is convert the xrange to a slice object in __getitem__. I've written some simple code to do this conversion in Python (note that in C, one can access the start, end, and step of an xrange object very easily.) def xrange_to_slice(ind): """ Converts an xrange object to a slice object. """ retval = slice(None, None, None) if type(ind) == XRangeType: # Grab a string representation of the xrange object, which takes # any of the forms: xrange(a), xrange(a,b), xrange(a,b,s). # Break it apart into a, b, and s. sind = str(ind) xr_params = [int(s) for s in sind[(sind.find('(')+1):sind.find(')')].split(",")] retval = apply(slice, xr_params) else: raise TypeError("Index must be an xrange object!") #endif return retval ---- On another note, I think it would be great if we added support for a find function, which takes a boolean array A, and returns the indices corresponding to True, but over A's flat view. In many cases, indexing with a boolean array is all one needs, making find unnecessary. However, I've encountered cases where computing the boolean array was computationally burdensome, the boolean arrays were large, and the result was needed many times throughout the broader computation. For many of my problems, storing away the flat index array uses a lot less memory than storing the boolean index arrays. I frequently define a function like def find(A): return numpy.where(A.flat)[0] Certainly, we'd need a find with more error checking, and one that handles the case when a list of booleans is passed (or a list of lists). Conceivably, one might try to index a non-flat array with the result of find. To deal with this, find could return a place holder object that the index operator checks for. Just an idea. -- I also think it'd be really useful to have a function that's like arange in that it supports floats/doubles, and also like xrange in that elements are only generated on demand. It could be implemented as a generator as shown below. def axrange(start, stop=None, step=1.0): if stop == None: stop = start start = 0.0 #endif (start, stop, step) = (numpy.float64(start), numpy.float64(stop), numpy.float64(step)) for i in xrange(0,numpy.ceil((stop-start)/step)): yield numpy.float64(start + step * i) #endfor Or, as a class, class axrangeiter: def __init__(self, rng): "An iterator over an axrange object." self.rng = rng self.i = 0 def next(self): "Returns the next float in the sequence." if self.i >= len(self.rng): raise StopIteration() self.i += 1 return self.rng[self.i-1] class axrange: def __init__(self, *args): """ axrange(stop) axrange(start, stop, [step]) An axrange object is an iterable numerical sequence between start and stop. Similar to arange, there are n=ceil((stop-start)/step) elements in the sequence. Elements are generated on demand, which can be more memory efficient. """ if len(args) == 1: self.start = numpy.float64(0.0) self.stop = numpy.float64(args[0]) self.step = numpy.float64(1.0) elif len(args) == 2: self.start = numpy.float64(args[0]) self.stop = numpy.float64(args[1]) self.step = numpy.float64(1.0) elif len(args) == 3: self.start = numpy.float64(args[0]) self.stop = numpy.float64(args[1]) self.step = numpy.float64(args[2]) else: raise TypeError("axrange requires 3 arguments.") #endif self.len = max(int(numpy.ceil((self.stop-self.start)/self.step)),0) def __len__(self): return self.len def __getitem__(self, i): return numpy.float64(self.start + self.step * i) def __iter__(self): return axrangeiter(self) def __repr__(self): if self.start == 0.0 and self.step == 1.0: return "axrange(%s)" % str(self.stop) elif self.step == 1.0: return "axrange(%s,%s)" % (str(self.start), str(self.stop)) else: return "axrange(%s,%s,%s)" % (str(self.start), str(self.stop), str(self.step)) #endif Travis E. Oliphant wrote:
Hi everybody,
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()]
I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible).
This is an easy change to implement, and I don't think it would cause any backward compatibility issues.
Any opinions from the list?
Best regards,
-Travis O.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Damian Eads wrote:
While we are on the subject of indexing... I use xranges all over the place because I tend to loop over big data sets. Thus I try avoid to avoid allocating large chunks of memory unnecessarily with range. While I try to be careful not to let xranges propagate to the ndarray's [] operator, there have been a few times when I've made a mistake. Is there any reason why adding support for xrange indexing would be a bad thing to do? All one needs to do is convert the xrange to a slice object in __getitem__. I've written some simple code to do this conversion in Python (note that in C, one can access the start, end, and step of an xrange object very easily.)
I think something like this could be supported. Basically, interpreting an xrange object as a slice object would be my presumed behavior. -Travis O.

Travis E. Oliphant wrote:
Hi everybody,
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()]
I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible).
This is an easy change to implement, and I don't think it would cause any backward compatibility issues.
Any opinions from the list?
Best regards,
-Travis O.
Travis, You have been getting mostly objections so far; maybe it would help if you gave a simple specific example of how your proposal would simplify code. Eric

Travis,
You have been getting mostly objections so far; I wouldn't characterize it that way, but yes 2 people have pushed back a bit, although one not directly speaking to the proposed behavior.
The issue is that [] notation does more than just "select from a container" for NumPy arrays. In particular, it is used to reshape an array to more dimensions: [..., newaxis] A common pattern is to reduce over a dimension and then re-shape the result so that it can be combined with the un-reduced object. Broadcasting makes this work if the dimension being reduced along is the first dimension. But, broadcasting is not enough if you want the reduction dimension to be arbitrary: Thus, y = add.reduce(x, axis=-1) produces an N-1 array if x is 2-d and a numpy scalar if x is 1-d. Suppose y needs to be subtracted from x. If x is 2-d, then
x - y[...,newaxis]
is the needed code. But, if x is 1-d, then
x - y[..., newaxis]
returns an error and a check must be done to handle the case separately. If y[..., newaxis] worked and produced a 1-d array when y was a numpy scalar, this could be avoided. -Travis O.

On Thu, Feb 21, 2008 at 12:30 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Travis,
You have been getting mostly objections so far; I wouldn't characterize it that way, but yes 2 people have pushed back a bit, although one not directly speaking to the proposed behavior.
I need to think about it a lot more, but my initial reaction is also negative. On general principle, I think scalars should be different from arrays. Perhaps you could give some concrete examples of why you want the new behavior? Perhaps there will be other approaches that would achieve the same end. Chuck

Travis E. Oliphant wrote:
Travis,
You have been getting mostly objections so far; I wouldn't characterize it that way, but yes 2 people have pushed back a bit, although one not directly speaking to the proposed behavior.
The issue is that [] notation does more than just "select from a container" for NumPy arrays. In particular, it is used to reshape an array to more dimensions: [..., newaxis]
A common pattern is to reduce over a dimension and then re-shape the result so that it can be combined with the un-reduced object. Broadcasting makes this work if the dimension being reduced along is the first dimension. But, broadcasting is not enough if you want the reduction dimension to be arbitrary:
Thus,
y = add.reduce(x, axis=-1) produces an N-1 array if x is 2-d and a numpy scalar if x is 1-d.
Why does it produce a scalar instead of a 0-d array? Wouldn't the latter take care of your use case, and be consistent with the action of reduce in removing one dimension? I'm not opposed to your suggested change--just trying to understand it. I'm certainly sympathetic to your use case, below. I dimly recall extensive and confusing (to me) discussions of numpy scalars versus 0-d arrays during your heroic push to make numpy gel, and I suspect the answer is somewhere back in those discussions. Eric
Suppose y needs to be subtracted from x.
If x is 2-d, then
x - y[...,newaxis]
is the needed code. But, if x is 1-d, then
x - y[..., newaxis]
returns an error and a check must be done to handle the case separately. If y[..., newaxis] worked and produced a 1-d array when y was a numpy scalar, this could be avoided.
-Travis O.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Hi Travis, On Wed, Feb 20, 2008 at 10:14:07PM -0600, Travis E. Oliphant wrote:
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks
I've tried to use this before, but an IndexError (0-d arrays can't be indexed) is raised.
* Ellipses x[...] and x[..., newaxis]
This, especially, seems like it could be very useful.
This is an easy change to implement, and I don't think it would cause any backward compatibility issues.
Any opinions from the list?
This is maybe a fairly esoteric use case, but one I can imagine coming across. I'm in favour of implementing the change. Could I ask that we also consider implementing len() for 0-d arrays? numpy.asarray returns those as-is, and I would like to be able to handle them just as I do any other 1-dimensional array. I don't know if a length of 1 would be valid, given a shape of (), but there must be some consistent way of handling them. Regards Stefan

On 21/02/2008, Stefan van der Walt <stefan@sun.ac.za> wrote:
Could I ask that we also consider implementing len() for 0-d arrays? numpy.asarray returns those as-is, and I would like to be able to handle them just as I do any other 1-dimensional array. I don't know if a length of 1 would be valid, given a shape of (), but there must be some consistent way of handling them.
Well, if the length of an array is the product of all its sizes, the product of no things is customarily defined to be one... whether that is actually a useful value is another question. Anne

A Friday 22 February 2008, Stefan van der Walt escrigué:
Hi Travis,
On Wed, Feb 20, 2008 at 10:14:07PM -0600, Travis E. Oliphant wrote:
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks
I've tried to use this before, but an IndexError (0-d arrays can't be indexed) is raised.
Yes, that's true, and what's more, you can't pass a slice to a 0-d array, which is certainly problematic. I think this should be fixed.
* Ellipses x[...] and x[..., newaxis]
This, especially, seems like it could be very useful.
Well, if you want to create a x[..., newaxis], you can always use array([x]), which also works with scalars (and python scalars too), although the later does create a copy :-/
Could I ask that we also consider implementing len() for 0-d arrays? numpy.asarray returns those as-is, and I would like to be able to handle them just as I do any other 1-dimensional array. I don't know if a length of 1 would be valid, given a shape of (), but there must be some consistent way of handling them.
If 0-d arrays are going to be indexable, then +1 for len(0-d) returning 1. Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Travis, after reading all the post on this thread, my comments Fist of all, I'm definitelly +1 on your suggestion. Below my rationale. * I believe numpy scalars should provide all possible features needed to smooth the difference between mutable, indexable 0-d arrays and inmutable, non-indexable builtin Python numeric types. * Given that in the context of generic multi-dimensional array processing a 0-d array are more natural and useful concept that a Python 'int' and 'float', I really think that numpy scalars shoud follow as much as possible the behavior of 0-d arrays (of course, retaining inmutability). * Numpy scalars already have (thanks for that!) a very, very similar API to ndarrays. You can as for 'size', 'shape', etc ( BTW, why scalar.fill(x) does not generate any error????). Why do not add indexing as well? * However, I'm not sure about the proposal of supporting len(), I'm -0 on this point. Anyway, if this is added, then 0-d arrays should also have to support it. And then... len(scalar) or len(0-d-array) is going to return 0 (zero)? Regards. On 2/21/08, Travis E. Oliphant <oliphant@enthought.com> wrote:
In writing some generic code, I've encountered situations where it would reduce code complexity to allow NumPy scalars to be "indexed" in the same number of limited ways, that 0-d arrays support.
For example, 0-d arrays can be indexed with
* Boolean masks * Ellipses x[...] and x[..., newaxis] * Empty tuple x[()]
I think that numpy scalars should also be indexable in these particular cases as well (read-only of course, i.e. no setting of the value would be possible).
This is an easy change to implement, and I don't think it would cause any backward compatibility issues.
Any opinions from the list?
Best regards,
-Travis O.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Lisandro Dalcín --------------- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
participants (12)
-
Alan G Isaac
-
Anne Archibald
-
Charles R Harris
-
Christopher Barker
-
Damian Eads
-
dmitrey
-
Eric Firing
-
Francesc Altet
-
Konrad Hinsen
-
Lisandro Dalcin
-
Stefan van der Walt
-
Travis E. Oliphant