Before we implement what we said we would regarding rank-0 arrays in numarray, there became apparent a couple new issues that didn't really get considered in the first round of discussion (at least I don't recall that they did).
To restate the issue: there was a question about whether an index to an array that identified a single element only (i.e., not a slice, nor an incomplete index, e.g. x[3] where x is two dimensional) should return a Python scalar or a rank-0 array. Currently Numeric is inconsistent on this point. One usually gets scalars, but on some occasions, rank-0 arrays are returned. Good arguments are to be had for either alternative.
The primary advantage of returning rank-0 arrays is that they reduce the need for conditional code checking to see if a result is a scalar or an array. At the end of the discussion it was decided to have numarray return rank-0 arrays in all instances of single item indexing. Since then, a couple potential snags have arisen. I've already discussed some of these with Paul Dubois and Eric Jones. I'd like a little wider input before making a final (or at least experimental) decision.
If we return rank-0 arrays, what should repr return for rank-0 arrays. My initial impression is that the following is highly undesirable for a interactive session, but maybe it is just me:
x = arange(10) x[2]
array(2)
We, of course, could arrange __repr__ to return "2" instead, in other words print the simple scalar for all cases of rank-0 arrays. This would yield the expected output in the above example. Nevertheless, isn't it violating the intent of repr? Are there other examples where Python uses repr in a similar, misleading manner? But perhaps most feel that returning array(2) is perfectly acceptable and won't annoy users. I am curious about what people think about this.
The second issue is an efficiency one. Currently numarray uses Python objects for arrays. If we return rank-0 arrays for single item indexing, then some naive uses of larger arrays as sequences may lead to an enormous number of array objects to be created. True, there will be equivalent means of doing the same operation that won't result in massive object creations (such as specifically converting an array to a list, which would be done much faster). Is this a serious problem?
These two issues led us to question whether we should indeed return rank-0 arrays. We can live with either solution. But we do want to make the right choice. We also know that both functionalities must exist, e.g., indexing for scalars and indexing for rank-0 arrays and we will provide both. The issue is what indexing syntax returns. One argument is that it is not a great burden on programmers to use a method (or other means) to obtain a rank-0 array always if that is important for the code they are writing and that we should make the indexing syntax return what most users (especially less expert ones) intuitively expect (scalars I presume). But others feel it is just as important for the syntax that a progammer uses to be as simple as the interactive user expects (instead of something like x.getindexasarrayalways(2,4,1) [well, with a much better, and shorter, name])
Do either of these issues change anyone's opinion? If people still want rank-0 arrays, what should repr do?
Perry Greenfield
On Thu, 12 Sep 2002, Perry Greenfield wrote:
If we return rank-0 arrays, what should repr return for rank-0 arrays. My initial impression is that the following is highly undesirable for a interactive session, but maybe it is just me:
x = arange(10) x[2]
array(2)
We, of course, could arrange __repr__ to return "2" instead, in other words print the simple scalar for all cases of rank-0 arrays. This would yield the expected output in the above example. Nevertheless, isn't it violating the intent of repr? Are there other examples where Python uses repr in a similar, misleading manner? But perhaps most feel that returning array(2) is perfectly acceptable and won't annoy users. I am curious about what people think about this.
I think it would be confusing if the result of repr would be `2' and not `array(2)' because 2 and array(2) are not equivalent in all usages but it should be clear from repr results as a first way to learn more about the objects.
For example, if using array(2) as an index in Python native objects, then TypeError is raised (as expected). In interactive session the quickest way to check the type of a variable is just type its name and press enter:
i
array(2)
Now, if repr(array(2)) returns '2', then one firstly assumes that `i' is an integer. However, this would be very confusing if one sees
some_list[i]
Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: sequence index must be integer
i
2
So, I think that repr(array(2)) should return 'array(2)'. And users can always change this behaviour locally by using sys.displayhook. Though, I would recommend using tools like ipython for interactive sessions.
Btw, note that during an interactive session it would be *seemingly* desired if also repr(string) would return str(string). For example, when viewing documentation in interactive sessions. Wouldn't it be nice to have
sys.displayhook.__doc__
displayhook(object) -> None
Print an object to sys.stdout and also save it in __builtin__._
instead of the current behaviour:
sys.displayhook.__doc__
'displayhook(object) -> None\n\nPrint an object to sys.stdout and also save it in __builtin__._\n'
The second issue is an efficiency one. Currently numarray uses Python objects for arrays. If we return rank-0 arrays for single item indexing, then some naive uses of larger arrays as sequences may lead to an enormous number of array objects to be created. True, there will be equivalent means of doing the same operation that won't result in massive object creations (such as specifically converting an array to a list, which would be done much faster). Is this a serious problem?
Could array.__getitem_ and __getslice__ detect if their argument is an array and skip using Python objects when iterating over indices? If this is technically possible then it is not a good reason to drop returning rank-0 arrays. The actual implementation may come later, though.
If people still want rank-0 arrays, what should repr do?
Always return 'array(...)'.
You can also ask from python-dev for advice if numarray is considered to be included to Python library in future. I am sure that repr issue will be brought up if repr==str for 0-rank arrays.
Pearu
Pearu Peterson pearu@cens.ioc.ee writes:
I think it would be confusing if the result of repr would be `2' and not `array(2)' because 2 and array(2) are not equivalent in all usages but it should be clear from repr results as a first way to learn more about the objects.
I agree. There will already be some inevitable confusion with both rank-0 arrays and scalars around, with similar but not identical behaviour. Rank-0 arrays shouldn't make it worse by using camouflage.
The second issue is an efficiency one. Currently numarray uses Python objects for arrays. If we return rank-0 arrays for single item indexing, then some naive uses of larger arrays as sequences may lead to an enormous number of array objects to be created. True, there will be equivalent means of doing the same operation that won't result in massive object creations (such as specifically converting an array to a list, which would be done much faster). Is this a serious problem?
Could array.__getitem_ and __getslice__ detect if their argument is an array and skip using Python objects when iterating over indices?
Of course they know that they are indexing an array, they are defined at the level of the array class/type. However, they cannot detect an iteration as opposed to a single item access.
I don't know if this efficiency problem could be important in practice, probably only practice can tell. I have no idea how many single-item indexing operations into arrays occur in my code, this is not something I worried about when writing it.
If there will be scalar and rank-0 array returning variants of indexing anyway, then I suppose that changing the index syntax to one or the other is not a big effort. So my suggestion is to make a test release and see what the reactions are.
Konrad.
Konrad Hinsen writes:
Pearu Peterson pearu@cens.ioc.ee writes:
I think it would be confusing if the result of repr would be `2' and not `array(2)' because 2 and array(2) are not equivalent in all usages but it should be clear from repr results as a first way to learn more about the objects.
I agree. There will already be some inevitable confusion with both rank-0 arrays and scalars around, with similar but not identical behaviour. Rank-0 arrays shouldn't make it worse by using camouflage.
I also agree that having repr hide the fact that it is an array is a bad thing. But do you find getting "array(2)" when indexing a single item acceptable when working interactively? I agree that displayhook is probably the best way of altering this behavior if a user desires (perhaps we will provide a module function to do so, but not enable it by default).
Could array.__getitem_ and __getslice__ detect if their argument is an array and skip using Python objects when iterating over indices?
Of course they know that they are indexing an array, they are defined at the level of the array class/type. However, they cannot detect an iteration as opposed to a single item access.
I don't know that it is desirable either to have iteration return different kinds of objects than explicit indexing, even if possible. Do we really want
for val in x: ...
to use scalars while
for i in range(len(x)): val = x[i] ...
uses rank-0 arrays? I'm inclined not to have different behavior for what seems like identical iteration.
I don't know if this efficiency problem could be important in practice, probably only practice can tell. I have no idea how many single-item indexing operations into arrays occur in my code, this is not something I worried about when writing it.
If there will be scalar and rank-0 array returning variants of indexing anyway, then I suppose that changing the index syntax to one or the other is not a big effort. So my suggestion is to make a test release and see what the reactions are.
That is our plan (hence the reference to "experimental"), but I wanted to see if there were strong feelings on this before doing so.
Perry
Two other issues come up trying to implement the "rank-0 experiment":
1. What should be the behavior of subscripting a rank-0 array?
a. Return the scalar value (what numarray does now. seems inconsistent) b. Raise an exception c. Return a copy of the rank-0 array
2. What's a decent notation for .asScalar()?
a. a[ <subscript_resulting_in_rank0> ][0] (what numarray does now) b. a[ <subscript_resulting_in_rank0> ]() (override __call__) c. a[ <subscript_resulting_in_rank0> ].asScalar()
Any strong opinions?
Todd
Todd Miller jmiller@stsci.edu writes:
Two other issues come up trying to implement the "rank-0 experiment":
- What should be the behavior of subscripting a rank-0 array?
a. Return the scalar value (what numarray does now. seems inconsistent) b. Raise an exception c. Return a copy of the rank-0 array
I am for b). A rank-0 array is not a sequence and has no elements, so indexing shouldn't be allowed.
Konrad.
On Fri, 13 Sep 2002, Todd Miller wrote:
Two other issues come up trying to implement the "rank-0 experiment":
- What should be the behavior of subscripting a rank-0 array?
a. Return the scalar value (what numarray does now. seems inconsistent)
-0.5
b. Raise an exception
+1
c. Return a copy of the rank-0 array
-1
- What's a decent notation for .asScalar()?
a. a[ <subscript_resulting_in_rank0> ][0] (what numarray does now)
+-0, just be consistent.
b. a[ <subscript_resulting_in_rank0> ]() (override __call__)
-1
c. a[ <subscript_resulting_in_rank0> ].asScalar()
+0.5
Pearu
--- Todd Miller jmiller@stsci.edu wrote:
Two other issues come up trying to implement the "rank-0 experiment":
- What should be the behavior of subscripting a rank-0 array?
a. Return the scalar value (what numarray does now. seems inconsistent)
If you think of it as "dereferencing" instead of "indexing", I think this is most consistent. It takes 3 elements to dereference a rank-3 array, so it should take 0 elements to dereference a rank-0 array.
- What's a decent notation for .asScalar()?
a. a[ <subscript_resulting_in_rank0> ][0] (what numarray does now) b. a[ <subscript_resulting_in_rank0> ]() (override __call__) c. a[ <subscript_resulting_in_rank0> ].asScalar()
Any strong opinions?
Since subscripting a rank-3 array:
a[1, 2, 3]
is very much like
a[(1, 2, 3)]
The __getitem__ receives the tuple (1, 2, 3) in both cases. (Try it!)
So that would imply subscripting (dereferencing) a rank-0 array could be:
a[] could be represented by a[()]
It's just unfortunate that Python doesn't currently recognize the a[] as a valid syntax.
Cheers, -Scott
__________________________________________________ Do you Yahoo!? Yahoo! News - Today's headlines http://news.yahoo.com
Scott Gilbert xscottg@yahoo.com writes:
a. Return the scalar value (what numarray does now. seems inconsistent)
If you think of it as "dereferencing" instead of "indexing", I think this is most consistent. It takes 3 elements to dereference a rank-3 array, so it should take 0 elements to dereference a rank-0 array.
That would be fine with me, but empty index brackets raise a SyntaxError in Python.
Konrad.
Two other issues come up trying to implement the "rank-0 experiment":
- What should be the behavior of subscripting a rank-0 array?
a. Return the scalar value (what numarray does now. seems inconsistent)
+0.25
b. Raise an exception
+1
c. Return a copy of the rank-0 array
-1
- What's a decent notation for .asScalar()?
a. a[ <subscript_resulting_in_rank0> ][0] (what numarray does now)
-1
Numeric does this now for backward compatibility. I don't think this issue was discussed at length.
b. a[ <subscript_resulting_in_rank0> ]() (override __call__)
-1
c. a[ <subscript_resulting_in_rank0> ].asScalar()
+1 just keep the current .toscalar() notation.
Also int(a) and float(a) and complex(a) would work as well.
If we just implemented Python scalars for the other precisions then much of this rank 0 array business would be solved as we could always return the Python scalar rather than an array for rank 0 arrays.
I believe Konrad made this suggestion many moons ago and I would agree with him.
-Travis Oliphant
If we just implemented Python scalars for the other precisions then much of this rank 0 array business would be solved as we could always return the Python scalar rather than an array for rank 0 arrays.
I believe Konrad made this suggestion many moons ago and I would agree with him.
-Travis Oliphant
Hi Travis,
refresh my memory about how this proposal would work. I've heard proposals to add new types to Python itself, but that seems out of the question. Are you talking about adding new scalar types as module? E.g.
x = Int8(22) arr = arange(10, typecode = Int8) arr[2]
Int8(2)
Or some other approach?
In any case, doesn't this still have the problem that Eric complained about, having to test for whether a result is an array or a scalar (which was one of the drivers to produce rank-0 results).
Thanks, Perry
[Perry wrote about Travis's proposal]
refresh my memory about how this proposal would work. I've heard proposals to add new types to Python itself, but that seems out of the question. Are you talking about adding new scalar types as module? E.g.
[snip]
In any case, doesn't this still have the problem that Eric complained about, having to test for whether a result is an array or a scalar (which was one of the drivers to produce rank-0 results).
Hi Perry,
I like Travis's proposal the best of those I've seen so far, but I don't recall the details of Eric's problem. Could you refresh us as to the basics of it?.
-tim
Hey Tim,
Here is a short summary:
Reductions and indexing return different types based on the number of dimensions of the input array:
b = sum(a) l = len(b) # or whatever
This code works happily if "a" is 2 or more dimensions, but will fail if it is 1d because the sum(a) will return a scalar in this case. To write generic code, you have to put an if/else statement in to check whether b is a scalar or an array:
b = sum(a) if type(b) is ArrayType:
... l = len(b) ... else: ... l = 1 # or whatever you need
or less verbose but still unpleasant:
b = asarray(sum(a)) l = len(b)
eric
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy- discussion-admin@lists.sourceforge.net] On Behalf Of Tim Hochberg Sent: Friday, September 13, 2002 3:18 PM To: numpy-discussion@lists.sourceforge.net Subject: Re: [Numpy-discussion] rank-0 arrays
[Perry wrote about Travis's proposal]
refresh my memory about how this proposal would work. I've heard proposals to add new types to Python itself, but that seems out of the question. Are you talking about adding new scalar types as module? E.g.
[snip]
In any case, doesn't this still have the problem that Eric
complained
about, having to test for whether a result is an array or a scalar (which was one of the drivers to produce rank-0 results).
Hi Perry,
I like Travis's proposal the best of those I've seen so far, but I
don't
recall the details of Eric's problem. Could you refresh us as to the basics of it?.
-tim
This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
This is a perfect example of why the thing is so annoying.
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of eric jones Sent: Friday, September 13, 2002 1:44 PM To: 'Tim Hochberg'; numpy-discussion@lists.sourceforge.net Subject: RE: [Numpy-discussion] rank-0 arrays
Hey Tim,
Here is a short summary:
Reductions and indexing return different types based on the number of dimensions of the input array:
b = sum(a) l = len(b) # or whatever
This code works happily if "a" is 2 or more dimensions, but will fail if it is 1d because the sum(a) will return a scalar in this case. To write generic code, you have to put an if/else statement in to check whether b is a scalar or an array:
b = sum(a) if type(b) is ArrayType:
... l = len(b) ... else: ... l = 1 # or whatever you need
or less verbose but still unpleasant:
b = asarray(sum(a)) l = len(b)
eric
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy- discussion-admin@lists.sourceforge.net] On Behalf Of Tim Hochberg Sent: Friday, September 13, 2002 3:18 PM To: numpy-discussion@lists.sourceforge.net Subject: Re: [Numpy-discussion] rank-0 arrays
[Perry wrote about Travis's proposal]
refresh my memory about how this proposal would work. I've heard proposals to add new types to Python itself, but that
seems out of
the question. Are you talking about adding new scalar types as module? E.g.
[snip]
In any case, doesn't this still have the problem that Eric
complained
about, having to test for whether a result is an array or
a scalar
(which was one of the drivers to produce rank-0 results).
Hi Perry,
I like Travis's proposal the best of those I've seen so far, but I
don't
recall the details of Eric's problem. Could you refresh us
as to the
basics of it?.
-tim
This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Thanks for the summary Eric.
However, I don't find this example particularly compelling. Probably because I don't think len(b) should work if b is rank zero (or possibly return None). Yeah, I know it's worked in NumPy for years, but that doesn't mean I have to like it. I would favor stripping out as much of the weird special casing of rank-0 arrays as possible in the transition to numarray.
In the particular example given below, isn't the return value meaningless if b is rank-0. Or rather only meaningful if "or whatever you need" happens to be one. This all smells rather arbitrary to me. However, this isn't a problem that I run into and I'll concede that it's possible that returning rank[0] arrays and then treating the rank zeros arrays almost like shape (1,) arrays most of the time may solve more problems than it causes, but I'd be interested in seeing more realistic examples. I guess I'll go poke around MA and see what I can see. Any other suggestions of what to look at?
-tim
Hey Tim,
Here is a short summary:
Reductions and indexing return different types based on the number of dimensions of the input array:
b = sum(a) l = len(b) # or whatever
This code works happily if "a" is 2 or more dimensions, but will fail if it is 1d because the sum(a) will return a scalar in this case. To write generic code, you have to put an if/else statement in to check whether b is a scalar or an array:
b = sum(a) if type(b) is ArrayType:
... l = len(b) ... else: ... l = 1 # or whatever you need
or less verbose but still unpleasant:
b = asarray(sum(a)) l = len(b)
eric
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy- discussion-admin@lists.sourceforge.net] On Behalf Of Tim Hochberg Sent: Friday, September 13, 2002 3:18 PM To: numpy-discussion@lists.sourceforge.net Subject: Re: [Numpy-discussion] rank-0 arrays
[Perry wrote about Travis's proposal]
refresh my memory about how this proposal would work. I've heard proposals to add new types to Python itself, but that seems out of the question. Are you talking about adding new scalar types as module? E.g.
[snip]
In any case, doesn't this still have the problem that Eric
complained
about, having to test for whether a result is an array or a scalar (which was one of the drivers to produce rank-0 results).
Hi Perry,
I like Travis's proposal the best of those I've seen so far, but I
don't
recall the details of Eric's problem. Could you refresh us as to the basics of it?.
-tim
This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
"eric jones" eric@enthought.com writes:
Reductions and indexing return different types based on the number of dimensions of the input array:
b = sum(a) l = len(b) # or whatever
This code works happily if "a" is 2 or more dimensions, but will fail if it is 1d because the sum(a) will return a scalar in this case. To write
And it should fail, because a rank-0 array is not a sequence, so it doesn't have a length.
But there are valid examples in which it would be nice if scalars were arrays (but probably if *all* scalars supported array operations, not just those that were generated by indexing from arrays):
- a.shape should return () for a scalar (and (len(a),) for any sequence type)
- a.astype(N.Float) should also work for scalars
Similarly, it would be nice if complex operations (real/imaginary part) would work on integers and floats.
There's one more annoying difference between scalars and arrays of any rank which I think should be removed in numarray:
3 % -2
-1
array(3) % 2
1
fmod(3, -2)
1.0
I.e. the mod operation uses fmod() for arrays, but different rules for standard Python numbers.
Konrad.
Hey Konrad,
"eric jones" eric@enthought.com writes:
Reductions and indexing return different types based on the number
of
dimensions of the input array:
b = sum(a) l = len(b) # or whatever
This code works happily if "a" is 2 or more dimensions, but will
fail if
it is 1d because the sum(a) will return a scalar in this case. To
write
And it should fail, because a rank-0 array is not a sequence, so it doesn't have a length.
I disagree. You should not have to write special code to check for a specific case. It breaks one of the beauties of Numeric -- i.e. you can write generic code that handles arrays of any size and type. Any method that works on a 1 or more d array should also work on 0d arrays. If you ask for its shape, it returns a tuple. If you ask for its size it returns its length along its "first" axis. This will always be 1. It allows for generic code.
On this note: I do not see the benefit of making a scalar type object that is separate for 0d arrays. It seems to remove instead of enhance capabilities. What does a scalar object buy that simply using 0d arrays for that purpose does not?
But there are valid examples in which it would be nice if scalars were arrays (but probably if *all* scalars supported array
operations,
not just those that were generated by indexing from arrays):
a.shape should return () for a scalar (and (len(a),) for any sequence type)
a.astype(N.Float) should also work for scalars
Similarly, it would be nice if complex operations (real/imaginary part) would work on integers and floats.
Yes, this is needed. And I think the argument for it is similar as having len() work on 0d arrays. It allows for generic code.
There's one more annoying difference between scalars and arrays of any rank which I think should be removed in numarray:
3 % -2
-1
array(3) % 2
1
fmod(3, -2)
1.0
I.e. the mod operation uses fmod() for arrays, but different rules for standard Python numbers.
I think you meant,
array(3) % -2
1
That is unfortunate. It would be nice to clean this up.
eric
And it should fail, because a rank-0 array is not a sequence, so it doesn't have a length.
I disagree. You should not have to write special code to check for a specific case. It breaks one of the beauties of Numeric -- i.e. you can
It is not a specific case, more like a specific value (for the rank). 1/a fails for a == 0, should that be changed as well?
Let's examine some equivalent code pieces:
- len(a) == a.shape[0] the second fails for rank 0, so the first one should fail as well
- for i in range(len(a)): print a[i] works for all sequences. If len(a) doesn't fail (and I assume it would then return 1), a[1] shouldn't fail either.
- len(a) == len(list(a)) for all sequences a. Should list(a) return [a] for a rank-0 array? For a scalar it fails.
Actually this might be an argument for not having rank-0 arrays at all. Arrays are multidimensional sequences, but rank-0 arrays aren't.
returns its length along its "first" axis. This will always be 1. It allows for generic code.
Then please give an example where this genericity would be useful.
On this note: I do not see the benefit of making a scalar type object that is separate for 0d arrays. It seems to remove instead of enhance capabilities. What does a scalar object buy that simply using 0d arrays for that purpose does not?
Compatibility, for example the ability to index a sequence with an element of an integer array. Also consistency with other Python sequence types. For example,
[a][0] == a
so one would expect also
array([a])[0] == a
but this would not be fully true if the left-hand side is a rank-0 array.
Konrad.
----- Original Message ----- From: "eric jones" eric@enthought.com
[Konrad]
And it should fail, because a rank-0 array is not a sequence, so it doesn't have a length.
[Eric]
I disagree. You should not have to write special code to check for a specific case. It breaks one of the beauties of Numeric -- i.e. you can write generic code that handles arrays of any size and type. Any method that works on a 1 or more d array should also work on 0d arrays. If you ask for its shape, it returns a tuple.
No problem up to here.
If you ask for its size it returns its length along its "first" axis.
Here's where we part ways. As Konrad already pointed out, it return anArray.shape[0] except in the case of zero-D arrays where the arbitrary decision was made that:
This will always be 1.
Why 1 and not 0 or -1 or 42. If you really had to return something, the thing to return would be None.
It allows for generic code.
I don't see how. Even poking around in MA didn't convince me that this would help although I didn't spend enough time with it to get a great feel for it. The one function I did come close to working all the way through looked like it would be about _half_ as long if it didn't have to support the zero-rank cruft.
On this note: I do not see the benefit of making a scalar type object that is separate for 0d arrays. It seems to remove instead of enhance capabilities. What does a scalar object buy that simply using 0d arrays for that purpose does not?
Unless somethings changed, you can't index into lists and what not with a rank-0 array, so returning ints or some subclass from integer arrays would be convenient. It would be easy to always return subclasses of int, float or complex (or object for those few who use object arrays) so that the results would always play nice with the rest of Python.
However, given that the coercion rules have changed in numarray, I don't really see the point of returning anything other than int, float of complex. However, I have no objection to allowing the creation rank-0 arrays as long as they behave consistently with other array objects.
[Konrad]
- a.astype(N.Float) should also work for scalars
Similarly, it would be nice if complex operations (real/imaginary part) would work on integers and floats.
If these are needed, and I agree they would be nice, it seems that in order to integrate well with the rest of Python we should supply:
numarray.astype(a, type) -> array or scalar of type type. numarray.imaginary(a) -> imaginary part of a numarray.imaginary(a) -> real part of a
I actually had these latter two in JNumeric back when I was working on that, so I kinda thought they were Numeric, but I musta just added them in because I liked them.
[Eric and Konrad agree that the inconsistency between Python and Numeric's mod should be cleaned up]
Me too.
-tim
Hi Perry,
I like Travis's proposal the best of those I've seen so far, but I don't recall the details of Eric's problem. Could you refresh us as to the basics of it?.
-tim
Rather than put keystrokes on Eric's fingertips :-) I think it would be best if he (or Paul Dubois, who had similar concerns) gave some examples. But I believe it concerned fairly generic Numeric routines that had expressions that might or might not result in scalars. Testing for both cases resulted in extra code (Paul said that MA had to deal with this a lot) and that having to deal with this made the code error prone.
But I'd rather they explained it since they have very specific experience with this issue. Perhaps they can give a couple common examples.
Perry
Hi Perry,
I like Travis's proposal the best of those I've seen so far, but I don't recall the details of Eric's problem. Could you refresh us as to the basics of it?.
-tim
On a different front, I'm not sure that this proposal is very useful for numarray because of the new scalar/array coercion rules that numarray has. Since the new scalar literals must still use type functions (e.g.,, Int8(2)) it doesn't help make expressions any prettier (or do I misunderstand the proposal?). In returning scalars from arrays, I don't see that converting the array type to a Python scalar type helps much. With the exception of long doubles (and the corresponding complex precision), the conversion to the python scalar doesn't lose any precision and you can restore the exact array type value by assigning that scalar value back to an array element. As mentioned, the only exception is for long doubles; for that case, a new scalar type makes sense.
Is there any other compelling reason for new scalar types?
Perry
"Perry Greenfield" perry@stsci.edu writes:
Is there any other compelling reason for new scalar types?
It solves the problem we are discussing (extracting array elements) without introducing any incompatibility with existing code that uses only the standard Python data types, and very little for code that uses Float32 etc. I'd say it's the best compromise between consistency and backwards compatibility.
Konrad.
Hi Travis,
refresh my memory about how this proposal would work. I've heard proposals to add new types to Python itself, but that seems out of the question. Are you talking about adding new scalar types as module? E.g.
x = Int8(22) arr = arange(10, typecode = Int8) arr[2]
Int8(2)
Or some other approach?
This is the gist of it. Basically you extend the Python scalars to include the single precision types (all the types Numeric supports).
Ultimately, in Python it would be nice if all of the scalars had the same base class.
In any case, doesn't this still have the problem that Eric complained about, having to test for whether a result is an array or a scalar (which was one of the drivers to produce rank-0 results).
I would have to understand his reason better. He could be right. My reason for rank-0 results as always been more of a type issue. I'll have to ask him.
Now I remmber an issue that makes me question the proposal:
Currently the length of a rank-0 array is 1, while the length of a scalar raises an error --- this is a bad difference.
We could still implement rank-0 arrays as a separate (optimized) type though (so it doesn't carry around the extra baggage of full-rank arrays).
Now-wanting-rank-0-arrays-all-the-time,
-Travis
Travis Oliphant wrote:
This is the gist of it. Basically you extend the Python scalars to include the single precision types (all the types Numeric supports).
Would they be recognised as scalars by Python? In particular, could you use one as an index? Personally, this is what has bit me in the past: I could use A[3,2] as an index if A was type "Int" but not if it was "Int16" for example.
In any case, the type of A[3,2] should NOT depend on the precision of the numbers stored in A.
Frankly, I have no idea what the implimentation details would be, but could we get rid of rank-0 arrays altogether? I have always simply found them strange and confusing... What are they really neccesary for (besides holding scalar values of different precision that standard Pyton scalars)?
-Chris
"Chris Barker" Chris.Barker@noaa.gov writes:
Travis Oliphant wrote:
This is the gist of it. Basically you extend the Python scalars to include the single precision types (all the types Numeric supports).
Would they be recognised as scalars by Python? In particular, could you
There is no "scalar" category in Python. New scalar datatypes would be types with the same behaviour as the existing Python scalar types, but different from the. That means that explicitly type-checking code would not accept them, but everything else would.
use one as an index? Personally, this is what has bit me in the past: I
At least up to Python 1.5, no, indices have to be of integer type. I don't know if that condition was extended in later versions.
could use A[3,2] as an index if A was type "Int" but not if it was "Int16" for example.
Ehmmm... Are you sure that is the right example? The restriction is on the type of the index, not on the type of the array.
Frankly, I have no idea what the implimentation details would be, but could we get rid of rank-0 arrays altogether? I have always simply found
If we introduce additional scalars, yes. Whether or not we would want to get rid of them is of course another question.
Konrad.
From: "Konrad Hinsen" hinsen@cnrs-orleans.fr
Travis Oliphant wrote: Would they be recognised as scalars by Python? In particular, could you
There is no "scalar" category in Python. New scalar datatypes would be types with the same behaviour as the existing Python scalar types, but different from the. That means that explicitly type-checking code would not accept them, but everything else would.
use one as an index? Personally, this is what has bit me in the past: I
At least up to Python 1.5, no, indices have to be of integer type. I don't know if that condition was extended in later versions.
In python 2.2 (I'm not sure about 2.0,2.1), you can subclass int and the resultant class can be used as an index. So I think this could be done.
Frankly, I have no idea what the implimentation details would be, but could we get rid of rank-0 arrays altogether? I have always simply found
If we introduce additional scalars, yes.
Given that numarray has changed its coercion rules, is this still necessary?
----------start quote---------- ************* Type Coercion *************
In expressions involving only arrays, the normal coercion rules apply (i.e., the same as Numeric). However, the same rules do not apply to binary operations between arrays and Python scalars in certain cases. If the kind of number is the same for the array and scalar (e.g., both are integer types or both are float types), then the type of the output is determined by the precision of the array, not the scalar. Some examples will best illustrate:
Scalar type * Array type Numeric result type numarray result type Int Int16 Int32 Int16 Int Int8 Int32 Int8 Float Int8 Float64 Float64 Float Float32 Float64 Float32
The change in the rules was made so that it would be easy to preserve the precision of arrays in expressions involving scalars. Previous solutions with Numeric were either quite awkward (using a function to create a rank-0 array from a scalar with the desired type) or surprising (the savespace attribute, that never allowed type coercion). The problem arises because Python has a limited selection of scalar types. This appears to be the best solution though it admittedy may surprise some who are used to the classical type coercion model.
----------end quote----------
Whether or not we would want to get rid of them is of course another question.
Agreed. It's possible that rank zero arrays would be useful in certain unusual situations, although I can't think of any good examples. Regardless of that choice, if indexing never returns rank-0 arrays most people will never have to deal with them.
-tim
Konrad Hinsen writes:
"Chris Barker" Chris.Barker@noaa.gov writes:
use one as an index? Personally, this is what has bit me in the past: I
At least up to Python 1.5, no, indices have to be of integer type. I don't know if that condition was extended in later versions.
could use A[3,2] as an index if A was type "Int" but not if it was "Int16" for example.
Ehmmm... Are you sure that is the right example? The restriction is on the type of the index, not on the type of the array.
I think Chris is referring to the fact that Numeric returns a rank-0 array for A[3,2] if A is of type Int16, and that value cannot be used as in index in Python sequences (at least for now; nothing technical prevents it from being implemented to accept object that have an __int__ method). This is the odd inconsistency Numeric has now. If A were a 1-d Int16 array then A[2] would not be a rank-0 array, nor is A[3,2] a rank-0 array if A is of type Int32 (apparently because a Python scalar type suffices). Why it gives rank-0 for 2-d and not 1-d I have no idea.
Travis Oliphant wrote:
This is the gist of it. Basically you extend the Python scalars to include the single precision types (all the types Numeric supports).
Would they be recognised as scalars by Python? In particular, could you use one as an index? Personally, this is what has bit me in the past: I could use A[3,2] as an index if A was type "Int" but not if it was "Int16" for example.
In any case, the type of A[3,2] should NOT depend on the precision of the numbers stored in A.
Frankly, I have no idea what the implimentation details would be, but could we get rid of rank-0 arrays altogether? I have always simply found them strange and confusing... What are they really neccesary for (besides holding scalar values of different precision that standard Pyton scalars)?
With new coercion rules this becomes a possibility. Arguments against it are that special rank-0 arrays behave as more consistent numbers with the rest of Numeric than Python scalars. In other words they have a length and a shape and one can right N-dimensional code that works the same even when the result is a scalar.
Another advantage of having a Numeric scalar is that we can control the behavior of floating point operations better.
e.g.
if only Python scalars were available and sum(a) returned 0, then
1 / sum(a) would behave as Python behaves (always raises error).
while with our own scalars
1 / sum(a) could potentially behave however the user wanted.
-Travis
On Mon, 16 Sep 2002, Travis Oliphant wrote:
Frankly, I have no idea what the implimentation details would be, but could we get rid of rank-0 arrays altogether? I have always simply found them strange and confusing... What are they really neccesary for (besides holding scalar values of different precision that standard Pyton scalars)?
With new coercion rules this becomes a possibility. Arguments against it are that special rank-0 arrays behave as more consistent numbers with the rest of Numeric than Python scalars. In other words they have a length and a shape and one can right N-dimensional code that works the same even when the result is a scalar.
In addition, rank-0 arrays are mutable while Python scalars are not. Mutability is sometimes useful (e.g. when emulating C or Fortran calls in Python) but often also evil due to its unpythonic side effect.
Pearu
Travis Oliphant oliphant@ee.byu.edu writes:
If we just implemented Python scalars for the other precisions then much of this rank 0 array business would be solved as we could always return the Python scalar rather than an array for rank 0 arrays.
I believe Konrad made this suggestion many moons ago and I would agree
Right. I still think this is the best solution in that it is the least confusing. If I remember correctly, the main objection was that implementing all those types was not as trivial as hoped, but I forgot the details (and it wasn't me who tried).
Konrad.
To restate the issue: there was a question about whether an index to an array that identified a single element only (i.e., not a slice, nor an incomplete index, e.g. x[3] where x is two dimensional) should return a Python scalar or a rank-0 array. Currently Numeric is inconsistent on this point. One usually gets scalars, but on some occasions, rank-0 arrays are returned. Good arguments are to be had for either alternative.
I think we should implement Python scalars for the other types and then eliminate rank-0 arrays completely. I could have a student do this in a few weeks if it was agreed on.
-Travis Oliphant