I'm glad to get the feedback. 1) Types I like Francesc's suggestion that .typecode return a code and .type return a Python class. What is the attitude and opinion regarding the use of attributes or methods for this kind of thing? It always seems to me so arbitrary as to what is an attribute or what is a method. There will definitely be support for the nummary-style type specification. Something like that will be how they print (I like the 'i4', 'f4', specification a bit better though). There will also be support for specification in terms of a c-type. The typecodes will still be there, underneath. One thing has always bothered me though. Why is a double complex type Complex64? and a float complex type Complex32. This seems to break the idea that the number at the end specifies a bit width. Why don't we just call it Complex64 and Complex128? Can we change this? I'm also glad that some recognize the problems with always requiring specification of types in terms of bit-width or byte-widths as these are not the same across platforms. For some types (like Int8 or Int16) this is not a problem. But what about long double? On an intel machine long double is Float96 while on a PowerPC it is Float128. Wouldn't it just be easier to specify LDouble or 'g' then special-case your code? Problems also exist when you are interfacing with hardware or other C or Fortran code. You know you want single-precision floating point. You don't know or care what the bit-width is. I think with the Integer types the bit-width specification is more important than floating point types. In sum, I think it is important to have the ability to specify it both ways. When printing the array, it's probably better if it gives bit-width information. I like the way numarray prints arrays. 2) Multidimensional array indexing. Sometimes it is useful to select out of an array some elements based on it's linear (flattened) index in the array. MATLAB, for example, will allow you to take a three-dimensional array and index it with a single integer based on it's Fortran-order: x(1,1,1), x(2,1,1), ... What I'm proposing would have X[K] essentially equivalent to X.flat[K]. The problem with always requiring the use of X.flat[K] is that X.flat does not work for discontiguous arrays. It could be made to work if X.flat returned some kind of specially-marked array, which would then have to be checked every time indexing occurred for any array. Or, there maybe someway to have X.flat return an "indexable iterator" for X which may be a more Pythonic thing to do anyway. That could solve the problem and solve the discontiguous X.flat problem as well. If we can make X.flat[K] work for discontiguous arrays, then I would be very happy to not special-case the single index array but always treat it as a 1-tuple of integer index arrays. Capping indexes was proposed because of what numarray does. I can only think that the benefit would be that you don't have to check for and raise an error in the middle of an indexing loop or pre-scan the indexes. But, I suppose this is unavoidalbe, anyway. Currently Numeric allows specifying indexes that are too high in slices. It just chops them. Python allows this too, for slices. So, I guess I'm just specifying Python behavior. Of course indexing with an integer that is too large or too small will raise errors: In Python: a = [1,2,3,4,5] a[:20] works a[20] raises an error. 3) Always returning rank-0 arrays. This may be a bit controversial as it is a bit of a change. But, my experience is that quite a bit of extra code is written to check whether or not a calculation returns a Python-scalar (because these don't have the same methods as arrays). In particular len(a) does not work if a is a scalar, but len(b) works if b is a rank-0 array (numeric scalar). Rank-0 arrays are scalars. When Python needs a scalar it will generally ask the object if it can turn itself into an int or a float. A notable exception is indexing in a list (where Python needs an integer and won't ask the object to convert if it can). But int(b) always returns a Python integer if the array has only 1 element. I'd like to know what reasons people can think of for ever returning Python scalars unless explicitly asked for. Thanks for the suggestions. -Travis
Just a small comment: Travis Oliphant wrote:
Capping indexes was proposed because of what numarray does. I can only think that the benefit would be that you don't have to check for and raise an error in the middle of an indexing loop or pre-scan the indexes. But, I suppose this is unavoidalbe, anyway. Currently Numeric allows specifying indexes that are too high in slices. It just chops them. Python allows this too, for slices. So, I guess I'm just specifying Python behavior. Of course indexing with an integer that is too large or too small will raise errors:
In Python:
a = [1,2,3,4,5] a[:20] works a[20] raises an error.
This feature is extremely useful. Just yesterday, I needed some code to check whether the first character in a (potentially empty) string was one of a certain list. I couldn't use .startswith(), because I'd have to call it for each test, and I didn't feel like writing a regexp (since I abandoned Perl coding, I've mostly forgotten them and I need to look up the syntax every time). The danger with: if mystr[0] in ('a','b','c'): ... is that if mystr is empty, this blows up. Thanks to Python's acceptance of invalid indices in slices, I used if mystr[0:1] in ('a','b','c'): ... This is exactly identical to the first case, except that if the string is empty, mystr[0:1] returns ''. No need for extra checks, no need for a try/except block, nothing. Clean, elegant and to the point. Given that python supports these semantics for all their sliceable containers, I'd very much vote for numerix doing the same. There are cases where it makes sense for numerix arrays to have different semantics from python lists. The slice-as-view is probably the best example. But unless there is a strong reason to do so, I think numerix arrays should not deviate from python sequence behavior. Regards, f
On Thursday 17 February 2005 02:11 pm, Fernando Perez wrote:
The danger with:
if mystr[0] in ('a','b','c'): ...
is that if mystr is empty, this blows up. Thanks to Python's acceptance of invalid indices in slices, I used
if mystr[0:1] in ('a','b','c'): ...
Has Numeric or numarray considered allowing assignment to invalid indices? I sometimes miss being able to do this: a=zeros(2) a[1:0]=1 in order to get this: [[0,0], [1,0]] I guess it is not consistent with the list operations in python, but it is a really handy shortcut for constructing arrays interactively. Darren
Darren Dale wrote:
Has Numeric or numarray considered allowing assignment to invalid indices? I sometimes miss being able to do this: a=zeros(2) a[1:0]=1
in order to get this: [[0,0], [1,0]]
I guess it is not consistent with the list operations in python, but it is a really handy shortcut for constructing arrays interactively.
I'm not sure what you meant by this. Did you mean being able to expand an array only by assignment? i.e. a = zeros(2) a (2,1) array a[1,0] = 1 (resizes a behind the scenes to a 2x2 array and then sets the 1,0 element)? MATLAB does display this behavior. I'm not sure how difficult it would be to support it or if it would be worth it or not. What does IDL do? -Travis
Travis Oliphant wrote:
I'm not sure what you meant by this. Did you mean being able to expand an array only by assignment?
i.e.
a = zeros(2) a (2,1) array
a[1,0] = 1 (resizes a behind the scenes to a 2x2 array and then sets the 1,0 element)?
Mmmh. I'm not sure I like the idea of an assignment triggering a silent resize/reshape event. Explicit is better than implicit and all that... I could see this kind of magical behavior easily causing silent, extremely hard to find bugs in a big program. I may be missing something, but I'd be -1 on this. The 'invalid indices in slices' is basically just sytnactic sugar for a try/except block, and it's well-documented behavior in the base language, across all its sequence types: In [2]: ll=[] In [3]: tt=() In [4]: ss='' In [5]: ll[0:1] Out[5]: [] In [6]: tt[0:1] Out[6]: () In [7]: ss[0:1] Out[7]: '' So in my view at least, this behavior of python isn't a good justification for a silent resize/reshape (which could, I'm sure, be also potentially explosive memory-wise) in numerix arrays. Regards, f
Travis Oliphant wrote:
2) Multidimensional array indexing.
Sometimes it is useful to select out of an array some elements based on it's linear (flattened) index in the array. MATLAB, for example, will allow you to take a three-dimensional array and index it with a single integer based on it's Fortran-order: x(1,1,1), x(2,1,1), ...
What I'm proposing would have X[K] essentially equivalent to X.flat[K].
Maybe I'm missing something, but in Numeric and Numarray right now
from Numeric import * a = reshape(arange(9),(3,3)) print a [[0 1 2] [3 4 5] [6 7 8]] a[1] array([3, 4, 5]) a.flat[1] 1
so a[K] and a.flat[K] are very different things for multidimensional arrays. I think it would be a bad idea to change this now - it would certainly break a lot of my code. Or are you only talking about the case when K is a rank-1 index array and not a scalar? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jeffrey.S.Whitaker@noaa.gov 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg
Jeff Whitaker wrote:
from Numeric import * a = reshape(arange(9),(3,3)) print a [[0 1 2] [3 4 5] [6 7 8]] a[1] array([3, 4, 5]) a.flat[1] 1
so a[K] and a.flat[K] are very different things for multidimensional arrays. I think it would be a bad idea to change this now - it would certainly break a lot of my code.
Or are you only talking about the case when K is a rank-1 index array and not a scalar?
a[scalar] will behave exactly the same as now. We are just talking about the case where K is an index array (I should clarify that it must be rank-1 or bigger so that K being a rank-0 array wold behave just like the a[scalar] case). This is basically new behavior that numarray has started supporting. I just think numarray missed an important case of flattened indexing that MATLAB supports. My current proposal would distinguish between single-index array cases and tuple-index array cases. I'm still thinking about the X.flat possibility. Basically, I think that direction would requires a new "generic array view" or something like that. It may be worth trying, but I'm not sure I want to go that direction right now until I convince more people to come on board with Numeric3. -Travis
On Feb 17, 2005, at 3:25 PM, Travis Oliphant wrote:
This is basically new behavior that numarray has started supporting. I just think numarray missed an important case of flattened indexing that MATLAB supports. My current proposal would distinguish between single-index array cases and tuple-index array cases. I'm still thinking about the X.flat possibility. Basically, I think that direction would requires a new "generic array view" or something like that. It may be worth trying, but I'm not sure I want to go that direction right now until I convince more people to come on board with Numeric3.
It was new behavior in the sense that Numeric didn't support multidimensional array takes and puts at the time. For a long time it was the only kind of array indexing IDL supported (1-d and implicit .flat of multidimensional arrays). Speaking only for myself, I found the .flat semantics more often as unwanted behavior than convenient. I'd rather keep the numarray behavior in this case and make the .flat case explicit (but I understand the difficulty of that approach). There is the possibility of a custom index option (but ugly I suppose) X[K, flatten] where flatten is a special object that indexing recognizes as signaling a different interpretation to indexing. Perry
Perry Greenfield wrote:
On Feb 17, 2005, at 3:25 PM, Travis Oliphant wrote:
This is basically new behavior that numarray has started supporting. I just think numarray missed an important case of flattened indexing that MATLAB supports. My current proposal would distinguish between single-index array cases and tuple-index array cases. I'm still thinking about the X.flat possibility. Basically, I think that direction would requires a new "generic array view" or something like that. It may be worth trying, but I'm not sure I want to go that direction right now until I convince more people to come on board with Numeric3.
It was new behavior in the sense that Numeric didn't support multidimensional array takes and puts at the time. For a long time it was the only kind of array indexing IDL supported (1-d and implicit .flat of multidimensional arrays). Speaking only for myself, I found the .flat semantics more often as unwanted behavior than convenient. I'd rather keep the numarray behavior in this case and make the .flat case explicit (but I understand the difficulty of that approach). There is the possibility of a custom index option (but ugly I suppose)
X[K, flatten]
where flatten is a special object that indexing recognizes as signaling a different interpretation to indexing.
+1 on Travis's suggestion of X.flat[K] behavior, since I think it is explicit and intuitive. - 1 on X[K. flatten] I think in the long term that the X.flat[K] proposal should be pursued, even for non-contiguous arrays. Since we have decided that slice-as-view behavior is the default, then I believe the user should not have to worry whether the internal structure of an array is continguous or not. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218
Travis Oliphant <oliphant@ee.byu.edu> writes:
Currently Numeric allows specifying indexes that are too high in slices. It just chops them. Python allows this too, for slices.
Yes, since foo[:] just means foo.__getitem__(slice(sys.maxint)). Slices even have a nice method for normalizing the indices.
So, I guess I'm just specifying Python behavior.
Not really: your specification causes an element with a different index to be returned, whereas the usual slice behaviour only causes out-of-range indices to be omitted from the result.
3) Always returning rank-0 arrays.
This may be a bit controversial as it is a bit of a change.
Indeed. So you really do intend that if foo=array([1,2]), foo[0] should evaluate to array(1) rather than 1?
But, my experience is that quite a bit of extra code is written to check whether or not a calculation returns a Python-scalar
I suppose this may be necessary for code which operates on arrays of somewhat arbitrary rank and would not know without looking whether, e.g., foo[0] is a scalar or an array of positive rank.
In particular len(a) does not work if a is a scalar,
Depends on what kinds of scalars are supported. What about object arrays?
but len(b) works if b is a rank-0 array
It does? In Numarray len(b) raises ValueError and size(b) returns 1. To me this would seem the correct behaviour.
When Python needs a scalar it will generally ask the object if it can turn itself into an int or a float.
Hence this change might not be as incompatible as it seems, although users of object arrays would be in for some surprises.
I'd like to know what reasons people can think of for ever returning Python scalars unless explicitly asked for.
It would be more consistent with the usual container semantics and less likely to break existing code. -- Timo Korvola <URL:http://www.iki.fi/tkorvola>
Here are a couple of issues that are important to me that might be relevant to the design discussion. I have attached some code that illustrates part of the pain we have experienced developing libraries of algorithms that can handle both arrays and scalars. The attached library is the reusable part. The other part of this problem is that we have lots of logic sprinkled throughout our algorithms to enable them to handle both arrays and scalars. Secondly, I have just been bitten by this declaration which suggests that the new Numeric might handle default values better:. _vp_mod = zeros(num_pts) It would be less surprising to someone developing numeric algorithms if functions like this defaulted to creating a double precision array rather than integers. Regards, Duncan
3) Always returning rank-0 arrays.
This may be a bit controversial as it is a bit of a change.
Indeed. So you really do intend that if foo=array([1,2]), foo[0] should evaluate to array(1) rather than 1?
import scipy from scipy import take, amin, amax, arange, asarray, PyObject, mean, \ product, shape, array, Float64, nonzero """ The following safe_ methods were written to handle both arrays amd scalars to save the developer of numerical methods having to clutter their code with tests to determine the type of the data. """ def safe_take(a,indices): # Slice the input if it is an array but not if it is a scalar try: a = take(a,indices) except ValueError: # a is scalar pass return a def safe_copy(a): # Return a copy for both scalar and array input try: b = a.copy() except AttributeError: # a is a scalar b = a return b # Note: if x is a scalar and y = asarray(x), amin(y) FAILS but min(y) works # Note: BUT IF z=convert(y,frac,frac), THEN min(z) FAILS!!! def safe_min(a): # Return the minimum of the input array or the input if it is a scalar try: safemin = amin(a) except: safemin = a return safemin def safe_max(a): # Return the maximum of the input array or the input if it is a scalar try: safemax = amax(a) except: safemax = a return safemax def safe_mean(a): # Return the mean of the input array or the input if it is a scalar try: safemean = mean(a) except: safemean = a return safemean def safe_len(a): # Return the length of the input array or 1 if it is a scalar try: safelen = len(a) except: safelen = 1 return safelen def safe_flat(a): # Return a flat version of the input array or input if it is a scalar try: safeflat = a.flat except: safeflat = a return safeflat
Duncan Child wrote:
# Note: if x is a scalar and y = asarray(x), amin(y) FAILS but min(y) works # Note: BUT IF z=convert(y,frac,frac), THEN min(z) FAILS!!
My quick check shows that amin(y), min(y), and amin(x) all work after "from scipy import *". Only min(x) doesn't, and it is a Python built-in which throws an "iteration over non-sequence" exception when called with a scalar. Should Numeric 3 bump amin and similar functions up into itself? How many others need a min-like function which works with both arrays and scalars? In numarray, min is a method. Is this better?
On Feb 17, 2005, at 23:22, Duncan Child wrote:
I have attached some code that illustrates part of the pain we have experienced developing libraries of algorithms that can handle both arrays and scalars. The attached library is the reusable part. The other part of this problem is that we have lots of logic sprinkled throughout our algorithms to enable them to handle both arrays and scalars.
See comments below...
Secondly, I have just been bitten by this declaration which suggests that the new Numeric might handle default values better:.
_vp_mod = zeros(num_pts)
It would be less surprising to someone developing numeric algorithms if functions like this defaulted to creating a double precision array rather than integers.
If you want a function that returns float arrays, it is trivial to write and adds negligible overhead: def float_array(array_spec): return Numeric.array(array_spec, Numeric.Float) No need to interfere with Numeric's principle of "smallest usable type", which fits well into the Python type promotion hierarchy. More generally, I don't think defaults should be chosen with a particular application in mind. Arrays are a general and widely useful datatype in many domains. I use integer arrays as much as float arrays, even though my applications qualify as "numeric".
""" The following safe_ methods were written to handle both arrays amd scalars to save the developer of numerical methods having to clutter their code with tests to determine the type of the data. """
def safe_take(a,indices): # Slice the input if it is an array but not if it is a scalar
This is a very bad example. Your function does not interpret scalars as rank-0 arrays (for which take() would fail), but as something completely different.
def safe_copy(a): # Return a copy for both scalar and array input
That is a semantically reasonable application, but also one for which a simple and standard solution already exists: copy.copy().
def safe_min(a): # Return the minimum of the input array or the input if it is a scalar
I would argue that this is not a good example either, as "minimum over an array" implies a reduction operation which is not defined for a scalar. On the other hand, the operation you define certainly makes sense.
def safe_len(a): # Return the length of the input array or 1 if it is a scalar
That implies that a scalar is somehow equivalent to a rank-1 array of length 1, which is not the case. Actually, all of your examples look like an attempt to recreate Matlab behaviour. But Python is not Matlab! Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire Léon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen@llb.saclay.cea.fr ---------------------------------------------------------------------
konrad.hinsen@laposte.net wrote:
On Feb 17, 2005, at 23:22, Duncan Child wrote:
I have attached some code that illustrates part of the pain we have experienced developing libraries of algorithms that can handle both arrays and scalars. The attached library is the reusable part. The other part of this problem is that we have lots of logic sprinkled throughout our algorithms to enable them to handle both arrays and scalars.
See comments below...
Secondly, I have just been bitten by this declaration which suggests that the new Numeric might handle default values better:.
_vp_mod = zeros(num_pts)
It would be less surprising to someone developing numeric algorithms if functions like this defaulted to creating a double precision array rather than integers.
If you want a function that returns float arrays, it is trivial to write and adds negligible overhead:
def float_array(array_spec): return Numeric.array(array_spec, Numeric.Float)
No need to interfere with Numeric's principle of "smallest usable type", which fits well into the Python type promotion hierarchy.
More generally, I don't think defaults should be chosen with a particular application in mind. Arrays are a general and widely useful datatype in many domains. I use integer arrays as much as float arrays, even though my applications qualify as "numeric".
What I meant was that accidentally ommitting the na.Float in the declaration below introduced a hard to find bug in my code. _vp_mod = na.zeros(num_pts, na.Float) I had not heard of Numeric's "smallest usable type" principle. Even so, I would argue that for doing signal processing the smallest usable type is floating point :-)
snipped >>
Actually, all of your examples look like an attempt to recreate Matlab behaviour. But Python is not Matlab!
Good point, and this code was actually written by developers who were porting libraries of Matlab code. I thought the examples illustrated a more general problem that was created by Numeric handling scalars differently to arrays. In another post you said: "... as the goal is inclusion into the Python core .... I propose that the PEP should include unification of scalars and arrays such that for all practical purposes scalars *are* rank-0 arrays. " So I think we are in broad agreement. Regards Duncan
Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire Léon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen@llb.saclay.cea.fr ---------------------------------------------------------------------
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Duncan Child wrote:
In another post [Konrad Hinsen] said:
"... as the goal is inclusion into the Python core .... I propose that the PEP should include unification of scalars and arrays such that for all practical purposes scalars *are* rank-0 arrays. "
So I think we are in broad agreement.
It seems to me that actually getting the above behavior is going to require Guido van Rossum et al. to change how scalars behave in the Python core. As I said, right now min(scalar) in stock Python raises a TypeError on the grounds that scalars can't be iterated over.
Stephen Walton <stephen.walton@csun.edu> writes:
As I said, right now min(scalar) in stock Python raises a TypeError on the grounds that scalars can't be iterated over.
This is not very different from Numarray, where min(rank_0_array) raises ValueError because min is applied to an empty sequence. Iteration normally applies to the first dimenson but rank 0 arrays have no dimensions. -- Timo Korvola <URL:http://www.iki.fi/tkorvola>
On 18.02.2005, at 23:02, Stephen Walton wrote:
Duncan Child wrote:
In another post [Konrad Hinsen] said:
"... as the goal is inclusion into the Python core .... I propose that the PEP should include unification of scalars and arrays such that for all practical purposes scalars *are* rank-0 arrays. "
So I think we are in broad agreement.
It seems to me that actually getting the above behavior is going to require Guido van Rossum et al. to change how scalars behave in the Python core. As I said, right now min(scalar) in stock Python raises a TypeError on the grounds that scalars can't be iterated over.
That's fine - scalars (like rank-0 arrays) cannot be iterated over. However, there would indeed have to be changes to scalars. For example, 1.shape would have to return (). Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
3) Always returning rank-0 arrays.
This may be a bit controversial as it is a bit of a change.
Indeed. So you really do intend that if foo=array([1,2]), foo[0] should evaluate to array(1) rather than 1?
But, my experience is that quite a bit of extra code is written to check whether or not a calculation returns a Python-scalar
I suppose this may be necessary for code which operates on arrays of somewhat arbitrary rank and would not know without looking whether, e.g., foo[0] is a scalar or an array of positive rank.
In particular len(a) does not work if a is a scalar,
Depends on what kinds of scalars are supported. What about object arrays?
but len(b) works if b is a rank-0 array
How about proposing a PEP to extend Python's scalar behavior, so the len(a) works for either scalars or arrays. Though I haven't thought
Timo Korvola wrote: [snip, snip] this through in great detail, it would appear to be a transparent addition for most Python users who would never use this. At the same time, we should also consider other behavior that would unify (or smear) the behavior of Python scalars and arrays. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218
Paul Barrett wrote:
How about proposing a PEP to extend Python's scalar behavior, so the len(a) works for either scalars or arrays. Though I haven't thought this through in great detail, it would appear to be a transparent addition for most Python users who would never use this. At the same time, we should also consider other behavior that would unify (or smear) the behavior of Python scalars and arrays.
I'm sure there are any number of people who use len(x) as a way to test the sequenceness of x. While it might be okay for rank-0 arrays, extending this to builtin ints and floats may not be a good idea. I'm pretty sure this would get rejected. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
On Feb 18, 2005, at 16:23, Robert Kern wrote:
I'm sure there are any number of people who use len(x) as a way to test the sequenceness of x. While it might be okay for rank-0 arrays, extending this to builtin ints and floats may not be a good idea. I'm pretty sure this would get rejected.
For arrays, len(x) == x.shape[0], so len(x) should fail for rank-0 arrays anyway. As it does in Numeric. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire Léon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen@llb.saclay.cea.fr ---------------------------------------------------------------------
konrad.hinsen@laposte.net wrote:
On Feb 18, 2005, at 16:23, Robert Kern wrote:
I'm sure there are any number of people who use len(x) as a way to test the sequenceness of x. While it might be okay for rank-0 arrays, extending this to builtin ints and floats may not be a good idea. I'm pretty sure this would get rejected.
For arrays, len(x) == x.shape[0], so len(x) should fail for rank-0 arrays anyway. As it does in Numeric.
I'm not averse to len(x) returning 0 when given a rank-0 array. I see it as giving up one consistency (that scalar-like objects don't have lengths) for another (arrays having a common set of operations that one can expect regardless of rank or shape). My objection was to extending that set of operations to other standard objects where they make less sense. Although the len(x) sequenceness-test is a reasonably common idiom, it's not expected to be foolproof against any input. However, the test shouldn't stop working on core objects that are already there. In any case, len(x) is probably one of the less-common operations one would want to perform seamlessly on scalar and rank-n outputs from Numeric functions. x.typecode() and x.shape would probably top my list. Might our PEP efforts be better spent locating and fixing the places in core Python where rank-0 arrays won't be accepted as core ints and floats? List/tuple indexing is one. Extension code that demands an actual int object where it could be looser might already be considered deficient considering the int/long unification. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Robert Kern <rkern@ucsd.edu> writes:
I'm not averse to len(x) returning 0 when given a rank-0 array. I see it as giving up one consistency (that scalar-like objects don't have lengths) for another (arrays having a common set of operations that one can expect regardless of rank or shape).
That would be akin to making 0/0 return 0. It is possible to create arrays of length zero, e.g, by indexing with an empty slice, and these should not be confused with rank zero arrays.
Might our PEP efforts be better spent locating and fixing the places in core Python where rank-0 arrays won't be accepted as core ints and floats?
Sounds useful.
List/tuple indexing is one.
Rank 0 arrays should also be kept in mind when defining array indexing in the PEP. -- Timo Korvola <URL:http://www.iki.fi/tkorvola>
Timo Korvola wrote:
Robert Kern <rkern@ucsd.edu> writes:
I'm not averse to len(x) returning 0 when given a rank-0 array. I see it as giving up one consistency (that scalar-like objects don't have lengths) for another (arrays having a common set of operations that one can expect regardless of rank or shape).
That would be akin to making 0/0 return 0. It is possible to create arrays of length zero, e.g, by indexing with an empty slice, and these should not be confused with rank zero arrays.
Fair enough. I'm not averse to len(x) raising an exception when x is a rank-0 array either. :-)
Might our PEP efforts be better spent locating and fixing the places in core Python where rank-0 arrays won't be accepted as core ints and floats?
Sounds useful.
List/tuple indexing is one.
Rank 0 arrays should also be kept in mind when defining array indexing in the PEP.
Hopefully, this would drop out cleanly from the regular array-indexing and not require too much in the way of special-casing. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Robert Kern wrote:
Timo Korvola wrote:
Robert Kern <rkern@ucsd.edu> writes:
I'm not averse to len(x) returning 0 when given a rank-0 array. I see it as giving up one consistency (that scalar-like objects don't have lengths) for another (arrays having a common set of operations that one can expect regardless of rank or shape).
That would be akin to making 0/0 return 0. It is possible to create arrays of length zero, e.g, by indexing with an empty slice, and these should not be confused with rank zero arrays.
Fair enough. I'm not averse to len(x) raising an exception when x is a rank-0 array either. :-)
Or it could return 'None', i.e. undefined. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218
I don't think I can help with the discussion, but I would be interested to read this PEP - where is it? TIA, Thomas
On Feb 17, 2005, at 1:52 PM, Travis Oliphant wrote:
I'm glad to get the feedback.
1) Types
[...]
One thing has always bothered me though. Why is a double complex type Complex64? and a float complex type Complex32. This seems to break the idea that the number at the end specifies a bit width. Why don't we just call it Complex64 and Complex128? Can we change this?
My recollection is that is how we originally did it until we saw how it was done in Numeric, but maybe my memory is screwed up. I'm happy with real bit widths.
Problems also exist when you are interfacing with hardware or other C or Fortran code. You know you want single-precision floating point. You don't know or care what the bit-width is. I think with the Integer types the bit-width specification is more important than floating point types. In sum, I think it is important to have the ability to specify it both ways.
I'd agree that supporting both is ideal. Sometimes you really want a specific bit-width and don't care what the platform default is, and others you are tied to the platform default for one reason or another.
3) Always returning rank-0 arrays.
This may be a bit controversial as it is a bit of a change. But, my experience is that quite a bit of extra code is written to check whether or not a calculation returns a Python-scalar (because these don't have the same methods as arrays). In particular len(a) does not work if a is a scalar, but len(b) works if b is a rank-0 array (numeric scalar). Rank-0 arrays are scalars. When Python needs a scalar it will generally ask the object if it can turn itself into an int or a float. A notable exception is indexing in a list (where Python needs an integer and won't ask the object to convert if it can). But int(b) always returns a Python integer if the array has only 1 element. I'd like to know what reasons people can think of for ever returning Python scalars unless explicitly asked for.
I'm not sure this is an important issue for us (either way) so long as the overhead for rank-0 arrays is not much higher than for scalars (for numarray it was an issue). But there are those that argue (Konrad as an example if I remember correctly) that the definitions of rank and such mean len(rank-0) should not be 1 and that one should not be able to index rank-0 arrays. I know that the argument has been made that this helps support generic programming (not having to check between scalars and arrays), but every time I ask for specific examples I've found that there are simple alternatives to solve this problem or that type checks are still necessary because there is no control over what users may supply as arguments. If this is the reason, could it be motivated with a couple examples to show why it is the only reasonable alternative? (Then you can use it to slay all subsequent whiners). Perry
Travis Oliphant <oliphant@ee.byu.edu> writes:
I'm glad to get the feedback.
1) Types
I like Francesc's suggestion that .typecode return a code and .type return a Python class. What is the attitude and opinion regarding the use of attributes or methods for this kind of thing? It always seems to me so arbitrary as to what is an attribute or what is a method.
If it's an intrinisic attribute (heh) of the object, I usually try to make it an attribute. So I'd make these attributes.
There will definitely be support for the nummary-style type specification. Something like that will be how they print (I like the 'i4', 'f4', specification a bit better though). There will also be support for specification in terms of a c-type. The typecodes will still be there, underneath.
+1. I think labelling types with their sizes at some level is necessary for cross-platform compatibility (more below).
One thing has always bothered me though. Why is a double complex type Complex64? and a float complex type Complex32. This seems to break the idea that the number at the end specifies a bit width. Why don't we just call it Complex64 and Complex128? Can we change this?
Or rename to ComplexFloat32 and ComplexFloat64?
I'm also glad that some recognize the problems with always requiring specification of types in terms of bit-width or byte-widths as these are not the same across platforms. For some types (like Int8 or Int16) this is not a problem. But what about long double? On an intel machine long double is Float96 while on a PowerPC it is Float128. Wouldn't it just be easier to specify LDouble or 'g' then special-case your code?
One problem to consider (and where I first ran into these type of things) is when pickling. A pickle containing an array of Int isn't portable, if the two machines have a different idea of what an Int is (Int32 or Int64, for instance). Another reason to keep the byte-width. LDouble, for instance, should probably be an alias to Float96 on Intel, and Float128 on PPC, and pickle accordingly.
Problems also exist when you are interfacing with hardware or other C or Fortran code. You know you want single-precision floating point. You don't know or care what the bit-width is. I think with the Integer types the bit-width specification is more important than floating point types. In sum, I think it is important to have the ability to specify it both ways. When printing the array, it's probably better if it gives bit-width information. I like the way numarray prints arrays.
Do you mean adding bit-width info to str()? repr() definitely needs it, and it should be included in all cases, I think. You also run into that sizeof(Python integer) isn't necessarily sizeof(C int) (a Python int being a C long), espically on 64-bit systems. I come from a C background, so things like Float64, etc., look wrong. I think more in terms of single- and double-precision, so I think adding some more descriptive types: CInt (would be either Int32 or Int64, depending on the platform) CFloat (can't do Float, for backwards-compatibility reasons) CDouble (could just be Double) CLong (or Long) CLongLong (or LongLong) That could make it easier to match types in Python code to types in C extensions. Oh, and the Python types int and float should be allowed (especially if you want this to go in the core!). And a Fortran integer could be something else, but I think that's more of a SciPy problem than Numeric or numarray. It could add FInteger and FBoolean, for instance.
2) Multidimensional array indexing.
Sometimes it is useful to select out of an array some elements based on it's linear (flattened) index in the array. MATLAB, for example, will allow you to take a three-dimensional array and index it with a single integer based on it's Fortran-order: x(1,1,1), x(2,1,1), ...
What I'm proposing would have X[K] essentially equivalent to X.flat[K]. The problem with always requiring the use of X.flat[K] is that X.flat does not work for discontiguous arrays. It could be made to work if X.flat returned some kind of specially-marked array, which would then have to be checked every time indexing occurred for any array. Or, there maybe someway to have X.flat return an "indexable iterator" for X which may be a more Pythonic thing to do anyway. That could solve the problem and solve the discontiguous X.flat problem as well.
If we can make X.flat[K] work for discontiguous arrays, then I would be very happy to not special-case the single index array but always treat it as a 1-tuple of integer index arrays.
Right now, I find X.flat to be pretty useless, as you need a contiguous array. I'm +1 on making X.flat work in all cases (contiguous and discontiguous). Either a) X.flat returns a contiguous 1-dimensional array (like ravel(X)), which may be a copy of X or b) X.flat returns a "flat-indexable" view of X I'd argue for b), as I feel that attributes should operate as views, not as potential copies. To me, attributes "feel like" they do no work, so making a copy by mere dereferencing would be suprising. If a), I'd rather flat() be a method (or have a ravel() method). I think overloading X[K] starts to run into trouble: too many special cases. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
David M. Cooke wrote:
Right now, I find X.flat to be pretty useless, as you need a contiguous array. I'm +1 on making X.flat work in all cases (contiguous and discontiguous). Either
a) X.flat returns a contiguous 1-dimensional array (like ravel(X)), which may be a copy of X
or
b) X.flat returns a "flat-indexable" view of X
I'd argue for b), as I feel that attributes should operate as views, not as potential copies. To me, attributes "feel like" they do no work, so making a copy by mere dereferencing would be suprising.
+1 on b). I have quite a bit of code doing things like try: b=a.flat except: b=ravel(a) which feels silly. It would be nice for .flat to be a guaranteed, indexable, no-copy view of the original (even if it's non-contiguous). Granted, the indexing will be costlier for a non-contiguous object, but for the external users this provides a clean API. Regards, f
Fernando Perez wrote:
I have quite a bit of code doing things like
try: b=a.flat except: b=ravel(a)
which feels silly. It would be nice for .flat to be a guaranteed, indexable, no-copy view of the original (even if it's non-contiguous). Granted, the indexing will be costlier for a non-contiguous object, but for the external users this provides a clean API.
Why not just do b = ravel(a) ? That should work in both cases just fine. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Robert Kern wrote:
Why not just do
b = ravel(a)
?
That should work in both cases just fine.
Because I thouhgt that ravel would make a copy regardless. But I may be wrong, much of this is very old code, when I was just picking up Numeric. At the time, I felt it might be best to avoid the unnecessary copies if possible. Best, f
Fernando Perez wrote:
Robert Kern wrote:
Why not just do
b = ravel(a)
?
That should work in both cases just fine.
Because I thouhgt that ravel would make a copy regardless. But I may be wrong, much of this is very old code, when I was just picking up Numeric. At the time, I felt it might be best to avoid the unnecessary copies if possible.
Nope. In [1]: a = arange(8) In [2]: a.shape = (2,4) In [3]: a Out[3]: NumPy array, format: long [[0 1 2 3] [4 5 6 7]] In [4]: ravel(a)[3] = 10 In [5]: a Out[5]: NumPy array, format: long [[ 0 1 2 10] [ 4 5 6 7]] -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Robert Kern wrote:
Because I thouhgt that ravel would make a copy regardless. But I may be wrong, much of this is very old code, when I was just picking up Numeric. At the time, I felt it might be best to avoid the unnecessary copies if possible.
Nope.
[...] OK, thanks for the clarification. I guess if I had a nice, easy to use interactive shell I could have figured it out on my own. You'll have to sell me that shiny one you seem to be using :) Best, f
Robert Kern wrote:
Fernando Perez wrote:
Robert Kern wrote:
Why not just do
b = ravel(a)
?
That should work in both cases just fine.
Because I thouhgt that ravel would make a copy regardless. But I may be wrong, much of this is very old code, when I was just picking up Numeric. At the time, I felt it might be best to avoid the unnecessary copies if possible.
Nope.
In [1]: a = arange(8)
In [2]: a.shape = (2,4)
In [3]: a Out[3]: NumPy array, format: long [[0 1 2 3] [4 5 6 7]]
In [4]: ravel(a)[3] = 10
In [5]: a Out[5]: NumPy array, format: long [[ 0 1 2 10] [ 4 5 6 7]]
Ick, that's horrible. Functions that sometimes copy and sometimes don't are generally bad news IMO. This is just a way to introduce nasty, invisible bugs. The exceptions are things like asarray that are explicit about their variable behaviour. I'd be much happier if flat never made copies, but always worked by some sort of deep juju, while ravel always made copies. -tim
Ick, that's horrible. Functions that sometimes copy and sometimes don't are generally bad news IMO. This is just a way to introduce nasty, invisible bugs. The exceptions are things like asarray that are explicit about their variable behaviour.
I'd be much happier if flat never made copies, but always worked by some sort of deep juju, while ravel always made copies.
I tend to agree (though) there is precedence for "return a copy only if you have to" at least on the C-level. What I suggest is that attributes never return copies while methods return copies when necessary. In that vein, I am proposing making X.flat an array iterator and allowing array iterators to be indexed and set as if they were 1-d arrays with the underlying array being changed. This is actually an easy change with the current code base. Will it break any code? There maybe some X.flats that need to be changed to ravel. But it seems like a really good idea. -Travis
In that vein, I am proposing making X.flat an array iterator and allowing array iterators to be indexed and set as if they were 1-d arrays with the underlying array being changed. This is actually an easy change with the current code base. Will it break any code? There maybe some X.flats that need to be changed to ravel. But it seems like a really good idea.
That iterator would only work on the Python level I suppose, that would be effectively indistinguishable from an 1D array. But how would that work when passed to a function implemented in C? The function needs to know somehow that non-contiguous arrays need to be treated as 1D. That means having to code for such a special case (not a good idea), or you are back to making a copy.
Fernando Perez wrote:
It would be nice for .flat to be a guaranteed, indexable, no-copy view of the original (even if it's non-contiguous).
Dare I point out that a.flat works for non-contiguous arrays in numarray? In [4]: a.iscontiguous() Out[4]: 0 In [5]: a.flat Out[5]: array([ 0, 5, 10, 15, 20, 1, 6, 11, 16, 21, 2, 7, 12, 17, 22, 3, 8, 13, 18, 23, 4, 9, 14, 19, 24])
On Thu, 17 Feb 2005, David M. Cooke wrote:
I come from a C background, so things like Float64, etc., look wrong. I think more in terms of single- and double-precision, so I think adding some more descriptive types:
CInt (would be either Int32 or Int64, depending on the platform) CFloat (can't do Float, for backwards-compatibility reasons) CDouble (could just be Double) CLong (or Long) CLongLong (or LongLong)
That could make it easier to match types in Python code to types in C extensions.
Good choice of names.
Oh, and the Python types int and float should be allowed (especially if you want this to go in the core!).
Say, I like that idea. And maybe, like float and int, the numeric types could be callable to construct numeric arrays of that type, e.g., a = numeric3.Int16([1,2,3]) - Rick
Oh, and the Python types int and float should be allowed (especially if you want this to go in the core!).
Say, I like that idea. And maybe, like float and int, the numeric types could be callable to construct numeric arrays of that type, e.g.,
a = numeric3.Int16([1,2,3])
That is a good idea. Seems easy to implement, they would just be aliases.
David M. Cooke wrote:
Travis Oliphant <oliphant@ee.byu.edu> writes:
I'm glad to get the feedback.
1) Types
I like Francesc's suggestion that .typecode return a code and .type return a Python class. What is the attitude and opinion regarding the use of attributes or methods for this kind of thing? It always seems to me so arbitrary as to what is an attribute or what is a method.
If it's an intrinisic attribute (heh) of the object, I usually try to make it an attribute. So I'd make these attributes.
There will definitely be support for the nummary-style type specification. Something like that will be how they print (I like the 'i4', 'f4', specification a bit better though). There will also be support for specification in terms of a c-type. The typecodes will still be there, underneath.
+1. I think labelling types with their sizes at some level is necessary for cross-platform compatibility (more below).
One thing has always bothered me though. Why is a double complex type Complex64? and a float complex type Complex32. This seems to break the idea that the number at the end specifies a bit width. Why don't we just call it Complex64 and Complex128? Can we change this?
Or rename to ComplexFloat32 and ComplexFloat64?
I'm also glad that some recognize the problems with always requiring specification of types in terms of bit-width or byte-widths as these are not the same across platforms. For some types (like Int8 or Int16) this is not a problem. But what about long double? On an intel machine long double is Float96 while on a PowerPC it is Float128. Wouldn't it just be easier to specify LDouble or 'g' then special-case your code?
One problem to consider (and where I first ran into these type of things) is when pickling. A pickle containing an array of Int isn't portable, if the two machines have a different idea of what an Int is (Int32 or Int64, for instance). Another reason to keep the byte-width.
LDouble, for instance, should probably be an alias to Float96 on Intel, and Float128 on PPC, and pickle accordingly.
Problems also exist when you are interfacing with hardware or other C or Fortran code. You know you want single-precision floating point. You don't know or care what the bit-width is. I think with the Integer types the bit-width specification is more important than floating point types. In sum, I think it is important to have the ability to specify it both ways. When printing the array, it's probably better if it gives bit-width information. I like the way numarray prints arrays.
Do you mean adding bit-width info to str()? repr() definitely needs it, and it should be included in all cases, I think.
You also run into that sizeof(Python integer) isn't necessarily sizeof(C int) (a Python int being a C long), espically on 64-bit systems.
I come from a C background, so things like Float64, etc., look wrong. I think more in terms of single- and double-precision, so I think adding some more descriptive types:
CInt (would be either Int32 or Int64, depending on the platform) CFloat (can't do Float, for backwards-compatibility reasons) CDouble (could just be Double) CLong (or Long) CLongLong (or LongLong)
That could make it easier to match types in Python code to types in C extensions.
I guess the issue revolves around the characteristics of the target users, if most are C aficionados then the above has merit. However, this doesn't provide for the Int8's or the Int16's. Neither does it provide for a bit array, which would be suitable for Booleans. My guess is that most users would not be from a C background and so something along the lines of numerictypes makes sense.
Oh, and the Python types int and float should be allowed (especially if you want this to go in the core!).
And a Fortran integer could be something else, but I think that's more of a SciPy problem than Numeric or numarray. It could add FInteger and FBoolean, for instance.
2) Multidimensional array indexing.
Sometimes it is useful to select out of an array some elements based on it's linear (flattened) index in the array. MATLAB, for example, will allow you to take a three-dimensional array and index it with a single integer based on it's Fortran-order: x(1,1,1), x(2,1,1), ...
What I'm proposing would have X[K] essentially equivalent to X.flat[K]. The problem with always requiring the use of X.flat[K] is that X.flat does not work for discontiguous arrays. It could be made to work if X.flat returned some kind of specially-marked array, which would then have to be checked every time indexing occurred for any array. Or, there maybe someway to have X.flat return an "indexable iterator" for X which may be a more Pythonic thing to do anyway. That could solve the problem and solve the discontiguous X.flat problem as well.
If we can make X.flat[K] work for discontiguous arrays, then I would be very happy to not special-case the single index array but always treat it as a 1-tuple of integer index arrays.
Right now, I find X.flat to be pretty useless, as you need a contiguous array. I'm +1 on making X.flat work in all cases (contiguous and discontiguous). Either
a) X.flat returns a contiguous 1-dimensional array (like ravel(X)), which may be a copy of X
or
b) X.flat returns a "flat-indexable" view of X
I'd argue for b), as I feel that attributes should operate as views, not as potential copies. To me, attributes "feel like" they do no work, so making a copy by mere dereferencing would be suprising.
If a), I'd rather flat() be a method (or have a ravel() method).
I think overloading X[K] starts to run into trouble: too many special cases.
As someone else said, the draft PEP needs to have a much clearer statement of what datatype K is and just what X[K] would mean. Colin W.
On Thu, Feb 17, 2005 at 08:00:38PM -0500, Colin J. Williams wrote:
David M. Cooke wrote:
I come from a C background, so things like Float64, etc., look wrong. I think more in terms of single- and double-precision, so I think adding some more descriptive types:
CInt (would be either Int32 or Int64, depending on the platform) CFloat (can't do Float, for backwards-compatibility reasons) CDouble (could just be Double) CLong (or Long) CLongLong (or LongLong)
That could make it easier to match types in Python code to types in C extensions.
I guess the issue revolves around the characteristics of the target users, if most are C aficionados then the above has merit. However, this doesn't provide for the Int8's or the Int16's. Neither does it provide for a bit array, which would be suitable for Booleans.
My guess is that most users would not be from a C background and so something along the lines of numerictypes makes sense.
I'm thinking that CInt, etc., would be aliases for Int32 or Int64 (or whatever makes sense on the platform), at least at the Python level. The idea is if you're writing wrapper code for external routines, you want to use the types used in the routine, which most likely will vary platform-by-platform. In that case you *don't* want to hardcode Int32, etc., because that's not the right type for all platforms. I've run into enough of these bugs, since I'm using a 64-bit Athlon64 Linux system now as my main system (and Numeric still has some problems internally in this respect). It's partly documentation: you're asserting that an array is an array of C ints, whatever a C int is. Passing that to a routine that takes C ints shouldn't require conversion, or fail because of mismatched min-max int ranges. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
On Feb 17, 2005, at 7:52 PM, Travis Oliphant wrote:
I'm glad to get the feedback.
1) Types
I like Francesc's suggestion that .typecode return a code and .type return a Python class. What is the attitude and opinion regarding the use of attributes or methods for this kind of thing? It always seems to me so arbitrary as to what is an attribute or what is a method.
I don't think it really matters. Attributes seem natural, shape is an attribute for instance, so why not type? In the end, I don't care.
There will definitely be support for the nummary-style type specification. Something like that will be how they print (I like the 'i4', 'f4', specification a bit better though). There will also be support for specification in terms of a c-type. The typecodes will still be there, underneath.
sounds fine to me.
One thing has always bothered me though. Why is a double complex type Complex64? and a float complex type Complex32. This seems to break the idea that the number at the end specifies a bit width. Why don't we just call it Complex64 and Complex128? Can we change this?
I actually find the current approach natural. You specify the width of the real and the imaginary type, which are some kind of a double type. Again, in the end I would not care.
I'm also glad that some recognize the problems with always requiring specification of types in terms of bit-width or byte-widths as these are not the same across platforms. For some types (like Int8 or Int16) this is not a problem. But what about long double? On an intel machine long double is Float96 while on a PowerPC it is Float128. Wouldn't it just be easier to specify LDouble or 'g' then special-case your code?
long double is a bit of a special case. I guess I would probably not use it anyway. The point is indeed that having things like LDouble is 'a good thing'.
Problems also exist when you are interfacing with hardware or other C or Fortran code. You know you want single-precision floating point. You don't know or care what the bit-width is. I think with the Integer types the bit-width specification is more important than floating point types. In sum, I think it is important to have the ability to specify it both ways.
I completely agree with this. I probably dont care for floating point, it is good enough to distinguish between single and double precision. Integer types are a different story, you want to be a bit more precise then. Having both solves the problem quite well.
When printing the array, it's probably better if it gives bit-width information. I like the way numarray prints arrays.
Agreed.
2) Multidimensional array indexing.
Sometimes it is useful to select out of an array some elements based on it's linear (flattened) index in the array. MATLAB, for example, will allow you to take a three-dimensional array and index it with a single integer based on it's Fortran-order: x(1,1,1), x(2,1,1), ...
What I'm proposing would have X[K] essentially equivalent to X.flat[K]. The problem with always requiring the use of X.flat[K] is that X.flat does not work for discontiguous arrays. It could be made to work if X.flat returned some kind of specially-marked array, which would then have to be checked every time indexing occurred for any array. Or, there maybe someway to have X.flat return an "indexable iterator" for X which may be a more Pythonic thing to do anyway. That could solve the problem and solve the discontiguous X.flat problem as well.
But possibly slow, and that we want to avoid.
If we can make X.flat[K] work for discontiguous arrays, then I would be very happy to not special-case the single index array but always treat it as a 1-tuple of integer index arrays.
Speed will be an issue.
Capping indexes was proposed because of what numarray does. I can only think that the benefit would be that you don't have to check for and raise an error in the middle of an indexing loop or pre-scan the indexes. But, I suppose this is unavoidalbe, anyway. Currently Numeric allows specifying indexes that are too high in slices. It just chops them. Python allows this too, for slices. So, I guess I'm just specifying Python behavior. Of course indexing with an integer that is too large or too small will raise errors:
In Python:
a = [1,2,3,4,5] a[:20] works a[20] raises an error.
Probably better to stick to Python behavior.
Peter Verveer wrote:
On Feb 17, 2005, at 7:52 PM, Travis Oliphant wrote:
I'm glad to get the feedback. [snip]
2) Multidimensional array indexing.
Sometimes it is useful to select out of an array some elements based on it's linear (flattened) index in the array. MATLAB, for example, will allow you to take a three-dimensional array and index it with a single integer based on it's Fortran-order: x(1,1,1), x(2,1,1), ...
What I'm proposing would have X[K] essentially equivalent to X.flat[K]. The problem with always requiring the use of X.flat[K] is that X.flat does not work for discontiguous arrays. It could be made to work if X.flat returned some kind of specially-marked array, which would then have to be checked every time indexing occurred for any array. Or, there maybe someway to have X.flat return an "indexable iterator" for X which may be a more Pythonic thing to do anyway. That could solve the problem and solve the discontiguous X.flat problem as well.
But possibly slow, and that we want to avoid.
Currently, numarray returns an array with a reduced rank:
a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) a[1] array([3, 4, 5]) a[:,1] array([1, 4, 7])
Is this to be abandoned? Colin W.
On Feb 17, 2005, at 19:52, Travis Oliphant wrote:
I like Francesc's suggestion that .typecode return a code and .type return a Python class. What is the attitude and opinion regarding the use of attributes or methods for this kind of thing? It always seems to me so arbitrary as to what is an attribute or what is a method.
My view of pythonicity is that retrieving a value should be written as attribute access. Methods are more appropriate if there are arguments (no choice then anyway) or side effects. So I'd have .type as an attribute. BTW, as the goal is inclusion into the Python core, why not 1) Use Python type objects for array creation and as the values of the .type attribute. 2) Implement scalar types for those array element types that currently have no Python scalar equivalent (e.g. UInt16). 3) Implement the same set of attributes of methods for scalar types and arrays. Then the distinction between scalars and rank-0 arrays would become a minor implementation detail rather than a topic of heated debate. In different words, I propose that the PEP should include unification of scalars and arrays such that for all practical purposes scalars *are* rank-0 arrays.
One thing has always bothered me though. Why is a double complex type Complex64? and a float complex type Complex32. This seems to break the idea that the number at the end specifies a bit width. Why don't we just call it Complex64 and Complex128? Can we change this?
+1
PowerPC it is Float128. Wouldn't it just be easier to specify LDouble or 'g' then special-case your code?
Definitely.
Sometimes it is useful to select out of an array some elements based on it's linear (flattened) index in the array. MATLAB, for example, will allow you to take a three-dimensional array and index it with a single integer based on it's Fortran-order: x(1,1,1), x(2,1,1), ...
Could you give an example where this would be useful? To me this looks like a feature that MATLAB inherited from Fortran, which had it for efficiency reasons in a time when compilers were not so good at optimizing index expressions. I don't lile the "special case" status of such a construct either. It could lead to unpleasant bugs that would be hard to find by those who are not aware of the special case. I'd say that special cases need special justifications - and I don't see one here.
discontiguous arrays. It could be made to work if X.flat returned some kind of specially-marked array, which would then have to be checked every time indexing occurred for any array. Or, there maybe someway to have X.flat return an
I much prefer that approach, assuming there is a real use for this feature.
Capping indexes was proposed because of what numarray does. I can only think that the benefit would be that you don't have to check for and raise an error in the middle of an indexing loop or pre-scan the indexes. But, I suppose this is unavoidalbe, anyway. Currently Numeric allows specifying indexes that are too high in slices. It just chops them. Python allows this too, for slices. So, I guess I'm just specifying Python behavior. Of course indexing with an integer that is too large or too small will raise errors:
I am all for imitating Python list behaviour in arrays, but we should also watch out for pitfalls. Array index expressions are in general much more complex than list index expressions, so the risk of introducing bugs is also much higher, which might well justify a somewhat incompatible approach.
This may be a bit controversial as it is a bit of a change. But, my experience is that quite a bit of extra code is written to check whether or not a calculation returns a Python-scalar (because these don't have the same methods as arrays). In
The only method I can think of is typecode(). But if more array functionality is migrated to arrays, this might become more serious.
When Python needs a scalar it will generally ask the object if it can turn itself into an int or a float. A notable exception is indexing in a list (where Python needs an integer and won't ask the object to convert if it can). But int(b) always returns a Python integer if the array has only 1 element.
Still, this is a major point in practice. There was a Numeric release at some point in history that always returned rank-0 array objects (I don't remember if by design or by mistake), and it broke lots of my code because I was using array elements as indices. Another compatibility issue is routines in C modules that insist on scalar arguments. As I outlined above, I'd prefer a solution in which the distinction disappears from the Python programmer's point of view, even if scalars and rank-0 arrays remain distinct in the implementation (which is reasonable for performance reasons).
I'd like to know what reasons people can think of for ever returning Python scalars unless explicitly asked for.
Other than the pragmatic ones, consistency: arrays are container structures that store elements of particular types. You should get out what you put in. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire Léon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen@llb.saclay.cea.fr ---------------------------------------------------------------------
konrad.hinsen@laposte.net wrote:
My view of pythonicity is that retrieving a value should be written as attribute access. Methods are more appropriate if there are arguments (no choice then anyway) or side effects. So I'd have .type as an attribute.
That's my view as well.
BTW, as the goal is inclusion into the Python core, why not
1) Use Python type objects for array creation and as the values of the .type attribute.
Do you mean have register each of the 21 different types of arrays as a new type object? Hmm. That is an interesting idea. I'm a little worried about implications for having the arrays behave together, though it would make it more straightforward to define specific mixed operations. This does deserve a little more thought.
2) Implement scalar types for those array element types that currently have no Python scalar equivalent (e.g. UInt16).
Do you think this would fly with the Python folks. Counting the suggestion above, we would be encouraging the creation of 39 new types to the Python core. My current count shows the current number of types as 35 so we would basically double that. This doesn't have to matter, but I'd have to hear how Guido feels about something like that.
3) Implement the same set of attributes of methods for scalar types and arrays.
That would be ideal. But, I'm not sure what kind of chance we have with that. -Travis
On 18.02.2005, at 18:59, Travis Oliphant wrote:
Do you mean have register each of the 21 different types of arrays as a new type object? Hmm. That is an interesting idea. I'm a little worried
Yes. It only costs a bit of memory for the type objects, so why not?
2) Implement scalar types for those array element types that currently have no Python scalar equivalent (e.g. UInt16).
Do you think this would fly with the Python folks. Counting the suggestion above, we would be encouraging the creation of 39 new types to the Python core. My current count shows the current number of types as 35 so
Those new types would (if all goes well) be part of the standard library, but not built-in types. Compared to the number of types and classes in the standard library, the addition is not so big. There wouldn't be literals either. Anyone who doesn't use the array module could thus safely ignore the existence of those types. Implementation-wise, the new types could well be rank-0 arrays internally and thus add nearly no overhead.
we would basically double that. This doesn't have to matter, but I'd have to hear how Guido feels about something like that.
Of course!
3) Implement the same set of attributes of methods for scalar types and arrays.
That would be ideal. But, I'm not sure what kind of chance we have with that.
Me neither. It might depend on clever presentation of the project. Perhaps some bribing could help ;-) Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
Do you mean have register each of the 21 different types of arrays as a new type object? Hmm. That is an interesting idea. I'm a little worried
Yes. It only costs a bit of memory for the type objects, so why not?
Let me see if i understand you here. Since the type of the array is bascially handled by the PyArray_Descr* structure. Are you suggesting turning that into a full-fledged PythonObject with new type? I had thought of this before since it's basically calling for us to do something like that. Or do you have something else in mind? -Travis
On 18.02.2005, at 23:06, Travis Oliphant wrote:
Let me see if i understand you here. Since the type of the array is bascially handled by the PyArray_Descr* structure. Are you suggesting turning that into a full-fledged PythonObject with new type? I had thought of this before since it's basically calling for us to do something like that. Or do you have something else in mind?
I was thinking of the scalar types than of the arrays. What I am proposing is to have a Int16, Int32, UInt 16 etc. as Python scalar types (though not built-in types) and use the type objects to create arrays and to identify the type of array elements. Those types would be part of a type hierarchy for type testing. One could then either have a single array type that stores the type object for the elements internally, or have different Python types for every kind of array (i.e. Int16Array, Int32Array). I don't see any important difference there, so I have no clear opinion on this. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
konrad.hinsen@laposte.net wrote:
On 18.02.2005, at 18:59, Travis Oliphant wrote:
Do you think this would fly with the Python folks. Counting the suggestion above, we would be encouraging the creation of 39 new types to the Python core. My current count shows the current number of types as 35 so
Those new types would (if all goes well) be part of the standard library, but not built-in types. Compared to the number of types and classes in the standard library, the addition is not so big. There wouldn't be literals either. Anyone who doesn't use the array module could thus safely ignore the existence of those types.
I don't see the problem that this approach would solve. It doesn't solve the list/tuple indexing problem by itself. Even if the types are part of the standard library, they won't be bona-fide ints, so the indexing code would still have to be modified to check for them. I *do* like the idea of the typecode objects, however they are implemented, to be able to act as constructors. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
On 19.02.2005, at 05:37, Robert Kern wrote:
I don't see the problem that this approach would solve. It doesn't solve the list/tuple indexing problem by itself. Even if the types are part of the standard library, they won't be bona-fide ints, so the indexing code would still have to be modified to check for them.
Yes, but it could check for "integer type" (using the type hierarchy) rather than convert everything to an integer with the problem that Guido pointed out. However, my original motivation was consistency of usage. Python has type objects to specify types, so I'd rather use them than introduce another way to specify the type of array elements.
I *do* like the idea of the typecode objects, however they are implemented, to be able to act as constructors.
That is an interesting idea as well, though a slightly different one. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
konrad.hinsen@laposte.net wrote:
On 19.02.2005, at 05:37, Robert Kern wrote:
I don't see the problem that this approach would solve. It doesn't solve the list/tuple indexing problem by itself. Even if the types are part of the standard library, they won't be bona-fide ints, so the indexing code would still have to be modified to check for them.
Yes, but it could check for "integer type" (using the type hierarchy) rather than convert everything to an integer with the problem that Guido pointed out.
Except that these types probably can't be derived from the builtin int. The C layouts would have to be compatible. They'd probably have to be a separate hierarchy. At that, rank-0 arrays would have to become a special case because their value will have to be reflected by x->ob_ival. And how that happens is going to depend on their actual C type. We'll be inheriting methods that we can't use, and general arrays, even if the C types are compatible, can't be used in place of a bona fide PyIntObject. I would prefer a single type of array object that can store different kinds of values.
However, my original motivation was consistency of usage. Python has type objects to specify types, so I'd rather use them than introduce another way to specify the type of array elements.
True. However, if we introduce a bona fide TypeObject hierarchy for numerical scalars that *can* be profitably used outside of the array context, it's *going* to be used outside of the array context. If it gets into the standard library, it can't just be a large number hierarchy for our use; it will have to be *the* number hierarchy for Python and include PyLongObjects and decimals and rationals. And that's probably a bit more than we care to deal with to get multiarrays into the core. On the other hand, the list/tuple indexing issue won't go away until the PEP is accepted and integrated into the core. And it won't be accepted until Numeric3 has had some life of it's own outside the standard library. Bugger. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Yes, but it could check for "integer type" (using the type hierarchy) rather than convert everything to an integer with the problem that Guido pointed out.
Except that these types probably can't be derived from the builtin int. The C layouts would have to be compatible. They'd probably have to be a separate hierarchy.
On the python-dev list someone (Bob Ippolito) suggested inheriting rank-0 arrays from the Python Int Type. I don't see how this can be even be done for all of the integer types (unless the Python Int Type is changed to hold the largest possible integer (long long)).
At that, rank-0 arrays would have to become a special case because their value will have to be reflected by x->ob_ival. And how that happens is going to depend on their actual C type. We'll be inheriting methods that we can't use, and general arrays, even if the C types are compatible, can't be used in place of a bona fide PyIntObject.
I would prefer a single type of array object that can store different kinds of values.
I see the same problems that Robert is talking about. Do we really want to special-case all array code to handle rank-0 arrays differently? That seems to be opening up a very big can of worms. Is the only way to solve this problem to handle rank-0 arrays in a separate hierarchy? I have doubts that such a system would even work.
On the other hand, the list/tuple indexing issue won't go away until the PEP is accepted and integrated into the core. And it won't be accepted until Numeric3 has had some life of it's own outside the standard library.
I agree with Robert's assesment --- bugger. I'm really annoyed that such a relatively simple think like rank-0 arrays versus Python's already-defined scalars could be such a potential show-stopper. Hoping-it-won't-be -Travis
On 19.02.2005, at 15:28, Robert Kern wrote:
Except that these types probably can't be derived from the builtin int. The C layouts would have to be compatible. They'd probably have to be a separate hierarchy.
They could all derive from a common (yet-to-be-written) base class that has no data layout at all.
True. However, if we introduce a bona fide TypeObject hierarchy for numerical scalars that *can* be profitably used outside of the array context, it's *going* to be used outside of the array context. If it gets into the
True, though I expect its use to be limited to the numeric community.
standard library, it can't just be a large number hierarchy for our use; it will have to be *the* number hierarchy for Python and include PyLongObjects and decimals and rationals.
That would be preferable indeed.
And that's probably a bit more than we care to deal with to get multiarrays into the core.
It all depends on the reaction of the Python developer community. We won't know before asking. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
konrad.hinsen@laposte.net wrote:
On 19.02.2005, at 15:28, Robert Kern wrote:
Except that these types probably can't be derived from the builtin int. The C layouts would have to be compatible. They'd probably have to be a separate hierarchy.
They could all derive from a common (yet-to-be-written) base class that has no data layout at all.
We then end up with the same chicken-egg problem as accepting rank-0 integer arrays as indices. It won't work until it's in the core. If I'm understanding your proposition correctly, it also creates another problem: rank-n arrays would then pass this check, although they shouldn't.
True. However, if we introduce a bona fide TypeObject hierarchy for numerical scalars that *can* be profitably used outside of the array context, it's *going* to be used outside of the array context. If it gets into the
True, though I expect its use to be limited to the numeric community.
I expect so, too. However, when considering additions to the standard library, python-dev has to assume otherwise. If it's going to be so limited in application, then something so large shouldn't be in the standard library.
standard library, it can't just be a large number hierarchy for our use; it will have to be *the* number hierarchy for Python and include PyLongObjects and decimals and rationals.
That would be preferable indeed.
And that's probably a bit more than we care to deal with to get multiarrays into the core.
It all depends on the reaction of the Python developer community. We won't know before asking.
I think it would be great to have a more thorough number hierarchy in the standard library. So would some others. See PEPs 228 and 242. However, I think that the issue is orthogonal getting an multiarray object into the standard library. I'm not convinced that it actually solves the problems with getting multiarrays into the core. Now, we may have different priorities, so we have different thresholds of "problem-ness." -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
Robert Kern wrote:
konrad.hinsen@laposte.net wrote:
It all depends on the reaction of the Python developer community. We won't know before asking.
I think it would be great to have a more thorough number hierarchy in the standard library. So would some others. See PEPs 228 and 242. However, I think that the issue is orthogonal getting an multiarray object into the standard library. I'm not convinced that it actually solves the problems with getting multiarrays into the core. Now, we may have different priorities, so we have different thresholds of "problem-ness."
PEP 228 is under consideration (since 2000): Numerical Python Issues People who use Numerical Python do so for high-performance vector operations. Therefore, NumPy should keep its hardware based numeric model. *Unresolved Issues* Which number literals will be exact, and which inexact? How do we deal with IEEE 754 operations? (probably, isnan/isinf should be methods) On 64-bit machines, comparisons between ints and floats may be broken when the comparison involves conversion to float. Ditto for comparisons between longs and floats. This can be dealt with by avoiding the conversion to float. (Due to Andrew Koenig.) For PEP 242 the status is: This PEP has been closed by the author. The kinds module will not be added to the standard library. There was no opposition to the proposal but only mild interest in using it, not enough to justify adding the module to the standard library. Instead, it will be made available as a separate distribution item at the Numerical Python site. At the next release of Numerical Python, it will no longer be a part of the Numeric distribution. It seems to be up to the numerical folk to make proposals. Colin W.
Colin J. Williams wrote:
Robert Kern wrote:
konrad.hinsen@laposte.net wrote:
It all depends on the reaction of the Python developer community. We won't know before asking.
I think it would be great to have a more thorough number hierarchy in the standard library. So would some others. See PEPs 228 and 242. However, I think that the issue is orthogonal getting an multiarray object into the standard library. I'm not convinced that it actually solves the problems with getting multiarrays into the core. Now, we may have different priorities, so we have different thresholds of "problem-ness."
PEP 228 is under consideration (since 2000):
Numerical Python Issues
People who use Numerical Python do so for high-performance vector operations. Therefore, NumPy should keep its hardware based numeric model.
Note that the recommendation is that Numeric ignore PEP's number model. That PEP points *away* from things like Int32 and Float64. [snip]
For PEP 242 the status is:
This PEP has been closed by the author. The kinds module will not be added to the standard library.
There was no opposition to the proposal but only mild interest in using it, not enough to justify adding the module to the standard library. Instead, it will be made available as a separate distribution item at the Numerical Python site. At the next release of Numerical Python, it will no longer be a part of the Numeric distribution.
It seems to be up to the numerical folk to make proposals.
Note also that PEP 242 was retracted before people got really interested (by which I mean "interested enough to implement") in other number types like decimal and rationals. While historically these proposal have come from the NumPy community (which I'm distinguishing from "numerical folk"), in the future they will need to intimately involve a much larger group of people. Of course, the NumPy community is a subset of "numerical folk," so we are naturally interested in how numbers are represented in Python. I'm not saying we shouldn't be or that such a proposal shouldn't come from this community. In general, such a thing would be of great use to this community. However, I don't see how it would help, in specific, the addition of multiarray objects to the standard library, nor do I think that such should wait upon the acceptance and implementation of such a proposal. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
On 20.02.2005, at 00:55, Robert Kern wrote:
They could all derive from a common (yet-to-be-written) base class that has no data layout at all.
We then end up with the same chicken-egg problem as accepting rank-0 integer arrays as indices. It won't work until it's in the core. If I'm
True. We could have a patch to Python for testing purposes before such a decision, but this is not an ideal situation.
understanding your proposition correctly, it also creates another problem: rank-n arrays would then pass this check, although they shouldn't.
No. My proposal is to have new Python scalar types, which are distinct from the array type(s). The new scalar types would use rank-0 arrays as their internal representation, but that would be an implementation detail not visible to users.
I expect so, too. However, when considering additions to the standard library, python-dev has to assume otherwise. If it's going to be so limited in application, then something so large shouldn't be in the standard library.
I am not so sure about this. There is some very domain-specific stuff in the standard library.
I think it would be great to have a more thorough number hierarchy in the standard library. So would some others. See PEPs 228 and 242. However, I think that the issue is orthogonal getting an multiarray object into the standard library. I'm not convinced that it actually solves the problems with
The common point is just the community that is interested in this. However, there might be a wider interest in a re-working of the number hierarchy. There have been additions to the number hierarchy that don't come from the numerics community, such as the decimal number type. I think that a more open number hierarchy, into which modules could add their own types, could make it into the core if it doesn't cause any compatibility problems. In fact, this may be easier to argue for than having array methods on all scalar types. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
konrad.hinsen@laposte.net wrote:
On 20.02.2005, at 00:55, Robert Kern wrote:
understanding your proposition correctly, it also creates another problem: rank-n arrays would then pass this check, although they shouldn't.
No. My proposal is to have new Python scalar types, which are distinct from the array type(s). The new scalar types would use rank-0 arrays as their internal representation, but that would be an implementation detail not visible to users.
*Ahh!* Enlightenment dawns. That makes a *hell* of a lot more sense than what I thought you were proposing. Thank you for bearing with my babbling. I withdraw my objections. I do have one comment, though. I still prefer the idea that arrays, including rank-0 arrays, be containers. So I would suggest that there only be a handful of new rank-0 types, int-like, float-like, complex-like, object, and maybe a couple more. The differences between the type objects would be solely for inheritance reasons. Different precisions would be handled like they are for rank-n arrays.
I expect so, too. However, when considering additions to the standard library, python-dev has to assume otherwise. If it's going to be so limited in application, then something so large shouldn't be in the standard library.
I am not so sure about this. There is some very domain-specific stuff in the standard library.
I think my point was that a number hierarchy is something that *ought* to have much wider applicability. If the implementation *doesn't*, then it shouldn't be in the standard library. That's why I prefer the arrangement I described above. It isn't a real number hierarchy and doesn't purport to be one. It's just an implementation detail[1] to make multiarrays place nicer with the rest of the Python universe.
I think it would be great to have a more thorough number hierarchy in the standard library. So would some others. See PEPs 228 and 242. However, I think that the issue is orthogonal getting an multiarray object into the standard library. I'm not convinced that it actually solves the problems with
The common point is just the community that is interested in this. However, there might be a wider interest in a re-working of the number hierarchy. There have been additions to the number hierarchy that don't come from the numerics community, such as the decimal number type. I think that a more open number hierarchy, into which modules could add their own types, could make it into the core if it doesn't cause any compatibility problems. In fact, this may be easier to argue for than having array methods on all scalar types.
Agreed. I think, though, that what Travis proposes in "PEP Updated" is going to be easiest of all to swallow. [1] Isn't it amazing how many design problems go away by claiming that "it's just an implementation detail?" :-) -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
When Python needs a scalar it will generally ask the object if it can turn itself into an int or a float. A notable exception is indexing in a list (where Python needs an integer and won't ask the object to convert if it can). But int(b) always returns a Python integer if the array has only 1 element.
Still, this is a major point in practice. There was a Numeric release at some point in history that always returned rank-0 array objects (I don't remember if by design or by mistake), and it broke lots of my code because I was using array elements as indices.
I posted a question to python-dev about changing the internals of Python to support asking objects to convert to ints in slicing. While open to the idea for Python 3000, Guido does not seem favorable to the idea for Python 2.X The problem, Guido mentions was that float-like objects can convert to ints by truncation and he doesn't want to allow floats to be used as indexes. He feels it would break too much code. Using this line of reasoning, then, arrays should not be used as indexes unless they are explicitly converted to integers: int(a) I have proposed a second solution that asks if a special check could be made for rank-0 arrayobjects (of integer type) if and when they are allowed in the core. I think Konrad's valid point regarding consistency is that to the user it looks like he is making an array of integers a = array([1,2,3,4]) so it is confusing (the first time) if a[0] fails to act like an integer when requested for slicing an array. Of course underneath, a is not an array of integers (it is an array of homogeneous c-ints converted from the Python integer and so why should a[0] be a Python integer). This is the problem. We want different things to act the same all the time when fundamentally they are different. Python allows this in many cases, but doesn't seem to be fully transparent in this regard. When I first started with Python, as a MATLAB user I was confused by the fact that lists, tuples, and arrays were all different things that had some commonality. I was much happier when I just decided to let them be different and write my code accordingly. (Interestingly enough since that time MATLAB has added other types to their language as well --- which don't always get along). Here I think we should just let rank-0 arrays and Python scalars be different things and let people know that instead of trying to mask the situation which ultimately confuses things. -Travis
On 18.02.2005, at 23:45, Travis Oliphant wrote:
While open to the idea for Python 3000, Guido does not seem favorable to the idea for Python 2.X The problem, Guido mentions was that float-like objects can convert to ints by truncation and he doesn't want to allow floats to be used as indexes. He feels it would break too much code.
I can agree with that. Basically the problem is the type hierarchy that is too simple (but many if not most programming language suffer from that problem). Type conversions are all handled in the same way, which doesn't give enough flexibility. But we won't change that of course.
Of course underneath, a is not an array of integers (it is an array of homogeneous c-ints converted from the Python integer and so why should a[0] be a Python integer).
That would basically mean to change the status of arrays from generalized sequence objects to something different. Why not, but then this should be made clear to the programmer, in particular by having a different printed representation for rank-0 arrays and scalars. It also means adding some conversion functions, e.g. for extracting a Python Object from a rank-0 Python Object array. Still, I am worried about two aspects: 1) The amount of confusion this generates among users. The distinction between scalars and rank-0 arrays has no utility for the programmer, it exists only for implementation and political reasons. I am not looking forward to explaining this in my Python courses for beginners. 2) Compatibility with existing code. I am not sure I will convert my code to such conventions any time soon, because it requires inspecting every single indexing operation in its particular context to see if the index could be a rank-0 integer array. There is no way to spot those cases by textual analysis. So this change could reduce acceptance to the point where there is no interest in pursuing the project any more. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
konrad.hinsen@laposte.net wrote:
On 18.02.2005, at 23:45, Travis Oliphant wrote:
While open to the idea for Python 3000, Guido does not seem favorable to the idea for Python 2.X The problem, Guido mentions was that float-like objects can convert to ints by truncation and he doesn't want to allow floats to be used as indexes. He feels it would break too much code.
I can agree with that. Basically the problem is the type hierarchy that is too simple (but many if not most programming language suffer from that problem). Type conversions are all handled in the same way, which doesn't give enough flexibility. But we won't change that of course.
Of course underneath, a is not an array of integers (it is an array of homogeneous c-ints converted from the Python integer and so why should a[0] be a Python integer).
That would basically mean to change the status of arrays from generalized sequence objects to something different. Why not, but then this should be made clear to the programmer, in particular by having a different printed representation for rank-0 arrays and scalars. It also means adding some conversion functions, e.g. for extracting a Python Object from a rank-0 Python Object array.
Still, I am worried about two aspects:
1) The amount of confusion this generates among users. The distinction between scalars and rank-0 arrays has no utility for the programmer, it exists only for implementation and political reasons. I am not looking forward to explaining this in my Python courses for beginners.
(+1) If we consider an array as a sequence of objects of a fixed type, numeric or other, it makes sense that when a single object is returned then an object of that type be returned, coverted if necessary for Int8 etc. Returning a zero rank array is an historical pain. It might make sense if all traditional Python objects were of zero rank I can see no merit in that.
2) Compatibility with existing code. I am not sure I will convert my code to such conventions any time soon, because it requires inspecting every single indexing operation in its particular context to see if the index could be a rank-0 integer array. There is no way to spot those cases by textual analysis. So this change could reduce acceptance to the point where there is no interest in pursuing the project any more.
I thought that the intent of Numeric 3 was to produce the best - a new start, without being overly concerned about compatibility. I was glad to see the proposal to abandon "ravel" ( a hangover from APL?). Words should have a clear generally accepted meaning. For "ravel" dictionary.com offers: 1. To separate the fibers or threads of (cloth, for example); unravel. 2. To clarify by separating the aspects of. 3. To tangle or complicate. Colin W.
"Colin J. Williams" <cjw@sympatico.ca> writes:
Returning a zero rank array is an historical pain.
The historical pain is returning a scalar: that is what both Numeric and Numarray currently do. Returning a zero rank array would be a new pain to replace that.
It might make sense if all traditional Python objects were of zero rank I can see no merit in that.
Pushing arrays that deep into the core language would be natural for a language inteded for numerical linear algebra but perhaps not for a general purpose language which people also use for web services and whatnot.
I was glad to see the proposal to abandon "ravel" ( a hangover from APL?).
I thought all APL builtins were denoted by weird special characters rather than any readable or pronouncable names. But I don't see ravel actually being abandoned, as the PEP does not discuss functions much. One reason for preferring functions to methods and attributes is that functions can be made to work with scalars and generic sequences. -- Timo Korvola <URL:http://www.iki.fi/tkorvola>
Timo Korvola wrote:
"Colin J. Williams" <cjw@sympatico.ca> writes:
[snip]
I was glad to see the proposal to abandon "ravel" ( a hangover from APL?).
From 05-02-18 16:24 PM
Yeah, I don't like this anymore either. I like X.flatten() better than X.ravel() too. -Travis I suggest that X.flatten would be even better.
I thought all APL builtins were denoted by weird special characters rather than any readable or pronouncable names. But I don't see ravel actually being abandoned, as the PEP does not discuss functions much. One reason for preferring functions to methods and attributes is that functions can be made to work with scalars and generic sequences.
Yes, most of these weird special operators had names, one of these was ravel. Incidentally, 'shape' is also probably inherited from Iverson's APL. Colin W.
On 19.02.2005, at 14:44, Colin J. Williams wrote:
2) Compatibility with existing code. I am not sure I will convert my code to such conventions any time soon, because it requires inspecting every single indexing operation in its particular context to see if the index could be a rank-0 integer array. There is no way to spot those cases by textual analysis. So this change could reduce acceptance to the point where there is no interest in pursuing the project any more.
I thought that the intent of Numeric 3 was to produce the best - a new start, without being overly concerned about compatibility.
It all depends on where "overly" starts. Let's be pragmatic: the majority of potential Numeric 3 users will be the current users of Numeric and numarray. If they don't accept Numeric 3 because it's a pain, no amount of nice design will help.
I was glad to see the proposal to abandon "ravel" ( a hangover from APL?).
Fine, but that's the kind of change that doesn't hurt much: a name change can be made with a text editor. Changing the behaviour of fundamental operations (indexing) is a different thing. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------
I can agree with that. Basically the problem is the type hierarchy that is too simple (but many if not most programming language suffer from that problem). Type conversions are all handled in the same way, which doesn't give enough flexibility. But we won't change that of course.
Guido might in Python 3000....
That would basically mean to change the status of arrays from generalized sequence objects to something different. Why not, but then this should be made clear to the programmer, in particular by having a different printed representation for rank-0 arrays and scalars. It also means adding some conversion functions, e.g. for extracting a Python Object from a rank-0 Python Object array.
I guess I don't consider arrays as generalized sequence objects. Except for the Python Object array which might be. Still, my views on this are really not central as I want to help get something together that a large majority of us can be satisfied with. I'm really not interested in intentionally making something that is difficult to use and I am willing to bend on many things (especially if work-arounds are possible). It seems to me that the rank-0 array scalar problem is fundamental to the situation of trying to craft a Numerical-environment on top of a general-purpose one. I don't see a "right solution" that would please all parties short of altering all Python scalars --- and that would likely upset an entirely different crowd. I suppose that is why we have the compromise we do now.
Still, I am worried about two aspects:
1) The amount of confusion this generates among users. The distinction between scalars and rank-0 arrays has no utility for the programmer, it exists only for implementation and political reasons. I am not looking forward to explaining this in my Python courses for beginners.
The only utility might be speed because it is not too hard to see that rank-0 arrays that behave as part of a generic system of multidimensional arrays might carry quite a bit of baggage that is not needed if all you are every want is a scalar. This distinction may not be possible to get rid of. To put it more clearly: Is it possible to define a scalar that interacts seamlessly with a system of multidimensional arrays without slowing slowing it down for usage in other contexts? I don't know the answer, but it sure seems to be no. I've looked at Piddle (PERL's equivalent to Numeric) and they seem to do exactly the same thing (have a rank-0 array that is very different from the Perl scalar). Added to our problem is that we do not have much control over the definition of fundamental scalars in Python. Guido has suggested that he may be willing to allow integers to get methods (how many methods I'm not sure --- I didn't push him. He mentioned the possibility of adding a rank-inquiry method for example). It would pleasantly surprise me if he were willing to give scalars all of the methods and attributes of arrays.
2) Compatibility with existing code. I am not sure I will convert my code to such conventions any time soon, because it requires inspecting every single indexing operation in its particular context to see if the index could be a rank-0 integer array. There is no way to spot those cases by textual analysis. So this change could reduce acceptance to the point where there is no interest in pursuing the project any more.
A healthy use of int() in indexing operations could fix this, but yes, I see compatibility as an issue. I don't want to create incompatibilities if we can avoid it. On the other hand, I don't want to continue with a serious design flaw just for the sake of incompatibility either (I'm still trying to figure out if it is a flaw or not, it sure seems like a hack). Thanks for your continued help and advice. -Travis
Travis Oliphant wrote:
I'm still thinking about the X.flat possibility.
There was a discussion about his a while back, but I don't remember what conclusions were reached (if any). The technical details are beyond me, but it seems quite clear to me that A.flat() should be supported for all arrays. Otherwise, it makes it very hard to write generic code. What's the point of having discontiguous arrays if you can't use them everywhere you can use contiguous ones? Fernando Perez wrote:
Granted, the indexing will be costlier for a non-contiguous object, but for the external users this provides a clean API.
Right. A number of things are costlier for non-contiguous arrays. If you need to optimize, you can make sure your arrays are contiguous where it matters. If you don't, it should "just work"
But Python is not Matlab!
Konrad.
hear! hear!. If I wanted Matlab, I'd use Matlab (or Octave, or Psilab) -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (20)
-
Chris Barker
-
Colin J. Williams
-
cookedm@physics.mcmaster.ca
-
Darren Dale
-
David M. Cooke
-
Duncan Child
-
Fernando Perez
-
Jeff Whitaker
-
konrad.hinsen@laposte.net
-
Paul Barrett
-
Perry Greenfield
-
Peter Verveer
-
Peter Verveer
-
Rick White
-
Robert Kern
-
Stephen Walton
-
Thomas Heller
-
Tim Hochberg
-
Timo Korvola
-
Travis Oliphant