From eric at enthought.com Sat Jun 1 13:20:42 2002 From: eric at enthought.com (eric) Date: Sat Jun 1 13:20:42 2002 Subject: [Numpy-discussion] bug in negative stride indexing for empty arrays Message-ID: <020101c209a0$5d2bbcc0$6b01a8c0@ericlaptop> Hi, I just ran across a situation where reversing an empty array using a negative stride populates it with a new element. I'm betting this isn't the intended behavior. An example code snippet is below. eric C:\home\ej\wrk\chaco>python Python 2.1.3 (#35, Apr 8 2002, 17:47:50) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> import Numeric >>> Numeric.__version__ '21.0' >>> a = array(()) >>> a zeros((0,), 'l') >>> len(a) 0 >>> b = a[::-1] >>> len(b) 1 >>> b array([0]) -- Eric Jones Enthought, Inc. [www.enthought.com and www.scipy.org] (512) 536-1057 From eric at enthought.com Sat Jun 1 13:48:55 2002 From: eric at enthought.com (eric) Date: Sat Jun 1 13:48:55 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior References: Message-ID: <021e01c209a4$2efa7e50$6b01a8c0@ericlaptop> ----- Original Message ----- From: "Konrad Hinsen" To: "Pearu Peterson" Cc: Sent: Wednesday, May 29, 2002 4:08 AM Subject: Re: [Numpy-discussion] Bug: extremely misleading array behavior > Pearu Peterson writes: > > > an array with 0 rank. It seems that the Numeric documentation is missing > > (though, I didn't look too hard) the following rules of thumb: > > > > If `a' is rank 1 array, then a[i] is Python scalar or object. [MISSING] > > Or rather: > > - If `a' is rank 1 array with elements of type Int, Float, or Complex, > then a[i] is Python scalar or object. [MISSING] > > - If `a' is rank 1 array with elements of type Int16, Int32, Float32, or > Complex32, then a[i] is a rank 0 array. [MISSING] > > - If `a' is rank > 1 array, then a[i] is a sub-array a[i,...] > > The rank-0 arrays are the #1 question topic for users of my netCDF > interface (for portability reasons, netCDF integer arrays map to > Int32, not Int, so scalar integers read from a netCDF array are always > rank-0 arrays), and almost everybody initially claims that it's a bug, > so some education seems necessary. I don't think education is the answer here. We need to change Numeric to have uniform behavior across all typecodes. Having alternative behaviors for indexing based on the typecode can lead to very difficult to find bugs. Generic routines meant to work with any Numeric type can brake a year later when someone passes in an array with a seemingly compatible type. Also, because coersion can silently change typecodes during arithmetic operations, code written expecting one behavior can all the sudden exihibit the other. That is very dangerous and hard to test. eric From jake at edge2.net Mon Jun 3 07:26:02 2002 From: jake at edge2.net (Jake Edge) Date: Mon Jun 3 07:26:02 2002 Subject: [Numpy-discussion] no 3 arg multiply in MA? Message-ID: <20020603082021.A30335@magpie> I was converting a program written for Numeric to use masked arrays and I ran into a problem with multiply ... it would appear that there is no 3 argument version for MA? i.e. a = array([1, 2, 3]) multiply(a,a,a) works fine to square the array using Numeric, but i get an exception: TypeError: __call__() takes exactly 3 arguments (4 given) when doing it using MA ... it seems clear that that is the problem, is it an oversight or just as yet unimplemented or am I missing something? thanks! jake From jake at edge2.net Mon Jun 3 07:45:16 2002 From: jake at edge2.net (Jake Edge) Date: Mon Jun 3 07:45:16 2002 Subject: [Numpy-discussion] some casting oddness? Message-ID: <20020603084002.A30375@magpie> I am using both MA and Numeric in a program that I am writing and ran into some typecasting oddness (at least I thought it was odd). When using only Numeric, adding an array of typecode 'l' and one of typecode '1' produces an array of typecode 'l' whereas using an MA derived array of typecode '1' added to a Numeric array of typecode 'l' produces an array of typecode '1'. Sorry if that is a bit dense, the upshot is that mixing the two causes the output to be the _smaller_ of the two types (Int8 == '1') rather than the larger (Int == 'l') as I would expect ... below is some code that reproduces the problem (it may look contrived (and is), but it comes from the guts of some code I have been playing with): #!/usr/bin/env python from Numeric import * import MA a = zeros((10,)) print a.typecode() b = MA.ones((10,),Int8) b = MA.masked_where(MA.equal(b,1),b,0) print b.typecode() print b.mask().typecode() z = ones((10,),Int8) print z.typecode() c = add(a,b.mask()) print c.typecode() d = add(a,z) print d.typecode() I get output like: l 1 1 1 1 l any thoughts? thanks! jake From hinsen at cnrs-orleans.fr Mon Jun 3 09:34:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 3 09:34:04 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <021e01c209a4$2efa7e50$6b01a8c0@ericlaptop> References: <021e01c209a4$2efa7e50$6b01a8c0@ericlaptop> Message-ID: <200206031630.g53GUKj13666@chinon.cnrs-orleans.fr> > I don't think education is the answer here. We need to change > Numeric to have uniform behavior across all typecodes. I agree that this would be the better solution. But until this is done... > Having alternative behaviors for indexing based on the typecode can > lead to very difficult to find bugs. Generic routines meant to work The differences are not that important, in most circumstances rank-0 arrays and scalars behave in the same way. The problems occur mostly with code that does explicit type checking. The best solution, in my opinion, is to provide scalar objects corresponding to low-precision ints and floats, as part of NumPy. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From paul at pfdubois.com Mon Jun 3 09:56:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 3 09:56:01 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206031630.g53GUKj13666@chinon.cnrs-orleans.fr> Message-ID: <000101c20b1f$4f8df3f0$0c01a8c0@NICKLEBY> Konrad said: > > The best solution, in my opinion, is to provide scalar > objects corresponding to low-precision ints and floats, as > part of NumPy. > > Konrad. One of the thoughts I had in mind for the "kinds" proposal was to support this. I was going to do the float32 object as part of it as a demo of how it would work. So I got out the float object from Python, figuring I would just change a few types et voila. Not. It is very hard to understand, and I don't even understand the reasons it is hard to understand. Perhaps a young person with a high tolerance for pain would look at this? From oliphant.travis at ieee.org Mon Jun 3 13:02:04 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 3 13:02:04 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000101c20b1f$4f8df3f0$0c01a8c0@NICKLEBY> References: <000101c20b1f$4f8df3f0$0c01a8c0@NICKLEBY> Message-ID: <1023134522.2758.2.camel@travis> On Mon, 2002-06-03 at 10:54, Paul F Dubois wrote: > > Konrad said: > > > > The best solution, in my opinion, is to provide scalar > > objects corresponding to low-precision ints and floats, as > > part of NumPy. > > > > Konrad. > This seems like a good idea. It's been an old source of confusion. On a related note, how does the community feel about retrofitting Numeric with unsigned shorts and unsigned ints. I've got the code to do it already written. -Travis From oliphant.travis at ieee.org Mon Jun 3 23:40:01 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 3 23:40:01 2002 Subject: [Numpy-discussion] Unsigned shorts and ints Message-ID: <1023172789.21778.8.camel@travis> I would like to update the Numeric CVS tree to include support for unsigned shorts and ints. Making the transition will cause some difficulty with binary extensions as these will need to be recompiled with the new Numeric. As a result, I propose that a new release of Numeric be posted (to include the recent bug fixes), and then the changes made for inclusion in the next version number of Numeric. Comments? -Travis From perry at stsci.edu Tue Jun 4 14:18:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 4 14:18:05 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior Message-ID: > > I don't think education is the answer here. We need to change > > Numeric to have uniform behavior across all typecodes. > > I agree that this would be the better solution. But until this is > done... > > > Having alternative behaviors for indexing based on the typecode can > > lead to very difficult to find bugs. Generic routines meant to work > > The differences are not that important, in most circumstances rank-0 > arrays and scalars behave in the same way. The problems occur mostly > with code that does explicit type checking. > > The best solution, in my opinion, is to provide scalar objects > corresponding to low-precision ints and floats, as part of NumPy. > There is another approach that I think is more sensible. >From what I can tell, the driving force behind rank-0 arrays as scalars are the Numeric coercion rules. One needs to retain the 'lesser' integer and float types so that operations with these psuedo-scalars and other arrays does not coerce arrays to a higher type than would have been done when using the nearest equivalent of Python scalars (if there is some other reason, I'd like to know). For example if a and b are Int16 1-d arrays, if indexing an element out of them produced a Python integer value then a[0]*b becomes an Int32 (or even Int64 on some platforms?) array. Numarray has different coercion rules so that this doesn't happen. Thus one doesn't need c[1,1] to give a rank-0 array. (Eric Jones has pointed out privately that another reason is to use different error handling, but if I'm not mistaken so long as one can group all calculations so that no scalar-scalar calculation is done, one doesn't really need rank-0 arrays other than in unusual circumstances.) So I'd argue that numarray solves this issue. For those that can't wait (because numarray currently lacks a feature, library, it's too slow on small arrays or whatever) and you really must modify Numeric I think you would be much better off changing the coercion rules and eliminating rank-0 arrays resulting from ordinary indexing rather than one of the other proposed changes (if that isn't too hard to implement). Of course you get into backward compatibility issues. But really, to get it right, some incompatibility is necessary if you want to eliminate this particular wart. Perry Greenfield From perry at stsci.edu Thu Jun 6 13:30:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 6 13:30:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: [I thought I replied yesterday, but somehow that apparently vanished.] : > "Perry Greenfield" writes: > > > Numarray has different coercion rules so that this doesn't > > happen. Thus one doesn't need c[1,1] to give a rank-0 array. > > What are those coercion rules? > For binary operations between a Python scalar and array, there is no coercion performed on the array type if the scalar is of the same kind as the array (but not same size or precision). For example (assuming ints happen to be 32 bit in this case) Python Int (Int32) * Int16 array --> Int16 array Python Float (Float64) * Float32 array --> Float32 array. But if the Python scalar is of a higher kind, e.g., Python float scalar with Int array, then the array is coerced to the corresponding type of the Python scalar. Python Float (Float64) * Int16 array --> Float64 array. Python Complex (Complex64) * Float32 array --> Complex64 array. Numarray basically has the same coercion rules as Numeric when two arrays are involved (there are some extra twists such as: UInt16 array * Int16 array --> Int32 array since neither input type is a proper subset of the other. (But since Numeric doesn't (or didn't until Travis changed that) have unsigned types, that wouldn't have been an issue with Numeric.) > > (if that isn't too hard to implement). Of course you get into > > backward compatibility issues. But really, to get it right, some > > incompatibility is necessary if you want to eliminate this particular > > wart. > > For a big change such as Numarray, I'd accept some incompatibilities. > For just a new version of NumPy, no. There is a lot of code out there > that uses NumPy, and I am sure that a good part of it relies on the > current coercion rules. Moreover, there is no simple way to detect > code that depends on coercion rules, so adapting existing code would > be an enormous amount of work. > Certainly. I didn't mean to minimize that. But the current coercion rules have produced a demand for solutions to the problem of upcasting, and I consider those solutions to be less than ideal (savespace and rank-0 arrays). If people really are troubled by these warts, I'm arguing that the real solution is in changing the coercion behavior. (Yes, it would be easiest to deal with if Python had all these types, but I think that will never happen, nor should it happen.) Perry From hinsen at cnrs-orleans.fr Fri Jun 7 09:01:47 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Jun 7 09:01:47 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: References: Message-ID: <200206071557.g57FvqY26621@chinon.cnrs-orleans.fr> > For binary operations between a Python scalar and array, there is > no coercion performed on the array type if the scalar is of the > same kind as the array (but not same size or precision). For example > (assuming ints happen to be 32 bit in this case) That solves one problem and creates another... Two, in fact. One is the inconsistency problem: Python type coercion always promotes "smaller" to "bigger" types, it would be good to make no exceptions from this rule. Besides, there are still situations in which types, ranks, and indexing operations depend on each other in a strange way. With a = array([1., 2.], Float) b = array([3., 4.], Float32) the result of a*b is of type Float, whereas a[0]*b is of type Float32 - if and only if a has rank 1. > (Yes, it would be easiest to deal with if Python had all these types, > but I think that will never happen, nor should it happen.) Python doesn't need to have them as standard types, an add-on package can provide them as well. NumPy seems like the obvious one. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Fri Jun 7 09:43:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 7 09:43:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206071557.g57FvqY26621@chinon.cnrs-orleans.fr> Message-ID: > > For binary operations between a Python scalar and array, there is > > no coercion performed on the array type if the scalar is of the > > same kind as the array (but not same size or precision). For example > > (assuming ints happen to be 32 bit in this case) > > That solves one problem and creates another... Two, in fact. One is > the inconsistency problem: Python type coercion always promotes > "smaller" to "bigger" types, it would be good to make no exceptions > from this rule. > > Besides, there are still situations in which types, ranks, and > indexing operations depend on each other in a strange way. With > > a = array([1., 2.], Float) > b = array([3., 4.], Float32) > > the result of > > a*b > > is of type Float, whereas > > a[0]*b > > is of type Float32 - if and only if a has rank 1. > All this is true. It really comes down to which poison you prefer. Neither choice is perfect. Changing the coercion rules results in the inconsistencies you mention. Not changing them results in the existing inconsistencies recently discussed (and still doesn't remove the difficulties of dealing with scalars in expressions without awkward constructs). We think the inconsistencies you point out are easier to live with than the existing behavior. It would be nice to have a solution that had none of these problems, but that doesn't appear to be possible. Perry From hinsen at cnrs-orleans.fr Fri Jun 7 13:49:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Jun 7 13:49:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: (message from Perry Greenfield on Fri, 07 Jun 2002 12:40:40 -0400) References: Message-ID: <200206072046.g57KkJZ27511@chinon.cnrs-orleans.fr> > It would be nice to have a solution that had none of these > problems, but that doesn't appear to be possible. I still believe that the best solution is to define scalar data types corresponding to all array element types. As far as I can see, this doesn't have any of the disadvantages of the other solutions that have been proposed until now. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Fri Jun 7 14:42:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 7 14:42:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206072046.g57KkJZ27511@chinon.cnrs-orleans.fr> Message-ID: : > I still believe that the best solution is to define scalar data types > corresponding to all array element types. As far as I can see, this > doesn't have any of the disadvantages of the other solutions that > have been proposed until now. > If x was a Float32 array how would the following not be promoted to a Float64 array y = x + 1. If you are proposing something like y = x + Float32(1.) it would work, but it sure leads to some awkward expressions. Perry From hinsen at cnrs-orleans.fr Sat Jun 8 15:41:08 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sat Jun 8 15:41:08 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: (message from Perry Greenfield on Fri, 07 Jun 2002 17:42:53 -0400) References: Message-ID: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> > If you are proposing something like > > y = x + Float32(1.) > > it would work, but it sure leads to some awkward expressions. Yes, that's what I am proposing. It's no worse than what we have now, and if writing Float32 a hundred times is too much effort, an abbreviation like f = Float32 helps a lot. Anyway, following the Python credo "explicit is better than implicit", I'd rather write explicit type conversions than have automagical ones surprise me. Finally, we can always lobby for inclusion of the new scalar types into the core interpreter, with a corresponding syntax for literals, but it would sure help if we could show that the system works and suffers only from the lack of literals. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From oliphant.travis at ieee.org Sat Jun 8 18:56:02 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Jun 8 18:56:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> References: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> Message-ID: <1023587755.13067.4.camel@travis> I did not receive any major objections, and so I have released a new Numeric (21.3) incorporating bug fixes. I also tagged the CVS tree with VERSION_21_3, and then I incorporated the unsigned integers and unsigned shorts into the CVS version of Numeric, for inclusion in a tentatively named version 22.0 I've only uploaded a platform independent tar file for 21.3. Any binaries need to be updated. If you are interested in testing the new additions, please let me know of any bugs you find. Thanks, -Travis O. From eric at enthought.com Sun Jun 9 17:19:13 2002 From: eric at enthought.com (eric jones) Date: Sun Jun 9 17:19:13 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> Message-ID: <000301c21014$50e991b0$6b01a8c0@ericlaptop> > > If you are proposing something like > > > > y = x + Float32(1.) > > > > it would work, but it sure leads to some awkward expressions. > > Yes, that's what I am proposing. It's no worse than what we have now, > and if writing Float32 a hundred times is too much effort, an > abbreviation like f = Float32 helps a lot. > > Anyway, following the Python credo "explicit is better than implicit", > I'd rather write explicit type conversions than have automagical ones > surprise me. How about making indexing (not slicing) arrays *always* return a 0-D array with copy instead of "view" semantics? This is nearly equivalent to creating a new scalar type, but without requiring major changes. I think it is probably even more useful for writing generic code because the returned value with retain array behavior. Also, the following example > a = array([1., 2.], Float) > b = array([3., 4.], Float32) > > a[0]*b would now return a Float array as Konrad desires because a[0] is a Float array. Using copy semantics would fix the unexpected behavior reported by Larry that kicked off this discussion. Slices are a different animal than indexing that would (and definitely should) continue to return view semantics. I further believe that all Numeric functions (sum, product, etc.) should return arrays all the time instead of converting implicitly converting them to Python scalars in special cases such as reductions of 1d arrays. I think the only reason for the silent conversion is that Python lists only allow integer values for use in indexing so that: >>> a = [1,2,3,4] >>> a[array(0)] Traceback (most recent call last): File "", line 1, in ? TypeError: sequence index must be integer Numeric arrays don't have this problem: >>> a = array([1,2,3,4]) >>> a[array(0)] 1 I don't think this alone is a strong enough reason for the conversion. Getting rid of special cases is more important because it makes behavior predictable to the novice (and expert), and it is easier to write generic functions and be sure they will not break a year from now when one of the special cases occurs. Are there other reasons why scalars are returned? On coercion rules: As for adding the array to a scalar value, x = array([3., 4.], Float32) y = x + 1. Should y be a Float or a Float32? I like numarray's coercion rules better (Float32). I have run into this upcasting to many times to count. Explicit and implicit aren't obvious to me here. The user explicitly cast x to be Float32, but because of the limited numeric types in Python, the result is upcast to a double. Here's another example, >>> from Numeric import * >>> a = array((1,2,3,4), UnsignedInt8) >>> left_shift(a,3) array([ 8, 16, 24, 32],'i') I had to stare at this for a while when I first saw it before I realized the integer value 3 upcast the result to be type 'i'. So, I think this is confusing and rarely the desired behavior. The fact that this is inconsistent with Python's "always upcast" rule is minor for me. The array math operations are necessarily a different animal from scalar operations because of the extra types supported. Defining these operations in a way that is most convenient for working with array data seems OK. On the other hand, I don't think a jump from 21 to 22 is enough of a jump to make such a change. Numeric progresses pretty fast, and users don't expect such a major shift in behavior. I do think, though, that the computational speed issue is going to result in numarray and Numeric existing side-by-side for a long time. Perhaps we should think create an "interim" Numeric version (maybe starting at 30), that tries to be compatible with the upcoming numarray, in its coercion rules, etc? Advanced features such as indexing arrays with arrays, memory mapped arrays, floating point exception behavior, etc. won't be there, but it should help people transition their codes to work with numarray, and also offer a speedy alternative. A second choice would be to make SciPy's Numeric implementation the intermediate step. It already produces NaN's during div-by-zero exceptions according to numarray's rules. The coercion modifications could also be incorporated. > > Finally, we can always lobby for inclusion of the new scalar types > into the core interpreter, with a corresponding syntax for literals, > but it would sure help if we could show that the system works and > suffers only from the lack of literals. There was a seriously considered debate last year about unifying Python's numeric model into a single type to get rid of the integer-float distinction, at last year's Python conference and the ensuing months. While it didn't (and won't) happen, I'd be real surprised if the general community would welcome us suggesting stirring yet another type into the brew. Can't we make 0-d arrays work as an alternative? eric > > Konrad. > -- > ------------------------------------------------------------------------ -- > ----- > Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------ -- > ----- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From hinsen at cnrs-orleans.fr Mon Jun 10 10:13:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 10 10:13:05 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000301c21014$50e991b0$6b01a8c0@ericlaptop> References: <000301c21014$50e991b0$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > How about making indexing (not slicing) arrays *always* return a 0-D > array with copy instead of "view" semantics? This is nearly equivalent > to creating a new scalar type, but without requiring major changes. I ... I think this was discussed as well a long time ago. For pure Python code, this would be a very good solution. But > I think the only reason for the silent conversion is that Python lists > only allow integer values for use in indexing so that: There are some more cases where the type matters. If you call C routines that do argument parsing via PyArg_ParseTuple and expect a float argument, a rank-0 float array will raise a TypeError. All the functions from the math module work like that, and of course many in various extension modules. In the ideal world, there would not be any distinction between scalars and rank-0 arrays. But I don't think we'll get there soon. > On coercion rules: > > As for adding the array to a scalar value, > > x = array([3., 4.], Float32) > y = x + 1. > > Should y be a Float or a Float32? I like numarray's coercion rules > better (Float32). I have run into this upcasting to many times to Statistically they probably give the desired result in more cases. But they are in contradiction to Python principles, and consistency counts a lot on my value scale. I propose an experiment: ask a few Python programmers who are not using NumPy what type they would expect for the result. I bet that not a single one would answer "Float32". > On the other hand, I don't think a jump from 21 to 22 is enough of a > jump to make such a change. Numeric progresses pretty fast, and users I don't think any increase in version number is enough for incompatible changes. For many users, NumPy is just a building block, they install it because some other package(s) require it. If a new version breaks those other packages, they won't be happy. The authors of those packages won't be happy either, as they will get the angry letters. As an author of such packages, I am speaking from experience. I have even considered to make my own NumPy distribution under a different name, just to be safe from changes in NumPy that break my code (in the past it was mostly the installation code that was broken when arrayobject.h changed its location). In my opinion, anything that is not compatible with Numeric should not be called Numeric. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From oliphant.travis at ieee.org Mon Jun 10 11:13:07 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 10 11:13:07 2002 Subject: [Numpy-discussion] 0-D arrays as scalars In-Reply-To: References: <000301c21014$50e991b0$6b01a8c0@ericlaptop> Message-ID: <1023732818.28672.13.camel@travis> On Mon, 2002-06-10 at 11:08, Konrad Hinsen wrote: > "eric jones" writes: > > > > I think the only reason for the silent conversion is that Python lists > > only allow integer values for use in indexing so that: > > There are some more cases where the type matters. If you call C > routines that do argument parsing via PyArg_ParseTuple and expect a > float argument, a rank-0 float array will raise a TypeError. All the > functions from the math module work like that, and of course many in > various extension modules. Actually, the code in PyArg_ParseTuple asks the object it gets if it knows how to be a float. 0-d arrays for some time have known how to be Python floats. So, I do not think this error occurs as you've described. Could you demonstrate this error? In fact most of the code in Python itself which needs scalars allows arbitrary objects provided the object has defined functions which return a Python scalar. The only exception to this that I've seen is the list indexing code (probably for optimization purposes). There could be more places, but I have not found them or heard of them. Originally Numeric arrays did not define appropriate functions for 0-d arrays to act like scalars in the right places. For quite a while, they have now. I'm quite supportive of never returning Python scalars from Numeric array operations unless specifically requested (e.g. the toscalar method). > > On coercion rules: > > > > As for adding the array to a scalar value, > > > > x = array([3., 4.], Float32) > > y = x + 1. > > > > Should y be a Float or a Float32? I like numarray's coercion rules > > better (Float32). I have run into this upcasting to many times to > > Statistically they probably give the desired result in more cases. But > they are in contradiction to Python principles, and consistency counts > a lot on my value scale. > > I propose an experiment: ask a few Python programmers who are not > using NumPy what type they would expect for the result. I bet that not > a single one would answer "Float32". > I'm not sure I agree with that at all. On what reasoning is that presumption based? If I encounter a Python object that I'm unfamiliar with, I don't presume to know how it will define multiplication. From paul at pfdubois.com Mon Jun 10 11:20:06 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 11:20:06 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <000001c210ab$47b261c0$0c01a8c0@NICKLEBY> We have certainly beaten this topic to death in the past. It keeps coming up because there is no good way around it. Two points about the x + 1.0 issue: 1. How often this occurs is really a function of what you are doing. For those using Numeric Python as a kind of MATLAB clone, who are typing interactively, the size issue is of less importance and the easy expression is of more importance. To those writing scripts to batch process or writing steered applications, the size issue is more important and the easy expression less important. I'm using words like less and more here because both issues matter to everyone at some time, it is just a question of relative frequency of concern. 2. Part of what I had in mind with the kinds module proposal PEP 0242 was dealing with the literal issue. There had been some proposals to make literals decimal numbers or rationals, and that got me thinking about how to defend myself if they did it, and also about the fact that Python doesn't have Fortran's kind concept which you can use to gain a more platform-independent calculation. >From the PEP this example In module myprecision.py: import kinds tinyint = kinds.int_kind(1) single = kinds.float_kind(6, 90) double = kinds.float_kind(15, 300) csingle = kinds.complex_kind(6, 90) In the rest of my code: from myprecision import tinyint, single, double, csingle n = tinyint(3) x = double(1.e20) z = 1.2 # builtin float gets you the default float kind, properties unknown w = x * float(x) # but in the following case we know w has kind "double". w = x * double(z) u = csingle(x + z * 1.0j) u2 = csingle(x+z, 1.0) Note how that entire code can then be changed to a higher precision by changing the arguments in myprecision.py. Comment: note that you aren't promised that single != double; but you are promised that double(1.e20) will hold a number with 15 decimal digits of precision and a range up to 10**300 or that the float_kind call will fail. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Konrad Hinsen > Sent: Monday, June 10, 2002 10:08 AM > To: eric jones > Cc: numpy-discussion at lists.sourceforge.net > Subject: Re: FW: [Numpy-discussion] Bug: extremely misleading > array behavior > > > "eric jones" writes: > > > How about making indexing (not slicing) arrays *always* > return a 0-D > > array with copy instead of "view" semantics? This is nearly > > equivalent to creating a new scalar type, but without > requiring major > > changes. I > ... > > I think this was discussed as well a long time ago. For pure > Python code, this would be a very good solution. But > > > I think the only reason for the silent conversion is that > Python lists > > only allow integer values for use in indexing so that: > > There are some more cases where the type matters. If you call > C routines that do argument parsing via PyArg_ParseTuple and > expect a float argument, a rank-0 float array will raise a > TypeError. All the functions from the math module work like > that, and of course many in various extension modules. > > In the ideal world, there would not be any distinction > between scalars and rank-0 arrays. But I don't think we'll > get there soon. > > > On coercion rules: > > > > As for adding the array to a scalar value, > > > > x = array([3., 4.], Float32) > > y = x + 1. > > > > Should y be a Float or a Float32? I like numarray's coercion rules > > better (Float32). I have run into this upcasting to many times to > > Statistically they probably give the desired result in more > cases. But they are in contradiction to Python principles, > and consistency counts a lot on my value scale. > > I propose an experiment: ask a few Python programmers who are > not using NumPy what type they would expect for the result. I > bet that not a single one would answer "Float32". > > > On the other hand, I don't think a jump from 21 to 22 is > enough of a > > jump to make such a change. Numeric progresses pretty > fast, and users > > I don't think any increase in version number is enough for > incompatible changes. For many users, NumPy is just a > building block, they install it because some other package(s) > require it. If a new version breaks those other packages, > they won't be happy. The authors of those packages won't be > happy either, as they will get the angry letters. > > As an author of such packages, I am speaking from experience. > I have even considered to make my own NumPy distribution > under a different name, just to be safe from changes in NumPy > that break my code (in the past it was mostly the > installation code that was broken when arrayobject.h changed > its location). > > In my opinion, anything that is not compatible with Numeric > should not be called Numeric. > > Konrad. > -- > -------------------------------------------------------------- > ----------------- > Konrad Hinsen | E-Mail: > hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > -------------------------------------------------------------- > ----------------- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's > Conference August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From perry at stsci.edu Mon Jun 10 12:07:15 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jun 10 12:07:15 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000001c210ab$47b261c0$0c01a8c0@NICKLEBY> Message-ID: : > > We have certainly beaten this topic to death in the past. It keeps > coming up because there is no good way around it. > Ain't that the truth. > Two points about the x + 1.0 issue: > > 1. How often this occurs is really a function of what you are doing. For > those using Numeric Python as a kind of MATLAB clone, who are typing > interactively, the size issue is of less importance and the easy > expression is of more importance. To those writing scripts to batch > process or writing steered applications, the size issue is more > important and the easy expression less important. I'm using words like > less and more here because both issues matter to everyone at some time, > it is just a question of relative frequency of concern. > We have many in the astronomical community that use IDL (instead of MATLAB) and for them size is an issue for interactive use. They often manipulate very large arrays interactively. Furthermore, many are astronomers who don't generally see themselves as programmers and who may write programs (perhaps not great programs) don't want to be bothered by such details even in a script (or they may want to read a "professional" program and not have to deal with such things). But you are right in that there is no solution that doesn't have some problems. Every array language deals with this in somewhat different ways I suspect. In IDL, the literals are generally smaller types (ints were (or used to be, I haven't used it myself in a while) 2 bytes, floats single precision) and there were ways of writing literals with higher precision (e.g., 2L, 2.0d-2). Since it was a language specifically intended to deal with numeric processing, supporting many scalar types made sense. Perry From perry at stsci.edu Mon Jun 10 13:07:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jun 10 13:07:04 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000301c21014$50e991b0$6b01a8c0@ericlaptop> Message-ID: : > I further believe that all Numeric functions (sum, product, etc.) should > return arrays all the time instead of converting implicitly converting > them to Python scalars in special cases such as reductions of 1d arrays. > I think the only reason for the silent conversion is that Python lists > only allow integer values for use in indexing so that: > > >>> a = [1,2,3,4] > >>> a[array(0)] > Traceback (most recent call last): > File "", line 1, in ? > TypeError: sequence index must be integer > > Numeric arrays don't have this problem: > > >>> a = array([1,2,3,4]) > >>> a[array(0)] > 1 > > I don't think this alone is a strong enough reason for the conversion. > Getting rid of special cases is more important because it makes behavior > predictable to the novice (and expert), and it is easier to write > generic functions and be sure they will not break a year from now when > one of the special cases occurs. > > Are there other reasons why scalars are returned? > Well, sure. It isn't just indexing lists directly, it would be anywhere in Python that you would use a number. In some contexts, the right thing may happen (where the function knows to try to obtain a simple number from an object), but then again, it may not (if calling a function where the number is used directly to index or slice). Here is another case where good arguments can be made for both sides. It really isn't an issue of functionality (one can write methods or functions to do what is needed), it's what the convenient syntax does. For example, if we really want a Python scalar but rank-0 arrays are always returned then something like this may be required: >>> x = arange(10) >>> a = range(10) >>> a[scalar(x[2])] # instead of a[x[2]] Whereas if simple indexing returns a Python scalar and consistency is desired in always having arrays returned one may have to do something like this >>> y = x.indexAsArray(2) # instead of y = x[2] or perhaps >>> y = x[ArrayAlwaysAsResultIndexObject(2)] # :-) with better name, of course One context or the other is going to be inconvenienced, but not prevented from doing what is needed. As long as Python scalars are the 'biggest' type of their kind, we strongly lean towards single elements being converted into Python scalars. It's our feeling that there are more surprises and gotchas, particularly for more casual users, on this side than on the uncertainty of an index returning an array or scalar. People writing code that expects to deal with uncertain dimensionality (the only place that this occurs) should be the ones to go the extra distance in more awkward syntax. Perry From eric at enthought.com Mon Jun 10 13:11:02 2002 From: eric at enthought.com (eric jones) Date: Mon Jun 10 13:11:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000001c210ab$47b261c0$0c01a8c0@NICKLEBY> Message-ID: <001301c210ba$d8e03910$6b01a8c0@ericlaptop> > We have certainly beaten this topic to death in the past. It keeps > coming up because there is no good way around it. > > Two points about the x + 1.0 issue: > > 1. How often this occurs is really a function of what you are doing. For > those using Numeric Python as a kind of MATLAB clone, who are typing > interactively, the size issue is of less importance and the easy > expression is of more importance. To those writing scripts to batch > process or writing steered applications, the size issue is more > important and the easy expression less important. I'm using words like > less and more here because both issues matter to everyone at some time, > it is just a question of relative frequency of concern. > > 2. Part of what I had in mind with the kinds module proposal PEP 0242 > was dealing with the literal issue. There had been some proposals to > make literals decimal numbers or rationals, and that got me thinking > about how to defend myself if they did it, and also about the fact that > Python doesn't have Fortran's kind concept which you can use to gain a > more platform-independent calculation. > > >From the PEP this example > > In module myprecision.py: > > import kinds > tinyint = kinds.int_kind(1) > single = kinds.float_kind(6, 90) > double = kinds.float_kind(15, 300) > csingle = kinds.complex_kind(6, 90) > > In the rest of my code: > > from myprecision import tinyint, single, double, csingle > n = tinyint(3) > x = double(1.e20) > z = 1.2 > # builtin float gets you the default float kind, properties > unknown > w = x * float(x) > # but in the following case we know w has kind "double". > w = x * double(z) > > u = csingle(x + z * 1.0j) > u2 = csingle(x+z, 1.0) > > Note how that entire code can then be changed to a higher > precision by changing the arguments in myprecision.py. > > Comment: note that you aren't promised that single != double; but > you are promised that double(1.e20) will hold a number with 15 > decimal digits of precision and a range up to 10**300 or that the > float_kind call will fail. > I think this is a nice feature, but it's actually heading the opposite direction of where I'd like to see things go for the general use of Numeric. Part of Python's appeal for me is that I don't have to specify types everywhere. I don't want to write explicit casts throughout equations because it munges up their readability. Of course, the casting sometimes can't be helped, but Numeric's current behavior really forces this explicit casting for array types besides double, int, and double complex. I like Numarray's fix for this problem. Also, as Perry noted, its unlikely to be used as an everyday command line tool (like Matlab) if the verbose casting is required. I'm interested to learn what other drawbacks yall found with always returning arrays (0-d for scalars) from Numeric functions. Konrad mentioned the tuple parsing issue in some extension libraries that expects floats, but it sounds like Travis thinks this is no longer an issue. Are there others? eric > > > > > -----Original Message----- > > From: numpy-discussion-admin at lists.sourceforge.net > > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > > Behalf Of Konrad Hinsen > > Sent: Monday, June 10, 2002 10:08 AM > > To: eric jones > > Cc: numpy-discussion at lists.sourceforge.net > > Subject: Re: FW: [Numpy-discussion] Bug: extremely misleading > > array behavior > > > > > > "eric jones" writes: > > > > > How about making indexing (not slicing) arrays *always* > > return a 0-D > > > array with copy instead of "view" semantics? This is nearly > > > equivalent to creating a new scalar type, but without > > requiring major > > > changes. I > > ... > > > > I think this was discussed as well a long time ago. For pure > > Python code, this would be a very good solution. But > > > > > I think the only reason for the silent conversion is that > > Python lists > > > only allow integer values for use in indexing so that: > > > > There are some more cases where the type matters. If you call > > C routines that do argument parsing via PyArg_ParseTuple and > > expect a float argument, a rank-0 float array will raise a > > TypeError. All the functions from the math module work like > > that, and of course many in various extension modules. > > > > In the ideal world, there would not be any distinction > > between scalars and rank-0 arrays. But I don't think we'll > > get there soon. > > > > > On coercion rules: > > > > > > As for adding the array to a scalar value, > > > > > > x = array([3., 4.], Float32) > > > y = x + 1. > > > > > > Should y be a Float or a Float32? I like numarray's coercion rules > > > better (Float32). I have run into this upcasting to many times to > > > > Statistically they probably give the desired result in more > > cases. But they are in contradiction to Python principles, > > and consistency counts a lot on my value scale. > > > > I propose an experiment: ask a few Python programmers who are > > not using NumPy what type they would expect for the result. I > > bet that not a single one would answer "Float32". > > > > > On the other hand, I don't think a jump from 21 to 22 is > > enough of a > > > jump to make such a change. Numeric progresses pretty > > fast, and users > > > > I don't think any increase in version number is enough for > > incompatible changes. For many users, NumPy is just a > > building block, they install it because some other package(s) > > require it. If a new version breaks those other packages, > > they won't be happy. The authors of those packages won't be > > happy either, as they will get the angry letters. > > > > As an author of such packages, I am speaking from experience. > > I have even considered to make my own NumPy distribution > > under a different name, just to be safe from changes in NumPy > > that break my code (in the past it was mostly the > > installation code that was broken when arrayobject.h changed > > its location). > > > > In my opinion, anything that is not compatible with Numeric > > should not be called Numeric. > > > > Konrad. > > -- > > -------------------------------------------------------------- > > ----------------- > > Konrad Hinsen | E-Mail: > > hinsen at cnrs-orleans.fr > > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > > France | Nederlands/Francais > > -------------------------------------------------------------- > > ----------------- > > > > _______________________________________________________________ > > > > Don't miss the 2002 Sprint PCS Application Developer's > > Conference August 25-28 in Las Vegas - > > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > > > > _______________________________________________ > > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From perry at stsci.edu Mon Jun 10 13:37:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jun 10 13:37:02 2002 Subject: [Numpy-discussion] default axis for numarray Message-ID: An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric. To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension. >>> x = arange(20) >>> x.shape = (2,10) >>> x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) >>> add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example, >>> sum = 0 >>> for subarr in x: sum += subarr acts on the first axis in effect. Likewise >>> reduce(add, x) does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always. The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes. Perry From eric at enthought.com Mon Jun 10 14:27:04 2002 From: eric at enthought.com (eric jones) Date: Mon Jun 10 14:27:04 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <001401c210c5$7c1025f0$6b01a8c0@ericlaptop> > : > > I further believe that all Numeric functions (sum, product, etc.) should > > return arrays all the time instead of converting implicitly converting > > them to Python scalars in special cases such as reductions of 1d arrays. > > I think the only reason for the silent conversion is that Python lists > > only allow integer values for use in indexing so that: > > > > >>> a = [1,2,3,4] > > >>> a[array(0)] > > Traceback (most recent call last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > > > Numeric arrays don't have this problem: > > > > >>> a = array([1,2,3,4]) > > >>> a[array(0)] > > 1 > > > > I don't think this alone is a strong enough reason for the conversion. > > Getting rid of special cases is more important because it makes behavior > > predictable to the novice (and expert), and it is easier to write > > generic functions and be sure they will not break a year from now when > > one of the special cases occurs. > > > > Are there other reasons why scalars are returned? > > > Well, sure. It isn't just indexing lists directly, it would be > anywhere in Python that you would use a number. Travis seemed to indicate that the Python would convert 0-d arrays to Python types correctly for most (all?) cases. Python indexing is a little unique because it explicitly requires integers. It's not just 0-d arrays that fail as indexes -- Python floats won't work either. As for passing arrays to functions expecting numbers, is it that much different than passing an integer into a function that does floating point operations? Python handles this casting automatically. It seems like is should do the same for 0-d arrays if they know how to "look like" Python types. > In some contexts, > the right thing may happen (where the function knows to try to obtain > a simple number from an object), but then again, it may not (if calling > a function where the number is used directly to index or slice). > > Here is another case where good arguments can be made for both > sides. It really isn't an issue of functionality (one can write > methods or functions to do what is needed), it's what the convenient > syntax does. For example, if we really want a Python scalar but > rank-0 arrays are always returned then something like this may > be required: > > >>> x = arange(10) > >>> a = range(10) > >>> a[scalar(x[2])] # instead of a[x[2]] Yes, this would be required for using them as array indexes. Or actually: >>> a[int(x[2])] > > Whereas if simple indexing returns a Python scalar and consistency > is desired in always having arrays returned one may have to do > something like this > > >>> y = x.indexAsArray(2) # instead of y = x[2] > > or perhaps > > >>> y = x[ArrayAlwaysAsResultIndexObject(2)] > # :-) with better name, of course > > One context or the other is going to be inconvenienced, but not > prevented from doing what is needed. Right. > > As long as Python scalars are the 'biggest' type of their kind, we > strongly lean towards single elements being converted into Python > scalars. It's our feeling that there are more surprises and gotchas, > particularly for more casual users, on this side than on the uncertainty > of an index returning an array or scalar. People writing code that > expects to deal with uncertain dimensionality (the only place that > this occurs) should be the ones to go the extra distance in more > awkward syntax. Well, I guess I'd like to figure out exactly what breaks before ruling it out because consistently returning the same type from functions/indexing is beneficial. It becomes even more beneficial with the exception behavior used by SciPy and numarray. The two breakage cases I'm aware of are (1) indexing and (2) functions that explicitly check for arguments of IntType, DoubleType, or ComplextType. When searching the standard library for these guys, they only turn up in copy, pickle, xmlrpclib, and the types module -- all in innocuous ways. Searching for 'float' (which is equal to FloatType) doesn't show up any code that breaks this either. A search of my site-packages had IntType tests used quite a bit -- primarily in SciPy. Some of these would go away with this change, and many were harmless. I saw a few that would need fixing (several in special.py), but the fix was trivial. eric From paul at pfdubois.com Mon Jun 10 16:06:02 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 16:06:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <001301c210ba$d8e03910$6b01a8c0@ericlaptop> Message-ID: <000101c210d3$4124cd70$0c01a8c0@NICKLEBY> > Konrad mentioned the tuple parsing issue in some > extension libraries that expects floats, but it sounds like > Travis thinks this is no longer an issue. Are there others? > > eric > Lots of code tries to distinguish cases using isinstance, and these tests will fail if given an array instance when they are testing for a float. From eric at enthought.com Mon Jun 10 16:16:03 2002 From: eric at enthought.com (eric jones) Date: Mon Jun 10 16:16:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> So one contentious issue a day isn't enough, huh? :-) > An issue that has been raised by scipy (most notably Eric Jones > and Travis Oliphant) has been whether the default axis used by > various functions should be changed from the current Numeric > default. This message is not directed at determining whether we > should change the current Numeric behavior for Numeric, but whether > numarray should adopt the same behavior as the current Numeric. > > To be more specific, certain functions and methods, such as > add.reduce(), operate by default on the first axis. For example, > if x is a 2 x 10 array, then add.reduce(x) results in a > 10 element array, where elements in the first dimension has > been summed over rather than the most rapidly varying dimension. > > >>> x = arange(20) > >>> x.shape = (2,10) > >>> x > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > >>> add.reduce(x) > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) The issue here is both consistency across a library and speed. >From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which functions use which and have resorted to explicitly using axis=-1 in my code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the argument list (but this is a different issue -- it just needs to be fixed). SciPy always uses axis=-1 for operations. There are 60+ functions with this convention. Choosing -1 offers the best cache use and therefore should be more efficient. Defaulting to the fastest behavior is convenient because new users don't need any special knowledge of Numeric's implementation to get near peak performance. Also, there is never a question about which axis is used for calculations. When using SciPy and Numeric, their function sets are completely co-mingled. When adding SciPy and Numeric's function counts together, it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a standard, it is impossible for the interface to become intuitive because of the exceptions to the rule from Numeric. So here what I think. All functions should default to the same axis so that the interface to common functions can become second nature for new users and experts alike. Further, the chosen axis should be the most efficient for the most cases. There are actually a few functions that, taken in isolation, I think should have axis=0. take() is an example. But, for the sake of consistency, it too should use axis=-1. It has been suggested to recommend that new users always specify axis=? as a keyword in functions that require an axis argument. This might be fine when writing modules, but always having to type: >>> sum(a,axis=-1) in command line mode is a real pain. Just a point about the larger picture here... The changes we're discussing are intended to clean up the warts on Numeric -- and, as good as it is overall, these are warts in terms of usability. Interfaces should be consistent across a library. The return types from functions should be consistent regardless of input type (or shape). Default arguments to the same keyword should also be consistent across functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 as default, returning arrays or scalars from Numeric functions and indexing), but the choice made should be applied as consistently as possible. We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come. Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size, but I also believe, based on the strength of Python, Numeric, and libraries such as Scientific and SciPy, the community can grow by 2 orders of magnitude over the next five years. This kind of growth can't occur if only savvy developers see the benefits of the elegant language. It can only occur if the general scientist see Python as a compelling alternative to Matlab (and IDL) as their day-in/day-out command line environment for scientific/engineering analysis. Making the interface consistent is one of several steps to making Python more attractive to this community. Whether the changes made for numarray should be migrated back into Numeric is an open question. I think they should, but see Konrad's counterpoint. I'm willing for SciPy to be the intermediate step in the migration between the two, but also think that is sub-optimal. > > Some feel that is contrary to expectations that the least rapidly > varying dimension should be operated on by default. There are > good arguments for both sides. For example, Konrad Hinsen has > argued that the current behavior is most compatible for behavior > of other Python sequences. For example, > > >>> sum = 0 > >>> for subarr in x: > sum += subarr > > acts on the first axis in effect. Likewise > > >>> reduce(add, x) > > does likewise. In this sense, Numeric is currently more consistent > with Python behavior. However, there are other functions that > operate on the most rapidly varying dimension. Unfortunately > I cannot currently access my old mail, but I think the rule > that was proposed under this argument was that if the 'reduction' > operation was of a structural kind, the first dimension is used. > If the reduction or processing step is 'time-series' oriented > (e.g., FFT, convolve) then the last dimension is the default. > On the other hand, some feel it would be much simpler to understand > if the last axis was the default always. > > The question is whether there is a consensus for one approach or > the other. We raised this issue at a scientific Birds-of-a-Feather > session at the last Python Conference. The sense I got there was > that most were for the status quo, keeping the behavior as it is > now. Is the same true here? In the absence of consensus or a > convincing majority, we will keep the behavior the same for backward > compatibility purposes. Obviously, I'm more opinionated about this now than I was then. I really urge you to consider using axis=-1 everywhere. SciPy is not the only scientific library, but I think it adds the most functions with a similar signature (the stats module is full of them). I very much hope for a consistent interface across all of Python's scientific functions because command line users aren't going to care whether sum() and kurtosis() come from different libraries, they just want them to behave consistently. eric > > Perry From ransom at physics.mcgill.ca Mon Jun 10 18:56:03 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Jun 10 18:56:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> References: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> Message-ID: <20020611015544.GC15736@spock.physics.mcgill.ca> I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole). I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)). Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile: >>> arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) >>> zeros(10, typecode='d') Traceback (most recent call last): File "", line 1, in ? TypeError: an integer is required >>> zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) Anyway, these little warts that we are discussing probably haven't kept my astronomer friends from switching from IDL, but as things progress and well-known astronomical or other scientific software packages are released based on Python (like pyraf) from well-known groups (like STScI/NASA), they will certainly take a closer look. On a slightly different note, my hearty thanks to all the developers for all of your hard work so far. Numeric/Numarray+Python is a fantastic platform for scientific computation. Cheers, Scott On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote: > So one contentious issue a day isn't enough, huh? :-) > > > An issue that has been raised by scipy (most notably Eric Jones > > and Travis Oliphant) has been whether the default axis used by > > various functions should be changed from the current Numeric > > default. This message is not directed at determining whether we > > should change the current Numeric behavior for Numeric, but whether > > numarray should adopt the same behavior as the current Numeric. > > > > To be more specific, certain functions and methods, such as > > add.reduce(), operate by default on the first axis. For example, > > if x is a 2 x 10 array, then add.reduce(x) results in a > > 10 element array, where elements in the first dimension has > > been summed over rather than the most rapidly varying dimension. > > > > >>> x = arange(20) > > >>> x.shape = (2,10) > > >>> x > > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > > >>> add.reduce(x) > > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) > > The issue here is both consistency across a library and speed. > > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which > functions use which and have resorted to explicitly using axis=-1 in my > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the > argument list (but this is a different issue -- it just needs to be > fixed). > > SciPy always uses axis=-1 for operations. There are 60+ functions with > this convention. Choosing -1 offers the best cache use and therefore > should be more efficient. Defaulting to the fastest behavior is > convenient because new users don't need any special knowledge of > Numeric's implementation to get near peak performance. Also, there is > never a question about which axis is used for calculations. > > When using SciPy and Numeric, their function sets are completely > co-mingled. When adding SciPy and Numeric's function counts together, > it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a > standard, it is impossible for the interface to become intuitive because > of the exceptions to the rule from Numeric. > > So here what I think. All functions should default to the same axis so > that the interface to common functions can become second nature for new > users and experts alike. Further, the chosen axis should be the most > efficient for the most cases. > > There are actually a few functions that, taken in isolation, I think > should have axis=0. take() is an example. But, for the sake of > consistency, it too should use axis=-1. > > It has been suggested to recommend that new users always specify axis=? > as a keyword in functions that require an axis argument. This might be > fine when writing modules, but always having to type: > > >>> sum(a,axis=-1) > > in command line mode is a real pain. > > Just a point about the larger picture here... The changes we're > discussing are intended to clean up the warts on Numeric -- and, as good > as it is overall, these are warts in terms of usability. Interfaces > should be consistent across a library. The return types from functions > should be consistent regardless of input type (or shape). Default > arguments to the same keyword should also be consistent across > functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 > as default, returning arrays or scalars from Numeric functions and > indexing), but the choice made should be applied as consistently as > possible. > > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. > > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, > but I also believe, based on the strength of Python, Numeric, and > libraries such as Scientific and SciPy, the community can grow by 2 > orders of magnitude over the next five years. This kind of growth can't > occur if only savvy developers see the benefits of the elegant language. > It can only occur if the general scientist see Python as a compelling > alternative to Matlab (and IDL) as their day-in/day-out command line > environment for scientific/engineering analysis. Making the interface > consistent is one of several steps to making Python more attractive to > this community. > > Whether the changes made for numarray should be migrated back into > Numeric is an open question. I think they should, but see Konrad's > counterpoint. I'm willing for SciPy to be the intermediate step in the > migration between the two, but also think that is sub-optimal. > > > > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has > > argued that the current behavior is most compatible for behavior > > of other Python sequences. For example, > > > > >>> sum = 0 > > >>> for subarr in x: > > sum += subarr > > > > acts on the first axis in effect. Likewise > > > > >>> reduce(add, x) > > > > does likewise. In this sense, Numeric is currently more consistent > > with Python behavior. However, there are other functions that > > operate on the most rapidly varying dimension. Unfortunately > > I cannot currently access my old mail, but I think the rule > > that was proposed under this argument was that if the 'reduction' > > operation was of a structural kind, the first dimension is used. > > If the reduction or processing step is 'time-series' oriented > > (e.g., FFT, convolve) then the last dimension is the default. > > On the other hand, some feel it would be much simpler to understand > > if the last axis was the default always. > > > > The question is whether there is a consensus for one approach or > > the other. We raised this issue at a scientific Birds-of-a-Feather > > session at the last Python Conference. The sense I got there was > > that most were for the status quo, keeping the behavior as it is > > now. Is the same true here? In the absence of consensus or a > > convincing majority, we will keep the behavior the same for backward > > compatibility purposes. > > Obviously, I'm more opinionated about this now than I was then. I > really urge you to consider using axis=-1 everywhere. SciPy is not the > only scientific library, but I think it adds the most functions with a > similar signature (the stats module is full of them). I very much hope > for a consistent interface across all of Python's scientific functions > because command line users aren't going to care whether sum() and > kurtosis() come from different libraries, they just want them to behave > consistently. > > eric > > > > > Perry > > > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From paul at pfdubois.com Mon Jun 10 20:20:02 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 20:20:02 2002 Subject: [Numpy-discussion] Selection of a new head nummie Message-ID: <000001c210f6$ce0be340$0c01a8c0@NICKLEBY> It is time to choose the next "head nummie", the chair of the set of sourceforge developers for Numerical Python. Now is an apt time since I will be changing assignments at LLNL in August to one which has less daily use of numpy. We have no procedure for doing this other than for us nummies to come to a consensus amongst ourselves, with the input of the Numpy community. After I return from Europython I hope we can make a selection during the first two weeks of July. From oliphant.travis at ieee.org Mon Jun 10 20:52:04 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 10 20:52:04 2002 Subject: [Numpy-discussion] Some missing keyword argument support fixed in CVS In-Reply-To: <20020611015544.GC15736@spock.physics.mcgill.ca> References: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> <20020611015544.GC15736@spock.physics.mcgill.ca> Message-ID: <1023767534.29865.8.camel@travis> On Mon, 2002-06-10 at 19:55, Scott Ransom wrote: > I have to admit that I agree with all of what Eric has to say > here -- even if it does cause some code breakage (I'm certainly > willing to do some maintenance on my code/modules that are > floating here and there so long as things continue to improve > with the language as a whole). I'm generally of the same opinion. > > I do think consistency is a very important aspect of getting > Numeric/Numarray accepted by a larger user base (and believe > me, my colaborators are probably sick of my Numeric Python > evangelism (but I like to think also a bit jealous of my NumPy > usage as they continue struggling with one-off C and Fortran > routines...)). > Another important factor is the support libraries. I know that something like Simulink (Matlab) is important to many of my colleagues in engineering. Simulink is the Mathworks version of visual programming which lets the user create a circuit visually which is then processed. I believe there was a good start to this sort of thing presented at the last Python Conference which was very encouraging. Other colleagues require something like a compiler to get C-code which will compile on a DSP board from a script and/or design session. I believe something like this would be very beneficial. > Another example of a glaring inconsistency in the current > implementation is this little number that has been bugging me > for awhile: > > >>> arange(10, typecode='d') > array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) > >>> ones(10, typecode='d') > array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) > >>> zeros(10, typecode='d') > Traceback (most recent call last): > File "", line 1, in ? > TypeError: an integer is required > >>> zeros(10, 'd') > array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) > This is now fixed in cvs, along with other keyword problems. The ufunc methods reduce and accumulate also now take a keyword argument in CVS. -Travis From paul at pfdubois.com Mon Jun 10 20:57:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 20:57:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <20020611015544.GC15736@spock.physics.mcgill.ca> Message-ID: <000001c210fb$e98098f0$0c01a8c0@NICKLEBY> I guess the argument for uniformity is pretty persuasive after all. (I know, I don't fit in on the Net, you can change my mind). Actually, don't we have a quick and dirty out here? Suppose we make the more uniform choice for Numarray, and then make a new module, say NumericCompatibility, which defines aliases to everything in Numarray that is the same as Numeric and then for the rest defines functions with the same names but the Numeric defaults, implemented by calling the ones in Numarray. Then changing "import Numeric" to "import NumericCompatibility as Numeric" ought to be enough to get someone working or close to working again. Someone posted something about "retrofitting" stuff from Numarray to Numeric. I cannot say strongly enough that I oppose this. Numeric itself must be frozen asap and eliminated eventually or there is no point to having developed a replacement that is easier to expand and maintain. We would have just doubled our workload for nothing. From hinsen at cnrs-orleans.fr Tue Jun 11 05:57:02 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 05:57:02 2002 Subject: [Numpy-discussion] 0-D arrays as scalars In-Reply-To: <1023732818.28672.13.camel@travis> References: <000301c21014$50e991b0$6b01a8c0@ericlaptop> <1023732818.28672.13.camel@travis> Message-ID: Travis Oliphant writes: > Actually, the code in PyArg_ParseTuple asks the object it gets if it > knows how to be a float. 0-d arrays for some time have known how to be > Python floats. So, I do not think this error occurs as you've > described. Could you demonstrate this error? No, it seems gone indeed. I remember a lengthy battle due to this problem, but that was a long time ago. > The only exception to this that I've seen is the list indexing code > (probably for optimization purposes). There could be more places, but > I have not found them or heard of them. Even for indexing, I don't see the point. If you test for the int type and do conversion attempts only for non-ints, that shouldn't slow down normal usage at all. > have now. I'm quite supportive of never returning Python scalars from > Numeric array operations unless specifically requested (e.g. the > toscalar method). I suppose this would be easy to implement, right? Then why not do it in a test release and find out empirically how much code it breaks. > presumption based? If I encounter a Python object that I'm unfamiliar > with, I don't presume to know how it will define multiplication. But if that object pretends to be a number type, a sequence type, a mapping type, etc., I do make assumptions about its behaviour. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Tue Jun 11 06:17:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 06:17:04 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> References: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > The issue here is both consistency across a library and speed. Consistency, fine. But not just within one package, also between that package and the language it is implemented in. Speed, no. If I need a sum along the first axis, I won't replace it by a sum across the last axis just because that is faster. > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which If you weight by frequency of usage, the first group gains a lot in importance. I just scanned through some of my code; almost all of the calls to Numeric routines are to functions whose default axis is zero. > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the That is certainly something that should be fixed, and I suppose no one objects to that. My vote is for keeping axis defaults as they are, both because the choices are reasonable (there was a long discussion about them in the early days of NumPy, and the defaults were chosen based on other array languages that had already been in use for years) and because any change would cause most existing NumPy code to break in many places, often giving wrong results instead of an error message. If a uniformization of the default is desired, I vote for axis=0, for two reasons: 1) Consistency with Python usage. 2) Minimization of code breakage. > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. What needs to be improved in that area? > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, I'd like to see evidence that changing the current NumPy behaviour would increase the size of the community. It would first of all split the current community, because many users (like myself) do not have enough time to spare to go through their code line by line in order to check for incompatibilities. That many others would switch to Python if only some changes were made is merely an hypothesis. > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has Actually the argument is not for the least rapidly varying dimension, but for the first dimension. The internal data layout is not significant for most Python array operations. We might for example offer a choice of C style and Fortran style data layout, enabling users to choose according to speed, compatibility, or just personal preference. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From paul at pfdubois.com Tue Jun 11 08:29:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Tue Jun 11 08:29:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <000001c2115c$5790dc50$0c01a8c0@NICKLEBY> Konrad's arguments are also very good. I guess there was a good reason we did all that arguing before -- another issue where there is a Perl-like "more than one way to do it" quandry. I think in my own coding reduction on the first dimension is the most frequent. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Konrad Hinsen > Sent: Tuesday, June 11, 2002 6:12 AM > To: eric jones > Cc: 'Perry Greenfield'; numpy-discussion at lists.sourceforge.net > Subject: Re: [Numpy-discussion] RE: default axis for numarray > > > "eric jones" writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also > between that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't > replace it by a sum across the last axis just because that is faster. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, > counting FFT, > > about 10 functions using axis=-1. To this day, I can't > remember which > > If you weight by frequency of usage, the first group gains a > lot in importance. I just scanned through some of my code; > almost all of the calls to Numeric routines are to functions > whose default axis is zero. > > > code. Unfortunately, many of the Numeric functions that > should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I > suppose no one objects to that. > > > My vote is for keeping axis defaults as they are, both > because the choices are reasonable (there was a long > discussion about them in the early days of NumPy, and the > defaults were chosen based on other array languages that had > already been in use for years) and because any change would > cause most existing NumPy code to break in many places, often > giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for > axis=0, for two reasons: > 1) Consistency with Python usage. > 2) Minimization of code breakage. > > > > We should also strive to make it as easy as possible to > write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? > > > Changes are going to create some backward incompatibilities > and that > > is definitely a bummer. But some changes are also necessary before > > the community gets big. I know the community is already reasonable > > size, > > I'd like to see evidence that changing the current NumPy > behaviour would increase the size of the community. It would > first of all split the current community, because many users > (like myself) do not have enough time to spare to go through > their code line by line in order to check for > incompatibilities. That many others would switch to Python if > only some changes were made is merely an hypothesis. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There > are good > > > arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data > layout is not significant for most Python array operations. > We might for example offer a choice of C style and Fortran > style data layout, enabling users to choose according to > speed, compatibility, or just personal preference. > > Konrad. > -- > -------------------------------------------------------------- > ----------------- > Konrad Hinsen | E-Mail: > hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > -------------------------------------------------------------- > ----------------- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's > Conference August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From eric at enthought.com Tue Jun 11 10:45:01 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 10:45:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> > "eric jones" writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also between > that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't replace > it by a sum across the last axis just because that is faster. The default axis choice influences how people choose to lay out their data in arrays. If the default is to sum down columns, then users lay out their data so that this is the order of computation. This results in strided operations. There are cases where you need to reduce over multiple data sets, etc. which is what the axis=? flag is for. But choosing the default to also be the most efficient just makes sense. The cost is even higher for wrappers around C libraries not written explicitly for Python (which is most of them), because you have to re-order the memory before passing the variables into the C loop. Of course, the axis=0 is faster for Fortran libraries with wrappers that are smart enough to recognize this (Pearu's f2py wrapped libraries now recognize this sort of thing). However, the marriage to C is more important as future growth will come in this area more than Fortran. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > > about 10 functions using axis=-1. To this day, I can't remember which > > If you weight by frequency of usage, the first group gains a lot in > importance. I just scanned through some of my code; almost all of the > calls to Numeric routines are to functions whose default axis > is zero. Right, but I think all the reduce operators (sum, product, etc.) should have been axis=-1 in the first place. > > > code. Unfortunately, many of the Numeric functions that should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I suppose no one > objects to that. Sounds like Travis already did it. Thanks. > > > My vote is for keeping axis defaults as they are, both because the > choices are reasonable (there was a long discussion about them in the > early days of NumPy, and the defaults were chosen based on other array > languages that had already been in use for years) and because any > change would cause most existing NumPy code to break in many places, > often giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for axis=0, > for two reasons: > 1) Consistency with Python usage. I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency. I don't see choosing axis=-1 as a break with Python -- multi-dimensional arrays are inherently different and used differently than lists of lists in Python. Further, reduce() is a "corner" of the Python language that has been superceded by list comprehensions. Choosing an alternative behavior that is generally better for array operations, as in the case of slices as views, is worth the change. > 2) Minimization of code breakage. Fixes will be necessary for sure, and I wish that wasn't the case. They will be necessary if we choose a consistent interface in either case. Choosing axis=0 or axis=-1 will not change what needs to be fixed -- only the function names searched for. > > > > We should also strive to make it as easy as possible to write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? Comparisons of complex numbers. But lets save that debate for later. > > > Changes are going to create some backward incompatibilities and that is > > definitely a bummer. But some changes are also necessary before the > > community gets big. I know the community is already reasonable size, > > I'd like to see evidence that changing the current NumPy behaviour > would increase the size of the community. It would first of all split > the current community, because many users (like myself) do not have > enough time to spare to go through their code line by line in order to > check for incompatibilities. That many others would switch to Python > if only some changes were made is merely an hypothesis. True. But I can tell you that we're definitely doing something wrong now. We have a superior language that is easier to integrate with legacy code and less expensive than the best competing alternatives. And, though I haven't done a serious market survey, I feel safe in saying we have significantly less than 1% of the potential user base. Even in communities where Python is relatively prevalent like astronomy, I would bet the every-day user base is less than 5% of the whole. There are a lot of holes to fill (graphics, comprehensive libraries, etc.) before we get up to the capabilities and quality of user interface that these tools have. Some of the interfaces problems are GUI and debugger related. Others are API related. Inconsistency in a library interface makes it harder to learn and is a wart. Whether it is as important as a graphics library? Probably not. But while we're building the next generation tool, we should fix things that make people wonder "why did they do this?". It is rarely a single thing that makes all the difference to a prospective user switching over. It is the overall quality of the tool that will sway them. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There are > > > good arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data layout > is not significant for most Python array operations. We might > for example offer a choice of C style and Fortran style data layout, > enabling users to choose according to speed, compatibility, or > just personal preference. In a way, as Pearu has shown in f2py, this is already possible by jiggering the stride and dimension entries, so this doesn't even require a change to the array descriptor (I don't think...). We could supply functions that returned a Fortran layout array. This would be beneficial for some applications outside of what we're discussing now that use Fortran extensions heavily. As long as it is transparent to the extension writer (which I think it can be) it sounds fine. I think the default constructor should return a C layout array though, and will be what 99% of the users will use. eric > > Konrad. > -- > ------------------------------------------------------------------------ -- > ----- > Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------ -- > ----- From perry at stsci.edu Tue Jun 11 11:07:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 11 11:07:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <001401c210c5$7c1025f0$6b01a8c0@ericlaptop> Message-ID: : > Travis seemed to indicate that the Python would convert 0-d arrays to > Python types correctly for most (all?) cases. Python indexing is a > little unique because it explicitly requires integers. It's not just 0-d > arrays that fail as indexes -- Python floats won't work either. > That's right, the primary breakage would be downstream use as indices. That appeared to be the case with the find() method of strings for example. > Yes, this would be required for using them as array indexes. Or > actually: > > >>> a[int(x[2])] > Yes, this would be sufficient for use as indices or slices. I'm not sure if there is any specific code that checks for float but doesn't invoke automatic conversion. I suspect that floats are much less of a problem this way, though will one necessarily know whether to use int(), float(), or scalar()? If one is writing a generic function that could accept int or float arrays then the generation of a int may be overpresuming what the result will be used for. (Though I don't have a particular example to give, I'll think about whether any exist). If the only type that could possibly cause problems is int, then int() should be all that would be necessary, but still awkward. Perry From eric at enthought.com Tue Jun 11 11:38:05 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 11:38:05 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <001f01c21177$01cd3000$6b01a8c0@ericlaptop> > From: Perry Greenfield [mailto:perry at stsci.edu] > : > > > Travis seemed to indicate that the Python would convert 0-d arrays to > > Python types correctly for most (all?) cases. Python indexing is a > > little unique because it explicitly requires integers. It's not just 0-d > > arrays that fail as indexes -- Python floats won't work either. > > > That's right, the primary breakage would be downstream use as > indices. That appeared to be the case with the find() method > of strings for example. > > > Yes, this would be required for using them as array indexes. Or > > actually: > > > > >>> a[int(x[2])] > > > Yes, this would be sufficient for use as indices or slices. I'm not > sure if there is any specific code that checks for float but doesn't > invoke automatic conversion. I suspect that floats are much less of > a problem this way, though will one necessarily know whether to use > int(), float(), or scalar()? If one is writing a generic function that > could accept int or float arrays then the generation of a int may > be overpresuming what the result will be used for. (Though I don't > have a particular example to give, I'll think about whether any > exist). If the only type that could possibly cause problems is int, > then int() should be all that would be necessary, but still awkward. If numarray becomes a first class citizen in the Python world as is hoped, maybe even this issue can be rectified. List/tuple indexing might be able to be changed to accept single element Integer arrays. I suspect this has major implications though -- probably a question for python-dev. eric From perry at stsci.edu Tue Jun 11 11:44:10 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 11 11:44:10 2002 Subject: [Numpy-discussion] repr for numarray Message-ID: While I'm flooding the mailing list with interface issues, I thought I would air another one (again, for numarray only). We've had some people internally complain that it does not make sense for repr to always generate a string capable of reconstructing the array. We often (usually) deal with multi-megabyte arrays. Typing a variable interactively for one of these arrays is invariably nonsensical. In such cases the user would be much better served by a message indicating the size, shape, type, etc. of the array than all of its contents. Yet on the other hand, it is undeniably convenient to use repr (by typing a variable) for small arrays interactively rather than using a print statement. This leads to 3 possible proposals for handling repr: 1) Do what is done now, always print a string that when eval'ed will recreate the array. 2) Only give summary information for the array regardless of its size. 3) Print the array if it has fewer than THRESHOLD number of elements, otherwise print a summary. THRESHOLD may be adjusted by the user. The last appears to be the most utilitarian to us, yet 'impure' somehow. Certainly there are may objects for which Python does not attempt to generate a string from repr that could be used with eval to recreate them. On the other hand, we are unaware of cases where repr sometimes does and sometimes does not. For example, strings may also get very large, but there is no threshold for generating the string. What do people think the most desirable solution? Keep in mind we intend to develop very efficient functions that will convert arrays to and from ascii representations (currently most of that code is in Python and quite slow in numarray at the moment) so it will not be necessary to use repr for this purpose. Only a few more issues to go, hopefully... Perry From perry at stsci.edu Tue Jun 11 11:53:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 11 11:53:02 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> Message-ID: : : > > What needs to be improved in that area? > > Comparisons of complex numbers. But lets save that debate for later. > No, no, let's do it now. ;-) We for one would like to know for numarray what should be done. If I might be presumptious enough to anticipate what Eric would say, it is that complex comparisons should be allowed, and that they use all the information in the complex number (real and imaginary) so that they lead to consistent results in sorting. But the purist argues that comparisons for complex numbers are meaningless. Well, yes, but there are cases in code where you don't which such comparisons to cause an exception. But even more important, there is at least one case which is practical. It isn't all that uncommon to want to eliminate duplicate values from arrays, and one would like to be able to do that for complex values as well. A common technique is to sort the values and then eliminate all identical adjacent values. A predictable comparison rule would allow that to be easily implemented. Eric, am I missing anything in this? It should be obvious that we agree with his position, but I am wondering if there are any arguments we have not heard yet that outweigh the advantages we see. Perry From ransom at physics.mcgill.ca Tue Jun 11 11:54:07 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Tue Jun 11 11:54:07 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: References: Message-ID: On June 11, 2002 02:43 pm, Perry Greenfield wrote: > Yet on the other hand, it is undeniably convenient to use > repr (by typing a variable) for small arrays interactively > rather than using a print statement. This leads to 3 possible > proposals for handling repr: > > 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > > The last appears to be the most utilitarian to us, yet > 'impure' somehow. Certainly there are may objects for which I vote for number 3, and have no hang-ups about any real or perceived "impurity". This is an issue that I deal with daily. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From hinsen at cnrs-orleans.fr Tue Jun 11 12:16:08 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 12:16:08 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> References: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> Message-ID: <200206111912.g5BJCj209939@chinon.cnrs-orleans.fr> > I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. It is an issue in much of my code, which contains stuff written with NumPy in mind as well as code using only standard Python operations (i.e. reduce()) which might however be applied to array objects. I also use arrays and nested lists interchangeably in many situations (NumPy functions accept nested lists instead of array arguments). Especially in interactive use, nested lists are easier to type. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with True, but this affects much fewer programs. Most of my code never modifies arrays after their creation, and then the difference in indexing behaviour doesn't matter. > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > arrays are inherently different and used differently than lists of lists As I said, I often use one or the other as a matter of convenience. I have always considered them similar types with somewhat different specialized behaviour. The most common situation is building up some table with lists (making use of the append function) and then converting the final construct into an array or not, depending on whether this seems advantageous. > in Python. Further, reduce() is a "corner" of the Python language that > has been superceded by list comprehensions. Choosing an alternative List comprehensions work in exactly the same way, by looping over the outermost index. > > 2) Minimization of code breakage. > > Fixes will be necessary for sure, and I wish that wasn't the case. They > will be necessary if we choose a consistent interface in either case. The current interface is not inconsistent. It follows a different logic than what some users expect, but there is a logic behind it. The current rules are the result of lengthy discussions and lengthy tests, though admittedly by a rather small group of people. If you arrange your arrays according to that logic, you almost never need to specify explicit axis arguments. > Choosing axis=0 or axis=-1 will not change what needs to be fixed -- > only the function names searched for. I disagree very much here. The fewer calls are concerned, the fewer mistakes are made, and the fewer modules have to be modified at all. Moreover, the functions that currently use axis=1 are more specialized and more likely to be called in similar contexts. They are also, in my limited experience, less often called with nested list arguments. I don't expect fixes to be as easy as searching for function names and adding an axis argument. Python is a very dynamic language, in which functions are objects like all others. They can be passed as arguments, stored in dictionaries and lists, assigned to variables, etc. In fact, instead of modifying any code, I'd rather write an interface module that emulates the old behaviour, which after all differs only in the default for one argument. The problem with this is that it adds another function call layer, which is rather expensive in Python. Which makes me wonder why we need this discussion at all. It is almost no extra effort to provide two different C modules that provide the same functions with different default arguments, and neither one needs to have any speed penalty. > True. But I can tell you that we're definitely doing something wrong > now. We have a superior language that is easier to integrate with > legacy code and less expensive than the best competing alternatives. > And, though I haven't done a serious market survey, I feel safe in > saying we have significantly less than 1% of the potential user base. I agree with that. But has anyone ever made a serious effort to find out why the whole world is not using Python? In my environment (which is too small to be representative for anything), the main reason is inertia. Most people don't want to invest any time to learn any new language, no matter what the advantages are (they remain hypothetical until you actually start to use the new language). I don't know anyone who has started to use Python and then dropped it because he was not satisfied with some aspect of the language or a library module. On the other hand, I do know projects that collapsed after a split in the user community due to some disagreement over minor details. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From gball at cfa.harvard.edu Tue Jun 11 12:25:08 2002 From: gball at cfa.harvard.edu (Greg Ball) Date: Tue Jun 11 12:25:08 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: Message-ID: > 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. I vote for 3) too. Especially annoying is when I mistakenly type a.shape instead of a.shape() interactively. Without the parentheses I get a bound method, the repr of which includes the repr for the whole array, and when this has > 25 million elements it really is a drag to wait for it all to finish spewing out... Getting sidetracked... is this repr of methods a feature? >>> l = [1,2,3,4] >>> l.sort >>> a = numarray.array(l) >>> a.shape It would seem more pythonic to get or similar? -- Greg Ball From hinsen at cnrs-orleans.fr Tue Jun 11 12:27:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 12:27:05 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > > The last appears to be the most utilitarian to us, yet > 'impure' somehow. Certainly there are may objects for which > Python does not attempt to generate a string from repr that > could be used with eval to recreate them. On the other hand, > we are unaware of cases where repr sometimes does and sometimes I don't see the problem. The documented behaviour would be that it doesn't allow reconstruction. If for some arrays that works nevertheless, who is going to complain? BTW, it would be nice if the summary would contain the values of some elements, to allow a quick identification of NaN arrays and similar problems. > does not. For example, strings may also get very large, but > there is no threshold for generating the string. Right. But in practice strings rarely do get that large. Arrays do. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From tim.hochberg at ieee.org Tue Jun 11 12:30:02 2002 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue Jun 11 12:30:02 2002 Subject: [Numpy-discussion] repr for numarray References: Message-ID: <033001c2117e$3cd0e5f0$061a6244@cx781526b> I would also be inclined toward option 3 with the caveat that THRESHOLD=None should print all the values for the purists out there (or if you want to use repr to dump the array to some sort of flat file). -tim > On June 11, 2002 02:43 pm, Perry Greenfield wrote: > > > Yet on the other hand, it is undeniably convenient to use > > repr (by typing a variable) for small arrays interactively > > rather than using a print statement. This leads to 3 possible > > proposals for handling repr: > > > > 1) Do what is done now, always print a string that when > > eval'ed will recreate the array. > > > > 2) Only give summary information for the array regardless of > > its size. > > > > 3) Print the array if it has fewer than THRESHOLD number of > > elements, otherwise print a summary. THRESHOLD may be adjusted > > by the user. > > > > The last appears to be the most utilitarian to us, yet > > 'impure' somehow. Certainly there are may objects for which > > I vote for number 3, and have no hang-ups about any real or perceived > "impurity". This is an issue that I deal with daily. > > Scott > > > -- > Scott M. Ransom Address: McGill Univ. Physics Dept. > Phone: (514) 398-6492 3600 University St., Rm 338 > email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 > GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From paul at pfdubois.com Tue Jun 11 13:55:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Tue Jun 11 13:55:01 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: <033001c2117e$3cd0e5f0$061a6244@cx781526b> Message-ID: <001e01c2118a$199a3210$0c01a8c0@NICKLEBY> MA users seem to all be happy with the facility in MA for limiting printing. >>> x=MA.arange(20) >>> x array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,]) >>> MA.set_print_limit(10) >>> x array([0,1,2,3,4,5,6,7,8,9,] + 10 more elements) >>> print x [0,1,2,3,4,5,6,7,8,9,] + 10 more elements >>> MA.set_print_limit(0) # no limit >>> x array([ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]) > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Tim Hochberg > Sent: Tuesday, June 11, 2002 12:29 PM > To: numpy-discussion at lists.sourceforge.net > Subject: Re: [Numpy-discussion] repr for numarray > > > > I would also be inclined toward option 3 with the caveat that > THRESHOLD=None should print all the values for the purists > out there (or if you want to use repr to dump the array to > some sort of flat file). > > -tim > > > > On June 11, 2002 02:43 pm, Perry Greenfield wrote: > > > > > Yet on the other hand, it is undeniably convenient to use > repr (by > > > typing a variable) for small arrays interactively rather > than using > > > a print statement. This leads to 3 possible proposals for > handling > > > repr: > > > > > > 1) Do what is done now, always print a string that when > eval'ed will > > > recreate the array. > > > > > > 2) Only give summary information for the array regardless of its > > > size. > > > > > > 3) Print the array if it has fewer than THRESHOLD number of > > > elements, otherwise print a summary. THRESHOLD may be adjusted by > > > the user. > > > > > > The last appears to be the most utilitarian to us, yet 'impure' > > > somehow. Certainly there are may objects for which > > > > I vote for number 3, and have no hang-ups about any real or > perceived > > "impurity". This is an issue that I deal with daily. > > > > Scott > > > > > > -- > > Scott M. Ransom Address: McGill Univ. Physics Dept. > > Phone: (514) 398-6492 3600 University St., Rm 338 > > email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 > > GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 > > > > _______________________________________________________________ > > > > Multimillion Dollar Computer Inventory > > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > > > > _______________________________________________ > > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From paul at pfdubois.com Tue Jun 11 13:57:03 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Tue Jun 11 13:57:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> One can make a case for allowing == and != for complex arrays, but > just doesn't make sense and should not be allowed. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Perry Greenfield > Sent: Tuesday, June 11, 2002 11:52 AM > To: eric jones; 'Konrad Hinsen' > Cc: numpy-discussion at lists.sourceforge.net > Subject: RE: [Numpy-discussion] RE: default axis for numarray > > > : > : > > > > What needs to be improved in that area? > > > > Comparisons of complex numbers. But lets save that debate > for later. > > > > No, no, let's do it now. ;-) We for one would like to know for > numarray what should be done. > > If I might be presumptious enough to anticipate what Eric > would say, it is that complex comparisons should be allowed, > and that they use all the information in the complex number > (real and imaginary) so that they lead to consistent results > in sorting. > > But the purist argues that comparisons for complex numbers > are meaningless. Well, yes, but there are cases in code where you > don't which such comparisons to cause an exception. But even > more important, there is at least one case which is > practical. It isn't all that uncommon to want to eliminate > duplicate values from arrays, and one would like to be able > to do that for > complex values as well. A common technique is to sort the > values and then eliminate all identical adjacent values. A > predictable comparison rule would allow that to be easily implemented. > > Eric, am I missing anything in this? It should be obvious > that we agree with his position, but I am wondering if there > are any arguments we have not heard yet that outweigh the > advantages we see. > > Perry > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ransom at physics.mcgill.ca Tue Jun 11 14:08:05 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Tue Jun 11 14:08:05 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> Message-ID: On June 11, 2002 04:56 pm, you wrote: > One can make a case for allowing == and != for complex arrays, but > > just doesn't make sense and should not be allowed. It depends if you think of complex numbers in phasor form or not. In phasor form, the amplitude of the complex number is certainly something that you could compare with > or < -- and in my opinion, that seems like a reasonable comparison. You _could_ do the same thing with the phases, except you run into the modulo 2pi thing... Scott > > -----Original Message----- > > From: numpy-discussion-admin at lists.sourceforge.net > > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > > Behalf Of Perry Greenfield > > Sent: Tuesday, June 11, 2002 11:52 AM > > To: eric jones; 'Konrad Hinsen' > > Cc: numpy-discussion at lists.sourceforge.net > > Subject: RE: [Numpy-discussion] RE: default axis for numarray > > > > > > : > > > > : > > > > What needs to be improved in that area? > > > > > > Comparisons of complex numbers. But lets save that debate > > > > for later. > > > > > > No, no, let's do it now. ;-) We for one would like to know for > > numarray what should be done. > > > > If I might be presumptious enough to anticipate what Eric > > would say, it is that complex comparisons should be allowed, > > and that they use all the information in the complex number > > (real and imaginary) so that they lead to consistent results > > in sorting. > > > > But the purist argues that comparisons for complex numbers > > are meaningless. Well, yes, but there are cases in code where you > > don't which such comparisons to cause an exception. But even > > more important, there is at least one case which is > > practical. It isn't all that uncommon to want to eliminate > > duplicate values from arrays, and one would like to be able > > to do that for > > complex values as well. A common technique is to sort the > > values and then eliminate all identical adjacent values. A > > predictable comparison rule would allow that to be easily implemented. > > > > Eric, am I missing anything in this? It should be obvious > > that we agree with his position, but I am wondering if there > > are any arguments we have not heard yet that outweigh the > > advantages we see. > > > > Perry > > > > _______________________________________________________________ > > > > Multimillion Dollar Computer Inventory > > Live Webcast Auctions Thru Aug. 2002 - > > http://www.cowanalexander.com/calendar > > > > > > > > > > _______________________________________________ > > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From eric at enthought.com Tue Jun 11 15:01:02 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 15:01:02 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <3D063B43.F2095A08@noaa.gov> Message-ID: <002801c21193$4fbf7f90$6b01a8c0@ericlaptop> > From: cbarker at localhost.localdomain [mailto:cbarker at localhost.localdomain] > > eric jones wrote: > > The default axis choice influences how people choose to lay out their > > data in arrays. If the default is to sum down columns, then users lay > > out their data so that this is the order of computation. > > This is absolutely true. I definitely choose my data layout to that the > various rank reducing operators do what I want. Another reason to have > consistency. So I don't really care which way is default, so the default > might as well be the better performing option. > > Of course, compatibility with previous versions is helpful too...arrrgg! > > What kind of a performance difference are we talking here anyway? Guess I ought to test instead of just saying it is so... I ran the following test of summing 200 sets of 10000 numbers. I expected a speed-up of about 2... I didn't get it. They are pretty much the same speed on my machine.?? (more later) C:\WINDOWS\system32>python ActivePython 2.2.1 Build 222 (ActiveState Corp.) based on Python 2.2.1 (#34, Apr 15 2002, 09:51:39) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> import time >>> a = ones((10000,200),Float) * arange(10000)[:,NewAxis] >>> b = ones((200,10000),Float) * arange(10000)[NewAxis,:] >>> t1 = time.clock();x=sum(a,axis=0);t2 = time.clock();print t2-t1 0.0772411018719 >>> t1 = time.clock();x=sum(b,axis=-1);t2 = time.clock();print t2-t1 0.079615705348 I also tried FFT, and did see a difference -- a speed up of 1.5+: >>> q = ones((1024,1024),Float) >>> t1 = time.clock();x = FFT.fft(q,axis=0);t2 = time.clock();print t2-t1 0.907373143793 >>> t1 = time.clock();x= FFT.fft(q,axis=-1);t2 = time.clock();print t2-t1 0.581641800843 >>> .907/.581 1.5611015490533564 Same in scipy >>> from scipy import * >>> a = ones((1024,1024),Float) >>> import time >>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1 0.870259488287 >>> t1 = time.clock(); q = fft(a,axis=-1); t2 = time.clock();print t2-t1 0.489512214541 >>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1 0.849266317367 >>> .849/.489 1.7361963190184049 So why is sum() the same speed for both cases? I don't know. I wrote a quick C program that is similar to how Numeric loops work, and I saw about a factor of 4 improvement by summing rows instead columns: C:\home\eric\wrk\axis_speed>gcc -O2 axis.c C:\home\eric\wrk\axis_speed>a summing rows (sec): 0.040000 summing columns (sec): 0.160000 pass These numbers are more like what I expected to see in the Numeric tests, but they are strange when compared to the Numeric timings -- the row sum is twice as fast as Numeric while the column sum is twice as slow. Because all the work is done in C and we're summing reasonably long arrays, the Numeric and C versions should be roughly the same speed. I can understand why summing rows is twice as fast in my C routine -- the Numeric loop code is not going to win awards for being optimal. What I don't understand is why the column summation is twice as slow in my C code as in Numeric. This should not be. I've posted it below in case someone can enlighten me. I think in general, you should see a speed up of 1.5+ when the summing over the "faster" axis. This holds true for fft in Python and my sum in C. As to why I don't in Numeric's sum(), I'm not sure. It is certainly true that non-strided access makes the best use of cache and *usually* is faster. eric ------------------------------------------------------------------------ -- #include #include int main() { double *a, *sum1, *sum2; int i, j, si, sj, ind, I, J; int small=200, big=10000; time_t t1, t2; I = small; J = big; si = big; sj = 1; a = (double*)malloc(I*J*sizeof(double)); sum1 = (double*)malloc(small*sizeof(double)); sum2 = (double*)malloc(small*sizeof(double)); //set memory for(i = 0; i < I; i++) { sum1[i] = 0; sum2[i] = 0; ind = si * i; for(j = 0; j < J; j++) { a[ind] = (double)j; ind += sj; } ind += si; } t1 = clock(); for(i = 0; i < I; i++) { sum1[i] = 0; ind = si * i; for(j = 0; j < J; j++) { sum1[i] += a[ind]; ind += sj; } ind += si; } t2 = clock(); printf("summing rows (sec): %f\n", (t2-t1)/(float)CLOCKS_PER_SEC); I = big; J = small; sj = big; si = 1; t1 = clock(); //set memory for(i = 0; i < I; i++) { ind = si * i; for(j = 0; j < J; j++) { a[ind] = (double)i; ind += sj; } ind += si; } for(j = 0; j < J; j++) { sum2[j] = 0; ind = sj * j; for(i = 0; i < I; i++) { sum2[j] += a[ind]; ind += si; } } t2 = clock(); printf("summing columns (sec): %f\n", (t2-t1)/(float)CLOCKS_PER_SEC); for (i=0; i < small; i++) { if(sum1[i] != sum2[i]) printf("failure %d, %f %f\n", i, sum1[i], sum2[i]); } printf("pass %f\n", sum1[0]); return 0; } From a.schmolck at gmx.net Tue Jun 11 16:03:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Tue Jun 11 16:03:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <001f01c21177$01cd3000$6b01a8c0@ericlaptop> References: <001f01c21177$01cd3000$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with > consistency in a very utilized area of Python because of efficiency. Ahh, a loaded example ;) I always thought that Numeric's view-slicing is a fairly problematic deviation from standard Python behavior and I'm not entirely sure why it needs to be done that way. Couldn't one have both consistency *and* efficiency by implementing a copy-on-demand scheme (which is what matlab does, if I'm not entirely mistaken; a real copy gets only created if either the original or the 'copy' is modified)? The current behavior seems not just problematic because it breaks consistency and hence user expectations, it also breaks code that is written with more pythonic sequences in mind (in a potentially hard to track down manner) and is, IMHO generally undesirable and error-prone, for pretty much the same reasons that dynamic scope and global variables are generally undesirable and error-prone -- one can unwittingly create intricate interactions between remote parts of a program that can be very difficult to track down. Obviously there *are* cases where one really wants a (partial) view of an existing array. It would seem to me, however, that these cases are exceedingly rare (In all my Numeric code I'm only aware of one instance where I actually want the aliasing behavior, so that I can manipulate a large array by manipulating its views and vice versa). Thus rather than being the default behavior, I'd rather see those cases accommodated by a special syntax that makes it explicit that an alias is desired and that care must be taken when modifying either the original or the view (e.g. one possible syntax would be ``aliased_vector = m.view[:,1]``). Again I think the current behavior is somewhat analogous to having variables declared in global (or dynamic) scope by default which is not only error-prone, it also masks those cases where global (or dynamic) scope *is* actually desired and necessary. It might be that the problems associated with a copy-on-demand scheme outweigh the error-proneness, the interface breakage that the deviation from standard python slicing behavior causes, but otherwise copying on slicing would be an backwards incompatibility in numarray I'd rather like to see (especially since one could easily add a view attribute to Numeric, for forwards-compatibility). I would also suspect that this would make it *a lot* easier to get numarray (or parts of it) into the core, but this is just a guess. > > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > arrays are inherently different and used differently than lists of lists > in Python. Further, reduce() is a "corner" of the Python language that > has been superceded by list comprehensions. Choosing an alternative Guido might nowadays think that adding reduce was as mistake, so in that sense it might be a "corner" of the python language (although some people, including me, still rather like using reduce), but I can't see how you can generally replace reduce with anything but a loop. Could you give an example? alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From oliphant at ee.byu.edu Tue Jun 11 16:26:02 2002 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Jun 11 16:26:02 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: Message-ID: > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. I think this is best. I don't believe the convention of repr is critical to numarray. -Travis From reggie at merfinllc.com Tue Jun 11 16:31:02 2002 From: reggie at merfinllc.com (Reggie Dugard) Date: Tue Jun 11 16:31:02 2002 Subject: [Numpy-discussion] repr for numarray Message-ID: <1023838218.23968.274.camel@auk> I vote for number 3 as well. As Paul already noted, his MA module already does something similar to this and I've found that very handy while working interactively. On Tue, 2002-06-11 at 11:43, Perry Greenfield wrote: > ... > Yet on the other hand, it is undeniably convenient to use > repr (by typing a variable) for small arrays interactively > rather than using a print statement. This leads to 3 possible > proposals for handling repr: > > 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > >... From eric at enthought.com Tue Jun 11 22:28:03 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 22:28:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <003201c211d1$d673de30$6b01a8c0@ericlaptop> > "eric jones" writes: > > > > I think the consistency with Python is less of an issue than it seems. > > I wasn't aware that add.reduce(x) would generated the same results as > > the Python version of reduce(add,x) until Perry pointed it out to me. > > There are some inconsistencies between Python the language and Numeric > > because the needs of the Numeric community. For instance, slices create > > views instead of copies as in Python. This was a correct break with > > consistency in a very utilized area of Python because of efficiency. > > Ahh, a loaded example ;) I always thought that Numeric's view-slicing is a > fairly problematic deviation from standard Python behavior and I'm not > entirely sure why it needs to be done that way. > > Couldn't one have both consistency *and* efficiency by implementing a > copy-on-demand scheme (which is what matlab does, if I'm not entirely > mistaken; a real copy gets only created if either the original or the > 'copy' > is modified)? Well, slices creating copies is definitely a bad idea (which is what I have heard proposed before) -- finite difference calculations (and others) would be very slow with this approach. Your copy-on-demand suggestion might work though. Its implementation would be more complex, but I don't think it would require cooperation from the Python core.? It could be handled in the ufunc code. It would also require extension modules to make copies before they modified any values. Copy-on-demand doesn't really fit with python's 'assignments are references" approach to things though does it? Using foo = bar in Python and then changing an element of foo will also change bar. So, I guess there would have to be a distinction made here. This adds a little more complexity. Personally, I like being able to pass views around because it allows for efficient implementations. The option to pass arrays into extension function and edit them in-place is very nice. Copy-on-demand might allow for equal efficiency -- I'm not sure. I haven't found the current behavior very problematic in practice and haven't seen that it as a major stumbling block to new users. I'm happy with status quo on this. But, if copy-on-demand is truly efficient and didn't make extension writing a nightmare, I wouldn't complain about the change either. I have a feeling the implementers of numarray would though. :-) And talk about having to modify legacy code... > The current behavior seems not just problematic because it > breaks consistency and hence user expectations, it also breaks code that > is > written with more pythonic sequences in mind (in a potentially hard to > track > down manner) and is, IMHO generally undesirable and error-prone, for > pretty > much the same reasons that dynamic scope and global variables are > generally > undesirable and error-prone -- one can unwittingly create intricate > interactions between remote parts of a program that can be very difficult > to > track down. > > Obviously there *are* cases where one really wants a (partial) view of an > existing array. It would seem to me, however, that these cases are > exceedingly > rare (In all my Numeric code I'm only aware of one instance where I > actually > want the aliasing behavior, so that I can manipulate a large array by > manipulating its views and vice versa). Thus rather than being the > default > behavior, I'd rather see those cases accommodated by a special syntax that > makes it explicit that an alias is desired and that care must be taken > when > modifying either the original or the view (e.g. one possible syntax would > be > ``aliased_vector = m.view[:,1]``). Again I think the current behavior is > somewhat analogous to having variables declared in global (or dynamic) > scope > by default which is not only error-prone, it also masks those cases where > global (or dynamic) scope *is* actually desired and necessary. > > It might be that the problems associated with a copy-on-demand scheme > outweigh the error-proneness, the interface breakage that the deviation > from > standard python slicing behavior causes, but otherwise copying on slicing > would be an backwards incompatibility in numarray I'd rather like to see > (especially since one could easily add a view attribute to Numeric, for > forwards-compatibility). I would also suspect that this would make it *a > lot* > easier to get numarray (or parts of it) into the core, but this is just a > guess. I think the two things Guido wants for inclusion of numarray is a consensus from our community on what we want, and (more importantly) a comprehensible code base. :-) If Numeric satisfied this 2nd condition, it might already be slated for inclusion... The 1st is never easy with such varied opinions -- I've about concluded that Konrad and I are anti-particles :-) -- but I hope it will happen. > > > > > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > > arrays are inherently different and used differently than lists of lists > > in Python. Further, reduce() is a "corner" of the Python language that > > has been superceded by list comprehensions. Choosing an alternative > > Guido might nowadays think that adding reduce was as mistake, so in that > sense > it might be a "corner" of the python language (although some people, > including > me, still rather like using reduce), but I can't see how you can generally > replace reduce with anything but a loop. Could you give an example? Your right. You can't do it without a loop. List comprehensions only supercede filter and map since they always return a list. I think reduce is here to stay. And, like you, I would actually be disappointed to see it go (I like lambda too...) The point is that I wouldn't choose the definition of sum() or product() based on the behavior of Python's reduce operator. Hmmm. So I guess that is key -- its really these *function* interfaces that I disagree with. So, how about add.reduce() keep axis=0 to match the behavior of Python, but sum() and friends defaulted to axis=-1 to match the rest of the library functions? It does break with consistency across the library, so I think it is sub-optimal. However, the distinction is reasonably clear and much less likely to cause confusion. It also allows FFT and future modules (wavelets or whatever) operate across the fastest axis by default while conforming to an intuitive standard. take() and friends would also become axis=-1 for consistency with all other functions. Would this be a reasonable compromise? eric > > > alex > > -- > Alexander Schmolck Postgraduate Research Student > Department of Computer Science > University of Exeter > A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From groma at nucleus.szbk.u-szeged.hu Tue Jun 11 23:30:03 2002 From: groma at nucleus.szbk.u-szeged.hu (Geza Groma) Date: Tue Jun 11 23:30:03 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? Message-ID: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> Using Numeric-21.0.win32-py2.2 I found this: Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> a = array((1, 1), 'b') >>> b = array((1, 0), 'b') >>> a and b array([1, 0],'b') >>> b and a array([1, 1],'b') >>> It looks like a bug, or at least very weird. a&b and b&a work correctly. -- G?za Groma Institute of Biophysics, Biological Research Center of Hungarian Academy of Sciences Temesv?ri krt.62. 6726 Szeged Hungary phone: +36 62 432 232 fax: +36 62 433 133 From hinsen at cnrs-orleans.fr Wed Jun 12 01:36:01 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 01:36:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> Message-ID: Scott Ransom writes: > On June 11, 2002 04:56 pm, you wrote: > > One can make a case for allowing == and != for complex arrays, but > > > just doesn't make sense and should not be allowed. > > It depends if you think of complex numbers in phasor form or not. In phasor > form, the amplitude of the complex number is certainly something that you > could compare with > or < -- and in my opinion, that seems like a reasonable Sure, but that doesn't give a full order relation for complex numbers. Two different numbers with equal magnitude would be neither equal nor would one be larger than the other. I agree with Paul that complex comparison should not be allowed. On the other hand, Perry's argument about sorting makes sense as well. Is there anything that prevents us from permitting arraysort() on complex arrays but not the comparison operators? Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Wed Jun 12 01:55:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 01:55:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <003201c211d1$d673de30$6b01a8c0@ericlaptop> References: <003201c211d1$d673de30$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > others) would be very slow with this approach. Your copy-on-demand > suggestion might work though. Its implementation would be more complex, > but I don't think it would require cooperation from the Python core.? It wouldn't, and I am not sure the implementation would be much more complex, but then I haven't tried. Having both copy on demand and views is difficult, both conceptually and implementationwise, but with copy-on-demand, views become less important. > Copy-on-demand doesn't really fit with python's 'assignments are > references" approach to things though does it? Using foo = bar in > Python and then changing an element of foo will also change bar. So, I That would be true as well with copy-on-demand arrays, as foo and bar would be the same object. Semantically, copy-on-demand would be equivalent to copying when slicing, which is exactly Python's behaviour for lists. > So, how about add.reduce() keep axis=0 to match the behavior of Python, > but sum() and friends defaulted to axis=-1 to match the rest of the That sounds like the most arbitrary inconsistency - add.reduce and sum are synonyms for me. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From ransom at physics.mcgill.ca Wed Jun 12 07:27:02 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Wed Jun 12 07:27:02 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> Message-ID: <20020612142600.GA28158@spock.physics.mcgill.ca> On Wed, Jun 12, 2002 at 10:32:12AM +0200, Konrad Hinsen wrote: > Scott Ransom writes: > > > On June 11, 2002 04:56 pm, you wrote: > > > One can make a case for allowing == and != for complex arrays, but > > > > just doesn't make sense and should not be allowed. > > > > It depends if you think of complex numbers in phasor form or not. In phasor > > form, the amplitude of the complex number is certainly something that you > > could compare with > or < -- and in my opinion, that seems like a reasonable > > Sure, but that doesn't give a full order relation for complex numbers. > Two different numbers with equal magnitude would be neither equal nor > would one be larger than the other. The comparison operators could be defined to operate on the magnitudes only. In this case you would get the kind of ugly result that two complex numbers with the same magnitude but different phases would be equal. Complex comparisons of this type could be quite useful to those (like me) who are do lots of Fourier domain signal processing. > I agree with Paul that complex comparison should not be allowed. On the > other hand, Perry's argument about sorting makes sense as well. Is there > anything that prevents us from permitting arraysort() on complex arrays > but not the comparison operators? How do you sort an array of complex numbers if you can't compare them? Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From hinsen at cnrs-orleans.fr Wed Jun 12 07:56:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 07:56:04 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <20020612142600.GA28158@spock.physics.mcgill.ca> (message from Scott Ransom on Wed, 12 Jun 2002 10:26:00 -0400) References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> <20020612142600.GA28158@spock.physics.mcgill.ca> Message-ID: <200206121447.g5CElEZ13245@chinon.cnrs-orleans.fr> > The comparison operators could be defined to operate on the > magnitudes only. In this case you would get the kind of ugly > result that two complex numbers with the same magnitude but > different phases would be equal. If you want to compare magnitudes, you can do that explicitly without much effort. > How do you sort an array of complex numbers if you can't compare them? You could for example sort by real part first and by imaginary part second. That would be a well-defined sort order, but not a useful definition of comparison in the mathematical sense. Konrad -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From a.schmolck at gmx.net Wed Jun 12 08:44:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 12 08:44:04 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <003201c211d1$d673de30$6b01a8c0@ericlaptop> References: <003201c211d1$d673de30$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > > Couldn't one have both consistency *and* efficiency by implementing a > > copy-on-demand scheme (which is what matlab does, if I'm not entirely > > mistaken; a real copy gets only created if either the original or the > > 'copy' > > is modified)? > > Well, slices creating copies is definitely a bad idea (which is what I > have heard proposed before) -- finite difference calculations (and > others) would be very slow with this approach. Your copy-on-demand > suggestion might work though. Its implementation would be more complex, > but I don't think it would require cooperation from the Python core.? > It could be handled in the ufunc code. It would also require extension > modules to make copies before they modified any values. > > Copy-on-demand doesn't really fit with python's 'assignments are > references" approach to things though does it? Using foo = bar in > Python and then changing an element of foo will also change bar. So, I My suggestion wouldn't conflict with any standard python behavior -- indeed the main motivation would be to have numarray conform to standard python behavior -- ``foo = bar`` and ``foo = bar[20:30]`` would behave exactly as for other sequences in python. The first one creates an alias to bar and in the second one the indexing operation creates a copy of part of the sequence which is then aliased to foo. Sequences are atomic in python, in the sense that indexing them creates a new object, which I think is not in contradiction to python's nice and consistent 'assignments are references' behavior. > guess there would have to be a distinction made here. This adds a > little more complexity. > > Personally, I like being able to pass views around because it allows for > efficient implementations. The option to pass arrays into extension > function and edit them in-place is very nice. Copy-on-demand might > allow for equal efficiency -- I'm not sure. I don't know how much of a performance drawback copy-on-demand would have when compared to views one -- I'd suspect it would be not significant, the fact that the runtime behavior becomes a bit more difficult to predict might be more of a drawback (but then I haven't heard matlab users complain and one could always force an eager copy). Another reason why I think a copy-on-demand scheme for slicing operations might be attractive is that I'd suspect one could gain significant benefits from doing other operations in a lazy fashion (plus optionally caching some results), too (transposing seems to cause in principle unnecessary copies at least in some cases at the moment). > > I haven't found the current behavior very problematic in practice and > haven't seen that it as a major stumbling block to new users. I'm happy >From my experience not even all people who use Numeric quite a lot are *aware* that the slicing behavior differs from python sequences. You might be right that in practice aliasing doesn't cause too many problems (as long as one sticks to arrays -- it certainly makes it harder to write code that operates on slices of generic sequence types) -- I'd really be interested to know whether there are cases where people have spent a long time to track down a bug caused by the view behavior. > with status quo on this. But, if copy-on-demand is truly efficient and > didn't make extension writing a nightmare, I wouldn't complain about the > change either. I have a feeling the implementers of numarray would > though. :-) And talk about having to modify legacy code... Since the vast majorities of slicing operations are currently not done to create views that are depedently modified, the backward incompatibility might not affect that much code. You are right though, that if Perry and the other numarray implementors don't think that copy-on-demand could be worthwhile the bother then its unlikely to happen. > > > forwards-compatibility). I would also suspect that this would make it > *a > > lot* > > easier to get numarray (or parts of it) into the core, but this is > just a > > guess. > > I think the two things Guido wants for inclusion of numarray is a > consensus from our community on what we want, and (more importantly) a > comprehensible code base. :-) If Numeric satisfied this 2nd condition, > it might already be slated for inclusion... The 1st is never easy with > such varied opinions -- I've about concluded that Konrad and I are > anti-particles :-) -- but I hope it will happen. As I said I can only guess about the politics involved, but I would think that before a significant piece of code such as numarray is incorporated into the core a relevant pep will be discussed in the newsgroup and that many people will feel more confortable about incorporating something into core-python that doesn't deviate significantly from standard behavior (i.e. doesn't view-slice), especially if it mainly caters to a rather specialized audience. But Guido obviously has the last word on those issues and if he doesn't have a problem either way than either way then as long as the community is undivided it shouldn't be an obstacle for inclusion. I agree that division of the community might pose the most significant problems -- MA for example *does* create copies on indexing if I'm not mistaken and the (desirable) transition process from Numeric to numarray also poses not insignificant difficulties and risks, especially since there now are quite a few important projects (not least of them scipy) that are build on top of Numeric and will have to be incorporated in the transition if numarray is to take over. Everything seems in a bit of a limbo right now. I'm currently working on a (fully-featured) matrix class that I'd like to work with both Numeric and numarray (and also scipy where available) more or less transparently for the user, which turns out to be much more difficult than I would have thought. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From paul at pfdubois.com Wed Jun 12 08:45:09 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Wed Jun 12 08:45:09 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <20020612142600.GA28158@spock.physics.mcgill.ca> Message-ID: <000101c21227$ea8b08c0$0c01a8c0@NICKLEBY> Using the term "comparison operators" is too loose and is causing a communication problem here. There are these comparison operators == and != (group 1) <, >, <=, and >= (group 2) For complex numbers it is easy to define the operators in group 1: x == y iff x.real == y.real and x.imag == y.imag. And, x != y iff (not x == y). I hardly think any other definition would be conceivable. The utility of this definition is questionable, as in most instances one should be making these comparisons with a tolerance, but there at least are cases when it makes sense. For group 2, there are a variety of possible definitions. Just to name three possible > definitions, the greater magnitude, the greater phase mod 2pi, or a radix-type order e.g., x > y if x.real > y.real or (x.real == y.real and x.imag > y.imag). A person can always define a function my_greater_than (c1, c2) to embody one of these definitions, and use it as an argument to a sort routine that takes a function argument to tell it how to sort. What you are arguing about is whether some particular version of this comparison should be "blessed" by attaching it to the operator ">". I do not think one of the definitions is such a clear winner that it should be blessed -- it would mean a casual reader could not guess what the operator means, and ">" does not have a doc string. Therefore I oppose doing so. From pearu at cens.ioc.ee Wed Jun 12 08:55:03 2002 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Wed Jun 12 08:55:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <200206121447.g5CElEZ13245@chinon.cnrs-orleans.fr> Message-ID: On Wed, 12 Jun 2002, Konrad Hinsen wrote: > > How do you sort an array of complex numbers if you can't compare them? > > You could for example sort by real part first and by imaginary part > second. That would be a well-defined sort order, but not a useful > definition of comparison in the mathematical sense. Releated discussion has been also in the scipy list. See the thread starting in http://www.scipy.org/site_content/mailman?fn=scipy-dev/2002-February/000364.html But here I would like to draw your attention to the suggestion that sort() function could take an optional argument that specifies the comparison method for complex numbers (for real numbers they are all equivalent). Here follows the releavant fragment of the message: http://www.scipy.org/site_content/mailman?fn=scipy-dev/2002-February/000366.html ... However, in different applications different conventions may be useful or reasonable for ordering complex numbers. Whatever is the convention, their mathematical correctness is irrelevant and this cannot be used as an argument for prefering one convention to another. I would propose providing number of efficient comparison methods for complex (or any) numbers that users may use in sort functions as an optional argument. For example, scipy.sort([2,1+2j],cmpmth='abs') -> [1+2j,2] # sorts by abs value scipy.sort([2,1+2j],cmpmth='real') -> [2,1+2j] # sorts by real part scipy.sort([2,1+2j],cmpmth='realimag') # sorts by real then by imag scipy.sort([2,1+2j],cmpmth='imagreal') # sorts by imag then by real scipy.sort([2,1+2j],cmpmth='absangle') # sorts by abs then by angle etc. scipy.sort([2,1+2j],cmpfunc=) Note that scipy.sort([-1,1],cmpmth='absangle') -> [1,-1] which also demonstrates the arbitrariness of sorting complex numbers. ... Regards, Pearu From Barrett at stsci.edu Wed Jun 12 08:55:05 2002 From: Barrett at stsci.edu (Paul Barrett) Date: Wed Jun 12 08:55:05 2002 Subject: [Numpy-discussion] RE: default axis for numarray References: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> Message-ID: <3D076EA9.4090209@STScI.Edu> eric jones wrote: > > I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with > consistency in a very utilized area of Python because of efficiency. I think consistency is an issue, particularly for novices. You cite the issue of slices creating views instead of copies as being the correct choice. But this decision is based solely on the perception that views are 'inherently' more efficient than copies and not on reasons of consistency or usability. I (a seasoned user) find view behavior to be annoying and have been caught out on this several times. For example, reversing in-place the elements of any array using slices, i.e. A = A[::-1], will give the wrong answer, unless you explicitly make a copy before doing the assignment. Whereas, copy behavior will do the right thing. I suggest that many novices will be caught out by this and similar examples, as I have been. Copy behavior for slices can be just as efficient as view behavior, if implemented as copy-on-write. The beauty of Python is that it allows the developer to spend much more time on consistency and usability issues than on implementation issues. Sadly, I think much of Numeric development is based solely on implementation issues to the detriment of consistency and usability. I don't have enough experience to definitely say whether axis=0 should be preferred over axis=-1 or vice versa. But is does appear that for the most general cases axis=0 is probably preferred. This is the default for the APL and J programming of which Numeric is based. Should we not continue to follow their lead? It might be nice to see a list of examples where axis=0 is the preferred default and the same for axis=-1. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From rlw at stsci.edu Wed Jun 12 09:27:03 2002 From: rlw at stsci.edu (Rick White) Date: Wed Jun 12 09:27:03 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: Here is what I see as the fundamental problem with implementing slicing in numarray using copy-on-demand instead views. Copy-on-demand requires the maintenance of a global list of all the active views associated with a particular array buffer. Here is a simple example: >>> a = zeros((5000,5000)) >>> b = a[49:51,50] >>> c = a[51:53,50] >>> a[50,50] = 1 The assignment to a[50,50] must trigger a copy of the array b; otherwise b also changes. On the other hand, array c does not need to be copied since its view does not include element 50,50. You could instead copy the array a -- but that means copying a 100 Mbyte array while leaving the original around (since b and c are still using it) -- not a good idea! The bookkeeping can get pretty messy (if you care about memory usage, which we definitely do). Consider this case: >>> a = zeros((5000,5000)) >>> b = a[0:-10,0:-10] >>> c = a[49:51,50] >>> del a >>> b[50,50] = 1 Now what happens? Either we can copy the array for b (which means two copies of the huge (5000,5000) array exist, one used by c and the new version used by b), or we can be clever and copy c instead. Even keeping track of the views associated with a buffer doesn't solve the problem of an array that is passed to a C extension and is modified in place. It would seem that passing an array into a C extension would always require all the associated views to be turned into copies. Otherwise we can't guarantee that views won't be modifed. This kind of state information with side effects leads to a system that is hard to develop, hard to debug, and really messes up the behavior of the program (IMHO). It is *highly* desirable to avoid it if possible. This is not to deny that copy-on-demand (with explicit views available on request) would have some desirable advantages for the behavior of the system. But we've worried these issues to death, and in the end were convinced that slices == views provided the best compromise between the desired behavior and a clean implementation. Rick ------------------------------------------------------------------ Richard L. White rlw at stsci.edu http://sundog.stsci.edu/rick/ Space Telescope Science Institute Baltimore, MD From btang at pacific.jpl.nasa.gov Wed Jun 12 09:35:05 2002 From: btang at pacific.jpl.nasa.gov (Benyang Tang) Date: Wed Jun 12 09:35:05 2002 Subject: [Numpy-discussion] Why the upcasting? Message-ID: <3D0778D8.D11B9E6@pacific.jpl.nasa.gov> The sum of an Int32 array and a Float32 array is a Float64 array, as shown by the following code: a = Numeric.array([1,2,3,4],'i') a.typecode(), a.itemsize() b = Numeric.array([1,2,3,4],'f') b.typecode(), b.itemsize() c=a+b c.typecode(), c.itemsize() >>> a = Numeric.array([1,2,3,4],'i') >>> a.typecode(), a.itemsize() ('i', 4) >>> >>> b = Numeric.array([1,2,3,4],'f') >>> b.typecode(), b.itemsize() ('f', 4) >>> c=a+b >>> c.typecode(), c.itemsize() ('d', 8) Why is the upcasting? I am using Linux/Pentium/python2.1/numpy20 . Thanks. Benyang Tang From perry at stsci.edu Wed Jun 12 09:45:06 2002 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jun 12 09:45:06 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: : > This kind of state information with side effects leads to a system that > is hard to develop, hard to debug, and really messes up the behavior of > the program (IMHO). It is *highly* desirable to avoid it if possible. > Rick beat me to the punch. The requirement for copy-on-demand definitely leads to a far more complex implementation with much more potential for misunderstood memory usage. You could do one small thing and suddenly force a spate of copies (perhaps cascading). There is no way we would taken on a redesign of Numeric with this requirement with the resources we have available. > This is not to deny that copy-on-demand (with explicit views available > on request) would have some desirable advantages for the behavior of > the system. But we've worried these issues to death, and in the end > were convinced that slices == views provided the best compromise > between the desired behavior and a clean implementation. > Rick's explanation doesn't really address the other position which is slices should force immediate copies. This isn't a difficult implementation issue by itself. But it does raise some related implementation questions. Supposing one does feel that views are a feature one wants even though they are not the default, it turns out that it isn't all that simple to obtain views without sacrificing ordinary slicing syntax to obtain a view. It is simple to obtain copies of view slices though. Slicing views may not be important to everyone. It is important to us (and others) and we do see a number of situations where forcing copies to operate on array subsets would be a serious performance problem. We did discuss this issue with Guido and he did not indicate that having different behavior on slicing with arrays would be a show stopper for acceptance into the Standard Library. We are also aware that there is no great consensus on this issue (even internally at STScI :-). Perry Greenfield From cookedm at physics.mcmaster.ca Wed Jun 12 10:48:01 2002 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Jun 12 10:48:01 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> (Geza Groma's message of "Wed, 12 Jun 2002 08:27:57 +0200") References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> Message-ID: At some point, Geza Groma wrote: > Using Numeric-21.0.win32-py2.2 I found this: > > Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. >>>> from Numeric import * >>>> a = array((1, 1), 'b') >>>> b = array((1, 0), 'b') >>>> a and b > array([1, 0],'b') >>>> b and a > array([1, 1],'b') >>>> > > It looks like a bug, or at least very weird. a&b and b&a work correctly. Nope. From the Python language reference (5.10 Boolean operations): The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned. Since in your case both a and b are true (they aren't zero-length sequences, etc.), the last value will be returned. It works for other types too, of course: Python 2.1.3 (#1, May 23 2002, 09:00:41) [GCC 3.1 (Debian)] on linux2 Type "copyright", "credits" or "license" for more information. >>> a = 'This is a' >>> b = 'This is b' >>> a and b 'This is b' >>> b and a 'This is a' >>> -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke |cookedm at mcmaster.ca From hinsen at cnrs-orleans.fr Wed Jun 12 11:11:15 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 11:11:15 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <3D076EA9.4090209@STScI.Edu> References: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> <3D076EA9.4090209@STScI.Edu> Message-ID: Paul Barrett writes: > I think consistency is an issue, particularly for novices. You cite ... Finally a contribution that I can fully agree with :-) > I don't have enough experience to definitely say whether axis=0 should > be preferred over axis=-1 or vice versa. But is does appear that for > the most general cases axis=0 is probably preferred. This is the > default for the APL and J programming of which Numeric is based. > Should we not continue to follow their lead? It might be nice to see This the internal logic I referred to briefly earlier, but I didn't have the time to explain it in more detail. Now I have :-) The basic idea is that an array is seen as an array of array values. The N dimensions are split into two parts, the first N1 dimensions describe the shape of the "total" array, and the remaining N2=N-N1 dimensions describe the shape of the array-valued elements of the array. I suppose some examples will help: - A rank-1 array could be seen either as a vector of scalars (N1 = 1) or as a scalar containing a vector (N1 = 0), in practice there is no difference between these views. - A rank-2 array could be seen as a matrix (N1=2), as a vector of vectors (N1=1) or as a scalar containing a matrix (N1=0). The first and the last come down to the same, but the middle one doesn't. - A discretized vector field (i.e. one 3D vector value for each point on a 3D grid) is represented by a rank-6 array, with N1=3 and N2=3. Array operations are divided into two classes, "structural" and "element" operations. Element operations do something on each individual element of an array, returning a new array with the same "outer" shape, although the element shape may be different. Structural operations work on the outer shape, returning a new array with a possibly different outer shape but the same element shape. The most frequent element operations are addition, multiplication, etc., which work on scalar elements only. They need no axis argument at all. Element operations that work on rank-1 elements have a default axis of -1, I think FFT has been quoted as an example a few times. There are no element operations that work on higher-rank elements, but they are imaginable. A 2D FFT routine would default to axis=-2. Structural operations, which are by far the most frequent after scalar element operations, default to axis=0. They include reduction and accumulation, sorting, selection (take, repeat, ...) and some others. I hope this clarifies the choice of default axis arguments in the current NumPy. It is most definitely not arbitrary or accidental. If you follow the data layout principles explained above, you always never need to specify an explicit axis argument. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From reggie at merfinllc.com Wed Jun 12 11:56:05 2002 From: reggie at merfinllc.com (Reggie Dugard) Date: Wed Jun 12 11:56:05 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> Message-ID: <1023908127.25709.80.camel@auk> This is not, in fact, a bug although I've fallen prey to the same mistake myself. I'm assuming what you really wanted was to use logical_and: Python 2.2.1 (#1, Apr 29 2002, 15:21:53) [GCC 3.0.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> a = array((1,1), 'b') >>> b = array((1,0), 'b') >>> logical_and(a,b) array([1, 0],'b') >>> logical_and(b,a) array([1, 0],'b') >>> >From the python documentation: "The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned." So the "and" is just returning its second argument, since both arguments are considered "True" (containing at least 1 "True" element). On Tue, 2002-06-11 at 23:27, Geza Groma wrote: > Using Numeric-21.0.win32-py2.2 I found this: > > Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Numeric import * > >>> a = array((1, 1), 'b') > >>> b = array((1, 0), 'b') > >>> a and b > array([1, 0],'b') > >>> b and a > array([1, 1],'b') > >>> > > It looks like a bug, or at least very weird. a&b and b&a work correctly. > > -- > G?za Groma > Institute of Biophysics, > Biological Research Center of Hungarian Academy of Sciences > Temesv?ri krt.62. > 6726 Szeged > Hungary > phone: +36 62 432 232 > fax: +36 62 433 133 > > > > _______________________________________________________________ > > Sponsored by: > ThinkGeek at http://www.ThinkGeek.com/ > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion Reggie Dugard Merfin, LLC From oliphant.travis at ieee.org Wed Jun 12 12:03:21 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Jun 12 12:03:21 2002 Subject: [Numpy-discussion] Complex comparisions In-Reply-To: <000101c21227$ea8b08c0$0c01a8c0@NICKLEBY> References: <000101c21227$ea8b08c0$0c01a8c0@NICKLEBY> Message-ID: <1023908597.21793.5.camel@travis> I'd be interested to know what IDL does? Does it compare complex numbers. Matlab allows comparisons of complex numbers but just compares the real part. I think this is reasonable. Often during a calculation of limited precision one ends up with a complex number when the result is in a "mathematically pure sense" real. I guess I trust the user to realize that if they are comparing numbers they know what they mean --- (only real numbers are compared so the complex part is ignored). -Travis From rlw at stsci.edu Wed Jun 12 13:25:02 2002 From: rlw at stsci.edu (Rick White) Date: Wed Jun 12 13:25:02 2002 Subject: [Numpy-discussion] Complex comparisions In-Reply-To: <1023908597.21793.5.camel@travis> Message-ID: On 12 Jun 2002, Travis Oliphant wrote: > I'd be interested to know what IDL does? Does it compare complex > numbers. Well, that was an interesting question with a surprising answer (at least to me, a long-time IDL user): (1) IDL allows comparisons of complex number using equality and inequality, but attempts to compare using GT, LT, etc. cause an illegal exception. (2) IDL sorts complex numbers by the amplitude. It ignores the phase. Numbers with the same amplitude and different phases are randomly ordered depending on their positions in the original array. > Matlab allows comparisons of complex numbers but just compares the real > part. I think this is reasonable. Often during a calculation of > limited precision one ends up with a complex number when the result is > in a "mathematically pure sense" real. So neither IDL nor Matlab has what I consider the desirable feature that the sort order be unique at least to the extent that equal values wind up next to each other in the sorted array. (Sorting by real value and then, for equal real values, by imaginary value would accomplish that.) Since complex numbers can't be fully ordered there is no single comparison function that can be plugged into a standard sort algorithm and give that result -- it would require a special complex sort algorithm. I guess if neither of the major array processing systems (that I know about) have this property in their complex sorts, it must not be *that* important. And since I've been using IDL for 13 years without discovering that complex greater-than comparisons are illegal, I guess that must not be an important property either (at least to me :-). My conclusion now is similar to Paul Dubois's suggestion -- we should allow equality comparisons and sorting. Beyond that I guess whatever other people want should carry the day, since it clearly doesn't matter to the sorts of things that I do with Numeric! Rick From Chris.Barker at noaa.gov Wed Jun 12 13:29:02 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Jun 12 13:29:02 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> <1023908127.25709.80.camel@auk> Message-ID: <3D07AE26.8C3D2829@noaa.gov> Reggie Dugard wrote: > This is not, in fact, a bug although I've fallen prey to the same > mistake myself. I'm assuming what you really wanted was to use > logical_and: > So the "and" is just returning its second argument, since both arguments > are considered "True" (containing at least 1 "True" element). I imagine there is a compelling reason that "and" and "or" have not been overridden like the comparison operators, but it sure would be nice! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tim.hochberg at ieee.org Wed Jun 12 13:38:24 2002 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Wed Jun 12 13:38:24 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> <1023908127.25709.80.camel@auk> <3D07AE26.8C3D2829@noaa.gov> Message-ID: <013801c21250$ea7bf0f0$061a6244@cx781526b> From: "Chris Barker" > I imagine there is a compelling reason that "and" and "or" have not been > overridden like the comparison operators, but it sure would be nice! Because it's not possible? "and" and "or" operate on the basis of the truth of their arguments, so the only way you can affect them is to overide __nonzero__. Since this is a unary operation, there is no way to get the equivalent of logical_and out of it. In practice I haven't found this to be much of a problem. Nearly every time I need to and two arrays together, "&" works just as well as logical_and. I can certainly imagin ecases where this isn't true, I just haven't run into them in practice. -tim From paul at pfdubois.com Wed Jun 12 15:43:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Wed Jun 12 15:43:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <3D076EA9.4090209@STScI.Edu> Message-ID: <000101c21262$6cab3610$0c01a8c0@NICKLEBY> The users of Numeric at PCMDI found the 'view' semantics so annoying that they insisted their CS staff write a separate version of Numeric just to avoid it. We have since gotten out of that mess but that is the reason MA has copy semantics. Again, this is another issue where one is fighting over the right to 'own' the operator notation. I believe that copy semantics should win this one because it is a **proven fact** that scientists trip over it, and it is consistent with Python list semantics. People who really need view semantics could get it as previously suggested by someone, with something like x.sub[10:12, :]. There are now dead horses all over the landscape, and I for one am going to shut up. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Paul Barrett > Sent: Wednesday, June 12, 2002 8:54 AM > To: numpy-discussion > Subject: Re: [Numpy-discussion] RE: default axis for numarray > > > eric jones wrote: > > > > > I think the consistency with Python is less of an issue > than it seems. > > I wasn't aware that add.reduce(x) would generated the same > results as > > the Python version of reduce(add,x) until Perry pointed it > out to me. > > There are some inconsistencies between Python the language > and Numeric > > because the needs of the Numeric community. For instance, slices > > create views instead of copies as in Python. This was a > correct break > > with consistency in a very utilized area of Python because > of efficiency. > > > > I think consistency is an issue, particularly for novices. > You cite the issue > of slices creating views instead of copies as being the > correct choice. But > this decision is based solely on the perception that views > are 'inherently' more > efficient than copies and not on reasons of consistency or > usability. I (a > seasoned user) find view behavior to be annoying and have > been caught out on > this several times. For example, reversing in-place the > elements of any array > using slices, i.e. A = A[::-1], will give the wrong answer, > unless you > explicitly make a copy before doing the assignment. Whereas, > copy behavior will > do the right thing. I suggest that many novices will be > caught out by this and > similar examples, as I have been. Copy behavior for slices > can be just as > efficient as view behavior, if implemented as copy-on-write. > > The beauty of Python is that it allows the developer to spend > much more time on > consistency and usability issues than on implementation > issues. Sadly, I think > much of Numeric development is based solely on implementation > issues to the > detriment of consistency and usability. > > I don't have enough experience to definitely say whether > axis=0 should be > preferred over axis=-1 or vice versa. But is does appear that > for the most > general cases axis=0 is probably preferred. This is the > default for the APL and > J programming of which Numeric is based. Should we not > continue to follow their > lead? It might be nice to see a list of examples where > axis=0 is the preferred > default and the same for axis=-1. > > > > > -- > Paul Barrett, PhD Space Telescope Science Institute > Phone: 410-338-4475 ESS/Science Software Group > FAX: 410-338-4767 Baltimore, MD 21218 > > > _______________________________________________________________ > > Sponsored by: > ThinkGeek at http://www.ThinkGeek.com/ > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From a.schmolck at gmx.net Wed Jun 12 15:51:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 12 15:51:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: Rick White writes: > Here is what I see as the fundamental problem with implementing slicing > in numarray using copy-on-demand instead views. > > Copy-on-demand requires the maintenance of a global list of all the > active views associated with a particular array buffer. Here is a > simple example: > > >>> a = zeros((5000,5000)) > >>> b = a[49:51,50] > >>> c = a[51:53,50] > >>> a[50,50] = 1 > > The assignment to a[50,50] must trigger a copy of the array b; > otherwise b also changes. On the other hand, array c does not need to > be copied since its view does not include element 50,50. You could > instead copy the array a -- but that means copying a 100 Mbyte array > while leaving the original around (since b and c are still using it) -- > not a good idea! Sure, if one wants do perform only the *minimum* amount of copying, things can get rather tricky, but wouldn't it be satisfactory for most cases if attempted modification of the original triggered the delayed copying of the "views" (lazy copies)? In those cases were it isn't satisfactory the user could still explicitly create real (i.e. alias-only) views. > > The bookkeeping can get pretty messy (if you care about memory usage, > which we definitely do). Consider this case: > > >>> a = zeros((5000,5000)) > >>> b = a[0:-10,0:-10] > >>> c = a[49:51,50] > >>> del a > >>> b[50,50] = 1 > > Now what happens? Either we can copy the array for b (which means two ``b`` and ``c`` are copied and then ``a`` is deleted. What does numarray currently keep of a if I do something like the above or: >>> b = a.flat[::-10000] >>> del a ? > copies of the huge (5000,5000) array exist, one used by c and the new > version used by b), or we can be clever and copy c instead. > > Even keeping track of the views associated with a buffer doesn't solve > the problem of an array that is passed to a C extension and is modified > in place. It would seem that passing an array into a C extension would > always require all the associated views to be turned into copies. > Otherwise we can't guarantee that views won't be modifed. Yes -- but only if the C extension is destructive. In that case the user might well be making a mistake in current Numeric if he has views and doesn't want them to be modified by the operation (of course he might know that the inplace operation does not affect the view(s) -- but wouldn't such cases be rather rare?). If he *does* want the views to be modified, he would obviously have to explictly specify them as such in a copy-on-demand scheme and in the other case he has been most likely been prevented from making an error (and can still explicitly use real views if he knows that the inplace operation on the original will not have undesired effects on the "views"). > > This kind of state information with side effects leads to a system that > is hard to develop, hard to debug, and really messes up the behavior of > the program (IMHO). It is *highly* desirable to avoid it if possible. Sure, copy-on-demand is an optimization and optmizations always mess up things. On the other hand, some optimizations also make "nicer" (e.g. less error-prone) semantics computationally viable, so it's often a question between ease and clarity of the implementation vs. ease and clarity of code that uses it. I'm not denying that too much complexity in the implementation also aversely affects users in the form of bugs and that in the particular case of delayed copying the user can also be affected directly by more difficult to understand ressource usage behavior (e.g. a[0] = 1 triggering a monstrous copying operation). Just out of curiosity, has someone already asked the octave people how much trouble it has caused them to implement copy on demand and whether matlab/octave users in practice do experience difficulties because of the more harder to predict runtime behavior (I think, like matlab, octave does copy-on-demand)? > > This is not to deny that copy-on-demand (with explicit views available > on request) would have some desirable advantages for the behavior of > the system. But we've worried these issues to death, and in the end > were convinced that slices == views provided the best compromise > between the desired behavior and a clean implementation. If the implementing copy-on-demand is too difficult and the resulting code would be too messy then this is certainly a valid reason to compromise on the current slicing behavior (especially since people like me who'd like to see copy-on-demand are unlikely to volunteer to implement it :) > Rick > > ------------------------------------------------------------------ > Richard L. White rlw at stsci.edu http://sundog.stsci.edu/rick/ > Space Telescope Science Institute > Baltimore, MD > > alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From a.schmolck at gmx.net Wed Jun 12 15:51:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 12 15:51:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > : > > > This kind of state information with side effects leads to a system that > > is hard to develop, hard to debug, and really messes up the behavior of > > the program (IMHO). It is *highly* desirable to avoid it if possible. > > > Rick beat me to the punch. The requirement for copy-on-demand > definitely leads to a far more complex implementation with > much more potential for misunderstood memory usage. You could > do one small thing and suddenly force a spate of copies (perhaps > cascading). There is no way we would taken on a redesign of Yes, but I would suspect that cases were a little innocuous a[0] = 3 triggers excessive processing should be rather unusual (matlab or octave users will know). > Numeric with this requirement with the resources we have available. Fair enough -- if implementing copy-on-demand is too much work then we'll have to live without it (especially if view-slicing doesn't stand in the way of a future inclusion into the python core). I guess the best reason to bite the bullet and carry around state information would be if there were significant other cases where one also would want to optimize operations under the hood. If there isn't much else in this direction then the effort involved might not be justified. One thing that bugs me in Numeric (and that might already have been solved in numarray) is that e.g. ``ravel`` (and I think also ``transpose``) creates unnecessary copies, whereas ``.flat`` doesn't, but won't work in all cases (viz. when the array is non-contiguous), so I can either have ugly or inefficient code. > > > This is not to deny that copy-on-demand (with explicit views available > > on request) would have some desirable advantages for the behavior of > > the system. But we've worried these issues to death, and in the end > > were convinced that slices == views provided the best compromise > > between the desired behavior and a clean implementation. > > > Rick's explanation doesn't really address the other position which > is slices should force immediate copies. This isn't a difficult > implementation issue by itself. But it does raise some related > implementation questions. Supposing one does feel that views are > a feature one wants even though they are not the default, it turns > out that it isn't all that simple to obtain views without sacrificing > ordinary slicing syntax to obtain a view. It is simple to obtain > copies of view slices though. I'm not sure I understand the above. What is the problem with ``a.view[1:3]`` (or``a.view()[1:3])? > > Slicing views may not be important to everyone. It is important > to us (and others) and we do see a number of situations where > forcing copies to operate on array subsets would be a serious > performance problem. We did discuss this issue with Guido and Sure, no one denies that even if with copy-on-demand (explicitly) aliased views would still be useful. > he did not indicate that having different behavior on slicing > with arrays would be a show stopper for acceptance into the > Standard Library. We are also aware that there is no great > consensus on this issue (even internally at STScI :-). > Yep, I just saw Paul Barrett's post :) > Perry Greenfield > > alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From Chris.Barker at noaa.gov Wed Jun 12 16:21:04 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Jun 12 16:21:04 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> <1023908127.25709.80.camel@auk> <3D07AE26.8C3D2829@noaa.gov> <013801c21250$ea7bf0f0$061a6244@cx781526b> Message-ID: <3D07D685.5A0E6B5D@noaa.gov> Tim Hochberg wrote: > > I imagine there is a compelling reason that "and" and "or" have not been > > overridden like the comparison operators, but it sure would be nice! > > Because it's not possible? Well, yes, but it wasn't possible with <,>,== and friends untill rich comparisons were added in Python 2.1. So I am still wondering why the same extension wasn't made to "and" and "or". In fact, given that Guido is adding a bool type, this may be a time to re-visit the question, unless there really is a compelling reason not to, which is quite likely. > In practice I haven't found this to be much of a problem. Nearly every time > I need to and two arrays together, "&" works just as well as logical_and. This has always worked for me, as well, so maybe the answer is that there is no compelling reason to make a change. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From xscottg at yahoo.com Wed Jun 12 16:52:06 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Wed Jun 12 16:52:06 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <3D07D685.5A0E6B5D@noaa.gov> Message-ID: <20020612235115.50726.qmail@web12903.mail.yahoo.com> --- Chris Barker wrote: > > Well, yes, but it wasn't possible with <,>,== and friends untill rich > comparisons were added in Python 2.1. So I am still wondering why the > same extension wasn't made to "and" and "or". In fact, given that Guido > is adding a bool type, this may be a time to re-visit the question, > unless there really is a compelling reason not to, which is quite > likely. > The "and" and "or" operators do short circuit evaluation. So in addition to acting like boolean operations, they are also control flow. For "and", the second expression is not evaluated if the first one is false. For "or", the second expression is not evaluated if the first one is true. I'm not clever enough to figure out how an overloaded and/or operator could implement control flow for the outer expressions. The outer expressions "self" and "other" would already be evaluated by the time your __operator__(self, other) function was called. C++ has overloadable && and || operators, but overloading them is frowned on by many. C++ has the advantage over Python in that it knows the actual types at compile time. __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com From bsder at mail.allcaps.org Wed Jun 12 23:12:02 2002 From: bsder at mail.allcaps.org (Andrew P. Lentvorski) Date: Wed Jun 12 23:12:02 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <20020612235115.50726.qmail@web12903.mail.yahoo.com> Message-ID: <20020612194252.N31527-100000@mail.allcaps.org> On Wed, 12 Jun 2002, Scott Gilbert wrote: > C++ has overloadable && and || operators, but overloading them is frowned > on by many. C++ has the advantage over Python in that it knows the actual > types at compile time. Actually, overloading && and || isn't just frowned upon in C++, it's effectively banned. The reason is that it replaces short-circuit semantics with function call semantics and screws up the standard idioms (if ((a != NULL) && (*a == "a")) { ... } ). See "Effective C++" by Scott Meyers. As far as I know, *none* of the C++ literati hold the opposing view. -a From perry at stsci.edu Thu Jun 13 13:23:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 13:23:04 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <000101c21262$6cab3610$0c01a8c0@NICKLEBY> Message-ID: : > There are now dead horses all over the landscape, and I for one am going > to shut up. > Not enough dead horses for me :-). But seriously, I would like to hear from others about this issue (I already knew what Paul, Paul, Eric, Travis and Konrad felt about this before it started up). You can either post to the mailing list or email directly if you are the shy, retiring type. Perry From perry at stsci.edu Thu Jun 13 13:40:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 13:40:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: > I guess the best reason to bite the bullet and carry around state > information > would be if there were significant other cases where one also > would want to > optimize operations under the hood. If there isn't much else in > this direction > then the effort involved might not be justified. One thing that bugs me in > Numeric (and that might already have been solved in numarray) is that > e.g. ``ravel`` (and I think also ``transpose``) creates > unnecessary copies, > whereas ``.flat`` doesn't, but won't work in all cases (viz. when > the array is > non-contiguous), so I can either have ugly or inefficient code. > I guess that depends on what you mean by unnecessary copies. If the array is non-contiguous what would you have it do? > > a feature one wants even though they are not the default, it turns > > out that it isn't all that simple to obtain views without sacrificing > > ordinary slicing syntax to obtain a view. It is simple to obtain > > copies of view slices though. > > I'm not sure I understand the above. What is the problem with > ``a.view[1:3]`` > (or``a.view()[1:3])? > I didn't mean to imply it wasn't possible, but that it was not quite as clean. The thing I don't like about this approach (or Paul's suggestion of a.sub) is the creation of an odd object that has as its only purpose being sliced. (Even worse, in my opinion, is making it a different kind of array where slicing behaves differently. That will lead to the problem we have discussed for other kinds of array behavior, namely, how do you keep from being confused about a particular array's slicing behavior). That could lead to confusion as well. Many may be under the impression that x = a.view makes x refer to an array when it doesn't. Users would need to know that a.view without a '[' is usually an error. Sure it's not hard to implement. But I don't view it as that clean a solution. On the other hand, a[1:3].copy() (or alternatively, a[1:3].copy) is another array just like any other. > Perry From perry at stsci.edu Thu Jun 13 14:17:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 14:17:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: > > Copy-on-demand requires the maintenance of a global list of all the > > active views associated with a particular array buffer. Here is a > > simple example: > > > > >>> a = zeros((5000,5000)) > > >>> b = a[49:51,50] > > >>> c = a[51:53,50] > > >>> a[50,50] = 1 > > > > The assignment to a[50,50] must trigger a copy of the array b; > > otherwise b also changes. On the other hand, array c does not need to > > be copied since its view does not include element 50,50. You could > > instead copy the array a -- but that means copying a 100 Mbyte array > > while leaving the original around (since b and c are still using it) -- > > not a good idea! > > Sure, if one wants do perform only the *minimum* amount of > copying, things can > get rather tricky, but wouldn't it be satisfactory for most cases > if attempted > modification of the original triggered the delayed copying of the "views" > (lazy copies)? In those cases were it isn't satisfactory the > user could still > explicitly create real (i.e. alias-only) views. > I'm not sure what you mean. Are you saying that if anything in the buffer changes, force all views of the buffer to generate copies (rather than try to determine if the change affected only selected views)? If so, yes, it is easier, but it still is a non-trivial capability to implement. > > > > The bookkeeping can get pretty messy (if you care about memory usage, > > which we definitely do). Consider this case: > > > > >>> a = zeros((5000,5000)) > > >>> b = a[0:-10,0:-10] > > >>> c = a[49:51,50] > > >>> del a > > >>> b[50,50] = 1 > > > > Now what happens? Either we can copy the array for b (which means two > > ``b`` and ``c`` are copied and then ``a`` is deleted. > > What does numarray currently keep of a if I do something like the > above or: > > >>> b = a.flat[::-10000] > >>> del a > > ? > The whole buffer remains in both cases. > > copies of the huge (5000,5000) array exist, one used by c and the new > > version used by b), or we can be clever and copy c instead. > > > > Even keeping track of the views associated with a buffer doesn't solve > > the problem of an array that is passed to a C extension and is modified > > in place. It would seem that passing an array into a C extension would > > always require all the associated views to be turned into copies. > > Otherwise we can't guarantee that views won't be modifed. > > Yes -- but only if the C extension is destructive. In that case > the user might > well be making a mistake in current Numeric if he has views and > doesn't want > them to be modified by the operation (of course he might know > that the inplace > operation does not affect the view(s) -- but wouldn't such cases be rather > rare?). If he *does* want the views to be modified, he would > obviously have to > explictly specify them as such in a copy-on-demand scheme and in the other > case he has been most likely been prevented from making an error (and can > still explicitly use real views if he knows that the inplace > operation on the > original will not have undesired effects on the "views"). > If the point is that views are susceptible to unexpected changes made in place by a C extension, yes, certainly (just as they are for changes made in place in Python). But I'm not sure what that has to do with the implied copy (even if delayed) being broken by extensions written in C. Promising a copy, and not honoring it is not the same as not promising it in the first place. But I may be misunderstanding your point. Perry From perry at stsci.edu Thu Jun 13 14:52:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 14:52:03 2002 Subject: [Numpy-discussion] Some initial thoughts about the past week's discussions Message-ID: Impressions so far on various issues raised regarding numarray interfaces 1) We are mostly persuaded that rank-0 arrays are the way to go. We will pursue the issue of whether it is possible to have Python accept these as indices for sequence objects with python-dev. 2) We are still mulling over the axis order issue. Regardless of which convention we choose, we are almost certainly going to make it consistent (always the same axis as default). A compatibility module will be provided to replicate Numeric defaults. 3) repr. Finally, a consensus! Even unanimity. 4) Complex comparisons. Implement equality, non-equality, predictable sorting. Make >,<,>=,<= illegal. 5) Copy vs view. Open to more input (but no delayed copying or such). Perry From a.schmolck at gmx.net Thu Jun 13 17:36:05 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Thu Jun 13 17:36:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > I'm not sure what you mean. Are you saying that if anything in the > buffer changes, force all views of the buffer to generate copies > (rather than try to determine if the change affected only selected Yes (I suspect that this will be be sufficient in practice). > views)? If so, yes, it is easier, but it still is a non-trivial > capability to implement. Sure. But since copy-on-demand is only an optimization and as such doesn't affect the semantics, it could also be implemented at a later point if the resources are currently not available. I have little doubt that someone will eventually add copy-on-demand, if the option is kept open and in the meantime one could still get all the performance (and alias behavior) of the current implementation by explicitly using ``.view`` (or ``.sub`` if you prefer) to create aliases. I'm becoming increasingly convinced (see below) that copy-slicing-semantics are much to be preferred as the default, so given the above I don't think that performance concerns should sway one towards alias-slicing, if enough people feel that copy semantics as such are preferable. > > > > > > The bookkeeping can get pretty messy (if you care about memory usage, > > > which we definitely do). Consider this case: > > > > > > >>> a = zeros((5000,5000)) > > > >>> b = a[0:-10,0:-10] > > > >>> c = a[49:51,50] > > > >>> del a > > > >>> b[50,50] = 1 > > > > > > Now what happens? Either we can copy the array for b (which means two > > > > ``b`` and ``c`` are copied and then ``a`` is deleted. > > > > What does numarray currently keep of a if I do something like the > > above or: > > > > >>> b = a.flat[::-10000] > > >>> del a > > > > ? > > > The whole buffer remains in both cases. OK, so this is then a nice example where even eager copy slicing behavior would be *significantly* more efficient than the current aliasing behavior -- so copy-on-demand would then on the whole seem to be not just nearly equally but *more* efficient than alias slicing. And as far as difficult to understand runtime behavior is concerned, the extra ~100MB useless baggage carried around by b (second case) are, I'd venture to suspect, less than obvious to the casual observer. In fact I remember one of my fellow phd-students having significant problems with mysterious memory consumption (a couple of arrays taking up more than 1GB rather than a few hundred MB) -- maybe something like the above was involved. That ``A = A[::-1]`` doesn't work (as pointed out by Paul Barrett) will also come as a surprise to most people. If I understand all this correctly, I consider it a rather strong case against alias slicing as default behavior. > > > Even keeping track of the views associated with a buffer doesn't solve > > > the problem of an array that is passed to a C extension and is modified > > > in place. It would seem that passing an array into a C extension would > > > always require all the associated views to be turned into copies. > > > Otherwise we can't guarantee that views won't be modifed. > > > > Yes -- but only if the C extension is destructive. In that case > > the user might > > well be making a mistake in current Numeric if he has views and > > doesn't want > > them to be modified by the operation (of course he might know > > that the inplace > > operation does not affect the view(s) -- but wouldn't such cases be rather > > rare?). If he *does* want the views to be modified, he would > > obviously have to > > explictly specify them as such in a copy-on-demand scheme and in the other > > case he has been most likely been prevented from making an error (and can > > still explicitly use real views if he knows that the inplace > > operation on the > > original will not have undesired effects on the "views"). > > > If the point is that views are susceptible to unexpected changes > made in place by a C extension, yes, certainly (just as they > are for changes made in place in Python). But I'm not sure what > that has to do with the implied copy (even if delayed) being > broken by extensions written in C. Promising a copy, and not > honoring it is not the same as not promising it in the first > place. But I may be misunderstanding your point. > OK, I'll try again, hopefully this is clearer. In a sentence: I don't see any problems with C extensions in particular that would arise from copy-on-demand (I might well be overlooking something, though). Rick was saying that passing an array to a C extension that performs an inplace operation on it means that all copies of all its (lazy) views must be performed. My point was that this is correct, but I can't see any problem with that, neither from the point of extension writer, nor from the point of performance nor from the point of the user, nor indeed from the point of the numarray implementors (obviously the copy-on-demand scheme *as such* will be an effort). All that is needed is a separate interface for (the minority of) C extensions that destructively modify their arguments (they only need to call some function `actualize_views(the_array_or_view)` or whatever at the start -- this function will obviously be necessary regardless of the C extensions). So nothing will break, the promises are kept and no extra work. It won't be any slower than what would happen with current Numeric, either, because either the (Numeric) user intended his (aliased) views to modified as well or it was a bug. If he intended the views to be modified, he would explicitly use alias-views under the new scheme and everything would behave exactly the same. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From a.schmolck at gmx.net Thu Jun 13 17:36:10 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Thu Jun 13 17:36:10 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > > I guess the best reason to bite the bullet and carry around state > > information > > would be if there were significant other cases where one also > > would want to > > optimize operations under the hood. If there isn't much else in > > this direction > > then the effort involved might not be justified. One thing that bugs me in > > Numeric (and that might already have been solved in numarray) is that > > e.g. ``ravel`` (and I think also ``transpose``) creates > > unnecessary copies, > > whereas ``.flat`` doesn't, but won't work in all cases (viz. when > > the array is > > non-contiguous), so I can either have ugly or inefficient code. > > > I guess that depends on what you mean by unnecessary copies. In most cases the array of which I desire a flattened representation is contiguous (plus, I usually don't intend to modify it). Consequently, in most cases I don't want to any copies of it to be created (especially not if it is really large -- which is not seldom the case). The fact that you can never really be sure whether you can actually use ``.flat``, without checking beforehand if the array is in fact contiguous (I don't think there are many guarantees about something being contiguous, or are there?) and that ravel will always work but has a huge overhead, suggests to me that something is not quite right. > If the array is non-contiguous what would you have it do? Simple -- in that case 'lazy ravel' would do the same as 'ravel' currently does, create a copy (or alternatively rearrange the memory representation to make it non-contiguous and then create a lazy copy, but I don't know whether this would be a good or even feasible idea). A lazy version of ravel would have the same semantics as ravel but only create an actual copy if necessary-- which means as long as no modification takes place and the array is non-contiguous, it will be sufficient to return the ``.flat`` (for starters). If it is contiguous than the copying can't be helped, but these cases are rare and currently you either have to test for them explicitly or slow everything down and waste memory by just always using ``ravel()``. For example, if bar is contiguous ``foo = ravel(bar)`` would be computationally equivalent to ``bar.flat``, as long as neither of them is modified, but semantically equivalent to the current ``foo = ravel(bar)`` in all cases. Thus you could now write: >>> a = ravel(a)[20:] wherever you've written this boiler-plate code before: >>> if a.iscontiguous(): >>> a = a.flat[20:] >>> else: >>> a = ravel(a)[20:] without any loss of performance. > > > > a feature one wants even though they are not the default, it turns > > > out that it isn't all that simple to obtain views without sacrificing > > > ordinary slicing syntax to obtain a view. It is simple to obtain > > > copies of view slices though. > > > > I'm not sure I understand the above. What is the problem with > > ``a.view[1:3]`` > > (or``a.view()[1:3])? > > > I didn't mean to imply it wasn't possible, but that it was not > quite as clean. The thing I don't like about this approach (or > Paul's suggestion of a.sub) is the creation of an odd object > that has as its only purpose being sliced. (Even worse, in my I personally don't find it messy. And please keep in mind that the ``view`` construct would only very seldomly be used if copy-on-demand is the default -- as I said, I've only needed the aliasing behavior once -- no doubt it was really handy then, but the fact that e.g. matlab doesn't have anything along those lines (AFAIK) suggests that many people will never need it. So even if ``.view`` is messy, I'd rather have something messy that is almost never used, in exchange for (what I perceive as) significantly nicer and cleaner semantics for something that is used all the time (array slicing; alias slicing is messy in at least the respect that it breaks standard usage and generic sequence code as well as causing potentially devious bugs. Unexpected behaviors like phantom buffers kept alive in their entirety by partial views etc. or what ``A = A[::-1]`` does are not exactly pretty either). > opinion, is making it a different kind of array where slicing > behaves differently. That will lead to the problem we have > discussed for other kinds of array behavior, namely, how do > you keep from being confused about a particular array's slicing > behavior). That could lead to confusion as well. Many may be I don't see that problem, frankly. The view is *not* an array. It doesn't need (and shouldn't have) anything except a method to access slices (__getitem__). As mentioned before, I also regard it as highly desirable that ``b = a.view[3:10]`` sticks out immediately. This signals "warning -- potentially tricky code ahead". Nothing in ``b = a[3:10]`` tells you that someone intends to modify a and b depedently (because in more than 9 out of 10 cases he won't) -- now *this* is confusing. > under the impression that x = a.view makes x refer to an array > when it doesn't. Users would need to know that a.view without > a '[' is usually an error. Since the ``.view`` shouldn't allow anything except slicing, they'll soon find out ("Error: you can't multiply me, I'm a view and not an array"). And I can't see why that would be harder to figure out (or look up in the docu) than that a[1:3] creates an alias and *not* a copy contrary to *everything* else you've ever heard or read about python sequences (especially since in most cases it will work as intended). Also what exactly is the confused person's notion of the purpose of ``x = a.view`` supposed to be? That ``x = a`` is what ``x = a.copy()`` really does and that to create aliases an alias to ``a`` they would have to use ``x = a.view``? In that case they'd better read the python tutorial before they do any more python programming, because they are in for all kinds of unpleasant surprises (``a = []; b = a; b[1] = 3; print a`` -- oops). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From perry at stsci.edu Fri Jun 14 08:02:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 14 08:02:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: : : > > I guess that depends on what you mean by unnecessary copies. > > In most cases the array of which I desire a flattened representation is > contiguous (plus, I usually don't intend to modify it). > Consequently, in most > cases I don't want to any copies of it to be created (especially > not if it is > really large -- which is not seldom the case). > Numarray already returns a view of the array if it is contiguous. Copies are only produced if it is non-contiguous. I assume that is the behavior you are asking for? > The fact that you can never really be sure whether you can actually use > ``.flat``, without checking beforehand if the array is in fact > contiguous (I > don't think there are many guarantees about something being > contiguous, or are > there?) and that ravel will always work but has a huge overhead, > suggests to > me that something is not quite right. > Not for numarray, at least in this context. > > If the array is non-contiguous what would you have it do? > > Simple -- in that case 'lazy ravel' would do the same as 'ravel' currently > does, create a copy (or alternatively rearrange the memory > representation to > make it non-contiguous and then create a lazy copy, but I don't > know whether > this would be a good or even feasible idea). > > A lazy version of ravel would have the same semantics as ravel > but only create > an actual copy if necessary-- which means as long as no modification takes > place and the array is non-contiguous, it will be sufficient to return the > ``.flat`` (for starters). If it is contiguous than the copying can't be > helped, but these cases are rare and currently you either have to test for > them explicitly or slow everything down and waste memory by just > always using > ``ravel()``. > Currently for numarray .flat will fail if it isn't contiguous. It isn't clear if this should change. If .flat is meant to be a view always, then it should always fail it the array is not contiguous. Ravel is not guaranteed to be a view. This is a problematic issue if we decide to switch from view to copy semantics. If slices produce copies, then does .flat? If so, then how does one produce a flattened view? x.view.flat? > For example, if bar is contiguous ``foo = ravel(bar)`` would be > computationally equivalent to ``bar.flat``, as long as neither of them is > modified, but semantically equivalent to the current ``foo = > ravel(bar)`` in > all cases. > > Thus you could now write: > > >>> a = ravel(a)[20:] > > wherever you've written this boiler-plate code before: > > >>> if a.iscontiguous(): > >>> a = a.flat[20:] > >>> else: > >>> a = ravel(a)[20:] > > without any loss of performance. > I believe this is already true in numarray. > > I personally don't find it messy. And please keep in mind that > the ``view`` > construct would only very seldomly be used if copy-on-demand is > the default > -- as I said, I've only needed the aliasing behavior once -- no > doubt it was > really handy then, but the fact that e.g. matlab doesn't have > anything along > those lines (AFAIK) suggests that many people will never need it. > You're kidding, right? Particularly after arguing for aliasing semantics in the previous paragraph for .flat ;-) > > Also what exactly is the confused person's notion of the purpose of ``x = > a.view`` supposed to be? That ``x = a`` is what ``x = a.copy()`` > really does > and that to create aliases an alias to ``a`` they would have to use > ``x = a.view``? In that case they'd better read the python > tutorial before they do > any more python programming, because they are in for all kinds of > unpleasant > surprises (``a = []; b = a; b[1] = 3; print a`` -- oops). > This is basically true, though the confusion may be that a.view is an array object that has different slicing behavior instead of an non-array object that can be sliced to produce a view. I don't view it as a major issue but I do see how may mistakenly infer that. Perry From tim.hochberg at ieee.org Fri Jun 14 09:14:05 2002 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Fri Jun 14 09:14:05 2002 Subject: [Numpy-discussion] copy on demand References: Message-ID: <007601c213be$6dc61fd0$061a6244@cx781526b> <"Perry Greenfield" writes> [SNIP] > Numarray already returns a view of the array if it is contiguous. > Copies are only produced if it is non-contiguous. I assume that > is the behavior you are asking for? This is one horrible aspect of NumPy that I hope you get rid of. I've been burned by this several times -- I expected a view, but silently got a copy because my array was noncontiguous. If you go with copy semantics, this will go away, if you go with view semantics, this should raise an exception instead of silently copying. Ditto with reshape, etc. In my experience, this is a source of hard to find bugs (as opposed to axes issues which tend to produce shallow bugs). [SNIP] > Currently for numarray .flat will fail if it isn't contiguous. It isn't > clear if this should change. If .flat is meant to be a view always, then > it should always fail it the array is not contiguous. Ravel is not > guaranteed to be a view. Ravel should either always return a view or always return a copy -- I don't care which > This is a problematic issue if we decide to switch from view to copy > semantics. If slices produce copies, then does .flat? If so, then > how does one produce a flattened view? x.view.flat? Wouldn't that just produce a copy of the view? Unless you did some weird special casing on view? The following would work, although it's a little clunky. flat_x = x.view[:] # Or however "get me a view" would be spelled. flat_x.shape = (-1,) -tim From hinsen at cnrs-orleans.fr Fri Jun 14 10:52:02 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Jun 14 10:52:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > I didn't mean to imply it wasn't possible, but that it was not > quite as clean. The thing I don't like about this approach (or > Paul's suggestion of a.sub) is the creation of an odd object > that has as its only purpose being sliced. (Even worse, in my Not necessarily. We could decide that array.view is a view of the full array object, and that slicing views returns subviews. > opinion, is making it a different kind of array where slicing > behaves differently. That will lead to the problem we have > discussed for other kinds of array behavior, namely, how do A view could be a different type of object, even though much of the implementation would be shared with arrays. This would help to reduce confusion. > behavior). That could lead to confusion as well. Many may be > under the impression that x = a.view makes x refer to an array > when it doesn't. Users would need to know that a.view without > a '[' is usually an error. Why? It would be a full-size view, which might actually be useful in many situations. My main objection to changing the slicing behaviour is, like with some other proposed changes, compatibility. Even though view behaviour is not required by every NumPy program, there are people out there who use it and finding the locations in the code that need to be changed is a very tricky business. It may keep programmers from switching to Numarray in spite of benefits elsewhere. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Fri Jun 14 12:19:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 14 12:19:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: : : > > > I didn't mean to imply it wasn't possible, but that it was not > > quite as clean. The thing I don't like about this approach (or > > Paul's suggestion of a.sub) is the creation of an odd object > > that has as its only purpose being sliced. (Even worse, in my > > Not necessarily. We could decide that > > array.view > > is a view of the full array object, and that slicing views returns > subviews. > > > opinion, is making it a different kind of array where slicing > > behaves differently. That will lead to the problem we have > > discussed for other kinds of array behavior, namely, how do > > A view could be a different type of object, even though much of the > implementation would be shared with arrays. This would help to > reduce confusion. > I'd be strongly against this. This has the same problem that other customized array objects have (whether regarding slicing behavior, operators, coercion...). In particular, it is clear which kind it is when you create it, but you may pass it to a module that presumes different array behavior. Having different kind of arrays floating around just seems like an invitation for confusion. I'm very much in favor of picking one or the other behaviors and then making some means of explicitly getting the other behavior. > > behavior). That could lead to confusion as well. Many may be > > under the impression that x = a.view makes x refer to an array > > when it doesn't. Users would need to know that a.view without > > a '[' is usually an error. > > Why? It would be a full-size view, which might actually be useful > in many situations. > But one can do that simply by x = a (Though there is the issue that one could do the following which is not the same: x = a.view x.shape = (2,50) so that x is a full array view with a different shape than a) ******** I understand the backward compatibilty issue here, but it is clear that this is an issue that appears to be impossible to get a consensus on. There appear to be significant factions that care passionately about copy vs view and no matter what decision is made many will be unhappy. Perry From jjl at pobox.com Fri Jun 14 12:22:04 2002 From: jjl at pobox.com (John J. Lee) Date: Fri Jun 14 12:22:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: On 14 Jun 2002, Alexander Schmolck wrote: [...] > The fact that you can never really be sure whether you can actually use > ``.flat``, without checking beforehand if the array is in fact > contiguous (I don't think there are many guarantees about something > being contiguous, or are there?) and that ravel will always work but has > a huge overhead, suggests to me that something is not quite right. Why does ravel have a huge overhead? It seems it already doesn't copy unless required: search for 'Chacking' -- including the mis-spelling -- in this thread: http://groups.google.com/groups?hl=en&lr=&threadm=abjbfp%241t9%241%40news5.svr.pol.co.uk&rnum=1&prev=/groups%3Fq%3Diterating%2Bover%2Bthe%2Bcells%2Bgroup:comp.lang.python%26hl%3Den%26lr%3D%26scoring%3Dr%26selm%3Dabjbfp%25241t9%25241%2540news5.svr.pol.co.uk%26rnum%3D1 or start up your Python interpreter, if you're less lazy than me. John From Chris.Barker at noaa.gov Fri Jun 14 16:20:03 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jun 14 16:20:03 2002 Subject: [Numpy-discussion] copy on demand References: Message-ID: <3D0A75D9.4AF344B3@noaa.gov> Konrad Hinsen wrote: > Not necessarily. We could decide that > > array.view > > is a view of the full array object, and that slicing views returns > subviews. Please don't!! Having two types of arrays around in a single program that have the same behaviour except when they are sliced is begging for confusion and hard to find bugs. I agree with Perry, that I occasionaly use the view behaviour of slicing, and it is very usefull when I do, but most of the time I would be happier with copy symantics. All I want is a way to get at a view of part of an array, I don't want two different kinds of array around with different slicing behaviour. > My main objection to changing the slicing behaviour is, like with some > other proposed changes, compatibility. The switch from Numeric to Numarray is a substantial change. I think we should view it like the mythical Py3k: an oportunity to make incompatible changes that will really make it better. By the way, as an old MATLAB user, I have to say that being able to get views from a slice is one behaviour of NumPy that I really appreciate, even though I only need it occasionally. MATLAB, howver is a whole different ball of wax in a lot of ways. There has been a lot of discussion about the copy on demand idea in MATLAB, but that is primarily useful because MATLAB has call by value function semantics, so without copy on demand, you would be making copies of large arrays passed to functions that weren't even going to change them. I don't think MATLAB impliments copy on demand for slices anyway, but I could be wrong there. Oh, and no function (ie ravel() ) should return a view in some cases, and a copy in others, that is just asking for bugs! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ransom at physics.mcgill.ca Fri Jun 14 16:27:01 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Fri Jun 14 16:27:01 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <3D0A75D9.4AF344B3@noaa.gov> References: <3D0A75D9.4AF344B3@noaa.gov> Message-ID: I was going to write an almost identical email, but Chris saved me the trouble. These are my feelings as well. Scott On June 14, 2002 07:01 pm, Chris Barker wrote: > Konrad Hinsen wrote: > > Not necessarily. We could decide that > > > > array.view > > > > is a view of the full array object, and that slicing views returns > > subviews. > > Please don't!! Having two types of arrays around in a single program > that have the same behaviour except when they are sliced is begging for > confusion and hard to find bugs. > > I agree with Perry, that I occasionaly use the view behaviour of > slicing, and it is very usefull when I do, but most of the time I would > be happier with copy symantics. All I want is a way to get at a view of > part of an array, I don't want two different kinds of array around with > different slicing behaviour. > > > My main objection to changing the slicing behaviour is, like with some > > other proposed changes, compatibility. > > The switch from Numeric to Numarray is a substantial change. I think we > should view it like the mythical Py3k: an oportunity to make > incompatible changes that will really make it better. > > By the way, as an old MATLAB user, I have to say that being able to get > views from a slice is one behaviour of NumPy that I really appreciate, > even though I only need it occasionally. MATLAB, howver is a whole > different ball of wax in a lot of ways. There has been a lot of > discussion about the copy on demand idea in MATLAB, but that is > primarily useful because MATLAB has call by value function semantics, so > without copy on demand, you would be making copies of large arrays > passed to functions that weren't even going to change them. I don't > think MATLAB impliments copy on demand for slices anyway, but I could be > wrong there. > > Oh, and no function (ie ravel() ) should return a view in some cases, > and a copy in others, that is just asking for bugs! > > -Chris -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From hinsen at cnrs-orleans.fr Sat Jun 15 01:56:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sat Jun 15 01:56:03 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> > I'd be strongly against this. This has the same problem that other > customized array objects have (whether regarding slicing behavior, > operators, coercion...). In particular, it is clear which kind it > is when you create it, but you may pass it to a module that > presumes different array behavior. Having different kind of arrays We already have that situation with lists and arrays (and in much of my code netCDF arrays, which have copy semantics) , but in my experience this has never caused confusion. Most general code working on sequences doesn't modify elements at all. When it does, it either clearly requires view semantics (a function you call in order to modify (parts of) an array) or clearly requires copy semantics (a function that uses an array argument as an initial value that it then modifies). > floating around just seems like an invitation for confusion. I'm > very much in favor of picking one or the other behaviors and then > making some means of explicitly getting the other behavior. Then the only solution I see is the current one: default behaviour is view, and when you want a copy yoy copy explicitly. The inverse is not possible, once you made a copy you can't make it behave like a view anymore. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From ransom at physics.mcgill.ca Sat Jun 15 06:13:05 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Sat Jun 15 06:13:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> Message-ID: <20020615131238.GB7948@spock.physics.mcgill.ca> On Sat, Jun 15, 2002 at 10:53:17AM +0200, Konrad Hinsen wrote: > > floating around just seems like an invitation for confusion. I'm > > very much in favor of picking one or the other behaviors and then > > making some means of explicitly getting the other behavior. > > Then the only solution I see is the current one: default behaviour is > view, and when you want a copy yoy copy explicitly. The inverse is not > possible, once you made a copy you can't make it behave like a view > anymore. I don't think it is necessary to create the other object _from_ the default one. You could have copy behavior be the default, and if you want a view of some array you simply request one explicitly with .view, .sub, or whatever. Since creating a view is "cheap" compared to creating a copy, there is nothing sacrificed doing things in this manner. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From victor1977 at fazter.com Sat Jun 15 20:17:02 2002 From: victor1977 at fazter.com (victor ichaka nabia) Date: Sat Jun 15 20:17:02 2002 Subject: [Numpy-discussion] Personal Message-ID: Dear Sir, I am the Chairman Contract Review Committee of National Electric Power Authority (NEPA). Although this proposal might come to you as a surprise since it is coming from someone you do not know or ever seen before, but after due deliberation with my colleagues, I decided to contact you based onIntuition. We are soliciting for your humble and confidential assistance to take custody of Seventy One Million, Five Hundred Thousand United StatesDollars.{US$71,500,000.00}. This sum (US$71.5M) is an over invoiced contract sum which is currently in offshore payment account of the Central Bank of Nigeria as an unclaimed contract entitlement which can easily be withdrawn or drafted or paid to any recommended beneficiary by my committee. On this note, you will be presented as a contractor to NEPA who has executed a contract to a tune of the above sum and has not been paid. Proposed Sharing Partern (%): 1. 70% for me and my colleagues. 2. 20% for you as a partner/fronting for us. 3. 10% for expenses that may be incure by both parties during the cause of this transacton. Our law prohibits a civil servant from operating a foreign account, hence we are contacting you. If this proposal satisfies you, do response as soon as possible with the following information: 1. The name you wish to use as the beneficiary of thefund. 2. Your Confidential Phone and Fax Numbers. Further discussion will be centered on how the fund shall be transferred and full details on how to accomplish this great opportunity of ours. Thank you and God bless. Best regards, victor ichaka nabia From a.schmolck at gmx.net Sun Jun 16 15:59:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Sun Jun 16 15:59:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > : > : > > > I guess that depends on what you mean by unnecessary copies. > > > > In most cases the array of which I desire a flattened representation is > > contiguous (plus, I usually don't intend to modify it). > > Consequently, in most > > cases I don't want to any copies of it to be created (especially > > not if it is > > really large -- which is not seldom the case). > > > Numarray already returns a view of the array if it is contiguous. > Copies are only produced if it is non-contiguous. I assume that > is the behavior you are asking for? Not at all -- in fact I was rather shocked when my attention was drawn to the fact that this is also the behavior of Numeric -- I had thought that ravel would *always* create a copy. I absolutely agree with the other posters that remarked that different behavior of ravel (creating a copy vs creating a view, depending on whether the argument is contiguous) is highly undesirable and error-prone (especially since it is not even possible to determine at compile time which behavior will occur, if I'm not mistaken). In fact, I think this behavior is worse than what I incorrectly assumed to be the case. What I was arguing for is a ravel that always has the same semantics, (namely creating a copy) but tha -- because it would create the copy only demand -- would be just as efficient as using .flat when a) its argument were contiguous; and b) neither the result nor the argument were modified while both are alive. The reason that I view `.flat` as a hack, is that it is an operation that is there exclusively for efficiency reasons and has no well defined semantics -- it will only work stochastically, giving better performance in certain cases. Thus you have to cast lots whether you actually use it at runtime (calling .iscontiguous) and always have a fall-back scheme (most likely using ravel) at hand -- there seems to be no way to determine at compile time what's going to happen. I don't think a language or a library should have any such constructs or at least strive to minimize their number. The fact that the current behavior of ravel actually achieves the effect I want in most cases doesn't justify its obscure behavior in my eyes, which translates into a variation of the boiler-plate code previously mentioned (``if a.iscontiguous:...else:``) when you actually want a *single* ravelled copy and it also is a very likely candidate for extremely hard to find bugs. One nice thing about python is that there is very little undefined behavior. I'd like to keep it that way. [snipped] > > I personally don't find it messy. And please keep in mind that > > the ``view`` > > construct would only very seldomly be used if copy-on-demand is > > the default > > -- as I said, I've only needed the aliasing behavior once -- no > > doubt it was > > really handy then, but the fact that e.g. matlab doesn't have > > anything along > > those lines (AFAIK) suggests that many people will never need it. > > > You're kidding, right? Particularly after arguing for aliasing > semantics in the previous paragraph for .flat ;-) I didn't argue for any semantics of ``.flat`` -- I just pointed out that I found the division of labour that I (incorrectly) assumed to be the case an ugly hack (for the reasons outlined above): ``ravel``: always works, but always creates copy (which might be undesirable wastage of resources); [this was mistaken; the real semantics are: always works, creates view if contiguous, copy otherwise] ``.flat``: behavior undefined at compile time, a runtime-check can be used to ensure that it can be used as a more efficient alternative to ``ravel`` in some cases. If I now understand the behavior of both ``ravel`` and ``.flat`` correctly then I can't currently see *any* raison d'?tre for a ``.flat`` attribute. If, as I would hope, the behavior of ravel is changed to always create copies (ideally on-demand), then matters might look different. In that case, it might be justifiable to have ``.flat`` as a specialized construct analogous to what I proposed as``.view``, but only if there is some way to make it work (the same) for both contiguous and non-contiguous arrays. I'm not sure that it would be needed at all (especially with a lazy ravel). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From hinsen at cnrs-orleans.fr Mon Jun 17 01:46:08 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 17 01:46:08 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: (message from Alexander Schmolck on 17 Jun 2002 00:30:19 +0100) References: Message-ID: <200206170843.g5H8h4u08627@chinon.cnrs-orleans.fr> > Konrad Hinsen writes: > > [did you mean this to be off-list? If not, please just forward it to the > list.] No, I sent the mail to the list as well, but one out of three mails I send to the list never arrive there at first try... In this case, the copy sent to myself got lost as well, so I don't have any copy left, sorry. > > > > > > I don't know about the others out there, but I have 30000 lines of > > published Python code plus a lot of unpublished code (scripts), all of > > which use NumPy arrays almost everywhere. There are also a few places > > where views are created intentionally, which are then passed around to > > other code and can end up anywhere. The time required to update all > > that to new slicing semantics would be enormous, and I don't see how I > > could justify it to myself or to my employer. I'd also have to stop > > advertising Python as a time-efficient development tool. > > I sympathize with this view. However, I think the solution to this problem > should be a compatibility wrapper rather than a design compromise. > > There are at least 2 reasons why: > > 1. Numarray has quite a few incompatibilities to Numeric anyway, so even > without this change you'd be forced to rewrite all or most of those scripts The question is how much effort it is to update code. If it is easy, most people will do it sooner or later. If it is difficult, they won't. And that will lead to a split in the user community, which I think is highly detrimental to the further development of NumPy and Numarray. A compatibility wrapper won't change this. Assume that I have tons of code that I can't update because it's too much effort. Instead I use the compatbility wrapper. When I add a line or a function to that code, it will of course stick to the old conventions. When I add a new module, I will also prefer the old conventions, for consistency. And other people working with the code will pick up the old conventions as well. At the same time, other people will use the new conventions. There will be two parts of the community that cannot easily read each other's code. So unless we can reach a concensus that will guarantee that 90% of existing code will be adapted to the new interfaces, there will be a split. > (or use the wrapper), but none of the incompatibilities I'm currently aware > of would, in my eyes, buy one as much as introducing copy-indexing > semantics would. So if things get broken anyway, one might as well take I agree, but it also comes at the highest cost. There is absolute no way to identify automatically the code that needs to be adapted, and there is no run-time error message in case of failure - just a wrong result. None of the other proposed changes is as risky as this one. > this step (especially since intentional views are, on the whole, used > rather sparingly -- although tracking down these uses in retrospect might > admittedly be unpleasant). It is not merely unpleasant, the cost is simply prohibitive. > 2. Numarray is supposed to be incorporated into the core. Compromising the > consistency of core python (and code that depends on it) is in my eyes > worse than compromising code written for Numeric. I don't see view behaviour as inconsistent with Python. Python has one mutable sequence type, the list, with copy behaviour. One type is hardly enough to establish a rule. > As a third reason I could claim that there is some hope of a much more > widespread adoption of Numeric/numarray as an alternative to matlab etc. in > the next couple of years, so that it might be wise to fix things now, but I'd > understand if you'd remain unimpressed by that :) I'd like to see any supporting evidence. I think this argument is based on the reasoning "I would prefer it to be this way, so many others would certainly also prefer it, so they would start using NumPy if only these changes were made." This is not how decision processes work in real life. On the contrary, people might look at the history of NumPy and decide that it is too unreliable to base a serious project on - if they changed the interface once, they might do it again. This is a particularly important aspect in the OpenSource universe, where there are no contracts that promise anything. If you want people to use your code, you have to demonstrate that it is reliable, and that applies to both the code and the interfaces. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Mon Jun 17 02:01:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 17 02:01:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <20020615131238.GB7948@spock.physics.mcgill.ca> (message from Scott Ransom on Sat, 15 Jun 2002 09:12:38 -0400) References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> <20020615131238.GB7948@spock.physics.mcgill.ca> Message-ID: <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> > > Then the only solution I see is the current one: default behaviour is > > view, and when you want a copy yoy copy explicitly. The inverse is not > > possible, once you made a copy you can't make it behave like a view > > anymore. > > I don't think it is necessary to create the other object _from_ > the default one. You could have copy behavior be the default, > and if you want a view of some array you simply request one > explicitly with .view, .sub, or whatever. Let's make this explicit. Given the following four expressions, 1) array 2) array[0] 3) array.view 4) array.view[0] what would the types of each of these objects be according to your proposal? What would the indexing behaviour of those types be? I don't see how you can avoid having either two types or two different behaviours within one type. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From a.schmolck at gmx.net Mon Jun 17 08:12:03 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Mon Jun 17 08:12:03 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206170843.g5H8h4u08627@chinon.cnrs-orleans.fr> References: <200206170843.g5H8h4u08627@chinon.cnrs-orleans.fr> Message-ID: Konrad Hinsen writes: [Konrad wants to keep alias-slicing behavior for backward-compatibility] > > I sympathize with this view. However, I think the solution to this problem > > should be a compatibility wrapper rather than a design compromise. > > > > There are at least 2 reasons why: > > > > 1. Numarray has quite a few incompatibilities to Numeric anyway, so even > > without this change you'd be forced to rewrite all or most of those scripts > > The question is how much effort it is to update code. If it is easy, > most people will do it sooner or later. If it is difficult, they won't. > And that will lead to a split in the user community, which I think > is highly detrimental to the further development of NumPy and Numarray. I agree that avoiding a split of the Numeric user community is a crucial issue and that efforts have to be taken to make transition painless enough to happen (in most cases; maybe it needs to be even 90% or more as you say). > > A compatibility wrapper won't change this. Assume that I have tons of > code that I can't update because it's too much effort. Instead I use > the compatbility wrapper. When I add a line or a function to that > code, it will of course stick to the old conventions. When I add a new > module, I will also prefer the old conventions, for consistency. And > other people working with the code will pick up the old conventions as > well. At the same time, other people will use the new conventions. > There will be two parts of the community that cannot easily read each > other's code. I don't think the situation is quite so bleak. Yes, library code should be converted, and although a compatibility wrapper might be helpful in the process, I agree that it isn't a full solution for the reasons you cite above. But there is plenty of code that is mainly used internally and no longer changes (much), for which I think a compatibility wrapper is a fine solution (and might be preferable to conversion, even if it involves little effort). If I had some matlab (or C) code that fulfills similar criteria, I'd also rather wrap it somehow rather than to convert it to python. > > So unless we can reach a concensus that will guarantee that 90% of > existing code will be adapted to the new interfaces, there will be a > split. > > > (or use the wrapper), but none of the incompatibilities I'm currently aware > > of would, in my eyes, buy one as much as introducing copy-indexing > > semantics would. So if things get broken anyway, one might as well take > > I agree, but it also comes at the highest cost. There is absolute no > way to identify automatically the code that needs to be adapted, and > there is no run-time error message in case of failure - just a wrong > result. None of the other proposed changes is as risky as this one. Wouldn't an (almost) automatic solution be to simply replace (almost) all instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual cases (like if you heavily mix arrays and lists) you could still autoconvert by inserting ``if type(foo) == ArrayType:...``, although this would admittedly be rather messy. The unnecessary ``.view``s can be eliminated over time and even if they aren't, no one would have to learn or switch between two libraries. > > > this step (especially since intentional views are, on the whole, used > > rather sparingly -- although tracking down these uses in retrospect might > > admittedly be unpleasant). > > It is not merely unpleasant, the cost is simply prohibitive. See above. I personally hope that even without resorting to something like the above, converting my code to copy behavior wouldn't be too much of an effort, but my code-base is much smaller than yours and I can't currently recall more than one case of intended aliasing that would require a couple of changes and my estimate might also prove quite wrong. I have no idea which scenario is typical. > > > 2. Numarray is supposed to be incorporated into the core. Compromising the > > consistency of core python (and code that depends on it) is in my eyes > > worse than compromising code written for Numeric. > > I don't see view behaviour as inconsistent with Python. Python has one > mutable sequence type, the list, with copy behaviour. One type is > hardly enough to establish a rule. Well, AFAIK there are actually three mutable sequence types in python core and all have copy-slicing behavior: list, UserList and array: >>> import array >>> aa = array.array('d', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]) >>> bb = aa[:] >>> bb is aa 0 I would suppose that in the grand scheme of things numarray.array is intended as an eventual replacement for array.array, or not? Furthermore list is such a fundamental data type in python that I think it is actually enough to establish a rule (if the vast majority of 3rd party modules sequence types don't have the same semantics, I'd regard it as a strong argument for your position, but I haven't checked). > > > As a third reason I could claim that there is some hope of a much more > > widespread adoption of Numeric/numarray as an alternative to matlab etc. in > > the next couple of years, so that it might be wise to fix things now, but I'd > > understand if you'd remain unimpressed by that :) > > I'd like to see any supporting evidence. I think this argument is > based on the reasoning "I would prefer it to be this way, so many > others would certainly also prefer it, so they would start using NumPy > if only these changes were made." This is not how decision processes > work in real life. Sure, but I didn't try to imply this causality anyway:) My argument wasn't so much "lets make it really good (where good is what *I* say) then loads of people will adopt it", it was more: "Numeric has a good chance to grow considerably in popularity over the next years, so it will be much easier to fix things now than later" (for slicing behavior, now is likely to be the last chance). The fact that matlab users are used to copy-on-demand and the fact that many people, (including you if I understand you correctly) think that copy-slicing semantics as such (without backward compatibility concerns) are preferable, might have a small influence on people's decision to adopt Numeric, but I perfectly agree that this influence will be minor compared to other issues. > > On the contrary, people might look at the history of NumPy and decide > that it is too unreliable to base a serious project on - if they > changed the interface once, they might do it again. This is a > particularly important aspect in the OpenSource universe, where there > are no contracts that promise anything. If you want people to use your I don't think matlab or similar alternatives make legally binding promises about backwards compatibility, or do they? It guess it is actually more difficult to *force* incompatible changes on people with an open source project than with commercial software, but I agree that splitting or lighthearted sacrifices of backwards compatibility are more of a temptation with open source, for one thing because there are usually less financial stakes involved for the authors. > code, you have to demonstrate that it is reliable, and that applies to > both the code and the interfaces. Yes, this is very important and I very much appreciate that you stress these and similar points in your postings. But reliability to me also includes the ability for growth -- I not only want my old code to work in a couple of years, I also want the tool I wrote it in to remain competitive and this can conflict with backwards-compatibility. I like the balance python strikes here so far -- the language has improved significantly (and in my eyes has remained superior to newer competitors such as ruby) but at the same time for me and most other people transitions between versions haven't caused too much trouble. This increases the value of my code-base to me: I can assume that it will still work (or be adapted without too much effort) in years to come and yet be written in an excellent language for the job. Striking this balance is however quite difficult (as can be seen by the heated discussions in c.l.p), so getting it right will most likely involve considerable effort (and controversy) within the Numeric community. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From ransom at physics.mcgill.ca Mon Jun 17 10:14:48 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Jun 17 10:14:48 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> References: <20020615131238.GB7948@spock.physics.mcgill.ca> <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> Message-ID: On June 17, 2002 04:57 am, Konrad Hinsen wrote: > > > Then the only solution I see is the current one: default behaviour is > > > view, and when you want a copy yoy copy explicitly. The inverse is not > > > possible, once you made a copy you can't make it behave like a view > > > anymore. > > > > I don't think it is necessary to create the other object _from_ > > the default one. You could have copy behavior be the default, > > and if you want a view of some array you simply request one > > explicitly with .view, .sub, or whatever. > > Let's make this explicit. Given the following four expressions, > > 1) array > 2) array[0] > 3) array.view > 4) array.view[0] > > what would the types of each of these objects be according to your > proposal? What would the indexing behaviour of those types be? > I don't see how you can avoid having either two types or two > different behaviours within one type. If we assume that a slice returns a copy _always_, then I agree that #4 in your list above would not give a user what they would expect: array.view[0] would give the view of a copy of array[0], _not_ a view of array[0] which is probably what is wanted. I _think_ that this could be fixed by making view (or something similar) an option of the slice rather than a method of the object. For example (assuming that a is an array): Expression: Returns: Slicing Behavior: a or a[:] Copy of all of a Returns a copy of the sub-array a[0] Copy of a[0] Returns a copy of the sub-array a[:,view] View of all of a Returns a copy of the sub-array a[0,view] View of a[0] Returns a copy of the sub-array Notice that it is possible to return a copy of a sub-array from a view since you have access (through a pointer) to the original array data. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From fardal at uvastro.phys.uvic.ca Mon Jun 17 13:49:03 2002 From: fardal at uvastro.phys.uvic.ca (Mark Fardal) Date: Mon Jun 17 13:49:03 2002 Subject: [Numpy-discussion] Re: Personal Message-ID: <200206172047.g5HKlrw09617@mussel.phys.uvic.ca> Dear Numpy-Discussion, It is good to see that Numeric Python inspires such confidence in people all around the world, especially when subjected to due deliberation. I hope that this invoiced contract entitlement will not be set to zero once we obtain a view of it. I would like to propose a further elaboration of the Sharing Partern. Eric, Travis, Konrad, Scott, Paul, and Perry will each get 2% of the total based on their contributions to the mailing list traffic so far (I am blissfully ignorant of who has written actual code), and the rest of the 20% will go to the first individual to deliver a finished working Numarray. With copy semantics only, please. best regards, Mark Fardal > Dear Sir, > I am the Chairman Contract Review Committee of > National Electric Power Authority (NEPA). > Although this proposal might come to you as a surprise > since it is coming from someone you do not know or > ever seen before, but after due deliberation with my > colleagues, I decided to contact you based onIntuition. > We are soliciting for your humble and confidential > assistance to take custody of Seventy One Million, > Five Hundred Thousand United StatesDollars.{US$71,500,000.00}. > This sum (US$71.5M) is an over invoiced contract sum > which is currently in offshore payment account of the > Central Bank of Nigeria as an unclaimed contract > entitlement which can easily be withdrawn or drafted > or paid to any recommended beneficiary by my committee. > On this note, you will be presented as a contractor to > NEPA who has executed a contract to a tune of the > above sum and has not been paid. > Proposed Sharing Partern (%): > 1. 70% for me and my colleagues. > 2. 20% for you as a partner/fronting for us. > 3. 10% for expenses that may be incure by both parties > during the cause of this transacton. > Our law prohibits a civil servant from operating a > foreign account, hence we are contacting you. > If this proposal satisfies you, do response as soon as > possible with the following information: > 1. The name you wish to use as the beneficiary of thefund. > 2. Your Confidential Phone and Fax Numbers. > Further discussion will be centered on how the fund > shall be transferred and full details on how to accomplish this great opportuni\ ty of ours. > Thank you and God bless. > > Best regards, > > victor ichaka nabia > From Chris.Barker at noaa.gov Mon Jun 17 15:49:04 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Mon Jun 17 15:49:04 2002 Subject: [Numpy-discussion] copy on demand References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> <20020615131238.GB7948@spock.physics.mcgill.ca> <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> Message-ID: <3D0E634F.9B53A102@noaa.gov> Konrad Hinsen wrote: > Let's make this explicit. Given the following four expressions, > 1) array > 2) array[0] > 3) array.view > 4) array.view[0] I thought I had I clear idea of what I wanted here, which was the non-view stuff being the same as Python lists, but I discovered something: Python lists provide slices that are copies, but they are shallow copies, so nested lists, which are sort-of the equivalent of multidimensional arrays, act a lot like the view behavior of NumPy arrays: make a "2-d" list >>> l = [[i, 1+5] for i in range(5)] >>> l [[0, 6], [1, 6], [2, 6], [3, 6], [4, 6]] make an array that is the same: >>> a = array(l) array([[0, 6], [1, 6], [2, 6], [3, 6], [4, 6]]) assign a new binding to the first element: >>> b = a[0] >>> m = l[0] change something in it: >>> b[0] = 30 >>> a array([[30, 6], [ 1, 6], [ 2, 6], [ 3, 6], [ 4, 6]]) The first array is changed Change something in the first element of the list: >>> m[0] = 30 >>> l [[30, 6], [1, 6], [2, 6], [3, 6], [4, 6]] The first list is changed too. Now try slices instead: >>> b = a[2:4] change an element in the slice: >>>> b[1,0] = 55 >>> a array([[30, 6], [ 1, 6], [ 2, 6], [55, 6], [ 4, 6]])>> a The first array is changed Now with the list >>> m = l[2:4] >>> m [[2, 6], [3, 6]] This is a copy, but it is a shallow copy, so: >>> m[1][0] = 45 Change an element >>> l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]] The list is changed, but: m[0] = [56,65] >>> l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]] The list doesn't change, where: >>> b[0] = [56,65] >>> a array([[30, 6], [ 1, 6], [56, 65], [55, 6], [ 4, 6]]) The array does change My conclusion is that nested lists and Arrays simply are different beasts so we can't expect complete compatibility. I'm also wondering why lists have that weird behavior of a single index returning a reference, and a slice returning a copy. Perhaps it has something to so with the auto-resizing of lists. That being said, I still like the idea of slices producing copies, so: > 1) array An Array like we have now, but slice-is-copy semantics. > 2) array[0] An Array of rank one less than array, sharing data with array > 3) array.view An object that can do nothing but create other Arrays that share data with array. I don't know if is possible but I'd be just as happy if array.view returned None, and array.view[slice] returned an Array that shared data with array. Perhaps there is some other notation that could do this. > 4) array.view[0] Same as 2) To add a few: 5) array[0:1] An Array with a copy of the data in array[0] 6) array.view[0:1] An Array sharing data with array As I write this, I am starting to think that this is all a bit strange. Even though lists treat slices and indexes differently, perhaps Arrays should not. They really are different beasts. I also see why it was done the way it was in the first place! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From a.schmolck at gmx.net Tue Jun 18 15:23:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Tue Jun 18 15:23:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <3D0E634F.9B53A102@noaa.gov> References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> <20020615131238.GB7948@spock.physics.mcgill.ca> <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> <3D0E634F.9B53A102@noaa.gov> Message-ID: Chris Barker writes: > My conclusion is that nested lists and Arrays simply are different > beasts so we can't expect complete compatibility. I'm also wondering why > lists have that weird behavior of a single index returning a reference, > and a slice returning a copy. Perhaps it has something to so with the This is not weird at all. Slicing and single item indexing are different conceptually and what I think you have in mind wouldn't really work. Think of a real life container, like box with subcompartments. Obviously you should be able to take out (or put in) an item from the box, which is what single indexing does (and the item may happen to be another box). My understanding is that you'd like the box to return copies of whatever was put into it on indexing, rather than the real thing -- this would not only be counterintuitive and inefficient, it also means that you could exclusively put items with a __copy__-method in lists, which would rather limit their usefulness. Slicing on the other hand creates a whole new box but this box is filled with (references to) the same items (a behavior for which a real life equivalent is more difficult to find :) : >>> l = 'foobar' >>> l = ['foobar', 'barfoot'] >>> l2 = l[:] >>> l2[0] is l[0] 1 Because the l and l2 are different boxes, however, assigning new items to l1 doesn't change l2 and vice versa. It is true, however that the situation is somewhat different for arrays, because "multidimensional" lists are just nested boxed, whereas multidimensional arrays have a different structure. array[1] indexes some part of itself according to its .shape (which can be modified, thus changing what array[1] indexes, without modifying the actual array contents in memory), whereas list[1] indexes some "real" object. This may mean that the best behavior for ``array[0]`` would be to return a copy and ``array[:]`` etc. what would be a "deep copy" if it where nested lists. I think this is the behavior Paul Dubois MA currently has. > auto-resizing of lists. That being said, I still like the idea of slices > producing copies, so: > > > 1) array > An Array like we have now, but slice-is-copy > semantics. > > > 2) array[0] > An Array of rank one less than array, sharing data with array > > > 3) array.view > An object that can do nothing but create other Arrays that share data > with array. I don't know if is possible but I'd be just as happy if > array.view returned None, and array.view[slice] returned an Array that No it is not possible. > shared data with array. Perhaps there is some other notation that could > do this. > > > 4) array.view[0] > Same as 2) I can't see why single-item indexing views would be needed at all if ``array[0]`` doesn't copy as you suggest above. > > To add a few: > > 5) array[0:1] > An Array with a copy of the data in array[0] (I suppose you'd also want array[0:1] and array[0] to have different shape?) > > 6) array.view[0:1] > An Array sharing data with array > > As I write this, I am starting to think that this is all a bit strange. > Even though lists treat slices and indexes differently, perhaps Arrays > should not. They really are different beasts. I also see why it was done Yes, arrays and lists are indeed different beasts and a different indexing behavior (creating copies) for arrays might well be preferable (since array indexing doesn't refer to "real" objects). > the way it was in the first place! > > -Chris alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From hinsen at cnrs-orleans.fr Thu Jun 20 09:30:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Thu Jun 20 09:30:05 2002 Subject: [Numpy-discussion] copy on demand Message-ID: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> > Wouldn't an (almost) automatic solution be to simply replace (almost) all > instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual That would convert all slicing operations, even those working on strings, lists, and user-defined sequence-type objects. > cases (like if you heavily mix arrays and lists) you could still I do, and I don't consider it that unusual. Anyway, even if some function gets called only with array arguments, I don't see how a code analyzer could detect that. So it would be... > autoconvert by inserting ``if type(foo) == ArrayType:...``, although typechecks for every slicing or indexing operation (a[0] generates a view as well for a multidimensional array). Guaranteed to render most code unreadable, and of course slow down execution. A further challenge for your code convertor: f(a[0], b[2:3], c[-1, 1]) That makes eight type combination cases. > Well, AFAIK there are actually three mutable sequence types in > python core and all have copy-slicing behavior: list, UserList and > array: UserList is not an independent type, it is merely a subclassable wrapper around lists. As for the array module, I haven't seen any code that uses it. > I would suppose that in the grand scheme of things numarray.array is intended > as an eventual replacement for array.array, or not? In the interest of those who rely on the current array module, I hope not. > much "lets make it really good (where good is what *I* say) then loads of > people will adopt it", it was more: "Numeric has a good chance to grow > considerably in popularity over the next years, so it will be much easier to > fix things now than later" (for slicing behavior, now is likely to be the last > chance). I agree - except that I think it is already too late. > The fact that matlab users are used to copy-on-demand and the fact that many > people, (including you if I understand you correctly) think that copy-slicing > semantics as such (without backward compatibility concerns) are preferable, Yes, assuming that views are somehow available. But my preference is not so strong that I consider it a sufficient reason to break lots of code. View semantics is not a catastrophe. All of us continue to use NumPy in spite of it, and I suspect none of use loses any sleep over it. I have spent perhaps a few hours in total (over six years of using NumPy) to track down view-related bugs, which makes it a minor problem on my personal scale. > I don't think matlab or similar alternatives make legally binding promises > about backwards compatibility, or do they? It guess it is actually more Of course not, software providers for the mass market take great care not to promise anything. But if Matlab did anything as drastic as what we are discussing, they would loose lots of paying customers. > But reliability to me also includes the ability for growth -- I not only want > my old code to work in a couple of years, I also want the tool I wrote it in > to remain competitive and this can conflict with backwards-compatibility. I In what way does the current slicing behaviour render your code non-competitive? > like the balance python strikes here so far -- the language has Me too. But there haven't been any incompatible changes in the documented core language, and only very few in the standard library (the to-be-abandoned re module comes to mind - anything else?). For a bad example, see the Python XML package(s). Lots of changes, incompatibilities between parsers, etc. The one decision I really regret is to have chosen an XML-based solution for documentation. Now I spend two days at every new release of my stuff to adapt the XML code to the fashion of the day. It is almost ironic that I appear here as the great anti-change advocate, since in many other occasions I have argued for improvement over excessive compatiblity. Basically I favour motivated incompatible changes, but under the condition that updating of existing code is manageable. Changing the semantics of a type is about the worst I can imagine in this respect. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From magnus at hetland.org Fri Jun 21 04:38:03 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Fri Jun 21 04:38:03 2002 Subject: [Numpy-discussion] average Message-ID: <20020621133705.A15296@idi.ntnu.no> One quick question: Why does the MA module have an average function, but not Numeric? And what is the equivalent in numarray? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From a.schmolck at gmx.net Fri Jun 21 16:42:01 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Fri Jun 21 16:42:01 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> Message-ID: [sorry for replying so late, an almost finished email got lost in a computer accident and I was rather busy.] Konrad Hinsen writes: > > Wouldn't an (almost) automatic solution be to simply replace (almost) all > > instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual > > That would convert all slicing operations, even those working on > strings, lists, and user-defined sequence-type objects. Well that's where the "(almost)" comes in ;) If you can tell at glance for most instances in you code whether the ``foo`` in ``foo[a:b]`` is an array, then running a query replace isn't that much trouble. Of course this might not be true. But the question really is: to what extent would it be more difficult to tell than what you need to find out already in all the other situations where code needs changing because of the incompatibilities numarray already introduces? (I think I have for example already found a slicing-incompatibility -- unfortunately the list of the issues I hit upon so far has disappeared somewhere, so I'll have to try to reconstruct it sometime...) If the answer is "not much", then you would have to regard these incompatibilities as even less acceptable than the introduction of copy-slicing semantics (because as you've already agreed, these incompatibilities don't confer the same benefit) or otherwise it would be difficult to see why copy-slicing shouldn't be introduced as well (just as an example, I'm sure I've already come across a slicing incompatibility -- unfortunately I've lost my compilation of this and similar problems, but I'll try to reconstruct it). View semantics have always bothered me, but if it weren't for the fact that numarray is going to cause me not inconsiderable inconvenience through various incompatibilities anyway, I would have been satisfied with the status quo. As things are, however I must admit I feel a strong temptation to get this fixed as well, especially as most of the other laudable improvements of numarray wouldn't seem to be of great importance to me personally at the moment (much nicer C code base, better handling of byteswapped data and very large arrays etc.). So I fully admit to a selfish desire for either more gain or less pain (incompatibility) or maybe even a bit of both. Of course I don't think these subjective desires of mine are a good standard to go by, but I am convinced that offering attractive improvements or few compatibility problems (or both) to the widest possible audience of current Numeric users is important in order to replace Numeric, quickly and cleanly, without any splitting. > > > autoconvert by inserting ``if type(foo) == ArrayType:...``, although > > typechecks for every slicing or indexing operation (a[0] generates a > view as well for a multidimensional array). Guaranteed to render most > code unreadable, and of course slow down execution. > > A further challenge for your code convertor: > > f(a[0], b[2:3], c[-1, 1]) > > That makes eight type combination cases. I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This was mainly intended as a demonstration that you *can* do it automatically, if you really need to. A function call would help the readability but obviously be even more inefficient. If I really had large amounts of code that needed that conversion, I'd be tempted to write such a function with an additional twist: have it monitor the input argument type whenever the program is run and if it isn't an array, the wrapping in this particular line can be discarded (with less confidence, if it always seems to be an array it could be converted into ``a.view[b:c]``, but that might need additional checking). In code that isn't reached, the wrapper just stays forever. I've always been looking for an excuse to write some self-modifying code :) > > > Well, AFAIK there are actually three mutable sequence types in > > python core and all have copy-slicing behavior: list, UserList and > > array: > > UserList is not an independent type, it is merely a subclassable > wrapper around lists. As for the array module, I haven't seen any code > that uses it. It is AFAIK the only way to work efficiently with large strings, so I guess it is important also I agree that it is not that often used. > > > I would suppose that in the grand scheme of things numarray.array is intended > > as an eventual replacement for array.array, or not? > > In the interest of those who rely on the current array module, I hope not. As long as array is kept around for backwards-compatibility, why not? [...] > > But reliability to me also includes the ability for growth -- I not only want > > my old code to work in a couple of years, I also want the tool I wrote it in > > to remain competitive and this can conflict with backwards-compatibility. I > > In what way does the current slicing behaviour render your code > non-competitive? A single design decision obviously doesn't have such an immediate huge negative impact that it immediately renders all your code-noncompetive, unless it was a *really* bad design decision it just means more bugs and less clear and general code. But language warts are more like tumours, they grow over the years and become increasingly difficult to excise (just look what tremendous redesign effort the perl people go through at the moment). The closer warts come to the core language the worse, and since numarray aims for inclusion I think it must be measured to a higher standard than other modules that don't. > > > like the balance python strikes here so far -- the language has > > Me too. But there haven't been any incompatible changes in the > documented core language, and only very few in the standard library > (the to-be-abandoned re module comes to mind - anything else?). I don't think this is true (and the documented core language is not necessarily a good standard to go by as far as python is concerned, because not quite everything one has to rely upon is actually documented (instead one can find things like: "XXX Can't be bothered to spell this out right now...")). Among the incompatible changes that I would strongly assume *were* documented before and after are: exceptions (strings -> classes), automatic conversion of ints to longs (instead of an exception) and the new division rules whose stepwise introduction has already started. There are also quite a few things that used to work for all classes, but that now no longer work with new-style classes, some of which can be quite annoying (you loose quite a bit of introspective and interactive power), but I'm not sure to which extent they were documented. > > For a bad example, see the Python XML package(s). Lots of changes, > incompatibilities between parsers, etc. The one decision I really > regret is to have chosen an XML-based solution for documentation. Now > I spend two days at every new release of my stuff to adapt the XML > code to the fashion of the day. I didn't do much xml processing, but as far as I can remember I was happy with 4suite: http://4suite.org/index.xhtml. > > It is almost ironic that I appear here as the great anti-change > advocate, since in many other occasions I have argued for improvement > over excessive compatiblity. Basically I favour motivated incompatible I don't think a particularly conservative character is necessary to fill that role :) You've got a big code base, which automatically reduces the desire for incompatibilities because you have to pay a hefty cost that is difficult to offset by potential advantages for future code. But that side of the argument is clearly important and I think even if you don't like to be an anti-change advocate you still often make valuable points against changes you perceive as uncalled for. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From hinsen at cnrs-orleans.fr Sun Jun 23 01:24:02 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sun Jun 23 01:24:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: (message from Alexander Schmolck on 22 Jun 2002 00:41:13 +0100) References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> Message-ID: <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> > If you can tell at glance for most instances in you code whether the ``foo`` > in ``foo[a:b]`` is an array, then running a query replace isn't that much How could I? Moreover, even if I could, that's not enough. I need a program to spot those places for me, as I won't go through 30000 lines of code by hand. > trouble. Of course this might not be true. But the question really > is: to what extent would it be more difficult to tell than what you > need to find out already in all the other situations where code > needs changing because of the incompatibilities numarray already What are those? In general, changes related to NumPy functions or attributes of array objects are relatively easy to deal with, as one can use a text editor to search for the name and thereby capture most locations (not all though). Changes related to generic operatinos that many other types share are the worst. > If the answer is "not much", then you would have to regard these I am not aware of any other incompatibility in the "worst" category. If there is one, I will probably never use Numarray. > > A further challenge for your code convertor: > > > > f(a[0], b[2:3], c[-1, 1]) > > > > That makes eight type combination cases. > > I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This c[-1,1] can't be a list, but it needn't be an array. Any class can implement multiple-dimension indexing. My netCDF array objects do, for example. > be even more inefficient. If I really had large amounts of code that needed > that conversion, I'd be tempted to write such a function with an additional > twist: have it monitor the input argument type whenever the program is run and I have large amounts of code that would need conversion. However, it is code that myself and about 100 other users rely on for their daily work, so it won't be the subject of empirical fixing of any kind. Either there will be an automatic procedure that is guaranteed to keep the code working, or there won't be any update. > just means more bugs and less clear and general code. But language > warts are more like tumours, they grow over the years and become > increasingly difficult to excise (just look what tremendous redesign I don't see any evidence for this in NumPy. > now...")). Among the incompatible changes that I would strongly assume *were* > documented before and after are: exceptions (strings -> classes), automatic String exceptions still work. I am not aware of any code that was broken by the fact that the standard exceptions are now classes. > conversion of ints to longs (instead of an exception) and the new division > rules whose stepwise introduction has already started. There are also quite a The division rules are the only case of serious incompatibilities I know of, and I am in fact against them; although I agree that the proposed new rules are much better. On the other hand, the proposed transition procedure provides much more help for updating code than we would get from Numarray. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From magnus at hetland.org Mon Jun 24 06:56:04 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Jun 24 06:56:04 2002 Subject: [Numpy-discussion] (K-Mean) Clustering Message-ID: <20020624155508.A15028@idi.ntnu.no> Hi! I've been looking for an implementation of k-means clustering in Python, and haven't really found anything I could use... I believe there is one in SciPy, but I'd rather keep the required number of packages as low as possible (already using Numeric/numarray), and Orange seems a bit hard to install in UNIX... So, I've fiddled with using Numeric/numarray for the purpose. Has anyone else done something like this (or some other clustering algorithm for that matter)? The approach I've been using (but am not completely finished with) is to use a two-dimensional multiarray for the data (i.e. a "set" of vectors) and a one-dimensional array with a cluster assignment for each vector. E.g. >>> data[42] array([1, 2, 3, 4, 5]) >>> cluster[42] 10 >>> reps[10] array([1, 2, 4, 5, 4]) Here reps is the representative of the cluster. Using argmin it should be relatively easy to assign each vector to the cluster with the closest representative (using sum((x-y)**2) as the distance measure), but how do I calculate the new representatives effectively? (The representative of a cluster, e.g., 10, should be the average of all vectors currently assigned to that cluster.) I could always use a loop and then compress() the data based on cluster number, but I'm looking for a way of calculating all the averages "simultaneously", to avoid using a Python loop... I'm sure there's a simple solution -- I just haven't been able to think of it yet. Any ideas? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From Aureli.Soria_Frisch at ipk.fhg.de Mon Jun 24 11:12:08 2002 From: Aureli.Soria_Frisch at ipk.fhg.de (Aureli Soria Frisch) Date: Mon Jun 24 11:12:08 2002 Subject: [Numpy-discussion] Numeric objects, os.spawnlp and pickle In-Reply-To: <20020621133705.A15296@idi.ntnu.no> References: <20020621133705.A15296@idi.ntnu.no> Message-ID: Hi all, I am trying to make run a numerical computation (with arrays) in different computers simultaneously (in parallel). The computation is done under Linux. For that purpose a master organizes the process and send rexec (remote execute) commands to the different slaves via the python command spawnlp. The slaves execute the script specified through rexec. Inside this script the slaves open a file with the arguments of the process, which were serialized via pickle, then make the numerical computation, and write the result (a NumPy array) again via pickle in a file. This file is opened by the master, which uses the different results. I am having the problem that the master sometimes (the problem does not happen always!!!) open the result and load an object of instead of the expected object of (what then produces an error). I have tested the type of the objects in the slaves and it is always 'array'. Has someone made similar experiences by 'pickling' arrays? Could it be a problem of the different computers running versions of Python from 2.0 to 2.2.1? Or a problem of different versions of NumPy? Is there any other way for doing such a parallel computation? Thanks for the time... Regards, Aureli -- ################################# Aureli Soria Frisch Fraunhofer IPK Dept. Pattern Recognition post: Pascalstr. 8-9, 10587 Berlin, Germany e-mail: aureli at ipk.fhg.de fon: +49 30 39006-143 fax: +49 30 3917517 web: http://vision.fhg.de/~aureli/web-aureli_en.html ################################# From tchur at optushome.com.au Mon Jun 24 12:15:03 2002 From: tchur at optushome.com.au (Tim Churches) Date: Mon Jun 24 12:15:03 2002 Subject: [Numpy-discussion] Numeric objects, os.spawnlp and pickle References: <20020621133705.A15296@idi.ntnu.no> Message-ID: <3D176B1A.B7F546FC@optushome.com.au> Aureli Soria Frisch wrote: > > Hi all, > > I am trying to make run a numerical computation (with arrays) in > different computers simultaneously (in parallel). The computation is > done under Linux. > > For that purpose a master organizes the process and send rexec > (remote execute) commands to the different slaves via the python > command spawnlp. The slaves execute the script specified through > rexec. > > Inside this script the slaves open a file with the arguments of the > process, which were serialized via pickle, then make the numerical > computation, and write the result (a NumPy array) again via pickle in > a file. This file is opened by the master, which uses the different > results. > > I am having the problem that the master sometimes (the problem does > not happen always!!!) open the result and load an object of 'instance'> instead of the expected object of (what > then produces an error). I have tested the type of the objects in the > slaves and it is always 'array'. > > Has someone made similar experiences by 'pickling' arrays? Could it > be a problem of the different computers running versions of Python > from 2.0 to 2.2.1? Or a problem of different versions of NumPy? > > Is there any other way for doing such a parallel computation? I am not sure what is causing the unpickling problem you are seeing, but I suggest that you consider MPI for what you are doing. There are a number of Python MPI interfaces around, but I can personally recommend PyPar by Ole Nielsen at the Australian National University. You can use PyPar with LAM/MPI, which runs in user mode and is very easy to install, and PyPar itself does not require any modifications to the Python interpreter. PyPar will automatically serialise Python objects for you (and deserialise them at the destination) but also has methods to send NumPy arrays directly which is very efficient. See http://datamining.anu.edu.au/~ole/pypar/ for more details. Tim C From a.schmolck at gmx.net Mon Jun 24 12:28:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Mon Jun 24 12:28:04 2002 Subject: [Numpy-discussion] Numeric objects, os.spawnlp and pickle In-Reply-To: References: <20020621133705.A15296@idi.ntnu.no> Message-ID: Aureli Soria Frisch writes: > Has someone made similar experiences by 'pickling' arrays? Could it be a > problem of the different computers running versions of Python from 2.0 to > 2.2.1? Or a problem of different versions of NumPy? Yes -- pickling isn't meant to work across different python versions (it might to some extent, but I wouldn't try it unless there is no way around it). Using netcdf as a data format instead of pickling might also be a solution (if intermediate storage on the disk is not too inefficient, but your original approach involved that anyway). Konrad Hinsen has written a nice wrapper for python that is quite easy to use: http://starship.python.net/crew/hinsen/scientific.html. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From ransom at physics.mcgill.ca Mon Jun 24 17:06:11 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Jun 24 17:06:11 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> Message-ID: <20020625000529.GA20926@spock.physics.mcgill.ca> Hi Konrad, On Sun, Jun 23, 2002 at 10:20:35AM +0200, Konrad Hinsen wrote: > > be even more inefficient. If I really had large amounts of code that needed > > that conversion, I'd be tempted to write such a function with an additional > > twist: have it monitor the input argument type whenever the program is run and > > I have large amounts of code that would need conversion. However, it > is code that myself and about 100 other users rely on for their daily > work, so it won't be the subject of empirical fixing of any kind. > Either there will be an automatic procedure that is guaranteed to keep > the code working, or there won't be any update. I think you are painting an overly bleak picture -- and one that is certainly more black and white than reality. I am one of those 100 users and I would (will) certainly go through the code that I use on a daily basis (and the other code that I use less frequently) -- just as I have every time there is an update to the Python core or your code. Hell, some of those 30000 line of "your" code are actually _my_ code. And out of those 100 other users, I'd be willing to bet a beer or three that at least a couple would help to track down incompatibilities as well. Many (perhaps even most) of the problems will be able to be spotted by simply running the test codes provided with the individual modules. By generously releasing your code, you have made it possible for your code to become part of my -- and many others -- "standard library". And it is a part that I don't want to get rid of. I truly hope that this incompatibility (i.e. copy vs view) and the time that it will take to update older code will not cause many potentially beneficial (or at least requested) features/changes to be dropped. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From Janne.Sinkkonen at hut.fi Tue Jun 25 05:04:04 2002 From: Janne.Sinkkonen at hut.fi (Janne Sinkkonen) Date: Tue Jun 25 05:04:04 2002 Subject: [Numpy-discussion] (K-Mean) Clustering In-Reply-To: Magnus Lie Hetland's message of "Mon, 24 Jun 2002 15:55:08 +0200" References: <20020624155508.A15028@idi.ntnu.no> Message-ID: <2b7kkno99g.fsf@james.hut.fi> > Using argmin it should be relatively easy to assign each vector to the > cluster with the closest representative (using sum((x-y)**2) as the > distance measure), but how do I calculate the new representatives > effectively? (The representative of a cluster, e.g., 10, should be the > average of all vectors currently assigned to that cluster.) I could > always use a loop and then compress() the data based on cluster > number, but I'm looking for a way of calculating all the averages > "simultaneously", to avoid using a Python loop... I'm sure there's a > simple solution -- I just haven't been able to think of it yet. Any > ideas? Maybe this helps (old code, may contain some suboptimal or otherwise weird things): from Numeric import * from RandomArray import randint import sys def squared_distances(X,Y): return add.outer(sum(X*X,-1),sum(Y*Y,-1))- 2*dot(X,transpose(Y)) def kmeans(data,M, wegstein=0.2, r_convergence=0.001, epsilon=0.001, debug=0, minit=20): """Computes kmeans for DATA with M centers until convergence in the sense that relative change of the quantization error is less than the optional RCONV (3rd param). WEGSTEIN (2nd param), by default .2 but always between 0 and 1, stabilizes the convergence process. EPSILON is used to quarantee centers are initially all different. DEBUG causes some intermediate output to appear to stderr. Returns centers and the average (squared) quantization error. """ N,D=data.shape # Selecting the initial centers has to be done carefully. # We have to ensure all of them are different, otherwise the # algorithm below will produce empty classes. centers=[] if debug: sys.stderr.write("kmeans: Picking centers.\n") while len(centers)0: d=minimum.reduce(squared_distances(array(centers), candidate)) else: d=2*epsilon if d>epsilon: centers.append(candidate) if debug: sys.stderr.write("kmeans: Iterating.\n") centers=array(centers) qerror,old_qerror,counter=None,None,0 while (counterr_convergence): # Initialize # Not like this, you get doubles: centers=take(data,randint(0,N,(M,))) # Iterate: # Squared distances from data to centers (all pairs) distances=squared_distances(data,centers) # Matrix telling which data item is closest to which center x=equal.outer(argmin(distances), arange(centers.shape[0])).astype(Float32) # Compute new centers centers=( ( wegstein)*(dot(transpose(x),data)/sum(x)[...,NewAxis]) + (1.0-wegstein)*centers) # Quantization error old_qerror=qerror qerror=sum(minimum.reduce(distances,1))/N counter=counter+1 if debug: try: sys.stderr.write("%f %f %i\n" %(qerror,old_qerror,counter)) except TypeError: sys.stderr.write("%f None %i\n" %(qerror,counter)) return centers, qerror -- Janne From magnus at hetland.org Tue Jun 25 06:30:04 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Jun 25 06:30:04 2002 Subject: [Numpy-discussion] (K-Mean) Clustering In-Reply-To: <2b7kkno99g.fsf@james.hut.fi>; from Janne.Sinkkonen@hut.fi on Tue, Jun 25, 2002 at 03:03:39PM +0300 References: <20020624155508.A15028@idi.ntnu.no> <2b7kkno99g.fsf@james.hut.fi> Message-ID: <20020625152918.C1200@idi.ntnu.no> Janne Sinkkonen : > [snip] > > Maybe this helps (old code, may contain some suboptimal or otherwise > weird things): Thanks :) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From hinsen at cnrs-orleans.fr Tue Jun 25 06:43:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 25 06:43:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <20020625000529.GA20926@spock.physics.mcgill.ca> (message from Scott Ransom on Mon, 24 Jun 2002 20:05:29 -0400) References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> <20020625000529.GA20926@spock.physics.mcgill.ca> Message-ID: <200206251339.g5PDdkH04049@chinon.cnrs-orleans.fr> > that is certainly more black and white than reality. I am one > of those 100 users and I would (will) certainly go through the > code that I use on a daily basis (and the other code that I use I certainly appreciate any help, but this is not just a matter of amount of time, but also of risk, the risk of introducing bugs. The package that you are using, Scientific Python, is the lesser of my worries, as the individual parts are very independent. My other package, MMTK, is not only bigger, but also consists of many tightly coupled modules. Moreover, I am not aware of any user except for myself who knows the code well enough to be able to work on such an update project. Finally, this is not just my personal problem, there is lots of NumPy code out there, publically released or not, whose developers would face the same difficulties. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From travis at enthought.com Tue Jun 25 12:25:07 2002 From: travis at enthought.com (Travis N. Vaught) Date: Tue Jun 25 12:25:07 2002 Subject: [Numpy-discussion] [ANN] SciPy '02 - Python for Scientific Computing Workshop Message-ID: ---------------------------------------- Python for Scientific Computing Workshop ---------------------------------------- CalTech, Pasadena, CA Septemer 5-6, 2002 http://www.scipy.org/site_content/scipy02 This workshop provides a unique opportunity to learn and affect what is happening in the realm of scientific computing with Python. Attendees will have the opportunity to review the available tools and how they apply to specific problems. By providing a forum for developers to share their Python expertise with the wider industrial, academic, and research communities, this workshop will foster collaboration and facilitate the sharing of software components, techniques and a vision for high level language use in scientific computing. The two-day workshop will be a mix of invited talks and training sessions in the morning. The afternoons will be breakout sessions with the intent of getting standardization of tools and interfaces. The cost of the workshop is $50.00 and includes 2 breakfasts and 2 lunches on Sept. 5th and 6th, one dinner on Sept. 5th, and snacks during breaks. There is a limit of 50 attendees. Should we exceed the limit of 50 registrants, the 50 persons selected to attend will be invited individually by the organizers. Discussion about the conference may be directed to the SciPy-user mailing list: mailto:scipy-user at scipy.org http://www.scipy.org/MailList ------------- Co-Hosted By: ------------- The National Biomedical Computation Resource (NBCR, SDSC, San Diego, CA) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ http://nbcr.sdsc.edu The mission of the National Biomedical Computation Resource at the San Diego Supercomputer Center is to conduct, catalyze, and enable biomedical research by harnessing advanced computational technology. The Center for Advanced Computing Research (CACR, CalTech, Pasadena, CA) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ http://nbcr.sdsc.edu CACR is dedicated to the pursuit of excellence in the field of high-performance computing, communication, and data engineering. Major activities include carrying out large-scale scientific and engineering applications on parallel supercomputers and coordinating collaborative research projects on high-speed network technologies, distributed computing and database methodologies, and related topics. Our goal is to help further the state of the art in scientific computing. Enthought, Inc. (Austin, TX) ^^^^^^^^^^^^^^^ http://enthought.com Enthought, Inc. provides business and scientific computing solutions through software development, consulting and training. Enthought also fosters the development of SciPy (http://scipy.org), an open source library of scientific tools for Python. From magnus at hetland.org Tue Jun 25 14:01:03 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Jun 25 14:01:03 2002 Subject: [Numpy-discussion] Rephrasing the question... Message-ID: <20020625230038.A26576@idi.ntnu.no> Thanks for the input on k-means clustering, but the main questionw as actully this... If I have the following: for i in xrange(k): w[i] = average(compress(C == i, V, 0)) ... can that be expressed without the Python for loop? (I.e. without using compress etc.) I want w[i] to be the average of the vectors in V[x] for which C[x] == i... -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From frankpit at erols.com Wed Jun 26 05:06:05 2002 From: frankpit at erols.com (Bernard Frankpitt) Date: Wed Jun 26 05:06:05 2002 Subject: [Numpy-discussion] Copy/View data point References: Message-ID: <3D19BE44.2060001@erols.com> My preference would be Copy semantics for a=b View semantics for a=b.view (or some other explicit syntax) Bernie From a.schmolck at gmx.net Wed Jun 26 06:30:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 26 06:30:04 2002 Subject: [Numpy-discussion] Copy/View data point In-Reply-To: <3D19BE44.2060001@erols.com> References: <3D19BE44.2060001@erols.com> Message-ID: Bernard Frankpitt writes: > My preference would be > > Copy semantics for a=b > View semantics for a=b.view (or some other explicit syntax) Although I have been arguing for copy semantics for a=b[c:d], what you want is not really possible (a=b creates and always will create an alias in python -- and this is really a good design decision; just compare it to other languages that do different things depending on what you are assigning). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From e.maryniak at pobox.com Wed Jun 26 09:34:04 2002 From: e.maryniak at pobox.com (Eric Maryniak) Date: Wed Jun 26 09:34:04 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) Message-ID: <200206261833.29702.e.maryniak@pobox.com> Dear crunchers, Please excuse me for dropping a feature request here as I'm new to the list and don't have the 'feel' of this list yet. Should feature requests be submitted to the bug tracker? Anyways, I installed Numarray on a SuSE/Linux box, following the Numarray PDF manual's directions. Having installed Python packages (like, ehm, Numeric) before, here are a few impressions: 1. When running 'python setup.py' and 'python setup.py --help' I was surprised to see that already source generation took place: Using EXTRA_COMPILE_ARGS = [] generating new version of Src/_convmodule.c ... generating new version of Src/_ufuncComplex64module.c Normally, you would expect that at build/install time. 2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like: /usr/local/bin/python ./setup.py install --prefix=/usr/local 3. After installation, I usually test the success of a library's import by looking at version info (especially with multiple installations, see [2]). However, numarray does not seem to have version info? : # python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.version '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' >>> sys.version_info (2, 2, 1, 'final', 0) >>> import Numeric >>> Numeric.__version__ '21.3' >>> import numarray >>> numarray.__version__ Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute '__version__' >>> numarray.version Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'version' The __doc__ string: 'numarray: The big enchilada numeric module\n\n $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' does not seem to give a hint at the version (i.c. 0.3.4), either. Well, enough nitpicking for now I guess. Thanks to the Numarray developers for this project, it's much appreciated. Bye-bye, Eric -- Eric Maryniak WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. An error in the premise will appear in the conclusion. From perry at stsci.edu Wed Jun 26 10:30:12 2002 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jun 26 10:30:12 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) In-Reply-To: <200206261833.29702.e.maryniak@pobox.com> Message-ID: Hi Eric, Todd Miller should answer these but he is away for a few days. > > 1. When running 'python setup.py' and 'python setup.py --help' > I was surprised to see that already source generation > took place: > > Using EXTRA_COMPILE_ARGS = [] > generating new version of Src/_convmodule.c > ... > generating new version of Src/_ufuncComplex64module.c > > Normally, you would expect that at build/install time. > Yes, it looks like it does the code generation regardless of the option. We should change that. > 2. Because I'm running two versions of Python (because Zope > and a lot of Zope/C products depend on a particular version) > the 'development' Python is installed in /usr/local/bin > (whereas SuSE's python is in /usr/bin). > It probably wouldn't do any harm if the manual would include > a hint at the '--prefix' option and mention an alternative > Python installation like: > > /usr/local/bin/python ./setup.py install --prefix=/usr/local > Good idea. > 3. After installation, I usually test the success of a library's > import by looking at version info (especially with multiple > installations, see [2]). However, numarray does not seem to > have version info? : > > # python > Python 2.2.1 (#1, Jun 25 2002, 20:45:02) > [GCC 2.95.3 20010315 (SuSE)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.version > '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' > >>> sys.version_info > (2, 2, 1, 'final', 0) > > >>> import Numeric > >>> Numeric.__version__ > '21.3' > > >>> import numarray > >>> numarray.__version__ > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute '__version__' > >>> numarray.version > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'version' > > The __doc__ string: > 'numarray: The big enchilada numeric module\n\n > $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' > does not seem to give a hint at the version (i.c. 0.3.4), either. > Well, I remember putting this on the to do list and thought it had been done, but obviously not. I'm sure Todd will take care of these. Thanks very much for the feedback. Perry From e.maryniak at pobox.com Wed Jun 26 11:48:01 2002 From: e.maryniak at pobox.com (Eric Maryniak) Date: Wed Jun 26 11:48:01 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) In-Reply-To: References: Message-ID: <200206262047.00731.e.maryniak@pobox.com> Hello Perry, On Wednesday 26 June 2002 19:29, Perry Greenfield wrote: > ... > > 2. Because I'm running two versions of Python (because Zope > > and a lot of Zope/C products depend on a particular version) > > the 'development' Python is installed in /usr/local/bin > > (whereas SuSE's python is in /usr/bin). > > It probably wouldn't do any harm if the manual would include > > a hint at the '--prefix' option and mention an alternative > > Python installation like: > > > > /usr/local/bin/python ./setup.py install --prefix=/usr/local > > Good idea. And perhaps another suggestion: no mention is made of the 'setupall.py' script... and setup.py does _not_ install the LinearAlgebra2 (including our favorite SVD ;-), Convolve, RandomArray2 and FFT2 packages. I successfully installed them with: python ./setupall.py install Other minor notes: #1: No FFT2.pth file is generated (the others are ok). It should just include the string 'FFT2'. #2: While RandomArray2 etc. nicely stay away from a concurrently imported Numeric.RandomArray, shouldn't Convolve, for orthogonality, be named Convolve2? (cuz who knows, numarray's Convolve may be backported to Numeric in the future, for comparative testing etc.). Of course in the end, when numarray is to replace Numeric, the '2' could be dropped altogether (breaking some programs then ;-) #3: LinearAlgebra2, RandomArray2 and Convolve have empty __doc__ 's. FFT and these 3 have no __version__ attributes, either (like numarray itself, too). Module sys uses a tuple 'version_info': >>> sys.version_info (2, 2, 1, 'final', 0) allowing fine-grained version testing and e.g. conditional importing etc. based on that. This may be a good idea for numarray, where interfaces may change and you could thus allow your code to support multiple (or rather, evolving) versions of numarray. Btw: imho __versioninfo__ or just __version__ would be a better standard attribute (for all modules) allowing a standard way of testing for major/minor version number, if __version__[0] >= 2: etc() Ideally, numarray's sub-packages' numbers would be in sync with that of numarray itself. Numeric's __version__ is a string, which is not so handy, either. #4: It is very helpful that there are a large number of self-tests of the packages, together with expected values. E.g.: Average of 10000 chi squared random numbers with 11 degrees of freedom (should be about 11 ): 11.0404176623 Variance of those random numbers (should be about 22 ): 21.6517761217 Skewness of those random numbers (should be about 0.852802865422 ): 0.718573002875 But sometimes you wonder (e.g. 0.85 / 0.71) if deviations are not too serious. Perhaps a 95%-int or std.dev. could be added? > >... > Thanks very much for the feedback. > > Perry You're welcome, they're just minor things one notices in the beginning and tends to ignore later; please say so if this kind of feedback should be postponed for later. Bye-bye, Eric -- Eric Maryniak WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. Puzzle: what's another word for synonym? From frankpit at erols.com Wed Jun 26 17:51:03 2002 From: frankpit at erols.com (Bernard Frankpitt) Date: Wed Jun 26 17:51:03 2002 Subject: [Numpy-discussion] Copy/View data point References: <3D19BE44.2060001@erols.com> Message-ID: <3D1A718E.6060300@erols.com> Bernard Frankpitt writes: >> My preference would be >> >> Copy semantics for a=b >> View semantics for a=b.view (or some other explicit syntax) > > And Alexander Schmolck Replies: > Although I have been arguing for copy semantics for a=b[c:d], what > you want is > not really possible (a=b creates and always will create an alias in > python -- Yes, you are right. In my haste I left out the slice notation Bernie From ndavis at spacedata.net Thu Jun 27 14:08:03 2002 From: ndavis at spacedata.net (Norman Davis) Date: Thu Jun 27 14:08:03 2002 Subject: [Numpy-discussion] How are non-contiguous arrays created? Message-ID: <5.1.0.14.0.20020627140032.030b16d0@spacedata.net> Hi All, In the "Copy on demand" discussion, the differences between ravel and flat were discussed with regards to contiguous/non-contiguous arrays. I want to experiment, but after looking/researching I can't figure it out: How is a non-contiguous array created? Thanks. Norman Davis Space Data Corporation From Chris.Barker at noaa.gov Thu Jun 27 15:07:04 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jun 27 15:07:04 2002 Subject: [Numpy-discussion] How are non-contiguous arrays created? References: <5.1.0.14.0.20020627140032.030b16d0@spacedata.net> Message-ID: <3D1B8371.49905EA2@noaa.gov> Norman Davis wrote: > How is a > non-contiguous array created? By slicing an array. Since slicing created a "view" into the same data, it may not represent a contiguous portion of memory. Example: >>> from Numeric import * >>> a = ones((3,4)) >>> a array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]) >>> a.iscontiguous() 1 # a newly created array will always be contiguous >>> b = a[3:3,:] >>> b.iscontiguous() 1 # sliced this way, you get a contiguous array >>> c = a[:,3:3] >>> c.iscontiguous() 0 #but sliced another way you don't -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jmiller at stsci.edu Sun Jun 30 06:24:03 2002 From: jmiller at stsci.edu (Todd Miller) Date: Sun Jun 30 06:24:03 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) References: Message-ID: <3D1F0839.2090802@stsci.edu> Perry Greenfield wrote: >Hi Eric, > >Todd Miller should answer these but he is away for a few days. > >>1. When running 'python setup.py' and 'python setup.py --help' >> I was surprised to see that already source generation >> took place: >> >>Using EXTRA_COMPILE_ARGS = [] >>generating new version of Src/_convmodule.c >>... >>generating new version of Src/_ufuncComplex64module.c >> >> Normally, you would expect that at build/install time. >> >Yes, it looks like it does the code generation regardless of >the option. We should change that. > I'll clean this up. > > >>2. Because I'm running two versions of Python (because Zope >> and a lot of Zope/C products depend on a particular version) >> the 'development' Python is installed in /usr/local/bin >> (whereas SuSE's python is in /usr/bin). >> It probably wouldn't do any harm if the manual would include >> a hint at the '--prefix' option and mention an alternative >> Python installation like: >> >> /usr/local/bin/python ./setup.py install --prefix=/usr/local >> >Good idea. > I'm actually surprised that this is necessary. I was under the impression that the distutils pick reasonable defaults simply based on the python that is running. In your case, I would expect numarray to install to /usr/local/lib/pythonX.Y/site-packages without specifying any prefix. What happens on SuSE? > > >>3. After installation, I usually test the success of a library's >> import by looking at version info (especially with multiple >> installations, see [2]). However, numarray does not seem to >> have version info? : >> > >># python >>Python 2.2.1 (#1, Jun 25 2002, 20:45:02) >>[GCC 2.95.3 20010315 (SuSE)] on linux2 >>Type "help", "copyright", "credits" or "license" for more information. >> >>>>>import sys >>>>>sys.version >>>>> >>'2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' >> >>>>>sys.version_info >>>>> >>(2, 2, 1, 'final', 0) >> >>>>>import Numeric >>>>>Numeric.__version__ >>>>> >>'21.3' >> In numarray, this is spelled: >>> import numinclude >>> numinclude.version '0.3.4' I'll add __version__ to numarray as a synonym. >> >>>>>import numarray >>>>>numarray.__version__ >>>>> >>Traceback (most recent call last): >> File "", line 1, in ? >>AttributeError: 'module' object has no attribute '__version__' >> >>>>>numarray.version >>>>> >>Traceback (most recent call last): >> File "", line 1, in ? >>AttributeError: 'module' object has no attribute 'version' >> >> The __doc__ string: >> 'numarray: The big enchilada numeric module\n\n >> $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' >> does not seem to give a hint at the version (i.c. 0.3.4), either. >> >Well, I remember putting this on the to do list and thought it >had been done, but obviously not. I'm sure Todd will take care >of these. > >Thanks very much for the feedback. > >Perry > Thanks again, Todd > > > >------------------------------------------------------- >This sf.net email is sponsored by: Jabber Inc. >Don't miss the IM event of the season | Special offer for OSDN members! >JabberConf 2002, Aug. 20-22, Keystone, CO http://www.jabberconf.com/osdn >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From eric at enthought.com Sat Jun 1 13:20:42 2002 From: eric at enthought.com (eric) Date: Sat Jun 1 13:20:42 2002 Subject: [Numpy-discussion] bug in negative stride indexing for empty arrays Message-ID: <020101c209a0$5d2bbcc0$6b01a8c0@ericlaptop> Hi, I just ran across a situation where reversing an empty array using a negative stride populates it with a new element. I'm betting this isn't the intended behavior. An example code snippet is below. eric C:\home\ej\wrk\chaco>python Python 2.1.3 (#35, Apr 8 2002, 17:47:50) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> import Numeric >>> Numeric.__version__ '21.0' >>> a = array(()) >>> a zeros((0,), 'l') >>> len(a) 0 >>> b = a[::-1] >>> len(b) 1 >>> b array([0]) -- Eric Jones Enthought, Inc. [www.enthought.com and www.scipy.org] (512) 536-1057 From eric at enthought.com Sat Jun 1 13:48:55 2002 From: eric at enthought.com (eric) Date: Sat Jun 1 13:48:55 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior References: Message-ID: <021e01c209a4$2efa7e50$6b01a8c0@ericlaptop> ----- Original Message ----- From: "Konrad Hinsen" To: "Pearu Peterson" Cc: Sent: Wednesday, May 29, 2002 4:08 AM Subject: Re: [Numpy-discussion] Bug: extremely misleading array behavior > Pearu Peterson writes: > > > an array with 0 rank. It seems that the Numeric documentation is missing > > (though, I didn't look too hard) the following rules of thumb: > > > > If `a' is rank 1 array, then a[i] is Python scalar or object. [MISSING] > > Or rather: > > - If `a' is rank 1 array with elements of type Int, Float, or Complex, > then a[i] is Python scalar or object. [MISSING] > > - If `a' is rank 1 array with elements of type Int16, Int32, Float32, or > Complex32, then a[i] is a rank 0 array. [MISSING] > > - If `a' is rank > 1 array, then a[i] is a sub-array a[i,...] > > The rank-0 arrays are the #1 question topic for users of my netCDF > interface (for portability reasons, netCDF integer arrays map to > Int32, not Int, so scalar integers read from a netCDF array are always > rank-0 arrays), and almost everybody initially claims that it's a bug, > so some education seems necessary. I don't think education is the answer here. We need to change Numeric to have uniform behavior across all typecodes. Having alternative behaviors for indexing based on the typecode can lead to very difficult to find bugs. Generic routines meant to work with any Numeric type can brake a year later when someone passes in an array with a seemingly compatible type. Also, because coersion can silently change typecodes during arithmetic operations, code written expecting one behavior can all the sudden exihibit the other. That is very dangerous and hard to test. eric From jake at edge2.net Mon Jun 3 07:26:02 2002 From: jake at edge2.net (Jake Edge) Date: Mon Jun 3 07:26:02 2002 Subject: [Numpy-discussion] no 3 arg multiply in MA? Message-ID: <20020603082021.A30335@magpie> I was converting a program written for Numeric to use masked arrays and I ran into a problem with multiply ... it would appear that there is no 3 argument version for MA? i.e. a = array([1, 2, 3]) multiply(a,a,a) works fine to square the array using Numeric, but i get an exception: TypeError: __call__() takes exactly 3 arguments (4 given) when doing it using MA ... it seems clear that that is the problem, is it an oversight or just as yet unimplemented or am I missing something? thanks! jake From jake at edge2.net Mon Jun 3 07:45:16 2002 From: jake at edge2.net (Jake Edge) Date: Mon Jun 3 07:45:16 2002 Subject: [Numpy-discussion] some casting oddness? Message-ID: <20020603084002.A30375@magpie> I am using both MA and Numeric in a program that I am writing and ran into some typecasting oddness (at least I thought it was odd). When using only Numeric, adding an array of typecode 'l' and one of typecode '1' produces an array of typecode 'l' whereas using an MA derived array of typecode '1' added to a Numeric array of typecode 'l' produces an array of typecode '1'. Sorry if that is a bit dense, the upshot is that mixing the two causes the output to be the _smaller_ of the two types (Int8 == '1') rather than the larger (Int == 'l') as I would expect ... below is some code that reproduces the problem (it may look contrived (and is), but it comes from the guts of some code I have been playing with): #!/usr/bin/env python from Numeric import * import MA a = zeros((10,)) print a.typecode() b = MA.ones((10,),Int8) b = MA.masked_where(MA.equal(b,1),b,0) print b.typecode() print b.mask().typecode() z = ones((10,),Int8) print z.typecode() c = add(a,b.mask()) print c.typecode() d = add(a,z) print d.typecode() I get output like: l 1 1 1 1 l any thoughts? thanks! jake From hinsen at cnrs-orleans.fr Mon Jun 3 09:34:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 3 09:34:04 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <021e01c209a4$2efa7e50$6b01a8c0@ericlaptop> References: <021e01c209a4$2efa7e50$6b01a8c0@ericlaptop> Message-ID: <200206031630.g53GUKj13666@chinon.cnrs-orleans.fr> > I don't think education is the answer here. We need to change > Numeric to have uniform behavior across all typecodes. I agree that this would be the better solution. But until this is done... > Having alternative behaviors for indexing based on the typecode can > lead to very difficult to find bugs. Generic routines meant to work The differences are not that important, in most circumstances rank-0 arrays and scalars behave in the same way. The problems occur mostly with code that does explicit type checking. The best solution, in my opinion, is to provide scalar objects corresponding to low-precision ints and floats, as part of NumPy. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From paul at pfdubois.com Mon Jun 3 09:56:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 3 09:56:01 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206031630.g53GUKj13666@chinon.cnrs-orleans.fr> Message-ID: <000101c20b1f$4f8df3f0$0c01a8c0@NICKLEBY> Konrad said: > > The best solution, in my opinion, is to provide scalar > objects corresponding to low-precision ints and floats, as > part of NumPy. > > Konrad. One of the thoughts I had in mind for the "kinds" proposal was to support this. I was going to do the float32 object as part of it as a demo of how it would work. So I got out the float object from Python, figuring I would just change a few types et voila. Not. It is very hard to understand, and I don't even understand the reasons it is hard to understand. Perhaps a young person with a high tolerance for pain would look at this? From oliphant.travis at ieee.org Mon Jun 3 13:02:04 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 3 13:02:04 2002 Subject: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000101c20b1f$4f8df3f0$0c01a8c0@NICKLEBY> References: <000101c20b1f$4f8df3f0$0c01a8c0@NICKLEBY> Message-ID: <1023134522.2758.2.camel@travis> On Mon, 2002-06-03 at 10:54, Paul F Dubois wrote: > > Konrad said: > > > > The best solution, in my opinion, is to provide scalar > > objects corresponding to low-precision ints and floats, as > > part of NumPy. > > > > Konrad. > This seems like a good idea. It's been an old source of confusion. On a related note, how does the community feel about retrofitting Numeric with unsigned shorts and unsigned ints. I've got the code to do it already written. -Travis From oliphant.travis at ieee.org Mon Jun 3 23:40:01 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 3 23:40:01 2002 Subject: [Numpy-discussion] Unsigned shorts and ints Message-ID: <1023172789.21778.8.camel@travis> I would like to update the Numeric CVS tree to include support for unsigned shorts and ints. Making the transition will cause some difficulty with binary extensions as these will need to be recompiled with the new Numeric. As a result, I propose that a new release of Numeric be posted (to include the recent bug fixes), and then the changes made for inclusion in the next version number of Numeric. Comments? -Travis From perry at stsci.edu Tue Jun 4 14:18:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 4 14:18:05 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior Message-ID: > > I don't think education is the answer here. We need to change > > Numeric to have uniform behavior across all typecodes. > > I agree that this would be the better solution. But until this is > done... > > > Having alternative behaviors for indexing based on the typecode can > > lead to very difficult to find bugs. Generic routines meant to work > > The differences are not that important, in most circumstances rank-0 > arrays and scalars behave in the same way. The problems occur mostly > with code that does explicit type checking. > > The best solution, in my opinion, is to provide scalar objects > corresponding to low-precision ints and floats, as part of NumPy. > There is another approach that I think is more sensible. >From what I can tell, the driving force behind rank-0 arrays as scalars are the Numeric coercion rules. One needs to retain the 'lesser' integer and float types so that operations with these psuedo-scalars and other arrays does not coerce arrays to a higher type than would have been done when using the nearest equivalent of Python scalars (if there is some other reason, I'd like to know). For example if a and b are Int16 1-d arrays, if indexing an element out of them produced a Python integer value then a[0]*b becomes an Int32 (or even Int64 on some platforms?) array. Numarray has different coercion rules so that this doesn't happen. Thus one doesn't need c[1,1] to give a rank-0 array. (Eric Jones has pointed out privately that another reason is to use different error handling, but if I'm not mistaken so long as one can group all calculations so that no scalar-scalar calculation is done, one doesn't really need rank-0 arrays other than in unusual circumstances.) So I'd argue that numarray solves this issue. For those that can't wait (because numarray currently lacks a feature, library, it's too slow on small arrays or whatever) and you really must modify Numeric I think you would be much better off changing the coercion rules and eliminating rank-0 arrays resulting from ordinary indexing rather than one of the other proposed changes (if that isn't too hard to implement). Of course you get into backward compatibility issues. But really, to get it right, some incompatibility is necessary if you want to eliminate this particular wart. Perry Greenfield From perry at stsci.edu Thu Jun 6 13:30:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 6 13:30:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: [I thought I replied yesterday, but somehow that apparently vanished.] : > "Perry Greenfield" writes: > > > Numarray has different coercion rules so that this doesn't > > happen. Thus one doesn't need c[1,1] to give a rank-0 array. > > What are those coercion rules? > For binary operations between a Python scalar and array, there is no coercion performed on the array type if the scalar is of the same kind as the array (but not same size or precision). For example (assuming ints happen to be 32 bit in this case) Python Int (Int32) * Int16 array --> Int16 array Python Float (Float64) * Float32 array --> Float32 array. But if the Python scalar is of a higher kind, e.g., Python float scalar with Int array, then the array is coerced to the corresponding type of the Python scalar. Python Float (Float64) * Int16 array --> Float64 array. Python Complex (Complex64) * Float32 array --> Complex64 array. Numarray basically has the same coercion rules as Numeric when two arrays are involved (there are some extra twists such as: UInt16 array * Int16 array --> Int32 array since neither input type is a proper subset of the other. (But since Numeric doesn't (or didn't until Travis changed that) have unsigned types, that wouldn't have been an issue with Numeric.) > > (if that isn't too hard to implement). Of course you get into > > backward compatibility issues. But really, to get it right, some > > incompatibility is necessary if you want to eliminate this particular > > wart. > > For a big change such as Numarray, I'd accept some incompatibilities. > For just a new version of NumPy, no. There is a lot of code out there > that uses NumPy, and I am sure that a good part of it relies on the > current coercion rules. Moreover, there is no simple way to detect > code that depends on coercion rules, so adapting existing code would > be an enormous amount of work. > Certainly. I didn't mean to minimize that. But the current coercion rules have produced a demand for solutions to the problem of upcasting, and I consider those solutions to be less than ideal (savespace and rank-0 arrays). If people really are troubled by these warts, I'm arguing that the real solution is in changing the coercion behavior. (Yes, it would be easiest to deal with if Python had all these types, but I think that will never happen, nor should it happen.) Perry From hinsen at cnrs-orleans.fr Fri Jun 7 09:01:47 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Jun 7 09:01:47 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: References: Message-ID: <200206071557.g57FvqY26621@chinon.cnrs-orleans.fr> > For binary operations between a Python scalar and array, there is > no coercion performed on the array type if the scalar is of the > same kind as the array (but not same size or precision). For example > (assuming ints happen to be 32 bit in this case) That solves one problem and creates another... Two, in fact. One is the inconsistency problem: Python type coercion always promotes "smaller" to "bigger" types, it would be good to make no exceptions from this rule. Besides, there are still situations in which types, ranks, and indexing operations depend on each other in a strange way. With a = array([1., 2.], Float) b = array([3., 4.], Float32) the result of a*b is of type Float, whereas a[0]*b is of type Float32 - if and only if a has rank 1. > (Yes, it would be easiest to deal with if Python had all these types, > but I think that will never happen, nor should it happen.) Python doesn't need to have them as standard types, an add-on package can provide them as well. NumPy seems like the obvious one. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Fri Jun 7 09:43:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 7 09:43:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206071557.g57FvqY26621@chinon.cnrs-orleans.fr> Message-ID: > > For binary operations between a Python scalar and array, there is > > no coercion performed on the array type if the scalar is of the > > same kind as the array (but not same size or precision). For example > > (assuming ints happen to be 32 bit in this case) > > That solves one problem and creates another... Two, in fact. One is > the inconsistency problem: Python type coercion always promotes > "smaller" to "bigger" types, it would be good to make no exceptions > from this rule. > > Besides, there are still situations in which types, ranks, and > indexing operations depend on each other in a strange way. With > > a = array([1., 2.], Float) > b = array([3., 4.], Float32) > > the result of > > a*b > > is of type Float, whereas > > a[0]*b > > is of type Float32 - if and only if a has rank 1. > All this is true. It really comes down to which poison you prefer. Neither choice is perfect. Changing the coercion rules results in the inconsistencies you mention. Not changing them results in the existing inconsistencies recently discussed (and still doesn't remove the difficulties of dealing with scalars in expressions without awkward constructs). We think the inconsistencies you point out are easier to live with than the existing behavior. It would be nice to have a solution that had none of these problems, but that doesn't appear to be possible. Perry From hinsen at cnrs-orleans.fr Fri Jun 7 13:49:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Jun 7 13:49:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: (message from Perry Greenfield on Fri, 07 Jun 2002 12:40:40 -0400) References: Message-ID: <200206072046.g57KkJZ27511@chinon.cnrs-orleans.fr> > It would be nice to have a solution that had none of these > problems, but that doesn't appear to be possible. I still believe that the best solution is to define scalar data types corresponding to all array element types. As far as I can see, this doesn't have any of the disadvantages of the other solutions that have been proposed until now. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Fri Jun 7 14:42:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 7 14:42:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206072046.g57KkJZ27511@chinon.cnrs-orleans.fr> Message-ID: : > I still believe that the best solution is to define scalar data types > corresponding to all array element types. As far as I can see, this > doesn't have any of the disadvantages of the other solutions that > have been proposed until now. > If x was a Float32 array how would the following not be promoted to a Float64 array y = x + 1. If you are proposing something like y = x + Float32(1.) it would work, but it sure leads to some awkward expressions. Perry From hinsen at cnrs-orleans.fr Sat Jun 8 15:41:08 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sat Jun 8 15:41:08 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: (message from Perry Greenfield on Fri, 07 Jun 2002 17:42:53 -0400) References: Message-ID: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> > If you are proposing something like > > y = x + Float32(1.) > > it would work, but it sure leads to some awkward expressions. Yes, that's what I am proposing. It's no worse than what we have now, and if writing Float32 a hundred times is too much effort, an abbreviation like f = Float32 helps a lot. Anyway, following the Python credo "explicit is better than implicit", I'd rather write explicit type conversions than have automagical ones surprise me. Finally, we can always lobby for inclusion of the new scalar types into the core interpreter, with a corresponding syntax for literals, but it would sure help if we could show that the system works and suffers only from the lack of literals. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From oliphant.travis at ieee.org Sat Jun 8 18:56:02 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Jun 8 18:56:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> References: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> Message-ID: <1023587755.13067.4.camel@travis> I did not receive any major objections, and so I have released a new Numeric (21.3) incorporating bug fixes. I also tagged the CVS tree with VERSION_21_3, and then I incorporated the unsigned integers and unsigned shorts into the CVS version of Numeric, for inclusion in a tentatively named version 22.0 I've only uploaded a platform independent tar file for 21.3. Any binaries need to be updated. If you are interested in testing the new additions, please let me know of any bugs you find. Thanks, -Travis O. From eric at enthought.com Sun Jun 9 17:19:13 2002 From: eric at enthought.com (eric jones) Date: Sun Jun 9 17:19:13 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <200206080757.g587vO428138@chinon.cnrs-orleans.fr> Message-ID: <000301c21014$50e991b0$6b01a8c0@ericlaptop> > > If you are proposing something like > > > > y = x + Float32(1.) > > > > it would work, but it sure leads to some awkward expressions. > > Yes, that's what I am proposing. It's no worse than what we have now, > and if writing Float32 a hundred times is too much effort, an > abbreviation like f = Float32 helps a lot. > > Anyway, following the Python credo "explicit is better than implicit", > I'd rather write explicit type conversions than have automagical ones > surprise me. How about making indexing (not slicing) arrays *always* return a 0-D array with copy instead of "view" semantics? This is nearly equivalent to creating a new scalar type, but without requiring major changes. I think it is probably even more useful for writing generic code because the returned value with retain array behavior. Also, the following example > a = array([1., 2.], Float) > b = array([3., 4.], Float32) > > a[0]*b would now return a Float array as Konrad desires because a[0] is a Float array. Using copy semantics would fix the unexpected behavior reported by Larry that kicked off this discussion. Slices are a different animal than indexing that would (and definitely should) continue to return view semantics. I further believe that all Numeric functions (sum, product, etc.) should return arrays all the time instead of converting implicitly converting them to Python scalars in special cases such as reductions of 1d arrays. I think the only reason for the silent conversion is that Python lists only allow integer values for use in indexing so that: >>> a = [1,2,3,4] >>> a[array(0)] Traceback (most recent call last): File "", line 1, in ? TypeError: sequence index must be integer Numeric arrays don't have this problem: >>> a = array([1,2,3,4]) >>> a[array(0)] 1 I don't think this alone is a strong enough reason for the conversion. Getting rid of special cases is more important because it makes behavior predictable to the novice (and expert), and it is easier to write generic functions and be sure they will not break a year from now when one of the special cases occurs. Are there other reasons why scalars are returned? On coercion rules: As for adding the array to a scalar value, x = array([3., 4.], Float32) y = x + 1. Should y be a Float or a Float32? I like numarray's coercion rules better (Float32). I have run into this upcasting to many times to count. Explicit and implicit aren't obvious to me here. The user explicitly cast x to be Float32, but because of the limited numeric types in Python, the result is upcast to a double. Here's another example, >>> from Numeric import * >>> a = array((1,2,3,4), UnsignedInt8) >>> left_shift(a,3) array([ 8, 16, 24, 32],'i') I had to stare at this for a while when I first saw it before I realized the integer value 3 upcast the result to be type 'i'. So, I think this is confusing and rarely the desired behavior. The fact that this is inconsistent with Python's "always upcast" rule is minor for me. The array math operations are necessarily a different animal from scalar operations because of the extra types supported. Defining these operations in a way that is most convenient for working with array data seems OK. On the other hand, I don't think a jump from 21 to 22 is enough of a jump to make such a change. Numeric progresses pretty fast, and users don't expect such a major shift in behavior. I do think, though, that the computational speed issue is going to result in numarray and Numeric existing side-by-side for a long time. Perhaps we should think create an "interim" Numeric version (maybe starting at 30), that tries to be compatible with the upcoming numarray, in its coercion rules, etc? Advanced features such as indexing arrays with arrays, memory mapped arrays, floating point exception behavior, etc. won't be there, but it should help people transition their codes to work with numarray, and also offer a speedy alternative. A second choice would be to make SciPy's Numeric implementation the intermediate step. It already produces NaN's during div-by-zero exceptions according to numarray's rules. The coercion modifications could also be incorporated. > > Finally, we can always lobby for inclusion of the new scalar types > into the core interpreter, with a corresponding syntax for literals, > but it would sure help if we could show that the system works and > suffers only from the lack of literals. There was a seriously considered debate last year about unifying Python's numeric model into a single type to get rid of the integer-float distinction, at last year's Python conference and the ensuing months. While it didn't (and won't) happen, I'd be real surprised if the general community would welcome us suggesting stirring yet another type into the brew. Can't we make 0-d arrays work as an alternative? eric > > Konrad. > -- > ------------------------------------------------------------------------ -- > ----- > Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------ -- > ----- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From hinsen at cnrs-orleans.fr Mon Jun 10 10:13:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 10 10:13:05 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000301c21014$50e991b0$6b01a8c0@ericlaptop> References: <000301c21014$50e991b0$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > How about making indexing (not slicing) arrays *always* return a 0-D > array with copy instead of "view" semantics? This is nearly equivalent > to creating a new scalar type, but without requiring major changes. I ... I think this was discussed as well a long time ago. For pure Python code, this would be a very good solution. But > I think the only reason for the silent conversion is that Python lists > only allow integer values for use in indexing so that: There are some more cases where the type matters. If you call C routines that do argument parsing via PyArg_ParseTuple and expect a float argument, a rank-0 float array will raise a TypeError. All the functions from the math module work like that, and of course many in various extension modules. In the ideal world, there would not be any distinction between scalars and rank-0 arrays. But I don't think we'll get there soon. > On coercion rules: > > As for adding the array to a scalar value, > > x = array([3., 4.], Float32) > y = x + 1. > > Should y be a Float or a Float32? I like numarray's coercion rules > better (Float32). I have run into this upcasting to many times to Statistically they probably give the desired result in more cases. But they are in contradiction to Python principles, and consistency counts a lot on my value scale. I propose an experiment: ask a few Python programmers who are not using NumPy what type they would expect for the result. I bet that not a single one would answer "Float32". > On the other hand, I don't think a jump from 21 to 22 is enough of a > jump to make such a change. Numeric progresses pretty fast, and users I don't think any increase in version number is enough for incompatible changes. For many users, NumPy is just a building block, they install it because some other package(s) require it. If a new version breaks those other packages, they won't be happy. The authors of those packages won't be happy either, as they will get the angry letters. As an author of such packages, I am speaking from experience. I have even considered to make my own NumPy distribution under a different name, just to be safe from changes in NumPy that break my code (in the past it was mostly the installation code that was broken when arrayobject.h changed its location). In my opinion, anything that is not compatible with Numeric should not be called Numeric. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From oliphant.travis at ieee.org Mon Jun 10 11:13:07 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 10 11:13:07 2002 Subject: [Numpy-discussion] 0-D arrays as scalars In-Reply-To: References: <000301c21014$50e991b0$6b01a8c0@ericlaptop> Message-ID: <1023732818.28672.13.camel@travis> On Mon, 2002-06-10 at 11:08, Konrad Hinsen wrote: > "eric jones" writes: > > > > I think the only reason for the silent conversion is that Python lists > > only allow integer values for use in indexing so that: > > There are some more cases where the type matters. If you call C > routines that do argument parsing via PyArg_ParseTuple and expect a > float argument, a rank-0 float array will raise a TypeError. All the > functions from the math module work like that, and of course many in > various extension modules. Actually, the code in PyArg_ParseTuple asks the object it gets if it knows how to be a float. 0-d arrays for some time have known how to be Python floats. So, I do not think this error occurs as you've described. Could you demonstrate this error? In fact most of the code in Python itself which needs scalars allows arbitrary objects provided the object has defined functions which return a Python scalar. The only exception to this that I've seen is the list indexing code (probably for optimization purposes). There could be more places, but I have not found them or heard of them. Originally Numeric arrays did not define appropriate functions for 0-d arrays to act like scalars in the right places. For quite a while, they have now. I'm quite supportive of never returning Python scalars from Numeric array operations unless specifically requested (e.g. the toscalar method). > > On coercion rules: > > > > As for adding the array to a scalar value, > > > > x = array([3., 4.], Float32) > > y = x + 1. > > > > Should y be a Float or a Float32? I like numarray's coercion rules > > better (Float32). I have run into this upcasting to many times to > > Statistically they probably give the desired result in more cases. But > they are in contradiction to Python principles, and consistency counts > a lot on my value scale. > > I propose an experiment: ask a few Python programmers who are not > using NumPy what type they would expect for the result. I bet that not > a single one would answer "Float32". > I'm not sure I agree with that at all. On what reasoning is that presumption based? If I encounter a Python object that I'm unfamiliar with, I don't presume to know how it will define multiplication. From paul at pfdubois.com Mon Jun 10 11:20:06 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 11:20:06 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <000001c210ab$47b261c0$0c01a8c0@NICKLEBY> We have certainly beaten this topic to death in the past. It keeps coming up because there is no good way around it. Two points about the x + 1.0 issue: 1. How often this occurs is really a function of what you are doing. For those using Numeric Python as a kind of MATLAB clone, who are typing interactively, the size issue is of less importance and the easy expression is of more importance. To those writing scripts to batch process or writing steered applications, the size issue is more important and the easy expression less important. I'm using words like less and more here because both issues matter to everyone at some time, it is just a question of relative frequency of concern. 2. Part of what I had in mind with the kinds module proposal PEP 0242 was dealing with the literal issue. There had been some proposals to make literals decimal numbers or rationals, and that got me thinking about how to defend myself if they did it, and also about the fact that Python doesn't have Fortran's kind concept which you can use to gain a more platform-independent calculation. >From the PEP this example In module myprecision.py: import kinds tinyint = kinds.int_kind(1) single = kinds.float_kind(6, 90) double = kinds.float_kind(15, 300) csingle = kinds.complex_kind(6, 90) In the rest of my code: from myprecision import tinyint, single, double, csingle n = tinyint(3) x = double(1.e20) z = 1.2 # builtin float gets you the default float kind, properties unknown w = x * float(x) # but in the following case we know w has kind "double". w = x * double(z) u = csingle(x + z * 1.0j) u2 = csingle(x+z, 1.0) Note how that entire code can then be changed to a higher precision by changing the arguments in myprecision.py. Comment: note that you aren't promised that single != double; but you are promised that double(1.e20) will hold a number with 15 decimal digits of precision and a range up to 10**300 or that the float_kind call will fail. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Konrad Hinsen > Sent: Monday, June 10, 2002 10:08 AM > To: eric jones > Cc: numpy-discussion at lists.sourceforge.net > Subject: Re: FW: [Numpy-discussion] Bug: extremely misleading > array behavior > > > "eric jones" writes: > > > How about making indexing (not slicing) arrays *always* > return a 0-D > > array with copy instead of "view" semantics? This is nearly > > equivalent to creating a new scalar type, but without > requiring major > > changes. I > ... > > I think this was discussed as well a long time ago. For pure > Python code, this would be a very good solution. But > > > I think the only reason for the silent conversion is that > Python lists > > only allow integer values for use in indexing so that: > > There are some more cases where the type matters. If you call > C routines that do argument parsing via PyArg_ParseTuple and > expect a float argument, a rank-0 float array will raise a > TypeError. All the functions from the math module work like > that, and of course many in various extension modules. > > In the ideal world, there would not be any distinction > between scalars and rank-0 arrays. But I don't think we'll > get there soon. > > > On coercion rules: > > > > As for adding the array to a scalar value, > > > > x = array([3., 4.], Float32) > > y = x + 1. > > > > Should y be a Float or a Float32? I like numarray's coercion rules > > better (Float32). I have run into this upcasting to many times to > > Statistically they probably give the desired result in more > cases. But they are in contradiction to Python principles, > and consistency counts a lot on my value scale. > > I propose an experiment: ask a few Python programmers who are > not using NumPy what type they would expect for the result. I > bet that not a single one would answer "Float32". > > > On the other hand, I don't think a jump from 21 to 22 is > enough of a > > jump to make such a change. Numeric progresses pretty > fast, and users > > I don't think any increase in version number is enough for > incompatible changes. For many users, NumPy is just a > building block, they install it because some other package(s) > require it. If a new version breaks those other packages, > they won't be happy. The authors of those packages won't be > happy either, as they will get the angry letters. > > As an author of such packages, I am speaking from experience. > I have even considered to make my own NumPy distribution > under a different name, just to be safe from changes in NumPy > that break my code (in the past it was mostly the > installation code that was broken when arrayobject.h changed > its location). > > In my opinion, anything that is not compatible with Numeric > should not be called Numeric. > > Konrad. > -- > -------------------------------------------------------------- > ----------------- > Konrad Hinsen | E-Mail: > hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > -------------------------------------------------------------- > ----------------- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's > Conference August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From perry at stsci.edu Mon Jun 10 12:07:15 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jun 10 12:07:15 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000001c210ab$47b261c0$0c01a8c0@NICKLEBY> Message-ID: : > > We have certainly beaten this topic to death in the past. It keeps > coming up because there is no good way around it. > Ain't that the truth. > Two points about the x + 1.0 issue: > > 1. How often this occurs is really a function of what you are doing. For > those using Numeric Python as a kind of MATLAB clone, who are typing > interactively, the size issue is of less importance and the easy > expression is of more importance. To those writing scripts to batch > process or writing steered applications, the size issue is more > important and the easy expression less important. I'm using words like > less and more here because both issues matter to everyone at some time, > it is just a question of relative frequency of concern. > We have many in the astronomical community that use IDL (instead of MATLAB) and for them size is an issue for interactive use. They often manipulate very large arrays interactively. Furthermore, many are astronomers who don't generally see themselves as programmers and who may write programs (perhaps not great programs) don't want to be bothered by such details even in a script (or they may want to read a "professional" program and not have to deal with such things). But you are right in that there is no solution that doesn't have some problems. Every array language deals with this in somewhat different ways I suspect. In IDL, the literals are generally smaller types (ints were (or used to be, I haven't used it myself in a while) 2 bytes, floats single precision) and there were ways of writing literals with higher precision (e.g., 2L, 2.0d-2). Since it was a language specifically intended to deal with numeric processing, supporting many scalar types made sense. Perry From perry at stsci.edu Mon Jun 10 13:07:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jun 10 13:07:04 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000301c21014$50e991b0$6b01a8c0@ericlaptop> Message-ID: : > I further believe that all Numeric functions (sum, product, etc.) should > return arrays all the time instead of converting implicitly converting > them to Python scalars in special cases such as reductions of 1d arrays. > I think the only reason for the silent conversion is that Python lists > only allow integer values for use in indexing so that: > > >>> a = [1,2,3,4] > >>> a[array(0)] > Traceback (most recent call last): > File "", line 1, in ? > TypeError: sequence index must be integer > > Numeric arrays don't have this problem: > > >>> a = array([1,2,3,4]) > >>> a[array(0)] > 1 > > I don't think this alone is a strong enough reason for the conversion. > Getting rid of special cases is more important because it makes behavior > predictable to the novice (and expert), and it is easier to write > generic functions and be sure they will not break a year from now when > one of the special cases occurs. > > Are there other reasons why scalars are returned? > Well, sure. It isn't just indexing lists directly, it would be anywhere in Python that you would use a number. In some contexts, the right thing may happen (where the function knows to try to obtain a simple number from an object), but then again, it may not (if calling a function where the number is used directly to index or slice). Here is another case where good arguments can be made for both sides. It really isn't an issue of functionality (one can write methods or functions to do what is needed), it's what the convenient syntax does. For example, if we really want a Python scalar but rank-0 arrays are always returned then something like this may be required: >>> x = arange(10) >>> a = range(10) >>> a[scalar(x[2])] # instead of a[x[2]] Whereas if simple indexing returns a Python scalar and consistency is desired in always having arrays returned one may have to do something like this >>> y = x.indexAsArray(2) # instead of y = x[2] or perhaps >>> y = x[ArrayAlwaysAsResultIndexObject(2)] # :-) with better name, of course One context or the other is going to be inconvenienced, but not prevented from doing what is needed. As long as Python scalars are the 'biggest' type of their kind, we strongly lean towards single elements being converted into Python scalars. It's our feeling that there are more surprises and gotchas, particularly for more casual users, on this side than on the uncertainty of an index returning an array or scalar. People writing code that expects to deal with uncertain dimensionality (the only place that this occurs) should be the ones to go the extra distance in more awkward syntax. Perry From eric at enthought.com Mon Jun 10 13:11:02 2002 From: eric at enthought.com (eric jones) Date: Mon Jun 10 13:11:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <000001c210ab$47b261c0$0c01a8c0@NICKLEBY> Message-ID: <001301c210ba$d8e03910$6b01a8c0@ericlaptop> > We have certainly beaten this topic to death in the past. It keeps > coming up because there is no good way around it. > > Two points about the x + 1.0 issue: > > 1. How often this occurs is really a function of what you are doing. For > those using Numeric Python as a kind of MATLAB clone, who are typing > interactively, the size issue is of less importance and the easy > expression is of more importance. To those writing scripts to batch > process or writing steered applications, the size issue is more > important and the easy expression less important. I'm using words like > less and more here because both issues matter to everyone at some time, > it is just a question of relative frequency of concern. > > 2. Part of what I had in mind with the kinds module proposal PEP 0242 > was dealing with the literal issue. There had been some proposals to > make literals decimal numbers or rationals, and that got me thinking > about how to defend myself if they did it, and also about the fact that > Python doesn't have Fortran's kind concept which you can use to gain a > more platform-independent calculation. > > >From the PEP this example > > In module myprecision.py: > > import kinds > tinyint = kinds.int_kind(1) > single = kinds.float_kind(6, 90) > double = kinds.float_kind(15, 300) > csingle = kinds.complex_kind(6, 90) > > In the rest of my code: > > from myprecision import tinyint, single, double, csingle > n = tinyint(3) > x = double(1.e20) > z = 1.2 > # builtin float gets you the default float kind, properties > unknown > w = x * float(x) > # but in the following case we know w has kind "double". > w = x * double(z) > > u = csingle(x + z * 1.0j) > u2 = csingle(x+z, 1.0) > > Note how that entire code can then be changed to a higher > precision by changing the arguments in myprecision.py. > > Comment: note that you aren't promised that single != double; but > you are promised that double(1.e20) will hold a number with 15 > decimal digits of precision and a range up to 10**300 or that the > float_kind call will fail. > I think this is a nice feature, but it's actually heading the opposite direction of where I'd like to see things go for the general use of Numeric. Part of Python's appeal for me is that I don't have to specify types everywhere. I don't want to write explicit casts throughout equations because it munges up their readability. Of course, the casting sometimes can't be helped, but Numeric's current behavior really forces this explicit casting for array types besides double, int, and double complex. I like Numarray's fix for this problem. Also, as Perry noted, its unlikely to be used as an everyday command line tool (like Matlab) if the verbose casting is required. I'm interested to learn what other drawbacks yall found with always returning arrays (0-d for scalars) from Numeric functions. Konrad mentioned the tuple parsing issue in some extension libraries that expects floats, but it sounds like Travis thinks this is no longer an issue. Are there others? eric > > > > > -----Original Message----- > > From: numpy-discussion-admin at lists.sourceforge.net > > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > > Behalf Of Konrad Hinsen > > Sent: Monday, June 10, 2002 10:08 AM > > To: eric jones > > Cc: numpy-discussion at lists.sourceforge.net > > Subject: Re: FW: [Numpy-discussion] Bug: extremely misleading > > array behavior > > > > > > "eric jones" writes: > > > > > How about making indexing (not slicing) arrays *always* > > return a 0-D > > > array with copy instead of "view" semantics? This is nearly > > > equivalent to creating a new scalar type, but without > > requiring major > > > changes. I > > ... > > > > I think this was discussed as well a long time ago. For pure > > Python code, this would be a very good solution. But > > > > > I think the only reason for the silent conversion is that > > Python lists > > > only allow integer values for use in indexing so that: > > > > There are some more cases where the type matters. If you call > > C routines that do argument parsing via PyArg_ParseTuple and > > expect a float argument, a rank-0 float array will raise a > > TypeError. All the functions from the math module work like > > that, and of course many in various extension modules. > > > > In the ideal world, there would not be any distinction > > between scalars and rank-0 arrays. But I don't think we'll > > get there soon. > > > > > On coercion rules: > > > > > > As for adding the array to a scalar value, > > > > > > x = array([3., 4.], Float32) > > > y = x + 1. > > > > > > Should y be a Float or a Float32? I like numarray's coercion rules > > > better (Float32). I have run into this upcasting to many times to > > > > Statistically they probably give the desired result in more > > cases. But they are in contradiction to Python principles, > > and consistency counts a lot on my value scale. > > > > I propose an experiment: ask a few Python programmers who are > > not using NumPy what type they would expect for the result. I > > bet that not a single one would answer "Float32". > > > > > On the other hand, I don't think a jump from 21 to 22 is > > enough of a > > > jump to make such a change. Numeric progresses pretty > > fast, and users > > > > I don't think any increase in version number is enough for > > incompatible changes. For many users, NumPy is just a > > building block, they install it because some other package(s) > > require it. If a new version breaks those other packages, > > they won't be happy. The authors of those packages won't be > > happy either, as they will get the angry letters. > > > > As an author of such packages, I am speaking from experience. > > I have even considered to make my own NumPy distribution > > under a different name, just to be safe from changes in NumPy > > that break my code (in the past it was mostly the > > installation code that was broken when arrayobject.h changed > > its location). > > > > In my opinion, anything that is not compatible with Numeric > > should not be called Numeric. > > > > Konrad. > > -- > > -------------------------------------------------------------- > > ----------------- > > Konrad Hinsen | E-Mail: > > hinsen at cnrs-orleans.fr > > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > > France | Nederlands/Francais > > -------------------------------------------------------------- > > ----------------- > > > > _______________________________________________________________ > > > > Don't miss the 2002 Sprint PCS Application Developer's > > Conference August 25-28 in Las Vegas - > > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > > > > _______________________________________________ > > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From perry at stsci.edu Mon Jun 10 13:37:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jun 10 13:37:02 2002 Subject: [Numpy-discussion] default axis for numarray Message-ID: An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric. To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension. >>> x = arange(20) >>> x.shape = (2,10) >>> x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) >>> add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example, >>> sum = 0 >>> for subarr in x: sum += subarr acts on the first axis in effect. Likewise >>> reduce(add, x) does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always. The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes. Perry From eric at enthought.com Mon Jun 10 14:27:04 2002 From: eric at enthought.com (eric jones) Date: Mon Jun 10 14:27:04 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <001401c210c5$7c1025f0$6b01a8c0@ericlaptop> > : > > I further believe that all Numeric functions (sum, product, etc.) should > > return arrays all the time instead of converting implicitly converting > > them to Python scalars in special cases such as reductions of 1d arrays. > > I think the only reason for the silent conversion is that Python lists > > only allow integer values for use in indexing so that: > > > > >>> a = [1,2,3,4] > > >>> a[array(0)] > > Traceback (most recent call last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > > > Numeric arrays don't have this problem: > > > > >>> a = array([1,2,3,4]) > > >>> a[array(0)] > > 1 > > > > I don't think this alone is a strong enough reason for the conversion. > > Getting rid of special cases is more important because it makes behavior > > predictable to the novice (and expert), and it is easier to write > > generic functions and be sure they will not break a year from now when > > one of the special cases occurs. > > > > Are there other reasons why scalars are returned? > > > Well, sure. It isn't just indexing lists directly, it would be > anywhere in Python that you would use a number. Travis seemed to indicate that the Python would convert 0-d arrays to Python types correctly for most (all?) cases. Python indexing is a little unique because it explicitly requires integers. It's not just 0-d arrays that fail as indexes -- Python floats won't work either. As for passing arrays to functions expecting numbers, is it that much different than passing an integer into a function that does floating point operations? Python handles this casting automatically. It seems like is should do the same for 0-d arrays if they know how to "look like" Python types. > In some contexts, > the right thing may happen (where the function knows to try to obtain > a simple number from an object), but then again, it may not (if calling > a function where the number is used directly to index or slice). > > Here is another case where good arguments can be made for both > sides. It really isn't an issue of functionality (one can write > methods or functions to do what is needed), it's what the convenient > syntax does. For example, if we really want a Python scalar but > rank-0 arrays are always returned then something like this may > be required: > > >>> x = arange(10) > >>> a = range(10) > >>> a[scalar(x[2])] # instead of a[x[2]] Yes, this would be required for using them as array indexes. Or actually: >>> a[int(x[2])] > > Whereas if simple indexing returns a Python scalar and consistency > is desired in always having arrays returned one may have to do > something like this > > >>> y = x.indexAsArray(2) # instead of y = x[2] > > or perhaps > > >>> y = x[ArrayAlwaysAsResultIndexObject(2)] > # :-) with better name, of course > > One context or the other is going to be inconvenienced, but not > prevented from doing what is needed. Right. > > As long as Python scalars are the 'biggest' type of their kind, we > strongly lean towards single elements being converted into Python > scalars. It's our feeling that there are more surprises and gotchas, > particularly for more casual users, on this side than on the uncertainty > of an index returning an array or scalar. People writing code that > expects to deal with uncertain dimensionality (the only place that > this occurs) should be the ones to go the extra distance in more > awkward syntax. Well, I guess I'd like to figure out exactly what breaks before ruling it out because consistently returning the same type from functions/indexing is beneficial. It becomes even more beneficial with the exception behavior used by SciPy and numarray. The two breakage cases I'm aware of are (1) indexing and (2) functions that explicitly check for arguments of IntType, DoubleType, or ComplextType. When searching the standard library for these guys, they only turn up in copy, pickle, xmlrpclib, and the types module -- all in innocuous ways. Searching for 'float' (which is equal to FloatType) doesn't show up any code that breaks this either. A search of my site-packages had IntType tests used quite a bit -- primarily in SciPy. Some of these would go away with this change, and many were harmless. I saw a few that would need fixing (several in special.py), but the fix was trivial. eric From paul at pfdubois.com Mon Jun 10 16:06:02 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 16:06:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <001301c210ba$d8e03910$6b01a8c0@ericlaptop> Message-ID: <000101c210d3$4124cd70$0c01a8c0@NICKLEBY> > Konrad mentioned the tuple parsing issue in some > extension libraries that expects floats, but it sounds like > Travis thinks this is no longer an issue. Are there others? > > eric > Lots of code tries to distinguish cases using isinstance, and these tests will fail if given an array instance when they are testing for a float. From eric at enthought.com Mon Jun 10 16:16:03 2002 From: eric at enthought.com (eric jones) Date: Mon Jun 10 16:16:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> So one contentious issue a day isn't enough, huh? :-) > An issue that has been raised by scipy (most notably Eric Jones > and Travis Oliphant) has been whether the default axis used by > various functions should be changed from the current Numeric > default. This message is not directed at determining whether we > should change the current Numeric behavior for Numeric, but whether > numarray should adopt the same behavior as the current Numeric. > > To be more specific, certain functions and methods, such as > add.reduce(), operate by default on the first axis. For example, > if x is a 2 x 10 array, then add.reduce(x) results in a > 10 element array, where elements in the first dimension has > been summed over rather than the most rapidly varying dimension. > > >>> x = arange(20) > >>> x.shape = (2,10) > >>> x > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > >>> add.reduce(x) > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) The issue here is both consistency across a library and speed. >From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which functions use which and have resorted to explicitly using axis=-1 in my code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the argument list (but this is a different issue -- it just needs to be fixed). SciPy always uses axis=-1 for operations. There are 60+ functions with this convention. Choosing -1 offers the best cache use and therefore should be more efficient. Defaulting to the fastest behavior is convenient because new users don't need any special knowledge of Numeric's implementation to get near peak performance. Also, there is never a question about which axis is used for calculations. When using SciPy and Numeric, their function sets are completely co-mingled. When adding SciPy and Numeric's function counts together, it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a standard, it is impossible for the interface to become intuitive because of the exceptions to the rule from Numeric. So here what I think. All functions should default to the same axis so that the interface to common functions can become second nature for new users and experts alike. Further, the chosen axis should be the most efficient for the most cases. There are actually a few functions that, taken in isolation, I think should have axis=0. take() is an example. But, for the sake of consistency, it too should use axis=-1. It has been suggested to recommend that new users always specify axis=? as a keyword in functions that require an axis argument. This might be fine when writing modules, but always having to type: >>> sum(a,axis=-1) in command line mode is a real pain. Just a point about the larger picture here... The changes we're discussing are intended to clean up the warts on Numeric -- and, as good as it is overall, these are warts in terms of usability. Interfaces should be consistent across a library. The return types from functions should be consistent regardless of input type (or shape). Default arguments to the same keyword should also be consistent across functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 as default, returning arrays or scalars from Numeric functions and indexing), but the choice made should be applied as consistently as possible. We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come. Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size, but I also believe, based on the strength of Python, Numeric, and libraries such as Scientific and SciPy, the community can grow by 2 orders of magnitude over the next five years. This kind of growth can't occur if only savvy developers see the benefits of the elegant language. It can only occur if the general scientist see Python as a compelling alternative to Matlab (and IDL) as their day-in/day-out command line environment for scientific/engineering analysis. Making the interface consistent is one of several steps to making Python more attractive to this community. Whether the changes made for numarray should be migrated back into Numeric is an open question. I think they should, but see Konrad's counterpoint. I'm willing for SciPy to be the intermediate step in the migration between the two, but also think that is sub-optimal. > > Some feel that is contrary to expectations that the least rapidly > varying dimension should be operated on by default. There are > good arguments for both sides. For example, Konrad Hinsen has > argued that the current behavior is most compatible for behavior > of other Python sequences. For example, > > >>> sum = 0 > >>> for subarr in x: > sum += subarr > > acts on the first axis in effect. Likewise > > >>> reduce(add, x) > > does likewise. In this sense, Numeric is currently more consistent > with Python behavior. However, there are other functions that > operate on the most rapidly varying dimension. Unfortunately > I cannot currently access my old mail, but I think the rule > that was proposed under this argument was that if the 'reduction' > operation was of a structural kind, the first dimension is used. > If the reduction or processing step is 'time-series' oriented > (e.g., FFT, convolve) then the last dimension is the default. > On the other hand, some feel it would be much simpler to understand > if the last axis was the default always. > > The question is whether there is a consensus for one approach or > the other. We raised this issue at a scientific Birds-of-a-Feather > session at the last Python Conference. The sense I got there was > that most were for the status quo, keeping the behavior as it is > now. Is the same true here? In the absence of consensus or a > convincing majority, we will keep the behavior the same for backward > compatibility purposes. Obviously, I'm more opinionated about this now than I was then. I really urge you to consider using axis=-1 everywhere. SciPy is not the only scientific library, but I think it adds the most functions with a similar signature (the stats module is full of them). I very much hope for a consistent interface across all of Python's scientific functions because command line users aren't going to care whether sum() and kurtosis() come from different libraries, they just want them to behave consistently. eric > > Perry From ransom at physics.mcgill.ca Mon Jun 10 18:56:03 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Jun 10 18:56:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> References: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> Message-ID: <20020611015544.GC15736@spock.physics.mcgill.ca> I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole). I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)). Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile: >>> arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) >>> zeros(10, typecode='d') Traceback (most recent call last): File "", line 1, in ? TypeError: an integer is required >>> zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) Anyway, these little warts that we are discussing probably haven't kept my astronomer friends from switching from IDL, but as things progress and well-known astronomical or other scientific software packages are released based on Python (like pyraf) from well-known groups (like STScI/NASA), they will certainly take a closer look. On a slightly different note, my hearty thanks to all the developers for all of your hard work so far. Numeric/Numarray+Python is a fantastic platform for scientific computation. Cheers, Scott On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote: > So one contentious issue a day isn't enough, huh? :-) > > > An issue that has been raised by scipy (most notably Eric Jones > > and Travis Oliphant) has been whether the default axis used by > > various functions should be changed from the current Numeric > > default. This message is not directed at determining whether we > > should change the current Numeric behavior for Numeric, but whether > > numarray should adopt the same behavior as the current Numeric. > > > > To be more specific, certain functions and methods, such as > > add.reduce(), operate by default on the first axis. For example, > > if x is a 2 x 10 array, then add.reduce(x) results in a > > 10 element array, where elements in the first dimension has > > been summed over rather than the most rapidly varying dimension. > > > > >>> x = arange(20) > > >>> x.shape = (2,10) > > >>> x > > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > > >>> add.reduce(x) > > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) > > The issue here is both consistency across a library and speed. > > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which > functions use which and have resorted to explicitly using axis=-1 in my > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the > argument list (but this is a different issue -- it just needs to be > fixed). > > SciPy always uses axis=-1 for operations. There are 60+ functions with > this convention. Choosing -1 offers the best cache use and therefore > should be more efficient. Defaulting to the fastest behavior is > convenient because new users don't need any special knowledge of > Numeric's implementation to get near peak performance. Also, there is > never a question about which axis is used for calculations. > > When using SciPy and Numeric, their function sets are completely > co-mingled. When adding SciPy and Numeric's function counts together, > it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a > standard, it is impossible for the interface to become intuitive because > of the exceptions to the rule from Numeric. > > So here what I think. All functions should default to the same axis so > that the interface to common functions can become second nature for new > users and experts alike. Further, the chosen axis should be the most > efficient for the most cases. > > There are actually a few functions that, taken in isolation, I think > should have axis=0. take() is an example. But, for the sake of > consistency, it too should use axis=-1. > > It has been suggested to recommend that new users always specify axis=? > as a keyword in functions that require an axis argument. This might be > fine when writing modules, but always having to type: > > >>> sum(a,axis=-1) > > in command line mode is a real pain. > > Just a point about the larger picture here... The changes we're > discussing are intended to clean up the warts on Numeric -- and, as good > as it is overall, these are warts in terms of usability. Interfaces > should be consistent across a library. The return types from functions > should be consistent regardless of input type (or shape). Default > arguments to the same keyword should also be consistent across > functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 > as default, returning arrays or scalars from Numeric functions and > indexing), but the choice made should be applied as consistently as > possible. > > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. > > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, > but I also believe, based on the strength of Python, Numeric, and > libraries such as Scientific and SciPy, the community can grow by 2 > orders of magnitude over the next five years. This kind of growth can't > occur if only savvy developers see the benefits of the elegant language. > It can only occur if the general scientist see Python as a compelling > alternative to Matlab (and IDL) as their day-in/day-out command line > environment for scientific/engineering analysis. Making the interface > consistent is one of several steps to making Python more attractive to > this community. > > Whether the changes made for numarray should be migrated back into > Numeric is an open question. I think they should, but see Konrad's > counterpoint. I'm willing for SciPy to be the intermediate step in the > migration between the two, but also think that is sub-optimal. > > > > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has > > argued that the current behavior is most compatible for behavior > > of other Python sequences. For example, > > > > >>> sum = 0 > > >>> for subarr in x: > > sum += subarr > > > > acts on the first axis in effect. Likewise > > > > >>> reduce(add, x) > > > > does likewise. In this sense, Numeric is currently more consistent > > with Python behavior. However, there are other functions that > > operate on the most rapidly varying dimension. Unfortunately > > I cannot currently access my old mail, but I think the rule > > that was proposed under this argument was that if the 'reduction' > > operation was of a structural kind, the first dimension is used. > > If the reduction or processing step is 'time-series' oriented > > (e.g., FFT, convolve) then the last dimension is the default. > > On the other hand, some feel it would be much simpler to understand > > if the last axis was the default always. > > > > The question is whether there is a consensus for one approach or > > the other. We raised this issue at a scientific Birds-of-a-Feather > > session at the last Python Conference. The sense I got there was > > that most were for the status quo, keeping the behavior as it is > > now. Is the same true here? In the absence of consensus or a > > convincing majority, we will keep the behavior the same for backward > > compatibility purposes. > > Obviously, I'm more opinionated about this now than I was then. I > really urge you to consider using axis=-1 everywhere. SciPy is not the > only scientific library, but I think it adds the most functions with a > similar signature (the stats module is full of them). I very much hope > for a consistent interface across all of Python's scientific functions > because command line users aren't going to care whether sum() and > kurtosis() come from different libraries, they just want them to behave > consistently. > > eric > > > > > Perry > > > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From paul at pfdubois.com Mon Jun 10 20:20:02 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 20:20:02 2002 Subject: [Numpy-discussion] Selection of a new head nummie Message-ID: <000001c210f6$ce0be340$0c01a8c0@NICKLEBY> It is time to choose the next "head nummie", the chair of the set of sourceforge developers for Numerical Python. Now is an apt time since I will be changing assignments at LLNL in August to one which has less daily use of numpy. We have no procedure for doing this other than for us nummies to come to a consensus amongst ourselves, with the input of the Numpy community. After I return from Europython I hope we can make a selection during the first two weeks of July. From oliphant.travis at ieee.org Mon Jun 10 20:52:04 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Jun 10 20:52:04 2002 Subject: [Numpy-discussion] Some missing keyword argument support fixed in CVS In-Reply-To: <20020611015544.GC15736@spock.physics.mcgill.ca> References: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> <20020611015544.GC15736@spock.physics.mcgill.ca> Message-ID: <1023767534.29865.8.camel@travis> On Mon, 2002-06-10 at 19:55, Scott Ransom wrote: > I have to admit that I agree with all of what Eric has to say > here -- even if it does cause some code breakage (I'm certainly > willing to do some maintenance on my code/modules that are > floating here and there so long as things continue to improve > with the language as a whole). I'm generally of the same opinion. > > I do think consistency is a very important aspect of getting > Numeric/Numarray accepted by a larger user base (and believe > me, my colaborators are probably sick of my Numeric Python > evangelism (but I like to think also a bit jealous of my NumPy > usage as they continue struggling with one-off C and Fortran > routines...)). > Another important factor is the support libraries. I know that something like Simulink (Matlab) is important to many of my colleagues in engineering. Simulink is the Mathworks version of visual programming which lets the user create a circuit visually which is then processed. I believe there was a good start to this sort of thing presented at the last Python Conference which was very encouraging. Other colleagues require something like a compiler to get C-code which will compile on a DSP board from a script and/or design session. I believe something like this would be very beneficial. > Another example of a glaring inconsistency in the current > implementation is this little number that has been bugging me > for awhile: > > >>> arange(10, typecode='d') > array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) > >>> ones(10, typecode='d') > array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) > >>> zeros(10, typecode='d') > Traceback (most recent call last): > File "", line 1, in ? > TypeError: an integer is required > >>> zeros(10, 'd') > array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) > This is now fixed in cvs, along with other keyword problems. The ufunc methods reduce and accumulate also now take a keyword argument in CVS. -Travis From paul at pfdubois.com Mon Jun 10 20:57:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Mon Jun 10 20:57:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <20020611015544.GC15736@spock.physics.mcgill.ca> Message-ID: <000001c210fb$e98098f0$0c01a8c0@NICKLEBY> I guess the argument for uniformity is pretty persuasive after all. (I know, I don't fit in on the Net, you can change my mind). Actually, don't we have a quick and dirty out here? Suppose we make the more uniform choice for Numarray, and then make a new module, say NumericCompatibility, which defines aliases to everything in Numarray that is the same as Numeric and then for the rest defines functions with the same names but the Numeric defaults, implemented by calling the ones in Numarray. Then changing "import Numeric" to "import NumericCompatibility as Numeric" ought to be enough to get someone working or close to working again. Someone posted something about "retrofitting" stuff from Numarray to Numeric. I cannot say strongly enough that I oppose this. Numeric itself must be frozen asap and eliminated eventually or there is no point to having developed a replacement that is easier to expand and maintain. We would have just doubled our workload for nothing. From hinsen at cnrs-orleans.fr Tue Jun 11 05:57:02 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 05:57:02 2002 Subject: [Numpy-discussion] 0-D arrays as scalars In-Reply-To: <1023732818.28672.13.camel@travis> References: <000301c21014$50e991b0$6b01a8c0@ericlaptop> <1023732818.28672.13.camel@travis> Message-ID: Travis Oliphant writes: > Actually, the code in PyArg_ParseTuple asks the object it gets if it > knows how to be a float. 0-d arrays for some time have known how to be > Python floats. So, I do not think this error occurs as you've > described. Could you demonstrate this error? No, it seems gone indeed. I remember a lengthy battle due to this problem, but that was a long time ago. > The only exception to this that I've seen is the list indexing code > (probably for optimization purposes). There could be more places, but > I have not found them or heard of them. Even for indexing, I don't see the point. If you test for the int type and do conversion attempts only for non-ints, that shouldn't slow down normal usage at all. > have now. I'm quite supportive of never returning Python scalars from > Numeric array operations unless specifically requested (e.g. the > toscalar method). I suppose this would be easy to implement, right? Then why not do it in a test release and find out empirically how much code it breaks. > presumption based? If I encounter a Python object that I'm unfamiliar > with, I don't presume to know how it will define multiplication. But if that object pretends to be a number type, a sequence type, a mapping type, etc., I do make assumptions about its behaviour. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Tue Jun 11 06:17:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 06:17:04 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> References: <001501c210d4$b3a2fe70$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > The issue here is both consistency across a library and speed. Consistency, fine. But not just within one package, also between that package and the language it is implemented in. Speed, no. If I need a sum along the first axis, I won't replace it by a sum across the last axis just because that is faster. > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which If you weight by frequency of usage, the first group gains a lot in importance. I just scanned through some of my code; almost all of the calls to Numeric routines are to functions whose default axis is zero. > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the That is certainly something that should be fixed, and I suppose no one objects to that. My vote is for keeping axis defaults as they are, both because the choices are reasonable (there was a long discussion about them in the early days of NumPy, and the defaults were chosen based on other array languages that had already been in use for years) and because any change would cause most existing NumPy code to break in many places, often giving wrong results instead of an error message. If a uniformization of the default is desired, I vote for axis=0, for two reasons: 1) Consistency with Python usage. 2) Minimization of code breakage. > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. What needs to be improved in that area? > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, I'd like to see evidence that changing the current NumPy behaviour would increase the size of the community. It would first of all split the current community, because many users (like myself) do not have enough time to spare to go through their code line by line in order to check for incompatibilities. That many others would switch to Python if only some changes were made is merely an hypothesis. > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has Actually the argument is not for the least rapidly varying dimension, but for the first dimension. The internal data layout is not significant for most Python array operations. We might for example offer a choice of C style and Fortran style data layout, enabling users to choose according to speed, compatibility, or just personal preference. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From paul at pfdubois.com Tue Jun 11 08:29:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Tue Jun 11 08:29:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <000001c2115c$5790dc50$0c01a8c0@NICKLEBY> Konrad's arguments are also very good. I guess there was a good reason we did all that arguing before -- another issue where there is a Perl-like "more than one way to do it" quandry. I think in my own coding reduction on the first dimension is the most frequent. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Konrad Hinsen > Sent: Tuesday, June 11, 2002 6:12 AM > To: eric jones > Cc: 'Perry Greenfield'; numpy-discussion at lists.sourceforge.net > Subject: Re: [Numpy-discussion] RE: default axis for numarray > > > "eric jones" writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also > between that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't > replace it by a sum across the last axis just because that is faster. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, > counting FFT, > > about 10 functions using axis=-1. To this day, I can't > remember which > > If you weight by frequency of usage, the first group gains a > lot in importance. I just scanned through some of my code; > almost all of the calls to Numeric routines are to functions > whose default axis is zero. > > > code. Unfortunately, many of the Numeric functions that > should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I > suppose no one objects to that. > > > My vote is for keeping axis defaults as they are, both > because the choices are reasonable (there was a long > discussion about them in the early days of NumPy, and the > defaults were chosen based on other array languages that had > already been in use for years) and because any change would > cause most existing NumPy code to break in many places, often > giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for > axis=0, for two reasons: > 1) Consistency with Python usage. > 2) Minimization of code breakage. > > > > We should also strive to make it as easy as possible to > write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? > > > Changes are going to create some backward incompatibilities > and that > > is definitely a bummer. But some changes are also necessary before > > the community gets big. I know the community is already reasonable > > size, > > I'd like to see evidence that changing the current NumPy > behaviour would increase the size of the community. It would > first of all split the current community, because many users > (like myself) do not have enough time to spare to go through > their code line by line in order to check for > incompatibilities. That many others would switch to Python if > only some changes were made is merely an hypothesis. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There > are good > > > arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data > layout is not significant for most Python array operations. > We might for example offer a choice of C style and Fortran > style data layout, enabling users to choose according to > speed, compatibility, or just personal preference. > > Konrad. > -- > -------------------------------------------------------------- > ----------------- > Konrad Hinsen | E-Mail: > hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > -------------------------------------------------------------- > ----------------- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's > Conference August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From eric at enthought.com Tue Jun 11 10:45:01 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 10:45:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> > "eric jones" writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also between > that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't replace > it by a sum across the last axis just because that is faster. The default axis choice influences how people choose to lay out their data in arrays. If the default is to sum down columns, then users lay out their data so that this is the order of computation. This results in strided operations. There are cases where you need to reduce over multiple data sets, etc. which is what the axis=? flag is for. But choosing the default to also be the most efficient just makes sense. The cost is even higher for wrappers around C libraries not written explicitly for Python (which is most of them), because you have to re-order the memory before passing the variables into the C loop. Of course, the axis=0 is faster for Fortran libraries with wrappers that are smart enough to recognize this (Pearu's f2py wrapped libraries now recognize this sort of thing). However, the marriage to C is more important as future growth will come in this area more than Fortran. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > > about 10 functions using axis=-1. To this day, I can't remember which > > If you weight by frequency of usage, the first group gains a lot in > importance. I just scanned through some of my code; almost all of the > calls to Numeric routines are to functions whose default axis > is zero. Right, but I think all the reduce operators (sum, product, etc.) should have been axis=-1 in the first place. > > > code. Unfortunately, many of the Numeric functions that should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I suppose no one > objects to that. Sounds like Travis already did it. Thanks. > > > My vote is for keeping axis defaults as they are, both because the > choices are reasonable (there was a long discussion about them in the > early days of NumPy, and the defaults were chosen based on other array > languages that had already been in use for years) and because any > change would cause most existing NumPy code to break in many places, > often giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for axis=0, > for two reasons: > 1) Consistency with Python usage. I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency. I don't see choosing axis=-1 as a break with Python -- multi-dimensional arrays are inherently different and used differently than lists of lists in Python. Further, reduce() is a "corner" of the Python language that has been superceded by list comprehensions. Choosing an alternative behavior that is generally better for array operations, as in the case of slices as views, is worth the change. > 2) Minimization of code breakage. Fixes will be necessary for sure, and I wish that wasn't the case. They will be necessary if we choose a consistent interface in either case. Choosing axis=0 or axis=-1 will not change what needs to be fixed -- only the function names searched for. > > > > We should also strive to make it as easy as possible to write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? Comparisons of complex numbers. But lets save that debate for later. > > > Changes are going to create some backward incompatibilities and that is > > definitely a bummer. But some changes are also necessary before the > > community gets big. I know the community is already reasonable size, > > I'd like to see evidence that changing the current NumPy behaviour > would increase the size of the community. It would first of all split > the current community, because many users (like myself) do not have > enough time to spare to go through their code line by line in order to > check for incompatibilities. That many others would switch to Python > if only some changes were made is merely an hypothesis. True. But I can tell you that we're definitely doing something wrong now. We have a superior language that is easier to integrate with legacy code and less expensive than the best competing alternatives. And, though I haven't done a serious market survey, I feel safe in saying we have significantly less than 1% of the potential user base. Even in communities where Python is relatively prevalent like astronomy, I would bet the every-day user base is less than 5% of the whole. There are a lot of holes to fill (graphics, comprehensive libraries, etc.) before we get up to the capabilities and quality of user interface that these tools have. Some of the interfaces problems are GUI and debugger related. Others are API related. Inconsistency in a library interface makes it harder to learn and is a wart. Whether it is as important as a graphics library? Probably not. But while we're building the next generation tool, we should fix things that make people wonder "why did they do this?". It is rarely a single thing that makes all the difference to a prospective user switching over. It is the overall quality of the tool that will sway them. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There are > > > good arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data layout > is not significant for most Python array operations. We might > for example offer a choice of C style and Fortran style data layout, > enabling users to choose according to speed, compatibility, or > just personal preference. In a way, as Pearu has shown in f2py, this is already possible by jiggering the stride and dimension entries, so this doesn't even require a change to the array descriptor (I don't think...). We could supply functions that returned a Fortran layout array. This would be beneficial for some applications outside of what we're discussing now that use Fortran extensions heavily. As long as it is transparent to the extension writer (which I think it can be) it sounds fine. I think the default constructor should return a C layout array though, and will be what 99% of the users will use. eric > > Konrad. > -- > ------------------------------------------------------------------------ -- > ----- > Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------ -- > ----- From perry at stsci.edu Tue Jun 11 11:07:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 11 11:07:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <001401c210c5$7c1025f0$6b01a8c0@ericlaptop> Message-ID: : > Travis seemed to indicate that the Python would convert 0-d arrays to > Python types correctly for most (all?) cases. Python indexing is a > little unique because it explicitly requires integers. It's not just 0-d > arrays that fail as indexes -- Python floats won't work either. > That's right, the primary breakage would be downstream use as indices. That appeared to be the case with the find() method of strings for example. > Yes, this would be required for using them as array indexes. Or > actually: > > >>> a[int(x[2])] > Yes, this would be sufficient for use as indices or slices. I'm not sure if there is any specific code that checks for float but doesn't invoke automatic conversion. I suspect that floats are much less of a problem this way, though will one necessarily know whether to use int(), float(), or scalar()? If one is writing a generic function that could accept int or float arrays then the generation of a int may be overpresuming what the result will be used for. (Though I don't have a particular example to give, I'll think about whether any exist). If the only type that could possibly cause problems is int, then int() should be all that would be necessary, but still awkward. Perry From eric at enthought.com Tue Jun 11 11:38:05 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 11:38:05 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <001f01c21177$01cd3000$6b01a8c0@ericlaptop> > From: Perry Greenfield [mailto:perry at stsci.edu] > : > > > Travis seemed to indicate that the Python would convert 0-d arrays to > > Python types correctly for most (all?) cases. Python indexing is a > > little unique because it explicitly requires integers. It's not just 0-d > > arrays that fail as indexes -- Python floats won't work either. > > > That's right, the primary breakage would be downstream use as > indices. That appeared to be the case with the find() method > of strings for example. > > > Yes, this would be required for using them as array indexes. Or > > actually: > > > > >>> a[int(x[2])] > > > Yes, this would be sufficient for use as indices or slices. I'm not > sure if there is any specific code that checks for float but doesn't > invoke automatic conversion. I suspect that floats are much less of > a problem this way, though will one necessarily know whether to use > int(), float(), or scalar()? If one is writing a generic function that > could accept int or float arrays then the generation of a int may > be overpresuming what the result will be used for. (Though I don't > have a particular example to give, I'll think about whether any > exist). If the only type that could possibly cause problems is int, > then int() should be all that would be necessary, but still awkward. If numarray becomes a first class citizen in the Python world as is hoped, maybe even this issue can be rectified. List/tuple indexing might be able to be changed to accept single element Integer arrays. I suspect this has major implications though -- probably a question for python-dev. eric From perry at stsci.edu Tue Jun 11 11:44:10 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 11 11:44:10 2002 Subject: [Numpy-discussion] repr for numarray Message-ID: While I'm flooding the mailing list with interface issues, I thought I would air another one (again, for numarray only). We've had some people internally complain that it does not make sense for repr to always generate a string capable of reconstructing the array. We often (usually) deal with multi-megabyte arrays. Typing a variable interactively for one of these arrays is invariably nonsensical. In such cases the user would be much better served by a message indicating the size, shape, type, etc. of the array than all of its contents. Yet on the other hand, it is undeniably convenient to use repr (by typing a variable) for small arrays interactively rather than using a print statement. This leads to 3 possible proposals for handling repr: 1) Do what is done now, always print a string that when eval'ed will recreate the array. 2) Only give summary information for the array regardless of its size. 3) Print the array if it has fewer than THRESHOLD number of elements, otherwise print a summary. THRESHOLD may be adjusted by the user. The last appears to be the most utilitarian to us, yet 'impure' somehow. Certainly there are may objects for which Python does not attempt to generate a string from repr that could be used with eval to recreate them. On the other hand, we are unaware of cases where repr sometimes does and sometimes does not. For example, strings may also get very large, but there is no threshold for generating the string. What do people think the most desirable solution? Keep in mind we intend to develop very efficient functions that will convert arrays to and from ascii representations (currently most of that code is in Python and quite slow in numarray at the moment) so it will not be necessary to use repr for this purpose. Only a few more issues to go, hopefully... Perry From perry at stsci.edu Tue Jun 11 11:53:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jun 11 11:53:02 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> Message-ID: : : > > What needs to be improved in that area? > > Comparisons of complex numbers. But lets save that debate for later. > No, no, let's do it now. ;-) We for one would like to know for numarray what should be done. If I might be presumptious enough to anticipate what Eric would say, it is that complex comparisons should be allowed, and that they use all the information in the complex number (real and imaginary) so that they lead to consistent results in sorting. But the purist argues that comparisons for complex numbers are meaningless. Well, yes, but there are cases in code where you don't which such comparisons to cause an exception. But even more important, there is at least one case which is practical. It isn't all that uncommon to want to eliminate duplicate values from arrays, and one would like to be able to do that for complex values as well. A common technique is to sort the values and then eliminate all identical adjacent values. A predictable comparison rule would allow that to be easily implemented. Eric, am I missing anything in this? It should be obvious that we agree with his position, but I am wondering if there are any arguments we have not heard yet that outweigh the advantages we see. Perry From ransom at physics.mcgill.ca Tue Jun 11 11:54:07 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Tue Jun 11 11:54:07 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: References: Message-ID: On June 11, 2002 02:43 pm, Perry Greenfield wrote: > Yet on the other hand, it is undeniably convenient to use > repr (by typing a variable) for small arrays interactively > rather than using a print statement. This leads to 3 possible > proposals for handling repr: > > 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > > The last appears to be the most utilitarian to us, yet > 'impure' somehow. Certainly there are may objects for which I vote for number 3, and have no hang-ups about any real or perceived "impurity". This is an issue that I deal with daily. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From hinsen at cnrs-orleans.fr Tue Jun 11 12:16:08 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 12:16:08 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> References: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> Message-ID: <200206111912.g5BJCj209939@chinon.cnrs-orleans.fr> > I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. It is an issue in much of my code, which contains stuff written with NumPy in mind as well as code using only standard Python operations (i.e. reduce()) which might however be applied to array objects. I also use arrays and nested lists interchangeably in many situations (NumPy functions accept nested lists instead of array arguments). Especially in interactive use, nested lists are easier to type. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with True, but this affects much fewer programs. Most of my code never modifies arrays after their creation, and then the difference in indexing behaviour doesn't matter. > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > arrays are inherently different and used differently than lists of lists As I said, I often use one or the other as a matter of convenience. I have always considered them similar types with somewhat different specialized behaviour. The most common situation is building up some table with lists (making use of the append function) and then converting the final construct into an array or not, depending on whether this seems advantageous. > in Python. Further, reduce() is a "corner" of the Python language that > has been superceded by list comprehensions. Choosing an alternative List comprehensions work in exactly the same way, by looping over the outermost index. > > 2) Minimization of code breakage. > > Fixes will be necessary for sure, and I wish that wasn't the case. They > will be necessary if we choose a consistent interface in either case. The current interface is not inconsistent. It follows a different logic than what some users expect, but there is a logic behind it. The current rules are the result of lengthy discussions and lengthy tests, though admittedly by a rather small group of people. If you arrange your arrays according to that logic, you almost never need to specify explicit axis arguments. > Choosing axis=0 or axis=-1 will not change what needs to be fixed -- > only the function names searched for. I disagree very much here. The fewer calls are concerned, the fewer mistakes are made, and the fewer modules have to be modified at all. Moreover, the functions that currently use axis=1 are more specialized and more likely to be called in similar contexts. They are also, in my limited experience, less often called with nested list arguments. I don't expect fixes to be as easy as searching for function names and adding an axis argument. Python is a very dynamic language, in which functions are objects like all others. They can be passed as arguments, stored in dictionaries and lists, assigned to variables, etc. In fact, instead of modifying any code, I'd rather write an interface module that emulates the old behaviour, which after all differs only in the default for one argument. The problem with this is that it adds another function call layer, which is rather expensive in Python. Which makes me wonder why we need this discussion at all. It is almost no extra effort to provide two different C modules that provide the same functions with different default arguments, and neither one needs to have any speed penalty. > True. But I can tell you that we're definitely doing something wrong > now. We have a superior language that is easier to integrate with > legacy code and less expensive than the best competing alternatives. > And, though I haven't done a serious market survey, I feel safe in > saying we have significantly less than 1% of the potential user base. I agree with that. But has anyone ever made a serious effort to find out why the whole world is not using Python? In my environment (which is too small to be representative for anything), the main reason is inertia. Most people don't want to invest any time to learn any new language, no matter what the advantages are (they remain hypothetical until you actually start to use the new language). I don't know anyone who has started to use Python and then dropped it because he was not satisfied with some aspect of the language or a library module. On the other hand, I do know projects that collapsed after a split in the user community due to some disagreement over minor details. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From gball at cfa.harvard.edu Tue Jun 11 12:25:08 2002 From: gball at cfa.harvard.edu (Greg Ball) Date: Tue Jun 11 12:25:08 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: Message-ID: > 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. I vote for 3) too. Especially annoying is when I mistakenly type a.shape instead of a.shape() interactively. Without the parentheses I get a bound method, the repr of which includes the repr for the whole array, and when this has > 25 million elements it really is a drag to wait for it all to finish spewing out... Getting sidetracked... is this repr of methods a feature? >>> l = [1,2,3,4] >>> l.sort >>> a = numarray.array(l) >>> a.shape It would seem more pythonic to get or similar? -- Greg Ball From hinsen at cnrs-orleans.fr Tue Jun 11 12:27:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 11 12:27:05 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > > The last appears to be the most utilitarian to us, yet > 'impure' somehow. Certainly there are may objects for which > Python does not attempt to generate a string from repr that > could be used with eval to recreate them. On the other hand, > we are unaware of cases where repr sometimes does and sometimes I don't see the problem. The documented behaviour would be that it doesn't allow reconstruction. If for some arrays that works nevertheless, who is going to complain? BTW, it would be nice if the summary would contain the values of some elements, to allow a quick identification of NaN arrays and similar problems. > does not. For example, strings may also get very large, but > there is no threshold for generating the string. Right. But in practice strings rarely do get that large. Arrays do. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From tim.hochberg at ieee.org Tue Jun 11 12:30:02 2002 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue Jun 11 12:30:02 2002 Subject: [Numpy-discussion] repr for numarray References: Message-ID: <033001c2117e$3cd0e5f0$061a6244@cx781526b> I would also be inclined toward option 3 with the caveat that THRESHOLD=None should print all the values for the purists out there (or if you want to use repr to dump the array to some sort of flat file). -tim > On June 11, 2002 02:43 pm, Perry Greenfield wrote: > > > Yet on the other hand, it is undeniably convenient to use > > repr (by typing a variable) for small arrays interactively > > rather than using a print statement. This leads to 3 possible > > proposals for handling repr: > > > > 1) Do what is done now, always print a string that when > > eval'ed will recreate the array. > > > > 2) Only give summary information for the array regardless of > > its size. > > > > 3) Print the array if it has fewer than THRESHOLD number of > > elements, otherwise print a summary. THRESHOLD may be adjusted > > by the user. > > > > The last appears to be the most utilitarian to us, yet > > 'impure' somehow. Certainly there are may objects for which > > I vote for number 3, and have no hang-ups about any real or perceived > "impurity". This is an issue that I deal with daily. > > Scott > > > -- > Scott M. Ransom Address: McGill Univ. Physics Dept. > Phone: (514) 398-6492 3600 University St., Rm 338 > email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 > GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From paul at pfdubois.com Tue Jun 11 13:55:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Tue Jun 11 13:55:01 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: <033001c2117e$3cd0e5f0$061a6244@cx781526b> Message-ID: <001e01c2118a$199a3210$0c01a8c0@NICKLEBY> MA users seem to all be happy with the facility in MA for limiting printing. >>> x=MA.arange(20) >>> x array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,]) >>> MA.set_print_limit(10) >>> x array([0,1,2,3,4,5,6,7,8,9,] + 10 more elements) >>> print x [0,1,2,3,4,5,6,7,8,9,] + 10 more elements >>> MA.set_print_limit(0) # no limit >>> x array([ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]) > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Tim Hochberg > Sent: Tuesday, June 11, 2002 12:29 PM > To: numpy-discussion at lists.sourceforge.net > Subject: Re: [Numpy-discussion] repr for numarray > > > > I would also be inclined toward option 3 with the caveat that > THRESHOLD=None should print all the values for the purists > out there (or if you want to use repr to dump the array to > some sort of flat file). > > -tim > > > > On June 11, 2002 02:43 pm, Perry Greenfield wrote: > > > > > Yet on the other hand, it is undeniably convenient to use > repr (by > > > typing a variable) for small arrays interactively rather > than using > > > a print statement. This leads to 3 possible proposals for > handling > > > repr: > > > > > > 1) Do what is done now, always print a string that when > eval'ed will > > > recreate the array. > > > > > > 2) Only give summary information for the array regardless of its > > > size. > > > > > > 3) Print the array if it has fewer than THRESHOLD number of > > > elements, otherwise print a summary. THRESHOLD may be adjusted by > > > the user. > > > > > > The last appears to be the most utilitarian to us, yet 'impure' > > > somehow. Certainly there are may objects for which > > > > I vote for number 3, and have no hang-ups about any real or > perceived > > "impurity". This is an issue that I deal with daily. > > > > Scott > > > > > > -- > > Scott M. Ransom Address: McGill Univ. Physics Dept. > > Phone: (514) 398-6492 3600 University St., Rm 338 > > email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 > > GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 > > > > _______________________________________________________________ > > > > Multimillion Dollar Computer Inventory > > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > > > > _______________________________________________ > > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From paul at pfdubois.com Tue Jun 11 13:57:03 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Tue Jun 11 13:57:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: Message-ID: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> One can make a case for allowing == and != for complex arrays, but > just doesn't make sense and should not be allowed. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Perry Greenfield > Sent: Tuesday, June 11, 2002 11:52 AM > To: eric jones; 'Konrad Hinsen' > Cc: numpy-discussion at lists.sourceforge.net > Subject: RE: [Numpy-discussion] RE: default axis for numarray > > > : > : > > > > What needs to be improved in that area? > > > > Comparisons of complex numbers. But lets save that debate > for later. > > > > No, no, let's do it now. ;-) We for one would like to know for > numarray what should be done. > > If I might be presumptious enough to anticipate what Eric > would say, it is that complex comparisons should be allowed, > and that they use all the information in the complex number > (real and imaginary) so that they lead to consistent results > in sorting. > > But the purist argues that comparisons for complex numbers > are meaningless. Well, yes, but there are cases in code where you > don't which such comparisons to cause an exception. But even > more important, there is at least one case which is > practical. It isn't all that uncommon to want to eliminate > duplicate values from arrays, and one would like to be able > to do that for > complex values as well. A common technique is to sort the > values and then eliminate all identical adjacent values. A > predictable comparison rule would allow that to be easily implemented. > > Eric, am I missing anything in this? It should be obvious > that we agree with his position, but I am wondering if there > are any arguments we have not heard yet that outweigh the > advantages we see. > > Perry > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ransom at physics.mcgill.ca Tue Jun 11 14:08:05 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Tue Jun 11 14:08:05 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> Message-ID: On June 11, 2002 04:56 pm, you wrote: > One can make a case for allowing == and != for complex arrays, but > > just doesn't make sense and should not be allowed. It depends if you think of complex numbers in phasor form or not. In phasor form, the amplitude of the complex number is certainly something that you could compare with > or < -- and in my opinion, that seems like a reasonable comparison. You _could_ do the same thing with the phases, except you run into the modulo 2pi thing... Scott > > -----Original Message----- > > From: numpy-discussion-admin at lists.sourceforge.net > > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > > Behalf Of Perry Greenfield > > Sent: Tuesday, June 11, 2002 11:52 AM > > To: eric jones; 'Konrad Hinsen' > > Cc: numpy-discussion at lists.sourceforge.net > > Subject: RE: [Numpy-discussion] RE: default axis for numarray > > > > > > : > > > > : > > > > What needs to be improved in that area? > > > > > > Comparisons of complex numbers. But lets save that debate > > > > for later. > > > > > > No, no, let's do it now. ;-) We for one would like to know for > > numarray what should be done. > > > > If I might be presumptious enough to anticipate what Eric > > would say, it is that complex comparisons should be allowed, > > and that they use all the information in the complex number > > (real and imaginary) so that they lead to consistent results > > in sorting. > > > > But the purist argues that comparisons for complex numbers > > are meaningless. Well, yes, but there are cases in code where you > > don't which such comparisons to cause an exception. But even > > more important, there is at least one case which is > > practical. It isn't all that uncommon to want to eliminate > > duplicate values from arrays, and one would like to be able > > to do that for > > complex values as well. A common technique is to sort the > > values and then eliminate all identical adjacent values. A > > predictable comparison rule would allow that to be easily implemented. > > > > Eric, am I missing anything in this? It should be obvious > > that we agree with his position, but I am wondering if there > > are any arguments we have not heard yet that outweigh the > > advantages we see. > > > > Perry > > > > _______________________________________________________________ > > > > Multimillion Dollar Computer Inventory > > Live Webcast Auctions Thru Aug. 2002 - > > http://www.cowanalexander.com/calendar > > > > > > > > > > _______________________________________________ > > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From eric at enthought.com Tue Jun 11 15:01:02 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 15:01:02 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <3D063B43.F2095A08@noaa.gov> Message-ID: <002801c21193$4fbf7f90$6b01a8c0@ericlaptop> > From: cbarker at localhost.localdomain [mailto:cbarker at localhost.localdomain] > > eric jones wrote: > > The default axis choice influences how people choose to lay out their > > data in arrays. If the default is to sum down columns, then users lay > > out their data so that this is the order of computation. > > This is absolutely true. I definitely choose my data layout to that the > various rank reducing operators do what I want. Another reason to have > consistency. So I don't really care which way is default, so the default > might as well be the better performing option. > > Of course, compatibility with previous versions is helpful too...arrrgg! > > What kind of a performance difference are we talking here anyway? Guess I ought to test instead of just saying it is so... I ran the following test of summing 200 sets of 10000 numbers. I expected a speed-up of about 2... I didn't get it. They are pretty much the same speed on my machine.?? (more later) C:\WINDOWS\system32>python ActivePython 2.2.1 Build 222 (ActiveState Corp.) based on Python 2.2.1 (#34, Apr 15 2002, 09:51:39) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> import time >>> a = ones((10000,200),Float) * arange(10000)[:,NewAxis] >>> b = ones((200,10000),Float) * arange(10000)[NewAxis,:] >>> t1 = time.clock();x=sum(a,axis=0);t2 = time.clock();print t2-t1 0.0772411018719 >>> t1 = time.clock();x=sum(b,axis=-1);t2 = time.clock();print t2-t1 0.079615705348 I also tried FFT, and did see a difference -- a speed up of 1.5+: >>> q = ones((1024,1024),Float) >>> t1 = time.clock();x = FFT.fft(q,axis=0);t2 = time.clock();print t2-t1 0.907373143793 >>> t1 = time.clock();x= FFT.fft(q,axis=-1);t2 = time.clock();print t2-t1 0.581641800843 >>> .907/.581 1.5611015490533564 Same in scipy >>> from scipy import * >>> a = ones((1024,1024),Float) >>> import time >>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1 0.870259488287 >>> t1 = time.clock(); q = fft(a,axis=-1); t2 = time.clock();print t2-t1 0.489512214541 >>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1 0.849266317367 >>> .849/.489 1.7361963190184049 So why is sum() the same speed for both cases? I don't know. I wrote a quick C program that is similar to how Numeric loops work, and I saw about a factor of 4 improvement by summing rows instead columns: C:\home\eric\wrk\axis_speed>gcc -O2 axis.c C:\home\eric\wrk\axis_speed>a summing rows (sec): 0.040000 summing columns (sec): 0.160000 pass These numbers are more like what I expected to see in the Numeric tests, but they are strange when compared to the Numeric timings -- the row sum is twice as fast as Numeric while the column sum is twice as slow. Because all the work is done in C and we're summing reasonably long arrays, the Numeric and C versions should be roughly the same speed. I can understand why summing rows is twice as fast in my C routine -- the Numeric loop code is not going to win awards for being optimal. What I don't understand is why the column summation is twice as slow in my C code as in Numeric. This should not be. I've posted it below in case someone can enlighten me. I think in general, you should see a speed up of 1.5+ when the summing over the "faster" axis. This holds true for fft in Python and my sum in C. As to why I don't in Numeric's sum(), I'm not sure. It is certainly true that non-strided access makes the best use of cache and *usually* is faster. eric ------------------------------------------------------------------------ -- #include #include int main() { double *a, *sum1, *sum2; int i, j, si, sj, ind, I, J; int small=200, big=10000; time_t t1, t2; I = small; J = big; si = big; sj = 1; a = (double*)malloc(I*J*sizeof(double)); sum1 = (double*)malloc(small*sizeof(double)); sum2 = (double*)malloc(small*sizeof(double)); //set memory for(i = 0; i < I; i++) { sum1[i] = 0; sum2[i] = 0; ind = si * i; for(j = 0; j < J; j++) { a[ind] = (double)j; ind += sj; } ind += si; } t1 = clock(); for(i = 0; i < I; i++) { sum1[i] = 0; ind = si * i; for(j = 0; j < J; j++) { sum1[i] += a[ind]; ind += sj; } ind += si; } t2 = clock(); printf("summing rows (sec): %f\n", (t2-t1)/(float)CLOCKS_PER_SEC); I = big; J = small; sj = big; si = 1; t1 = clock(); //set memory for(i = 0; i < I; i++) { ind = si * i; for(j = 0; j < J; j++) { a[ind] = (double)i; ind += sj; } ind += si; } for(j = 0; j < J; j++) { sum2[j] = 0; ind = sj * j; for(i = 0; i < I; i++) { sum2[j] += a[ind]; ind += si; } } t2 = clock(); printf("summing columns (sec): %f\n", (t2-t1)/(float)CLOCKS_PER_SEC); for (i=0; i < small; i++) { if(sum1[i] != sum2[i]) printf("failure %d, %f %f\n", i, sum1[i], sum2[i]); } printf("pass %f\n", sum1[0]); return 0; } From a.schmolck at gmx.net Tue Jun 11 16:03:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Tue Jun 11 16:03:02 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <001f01c21177$01cd3000$6b01a8c0@ericlaptop> References: <001f01c21177$01cd3000$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with > consistency in a very utilized area of Python because of efficiency. Ahh, a loaded example ;) I always thought that Numeric's view-slicing is a fairly problematic deviation from standard Python behavior and I'm not entirely sure why it needs to be done that way. Couldn't one have both consistency *and* efficiency by implementing a copy-on-demand scheme (which is what matlab does, if I'm not entirely mistaken; a real copy gets only created if either the original or the 'copy' is modified)? The current behavior seems not just problematic because it breaks consistency and hence user expectations, it also breaks code that is written with more pythonic sequences in mind (in a potentially hard to track down manner) and is, IMHO generally undesirable and error-prone, for pretty much the same reasons that dynamic scope and global variables are generally undesirable and error-prone -- one can unwittingly create intricate interactions between remote parts of a program that can be very difficult to track down. Obviously there *are* cases where one really wants a (partial) view of an existing array. It would seem to me, however, that these cases are exceedingly rare (In all my Numeric code I'm only aware of one instance where I actually want the aliasing behavior, so that I can manipulate a large array by manipulating its views and vice versa). Thus rather than being the default behavior, I'd rather see those cases accommodated by a special syntax that makes it explicit that an alias is desired and that care must be taken when modifying either the original or the view (e.g. one possible syntax would be ``aliased_vector = m.view[:,1]``). Again I think the current behavior is somewhat analogous to having variables declared in global (or dynamic) scope by default which is not only error-prone, it also masks those cases where global (or dynamic) scope *is* actually desired and necessary. It might be that the problems associated with a copy-on-demand scheme outweigh the error-proneness, the interface breakage that the deviation from standard python slicing behavior causes, but otherwise copying on slicing would be an backwards incompatibility in numarray I'd rather like to see (especially since one could easily add a view attribute to Numeric, for forwards-compatibility). I would also suspect that this would make it *a lot* easier to get numarray (or parts of it) into the core, but this is just a guess. > > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > arrays are inherently different and used differently than lists of lists > in Python. Further, reduce() is a "corner" of the Python language that > has been superceded by list comprehensions. Choosing an alternative Guido might nowadays think that adding reduce was as mistake, so in that sense it might be a "corner" of the python language (although some people, including me, still rather like using reduce), but I can't see how you can generally replace reduce with anything but a loop. Could you give an example? alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From oliphant at ee.byu.edu Tue Jun 11 16:26:02 2002 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Jun 11 16:26:02 2002 Subject: [Numpy-discussion] repr for numarray In-Reply-To: Message-ID: > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. I think this is best. I don't believe the convention of repr is critical to numarray. -Travis From reggie at merfinllc.com Tue Jun 11 16:31:02 2002 From: reggie at merfinllc.com (Reggie Dugard) Date: Tue Jun 11 16:31:02 2002 Subject: [Numpy-discussion] repr for numarray Message-ID: <1023838218.23968.274.camel@auk> I vote for number 3 as well. As Paul already noted, his MA module already does something similar to this and I've found that very handy while working interactively. On Tue, 2002-06-11 at 11:43, Perry Greenfield wrote: > ... > Yet on the other hand, it is undeniably convenient to use > repr (by typing a variable) for small arrays interactively > rather than using a print statement. This leads to 3 possible > proposals for handling repr: > > 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > >... From eric at enthought.com Tue Jun 11 22:28:03 2002 From: eric at enthought.com (eric jones) Date: Tue Jun 11 22:28:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: Message-ID: <003201c211d1$d673de30$6b01a8c0@ericlaptop> > "eric jones" writes: > > > > I think the consistency with Python is less of an issue than it seems. > > I wasn't aware that add.reduce(x) would generated the same results as > > the Python version of reduce(add,x) until Perry pointed it out to me. > > There are some inconsistencies between Python the language and Numeric > > because the needs of the Numeric community. For instance, slices create > > views instead of copies as in Python. This was a correct break with > > consistency in a very utilized area of Python because of efficiency. > > Ahh, a loaded example ;) I always thought that Numeric's view-slicing is a > fairly problematic deviation from standard Python behavior and I'm not > entirely sure why it needs to be done that way. > > Couldn't one have both consistency *and* efficiency by implementing a > copy-on-demand scheme (which is what matlab does, if I'm not entirely > mistaken; a real copy gets only created if either the original or the > 'copy' > is modified)? Well, slices creating copies is definitely a bad idea (which is what I have heard proposed before) -- finite difference calculations (and others) would be very slow with this approach. Your copy-on-demand suggestion might work though. Its implementation would be more complex, but I don't think it would require cooperation from the Python core.? It could be handled in the ufunc code. It would also require extension modules to make copies before they modified any values. Copy-on-demand doesn't really fit with python's 'assignments are references" approach to things though does it? Using foo = bar in Python and then changing an element of foo will also change bar. So, I guess there would have to be a distinction made here. This adds a little more complexity. Personally, I like being able to pass views around because it allows for efficient implementations. The option to pass arrays into extension function and edit them in-place is very nice. Copy-on-demand might allow for equal efficiency -- I'm not sure. I haven't found the current behavior very problematic in practice and haven't seen that it as a major stumbling block to new users. I'm happy with status quo on this. But, if copy-on-demand is truly efficient and didn't make extension writing a nightmare, I wouldn't complain about the change either. I have a feeling the implementers of numarray would though. :-) And talk about having to modify legacy code... > The current behavior seems not just problematic because it > breaks consistency and hence user expectations, it also breaks code that > is > written with more pythonic sequences in mind (in a potentially hard to > track > down manner) and is, IMHO generally undesirable and error-prone, for > pretty > much the same reasons that dynamic scope and global variables are > generally > undesirable and error-prone -- one can unwittingly create intricate > interactions between remote parts of a program that can be very difficult > to > track down. > > Obviously there *are* cases where one really wants a (partial) view of an > existing array. It would seem to me, however, that these cases are > exceedingly > rare (In all my Numeric code I'm only aware of one instance where I > actually > want the aliasing behavior, so that I can manipulate a large array by > manipulating its views and vice versa). Thus rather than being the > default > behavior, I'd rather see those cases accommodated by a special syntax that > makes it explicit that an alias is desired and that care must be taken > when > modifying either the original or the view (e.g. one possible syntax would > be > ``aliased_vector = m.view[:,1]``). Again I think the current behavior is > somewhat analogous to having variables declared in global (or dynamic) > scope > by default which is not only error-prone, it also masks those cases where > global (or dynamic) scope *is* actually desired and necessary. > > It might be that the problems associated with a copy-on-demand scheme > outweigh the error-proneness, the interface breakage that the deviation > from > standard python slicing behavior causes, but otherwise copying on slicing > would be an backwards incompatibility in numarray I'd rather like to see > (especially since one could easily add a view attribute to Numeric, for > forwards-compatibility). I would also suspect that this would make it *a > lot* > easier to get numarray (or parts of it) into the core, but this is just a > guess. I think the two things Guido wants for inclusion of numarray is a consensus from our community on what we want, and (more importantly) a comprehensible code base. :-) If Numeric satisfied this 2nd condition, it might already be slated for inclusion... The 1st is never easy with such varied opinions -- I've about concluded that Konrad and I are anti-particles :-) -- but I hope it will happen. > > > > > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > > arrays are inherently different and used differently than lists of lists > > in Python. Further, reduce() is a "corner" of the Python language that > > has been superceded by list comprehensions. Choosing an alternative > > Guido might nowadays think that adding reduce was as mistake, so in that > sense > it might be a "corner" of the python language (although some people, > including > me, still rather like using reduce), but I can't see how you can generally > replace reduce with anything but a loop. Could you give an example? Your right. You can't do it without a loop. List comprehensions only supercede filter and map since they always return a list. I think reduce is here to stay. And, like you, I would actually be disappointed to see it go (I like lambda too...) The point is that I wouldn't choose the definition of sum() or product() based on the behavior of Python's reduce operator. Hmmm. So I guess that is key -- its really these *function* interfaces that I disagree with. So, how about add.reduce() keep axis=0 to match the behavior of Python, but sum() and friends defaulted to axis=-1 to match the rest of the library functions? It does break with consistency across the library, so I think it is sub-optimal. However, the distinction is reasonably clear and much less likely to cause confusion. It also allows FFT and future modules (wavelets or whatever) operate across the fastest axis by default while conforming to an intuitive standard. take() and friends would also become axis=-1 for consistency with all other functions. Would this be a reasonable compromise? eric > > > alex > > -- > Alexander Schmolck Postgraduate Research Student > Department of Computer Science > University of Exeter > A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From groma at nucleus.szbk.u-szeged.hu Tue Jun 11 23:30:03 2002 From: groma at nucleus.szbk.u-szeged.hu (Geza Groma) Date: Tue Jun 11 23:30:03 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? Message-ID: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> Using Numeric-21.0.win32-py2.2 I found this: Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> a = array((1, 1), 'b') >>> b = array((1, 0), 'b') >>> a and b array([1, 0],'b') >>> b and a array([1, 1],'b') >>> It looks like a bug, or at least very weird. a&b and b&a work correctly. -- G?za Groma Institute of Biophysics, Biological Research Center of Hungarian Academy of Sciences Temesv?ri krt.62. 6726 Szeged Hungary phone: +36 62 432 232 fax: +36 62 433 133 From hinsen at cnrs-orleans.fr Wed Jun 12 01:36:01 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 01:36:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> Message-ID: Scott Ransom writes: > On June 11, 2002 04:56 pm, you wrote: > > One can make a case for allowing == and != for complex arrays, but > > > just doesn't make sense and should not be allowed. > > It depends if you think of complex numbers in phasor form or not. In phasor > form, the amplitude of the complex number is certainly something that you > could compare with > or < -- and in my opinion, that seems like a reasonable Sure, but that doesn't give a full order relation for complex numbers. Two different numbers with equal magnitude would be neither equal nor would one be larger than the other. I agree with Paul that complex comparison should not be allowed. On the other hand, Perry's argument about sorting makes sense as well. Is there anything that prevents us from permitting arraysort() on complex arrays but not the comparison operators? Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Wed Jun 12 01:55:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 01:55:03 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <003201c211d1$d673de30$6b01a8c0@ericlaptop> References: <003201c211d1$d673de30$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > others) would be very slow with this approach. Your copy-on-demand > suggestion might work though. Its implementation would be more complex, > but I don't think it would require cooperation from the Python core.? It wouldn't, and I am not sure the implementation would be much more complex, but then I haven't tried. Having both copy on demand and views is difficult, both conceptually and implementationwise, but with copy-on-demand, views become less important. > Copy-on-demand doesn't really fit with python's 'assignments are > references" approach to things though does it? Using foo = bar in > Python and then changing an element of foo will also change bar. So, I That would be true as well with copy-on-demand arrays, as foo and bar would be the same object. Semantically, copy-on-demand would be equivalent to copying when slicing, which is exactly Python's behaviour for lists. > So, how about add.reduce() keep axis=0 to match the behavior of Python, > but sum() and friends defaulted to axis=-1 to match the rest of the That sounds like the most arbitrary inconsistency - add.reduce and sum are synonyms for me. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From ransom at physics.mcgill.ca Wed Jun 12 07:27:02 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Wed Jun 12 07:27:02 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> Message-ID: <20020612142600.GA28158@spock.physics.mcgill.ca> On Wed, Jun 12, 2002 at 10:32:12AM +0200, Konrad Hinsen wrote: > Scott Ransom writes: > > > On June 11, 2002 04:56 pm, you wrote: > > > One can make a case for allowing == and != for complex arrays, but > > > > just doesn't make sense and should not be allowed. > > > > It depends if you think of complex numbers in phasor form or not. In phasor > > form, the amplitude of the complex number is certainly something that you > > could compare with > or < -- and in my opinion, that seems like a reasonable > > Sure, but that doesn't give a full order relation for complex numbers. > Two different numbers with equal magnitude would be neither equal nor > would one be larger than the other. The comparison operators could be defined to operate on the magnitudes only. In this case you would get the kind of ugly result that two complex numbers with the same magnitude but different phases would be equal. Complex comparisons of this type could be quite useful to those (like me) who are do lots of Fourier domain signal processing. > I agree with Paul that complex comparison should not be allowed. On the > other hand, Perry's argument about sorting makes sense as well. Is there > anything that prevents us from permitting arraysort() on complex arrays > but not the comparison operators? How do you sort an array of complex numbers if you can't compare them? Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From hinsen at cnrs-orleans.fr Wed Jun 12 07:56:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 07:56:04 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <20020612142600.GA28158@spock.physics.mcgill.ca> (message from Scott Ransom on Wed, 12 Jun 2002 10:26:00 -0400) References: <001f01c2118a$64ce83d0$0c01a8c0@NICKLEBY> <20020612142600.GA28158@spock.physics.mcgill.ca> Message-ID: <200206121447.g5CElEZ13245@chinon.cnrs-orleans.fr> > The comparison operators could be defined to operate on the > magnitudes only. In this case you would get the kind of ugly > result that two complex numbers with the same magnitude but > different phases would be equal. If you want to compare magnitudes, you can do that explicitly without much effort. > How do you sort an array of complex numbers if you can't compare them? You could for example sort by real part first and by imaginary part second. That would be a well-defined sort order, but not a useful definition of comparison in the mathematical sense. Konrad -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From a.schmolck at gmx.net Wed Jun 12 08:44:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 12 08:44:04 2002 Subject: FW: [Numpy-discussion] Bug: extremely misleading array behavior In-Reply-To: <003201c211d1$d673de30$6b01a8c0@ericlaptop> References: <003201c211d1$d673de30$6b01a8c0@ericlaptop> Message-ID: "eric jones" writes: > > Couldn't one have both consistency *and* efficiency by implementing a > > copy-on-demand scheme (which is what matlab does, if I'm not entirely > > mistaken; a real copy gets only created if either the original or the > > 'copy' > > is modified)? > > Well, slices creating copies is definitely a bad idea (which is what I > have heard proposed before) -- finite difference calculations (and > others) would be very slow with this approach. Your copy-on-demand > suggestion might work though. Its implementation would be more complex, > but I don't think it would require cooperation from the Python core.? > It could be handled in the ufunc code. It would also require extension > modules to make copies before they modified any values. > > Copy-on-demand doesn't really fit with python's 'assignments are > references" approach to things though does it? Using foo = bar in > Python and then changing an element of foo will also change bar. So, I My suggestion wouldn't conflict with any standard python behavior -- indeed the main motivation would be to have numarray conform to standard python behavior -- ``foo = bar`` and ``foo = bar[20:30]`` would behave exactly as for other sequences in python. The first one creates an alias to bar and in the second one the indexing operation creates a copy of part of the sequence which is then aliased to foo. Sequences are atomic in python, in the sense that indexing them creates a new object, which I think is not in contradiction to python's nice and consistent 'assignments are references' behavior. > guess there would have to be a distinction made here. This adds a > little more complexity. > > Personally, I like being able to pass views around because it allows for > efficient implementations. The option to pass arrays into extension > function and edit them in-place is very nice. Copy-on-demand might > allow for equal efficiency -- I'm not sure. I don't know how much of a performance drawback copy-on-demand would have when compared to views one -- I'd suspect it would be not significant, the fact that the runtime behavior becomes a bit more difficult to predict might be more of a drawback (but then I haven't heard matlab users complain and one could always force an eager copy). Another reason why I think a copy-on-demand scheme for slicing operations might be attractive is that I'd suspect one could gain significant benefits from doing other operations in a lazy fashion (plus optionally caching some results), too (transposing seems to cause in principle unnecessary copies at least in some cases at the moment). > > I haven't found the current behavior very problematic in practice and > haven't seen that it as a major stumbling block to new users. I'm happy >From my experience not even all people who use Numeric quite a lot are *aware* that the slicing behavior differs from python sequences. You might be right that in practice aliasing doesn't cause too many problems (as long as one sticks to arrays -- it certainly makes it harder to write code that operates on slices of generic sequence types) -- I'd really be interested to know whether there are cases where people have spent a long time to track down a bug caused by the view behavior. > with status quo on this. But, if copy-on-demand is truly efficient and > didn't make extension writing a nightmare, I wouldn't complain about the > change either. I have a feeling the implementers of numarray would > though. :-) And talk about having to modify legacy code... Since the vast majorities of slicing operations are currently not done to create views that are depedently modified, the backward incompatibility might not affect that much code. You are right though, that if Perry and the other numarray implementors don't think that copy-on-demand could be worthwhile the bother then its unlikely to happen. > > > forwards-compatibility). I would also suspect that this would make it > *a > > lot* > > easier to get numarray (or parts of it) into the core, but this is > just a > > guess. > > I think the two things Guido wants for inclusion of numarray is a > consensus from our community on what we want, and (more importantly) a > comprehensible code base. :-) If Numeric satisfied this 2nd condition, > it might already be slated for inclusion... The 1st is never easy with > such varied opinions -- I've about concluded that Konrad and I are > anti-particles :-) -- but I hope it will happen. As I said I can only guess about the politics involved, but I would think that before a significant piece of code such as numarray is incorporated into the core a relevant pep will be discussed in the newsgroup and that many people will feel more confortable about incorporating something into core-python that doesn't deviate significantly from standard behavior (i.e. doesn't view-slice), especially if it mainly caters to a rather specialized audience. But Guido obviously has the last word on those issues and if he doesn't have a problem either way than either way then as long as the community is undivided it shouldn't be an obstacle for inclusion. I agree that division of the community might pose the most significant problems -- MA for example *does* create copies on indexing if I'm not mistaken and the (desirable) transition process from Numeric to numarray also poses not insignificant difficulties and risks, especially since there now are quite a few important projects (not least of them scipy) that are build on top of Numeric and will have to be incorporated in the transition if numarray is to take over. Everything seems in a bit of a limbo right now. I'm currently working on a (fully-featured) matrix class that I'd like to work with both Numeric and numarray (and also scipy where available) more or less transparently for the user, which turns out to be much more difficult than I would have thought. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From paul at pfdubois.com Wed Jun 12 08:45:09 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Wed Jun 12 08:45:09 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <20020612142600.GA28158@spock.physics.mcgill.ca> Message-ID: <000101c21227$ea8b08c0$0c01a8c0@NICKLEBY> Using the term "comparison operators" is too loose and is causing a communication problem here. There are these comparison operators == and != (group 1) <, >, <=, and >= (group 2) For complex numbers it is easy to define the operators in group 1: x == y iff x.real == y.real and x.imag == y.imag. And, x != y iff (not x == y). I hardly think any other definition would be conceivable. The utility of this definition is questionable, as in most instances one should be making these comparisons with a tolerance, but there at least are cases when it makes sense. For group 2, there are a variety of possible definitions. Just to name three possible > definitions, the greater magnitude, the greater phase mod 2pi, or a radix-type order e.g., x > y if x.real > y.real or (x.real == y.real and x.imag > y.imag). A person can always define a function my_greater_than (c1, c2) to embody one of these definitions, and use it as an argument to a sort routine that takes a function argument to tell it how to sort. What you are arguing about is whether some particular version of this comparison should be "blessed" by attaching it to the operator ">". I do not think one of the definitions is such a clear winner that it should be blessed -- it would mean a casual reader could not guess what the operator means, and ">" does not have a doc string. Therefore I oppose doing so. From pearu at cens.ioc.ee Wed Jun 12 08:55:03 2002 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Wed Jun 12 08:55:03 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <200206121447.g5CElEZ13245@chinon.cnrs-orleans.fr> Message-ID: On Wed, 12 Jun 2002, Konrad Hinsen wrote: > > How do you sort an array of complex numbers if you can't compare them? > > You could for example sort by real part first and by imaginary part > second. That would be a well-defined sort order, but not a useful > definition of comparison in the mathematical sense. Releated discussion has been also in the scipy list. See the thread starting in http://www.scipy.org/site_content/mailman?fn=scipy-dev/2002-February/000364.html But here I would like to draw your attention to the suggestion that sort() function could take an optional argument that specifies the comparison method for complex numbers (for real numbers they are all equivalent). Here follows the releavant fragment of the message: http://www.scipy.org/site_content/mailman?fn=scipy-dev/2002-February/000366.html ... However, in different applications different conventions may be useful or reasonable for ordering complex numbers. Whatever is the convention, their mathematical correctness is irrelevant and this cannot be used as an argument for prefering one convention to another. I would propose providing number of efficient comparison methods for complex (or any) numbers that users may use in sort functions as an optional argument. For example, scipy.sort([2,1+2j],cmpmth='abs') -> [1+2j,2] # sorts by abs value scipy.sort([2,1+2j],cmpmth='real') -> [2,1+2j] # sorts by real part scipy.sort([2,1+2j],cmpmth='realimag') # sorts by real then by imag scipy.sort([2,1+2j],cmpmth='imagreal') # sorts by imag then by real scipy.sort([2,1+2j],cmpmth='absangle') # sorts by abs then by angle etc. scipy.sort([2,1+2j],cmpfunc=) Note that scipy.sort([-1,1],cmpmth='absangle') -> [1,-1] which also demonstrates the arbitrariness of sorting complex numbers. ... Regards, Pearu From Barrett at stsci.edu Wed Jun 12 08:55:05 2002 From: Barrett at stsci.edu (Paul Barrett) Date: Wed Jun 12 08:55:05 2002 Subject: [Numpy-discussion] RE: default axis for numarray References: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> Message-ID: <3D076EA9.4090209@STScI.Edu> eric jones wrote: > > I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with > consistency in a very utilized area of Python because of efficiency. I think consistency is an issue, particularly for novices. You cite the issue of slices creating views instead of copies as being the correct choice. But this decision is based solely on the perception that views are 'inherently' more efficient than copies and not on reasons of consistency or usability. I (a seasoned user) find view behavior to be annoying and have been caught out on this several times. For example, reversing in-place the elements of any array using slices, i.e. A = A[::-1], will give the wrong answer, unless you explicitly make a copy before doing the assignment. Whereas, copy behavior will do the right thing. I suggest that many novices will be caught out by this and similar examples, as I have been. Copy behavior for slices can be just as efficient as view behavior, if implemented as copy-on-write. The beauty of Python is that it allows the developer to spend much more time on consistency and usability issues than on implementation issues. Sadly, I think much of Numeric development is based solely on implementation issues to the detriment of consistency and usability. I don't have enough experience to definitely say whether axis=0 should be preferred over axis=-1 or vice versa. But is does appear that for the most general cases axis=0 is probably preferred. This is the default for the APL and J programming of which Numeric is based. Should we not continue to follow their lead? It might be nice to see a list of examples where axis=0 is the preferred default and the same for axis=-1. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From rlw at stsci.edu Wed Jun 12 09:27:03 2002 From: rlw at stsci.edu (Rick White) Date: Wed Jun 12 09:27:03 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: Here is what I see as the fundamental problem with implementing slicing in numarray using copy-on-demand instead views. Copy-on-demand requires the maintenance of a global list of all the active views associated with a particular array buffer. Here is a simple example: >>> a = zeros((5000,5000)) >>> b = a[49:51,50] >>> c = a[51:53,50] >>> a[50,50] = 1 The assignment to a[50,50] must trigger a copy of the array b; otherwise b also changes. On the other hand, array c does not need to be copied since its view does not include element 50,50. You could instead copy the array a -- but that means copying a 100 Mbyte array while leaving the original around (since b and c are still using it) -- not a good idea! The bookkeeping can get pretty messy (if you care about memory usage, which we definitely do). Consider this case: >>> a = zeros((5000,5000)) >>> b = a[0:-10,0:-10] >>> c = a[49:51,50] >>> del a >>> b[50,50] = 1 Now what happens? Either we can copy the array for b (which means two copies of the huge (5000,5000) array exist, one used by c and the new version used by b), or we can be clever and copy c instead. Even keeping track of the views associated with a buffer doesn't solve the problem of an array that is passed to a C extension and is modified in place. It would seem that passing an array into a C extension would always require all the associated views to be turned into copies. Otherwise we can't guarantee that views won't be modifed. This kind of state information with side effects leads to a system that is hard to develop, hard to debug, and really messes up the behavior of the program (IMHO). It is *highly* desirable to avoid it if possible. This is not to deny that copy-on-demand (with explicit views available on request) would have some desirable advantages for the behavior of the system. But we've worried these issues to death, and in the end were convinced that slices == views provided the best compromise between the desired behavior and a clean implementation. Rick ------------------------------------------------------------------ Richard L. White rlw at stsci.edu http://sundog.stsci.edu/rick/ Space Telescope Science Institute Baltimore, MD From btang at pacific.jpl.nasa.gov Wed Jun 12 09:35:05 2002 From: btang at pacific.jpl.nasa.gov (Benyang Tang) Date: Wed Jun 12 09:35:05 2002 Subject: [Numpy-discussion] Why the upcasting? Message-ID: <3D0778D8.D11B9E6@pacific.jpl.nasa.gov> The sum of an Int32 array and a Float32 array is a Float64 array, as shown by the following code: a = Numeric.array([1,2,3,4],'i') a.typecode(), a.itemsize() b = Numeric.array([1,2,3,4],'f') b.typecode(), b.itemsize() c=a+b c.typecode(), c.itemsize() >>> a = Numeric.array([1,2,3,4],'i') >>> a.typecode(), a.itemsize() ('i', 4) >>> >>> b = Numeric.array([1,2,3,4],'f') >>> b.typecode(), b.itemsize() ('f', 4) >>> c=a+b >>> c.typecode(), c.itemsize() ('d', 8) Why is the upcasting? I am using Linux/Pentium/python2.1/numpy20 . Thanks. Benyang Tang From perry at stsci.edu Wed Jun 12 09:45:06 2002 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jun 12 09:45:06 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: : > This kind of state information with side effects leads to a system that > is hard to develop, hard to debug, and really messes up the behavior of > the program (IMHO). It is *highly* desirable to avoid it if possible. > Rick beat me to the punch. The requirement for copy-on-demand definitely leads to a far more complex implementation with much more potential for misunderstood memory usage. You could do one small thing and suddenly force a spate of copies (perhaps cascading). There is no way we would taken on a redesign of Numeric with this requirement with the resources we have available. > This is not to deny that copy-on-demand (with explicit views available > on request) would have some desirable advantages for the behavior of > the system. But we've worried these issues to death, and in the end > were convinced that slices == views provided the best compromise > between the desired behavior and a clean implementation. > Rick's explanation doesn't really address the other position which is slices should force immediate copies. This isn't a difficult implementation issue by itself. But it does raise some related implementation questions. Supposing one does feel that views are a feature one wants even though they are not the default, it turns out that it isn't all that simple to obtain views without sacrificing ordinary slicing syntax to obtain a view. It is simple to obtain copies of view slices though. Slicing views may not be important to everyone. It is important to us (and others) and we do see a number of situations where forcing copies to operate on array subsets would be a serious performance problem. We did discuss this issue with Guido and he did not indicate that having different behavior on slicing with arrays would be a show stopper for acceptance into the Standard Library. We are also aware that there is no great consensus on this issue (even internally at STScI :-). Perry Greenfield From cookedm at physics.mcmaster.ca Wed Jun 12 10:48:01 2002 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Jun 12 10:48:01 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> (Geza Groma's message of "Wed, 12 Jun 2002 08:27:57 +0200") References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> Message-ID: At some point, Geza Groma wrote: > Using Numeric-21.0.win32-py2.2 I found this: > > Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. >>>> from Numeric import * >>>> a = array((1, 1), 'b') >>>> b = array((1, 0), 'b') >>>> a and b > array([1, 0],'b') >>>> b and a > array([1, 1],'b') >>>> > > It looks like a bug, or at least very weird. a&b and b&a work correctly. Nope. From the Python language reference (5.10 Boolean operations): The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned. Since in your case both a and b are true (they aren't zero-length sequences, etc.), the last value will be returned. It works for other types too, of course: Python 2.1.3 (#1, May 23 2002, 09:00:41) [GCC 3.1 (Debian)] on linux2 Type "copyright", "credits" or "license" for more information. >>> a = 'This is a' >>> b = 'This is b' >>> a and b 'This is b' >>> b and a 'This is a' >>> -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke |cookedm at mcmaster.ca From hinsen at cnrs-orleans.fr Wed Jun 12 11:11:15 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Wed Jun 12 11:11:15 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <3D076EA9.4090209@STScI.Edu> References: <001e01c2116f$7af083e0$6b01a8c0@ericlaptop> <3D076EA9.4090209@STScI.Edu> Message-ID: Paul Barrett writes: > I think consistency is an issue, particularly for novices. You cite ... Finally a contribution that I can fully agree with :-) > I don't have enough experience to definitely say whether axis=0 should > be preferred over axis=-1 or vice versa. But is does appear that for > the most general cases axis=0 is probably preferred. This is the > default for the APL and J programming of which Numeric is based. > Should we not continue to follow their lead? It might be nice to see This the internal logic I referred to briefly earlier, but I didn't have the time to explain it in more detail. Now I have :-) The basic idea is that an array is seen as an array of array values. The N dimensions are split into two parts, the first N1 dimensions describe the shape of the "total" array, and the remaining N2=N-N1 dimensions describe the shape of the array-valued elements of the array. I suppose some examples will help: - A rank-1 array could be seen either as a vector of scalars (N1 = 1) or as a scalar containing a vector (N1 = 0), in practice there is no difference between these views. - A rank-2 array could be seen as a matrix (N1=2), as a vector of vectors (N1=1) or as a scalar containing a matrix (N1=0). The first and the last come down to the same, but the middle one doesn't. - A discretized vector field (i.e. one 3D vector value for each point on a 3D grid) is represented by a rank-6 array, with N1=3 and N2=3. Array operations are divided into two classes, "structural" and "element" operations. Element operations do something on each individual element of an array, returning a new array with the same "outer" shape, although the element shape may be different. Structural operations work on the outer shape, returning a new array with a possibly different outer shape but the same element shape. The most frequent element operations are addition, multiplication, etc., which work on scalar elements only. They need no axis argument at all. Element operations that work on rank-1 elements have a default axis of -1, I think FFT has been quoted as an example a few times. There are no element operations that work on higher-rank elements, but they are imaginable. A 2D FFT routine would default to axis=-2. Structural operations, which are by far the most frequent after scalar element operations, default to axis=0. They include reduction and accumulation, sorting, selection (take, repeat, ...) and some others. I hope this clarifies the choice of default axis arguments in the current NumPy. It is most definitely not arbitrary or accidental. If you follow the data layout principles explained above, you always never need to specify an explicit axis argument. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From reggie at merfinllc.com Wed Jun 12 11:56:05 2002 From: reggie at merfinllc.com (Reggie Dugard) Date: Wed Jun 12 11:56:05 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> Message-ID: <1023908127.25709.80.camel@auk> This is not, in fact, a bug although I've fallen prey to the same mistake myself. I'm assuming what you really wanted was to use logical_and: Python 2.2.1 (#1, Apr 29 2002, 15:21:53) [GCC 3.0.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> a = array((1,1), 'b') >>> b = array((1,0), 'b') >>> logical_and(a,b) array([1, 0],'b') >>> logical_and(b,a) array([1, 0],'b') >>> >From the python documentation: "The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned." So the "and" is just returning its second argument, since both arguments are considered "True" (containing at least 1 "True" element). On Tue, 2002-06-11 at 23:27, Geza Groma wrote: > Using Numeric-21.0.win32-py2.2 I found this: > > Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Numeric import * > >>> a = array((1, 1), 'b') > >>> b = array((1, 0), 'b') > >>> a and b > array([1, 0],'b') > >>> b and a > array([1, 1],'b') > >>> > > It looks like a bug, or at least very weird. a&b and b&a work correctly. > > -- > G?za Groma > Institute of Biophysics, > Biological Research Center of Hungarian Academy of Sciences > Temesv?ri krt.62. > 6726 Szeged > Hungary > phone: +36 62 432 232 > fax: +36 62 433 133 > > > > _______________________________________________________________ > > Sponsored by: > ThinkGeek at http://www.ThinkGeek.com/ > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion Reggie Dugard Merfin, LLC From oliphant.travis at ieee.org Wed Jun 12 12:03:21 2002 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Jun 12 12:03:21 2002 Subject: [Numpy-discussion] Complex comparisions In-Reply-To: <000101c21227$ea8b08c0$0c01a8c0@NICKLEBY> References: <000101c21227$ea8b08c0$0c01a8c0@NICKLEBY> Message-ID: <1023908597.21793.5.camel@travis> I'd be interested to know what IDL does? Does it compare complex numbers. Matlab allows comparisons of complex numbers but just compares the real part. I think this is reasonable. Often during a calculation of limited precision one ends up with a complex number when the result is in a "mathematically pure sense" real. I guess I trust the user to realize that if they are comparing numbers they know what they mean --- (only real numbers are compared so the complex part is ignored). -Travis From rlw at stsci.edu Wed Jun 12 13:25:02 2002 From: rlw at stsci.edu (Rick White) Date: Wed Jun 12 13:25:02 2002 Subject: [Numpy-discussion] Complex comparisions In-Reply-To: <1023908597.21793.5.camel@travis> Message-ID: On 12 Jun 2002, Travis Oliphant wrote: > I'd be interested to know what IDL does? Does it compare complex > numbers. Well, that was an interesting question with a surprising answer (at least to me, a long-time IDL user): (1) IDL allows comparisons of complex number using equality and inequality, but attempts to compare using GT, LT, etc. cause an illegal exception. (2) IDL sorts complex numbers by the amplitude. It ignores the phase. Numbers with the same amplitude and different phases are randomly ordered depending on their positions in the original array. > Matlab allows comparisons of complex numbers but just compares the real > part. I think this is reasonable. Often during a calculation of > limited precision one ends up with a complex number when the result is > in a "mathematically pure sense" real. So neither IDL nor Matlab has what I consider the desirable feature that the sort order be unique at least to the extent that equal values wind up next to each other in the sorted array. (Sorting by real value and then, for equal real values, by imaginary value would accomplish that.) Since complex numbers can't be fully ordered there is no single comparison function that can be plugged into a standard sort algorithm and give that result -- it would require a special complex sort algorithm. I guess if neither of the major array processing systems (that I know about) have this property in their complex sorts, it must not be *that* important. And since I've been using IDL for 13 years without discovering that complex greater-than comparisons are illegal, I guess that must not be an important property either (at least to me :-). My conclusion now is similar to Paul Dubois's suggestion -- we should allow equality comparisons and sorting. Beyond that I guess whatever other people want should carry the day, since it clearly doesn't matter to the sorts of things that I do with Numeric! Rick From Chris.Barker at noaa.gov Wed Jun 12 13:29:02 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Jun 12 13:29:02 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> <1023908127.25709.80.camel@auk> Message-ID: <3D07AE26.8C3D2829@noaa.gov> Reggie Dugard wrote: > This is not, in fact, a bug although I've fallen prey to the same > mistake myself. I'm assuming what you really wanted was to use > logical_and: > So the "and" is just returning its second argument, since both arguments > are considered "True" (containing at least 1 "True" element). I imagine there is a compelling reason that "and" and "or" have not been overridden like the comparison operators, but it sure would be nice! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tim.hochberg at ieee.org Wed Jun 12 13:38:24 2002 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Wed Jun 12 13:38:24 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> <1023908127.25709.80.camel@auk> <3D07AE26.8C3D2829@noaa.gov> Message-ID: <013801c21250$ea7bf0f0$061a6244@cx781526b> From: "Chris Barker" > I imagine there is a compelling reason that "and" and "or" have not been > overridden like the comparison operators, but it sure would be nice! Because it's not possible? "and" and "or" operate on the basis of the truth of their arguments, so the only way you can affect them is to overide __nonzero__. Since this is a unary operation, there is no way to get the equivalent of logical_and out of it. In practice I haven't found this to be much of a problem. Nearly every time I need to and two arrays together, "&" works just as well as logical_and. I can certainly imagin ecases where this isn't true, I just haven't run into them in practice. -tim From paul at pfdubois.com Wed Jun 12 15:43:01 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Wed Jun 12 15:43:01 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <3D076EA9.4090209@STScI.Edu> Message-ID: <000101c21262$6cab3610$0c01a8c0@NICKLEBY> The users of Numeric at PCMDI found the 'view' semantics so annoying that they insisted their CS staff write a separate version of Numeric just to avoid it. We have since gotten out of that mess but that is the reason MA has copy semantics. Again, this is another issue where one is fighting over the right to 'own' the operator notation. I believe that copy semantics should win this one because it is a **proven fact** that scientists trip over it, and it is consistent with Python list semantics. People who really need view semantics could get it as previously suggested by someone, with something like x.sub[10:12, :]. There are now dead horses all over the landscape, and I for one am going to shut up. > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net] On > Behalf Of Paul Barrett > Sent: Wednesday, June 12, 2002 8:54 AM > To: numpy-discussion > Subject: Re: [Numpy-discussion] RE: default axis for numarray > > > eric jones wrote: > > > > > I think the consistency with Python is less of an issue > than it seems. > > I wasn't aware that add.reduce(x) would generated the same > results as > > the Python version of reduce(add,x) until Perry pointed it > out to me. > > There are some inconsistencies between Python the language > and Numeric > > because the needs of the Numeric community. For instance, slices > > create views instead of copies as in Python. This was a > correct break > > with consistency in a very utilized area of Python because > of efficiency. > > > > I think consistency is an issue, particularly for novices. > You cite the issue > of slices creating views instead of copies as being the > correct choice. But > this decision is based solely on the perception that views > are 'inherently' more > efficient than copies and not on reasons of consistency or > usability. I (a > seasoned user) find view behavior to be annoying and have > been caught out on > this several times. For example, reversing in-place the > elements of any array > using slices, i.e. A = A[::-1], will give the wrong answer, > unless you > explicitly make a copy before doing the assignment. Whereas, > copy behavior will > do the right thing. I suggest that many novices will be > caught out by this and > similar examples, as I have been. Copy behavior for slices > can be just as > efficient as view behavior, if implemented as copy-on-write. > > The beauty of Python is that it allows the developer to spend > much more time on > consistency and usability issues than on implementation > issues. Sadly, I think > much of Numeric development is based solely on implementation > issues to the > detriment of consistency and usability. > > I don't have enough experience to definitely say whether > axis=0 should be > preferred over axis=-1 or vice versa. But is does appear that > for the most > general cases axis=0 is probably preferred. This is the > default for the APL and > J programming of which Numeric is based. Should we not > continue to follow their > lead? It might be nice to see a list of examples where > axis=0 is the preferred > default and the same for axis=-1. > > > > > -- > Paul Barrett, PhD Space Telescope Science Institute > Phone: 410-338-4475 ESS/Science Software Group > FAX: 410-338-4767 Baltimore, MD 21218 > > > _______________________________________________________________ > > Sponsored by: > ThinkGeek at http://www.ThinkGeek.com/ > _______________________________________________ > Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From a.schmolck at gmx.net Wed Jun 12 15:51:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 12 15:51:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: Rick White writes: > Here is what I see as the fundamental problem with implementing slicing > in numarray using copy-on-demand instead views. > > Copy-on-demand requires the maintenance of a global list of all the > active views associated with a particular array buffer. Here is a > simple example: > > >>> a = zeros((5000,5000)) > >>> b = a[49:51,50] > >>> c = a[51:53,50] > >>> a[50,50] = 1 > > The assignment to a[50,50] must trigger a copy of the array b; > otherwise b also changes. On the other hand, array c does not need to > be copied since its view does not include element 50,50. You could > instead copy the array a -- but that means copying a 100 Mbyte array > while leaving the original around (since b and c are still using it) -- > not a good idea! Sure, if one wants do perform only the *minimum* amount of copying, things can get rather tricky, but wouldn't it be satisfactory for most cases if attempted modification of the original triggered the delayed copying of the "views" (lazy copies)? In those cases were it isn't satisfactory the user could still explicitly create real (i.e. alias-only) views. > > The bookkeeping can get pretty messy (if you care about memory usage, > which we definitely do). Consider this case: > > >>> a = zeros((5000,5000)) > >>> b = a[0:-10,0:-10] > >>> c = a[49:51,50] > >>> del a > >>> b[50,50] = 1 > > Now what happens? Either we can copy the array for b (which means two ``b`` and ``c`` are copied and then ``a`` is deleted. What does numarray currently keep of a if I do something like the above or: >>> b = a.flat[::-10000] >>> del a ? > copies of the huge (5000,5000) array exist, one used by c and the new > version used by b), or we can be clever and copy c instead. > > Even keeping track of the views associated with a buffer doesn't solve > the problem of an array that is passed to a C extension and is modified > in place. It would seem that passing an array into a C extension would > always require all the associated views to be turned into copies. > Otherwise we can't guarantee that views won't be modifed. Yes -- but only if the C extension is destructive. In that case the user might well be making a mistake in current Numeric if he has views and doesn't want them to be modified by the operation (of course he might know that the inplace operation does not affect the view(s) -- but wouldn't such cases be rather rare?). If he *does* want the views to be modified, he would obviously have to explictly specify them as such in a copy-on-demand scheme and in the other case he has been most likely been prevented from making an error (and can still explicitly use real views if he knows that the inplace operation on the original will not have undesired effects on the "views"). > > This kind of state information with side effects leads to a system that > is hard to develop, hard to debug, and really messes up the behavior of > the program (IMHO). It is *highly* desirable to avoid it if possible. Sure, copy-on-demand is an optimization and optmizations always mess up things. On the other hand, some optimizations also make "nicer" (e.g. less error-prone) semantics computationally viable, so it's often a question between ease and clarity of the implementation vs. ease and clarity of code that uses it. I'm not denying that too much complexity in the implementation also aversely affects users in the form of bugs and that in the particular case of delayed copying the user can also be affected directly by more difficult to understand ressource usage behavior (e.g. a[0] = 1 triggering a monstrous copying operation). Just out of curiosity, has someone already asked the octave people how much trouble it has caused them to implement copy on demand and whether matlab/octave users in practice do experience difficulties because of the more harder to predict runtime behavior (I think, like matlab, octave does copy-on-demand)? > > This is not to deny that copy-on-demand (with explicit views available > on request) would have some desirable advantages for the behavior of > the system. But we've worried these issues to death, and in the end > were convinced that slices == views provided the best compromise > between the desired behavior and a clean implementation. If the implementing copy-on-demand is too difficult and the resulting code would be too messy then this is certainly a valid reason to compromise on the current slicing behavior (especially since people like me who'd like to see copy-on-demand are unlikely to volunteer to implement it :) > Rick > > ------------------------------------------------------------------ > Richard L. White rlw at stsci.edu http://sundog.stsci.edu/rick/ > Space Telescope Science Institute > Baltimore, MD > > alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From a.schmolck at gmx.net Wed Jun 12 15:51:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 12 15:51:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > : > > > This kind of state information with side effects leads to a system that > > is hard to develop, hard to debug, and really messes up the behavior of > > the program (IMHO). It is *highly* desirable to avoid it if possible. > > > Rick beat me to the punch. The requirement for copy-on-demand > definitely leads to a far more complex implementation with > much more potential for misunderstood memory usage. You could > do one small thing and suddenly force a spate of copies (perhaps > cascading). There is no way we would taken on a redesign of Yes, but I would suspect that cases were a little innocuous a[0] = 3 triggers excessive processing should be rather unusual (matlab or octave users will know). > Numeric with this requirement with the resources we have available. Fair enough -- if implementing copy-on-demand is too much work then we'll have to live without it (especially if view-slicing doesn't stand in the way of a future inclusion into the python core). I guess the best reason to bite the bullet and carry around state information would be if there were significant other cases where one also would want to optimize operations under the hood. If there isn't much else in this direction then the effort involved might not be justified. One thing that bugs me in Numeric (and that might already have been solved in numarray) is that e.g. ``ravel`` (and I think also ``transpose``) creates unnecessary copies, whereas ``.flat`` doesn't, but won't work in all cases (viz. when the array is non-contiguous), so I can either have ugly or inefficient code. > > > This is not to deny that copy-on-demand (with explicit views available > > on request) would have some desirable advantages for the behavior of > > the system. But we've worried these issues to death, and in the end > > were convinced that slices == views provided the best compromise > > between the desired behavior and a clean implementation. > > > Rick's explanation doesn't really address the other position which > is slices should force immediate copies. This isn't a difficult > implementation issue by itself. But it does raise some related > implementation questions. Supposing one does feel that views are > a feature one wants even though they are not the default, it turns > out that it isn't all that simple to obtain views without sacrificing > ordinary slicing syntax to obtain a view. It is simple to obtain > copies of view slices though. I'm not sure I understand the above. What is the problem with ``a.view[1:3]`` (or``a.view()[1:3])? > > Slicing views may not be important to everyone. It is important > to us (and others) and we do see a number of situations where > forcing copies to operate on array subsets would be a serious > performance problem. We did discuss this issue with Guido and Sure, no one denies that even if with copy-on-demand (explicitly) aliased views would still be useful. > he did not indicate that having different behavior on slicing > with arrays would be a show stopper for acceptance into the > Standard Library. We are also aware that there is no great > consensus on this issue (even internally at STScI :-). > Yep, I just saw Paul Barrett's post :) > Perry Greenfield > > alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From Chris.Barker at noaa.gov Wed Jun 12 16:21:04 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Jun 12 16:21:04 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? References: <3D06E9EC.7E7E7969@nucleus.szbk.u-szeged.hu> <1023908127.25709.80.camel@auk> <3D07AE26.8C3D2829@noaa.gov> <013801c21250$ea7bf0f0$061a6244@cx781526b> Message-ID: <3D07D685.5A0E6B5D@noaa.gov> Tim Hochberg wrote: > > I imagine there is a compelling reason that "and" and "or" have not been > > overridden like the comparison operators, but it sure would be nice! > > Because it's not possible? Well, yes, but it wasn't possible with <,>,== and friends untill rich comparisons were added in Python 2.1. So I am still wondering why the same extension wasn't made to "and" and "or". In fact, given that Guido is adding a bool type, this may be a time to re-visit the question, unless there really is a compelling reason not to, which is quite likely. > In practice I haven't found this to be much of a problem. Nearly every time > I need to and two arrays together, "&" works just as well as logical_and. This has always worked for me, as well, so maybe the answer is that there is no compelling reason to make a change. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From xscottg at yahoo.com Wed Jun 12 16:52:06 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Wed Jun 12 16:52:06 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <3D07D685.5A0E6B5D@noaa.gov> Message-ID: <20020612235115.50726.qmail@web12903.mail.yahoo.com> --- Chris Barker wrote: > > Well, yes, but it wasn't possible with <,>,== and friends untill rich > comparisons were added in Python 2.1. So I am still wondering why the > same extension wasn't made to "and" and "or". In fact, given that Guido > is adding a bool type, this may be a time to re-visit the question, > unless there really is a compelling reason not to, which is quite > likely. > The "and" and "or" operators do short circuit evaluation. So in addition to acting like boolean operations, they are also control flow. For "and", the second expression is not evaluated if the first one is false. For "or", the second expression is not evaluated if the first one is true. I'm not clever enough to figure out how an overloaded and/or operator could implement control flow for the outer expressions. The outer expressions "self" and "other" would already be evaluated by the time your __operator__(self, other) function was called. C++ has overloadable && and || operators, but overloading them is frowned on by many. C++ has the advantage over Python in that it knows the actual types at compile time. __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com From bsder at mail.allcaps.org Wed Jun 12 23:12:02 2002 From: bsder at mail.allcaps.org (Andrew P. Lentvorski) Date: Wed Jun 12 23:12:02 2002 Subject: [Numpy-discussion] (a and b) != (b and a) ? In-Reply-To: <20020612235115.50726.qmail@web12903.mail.yahoo.com> Message-ID: <20020612194252.N31527-100000@mail.allcaps.org> On Wed, 12 Jun 2002, Scott Gilbert wrote: > C++ has overloadable && and || operators, but overloading them is frowned > on by many. C++ has the advantage over Python in that it knows the actual > types at compile time. Actually, overloading && and || isn't just frowned upon in C++, it's effectively banned. The reason is that it replaces short-circuit semantics with function call semantics and screws up the standard idioms (if ((a != NULL) && (*a == "a")) { ... } ). See "Effective C++" by Scott Meyers. As far as I know, *none* of the C++ literati hold the opposing view. -a From perry at stsci.edu Thu Jun 13 13:23:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 13:23:04 2002 Subject: [Numpy-discussion] RE: default axis for numarray In-Reply-To: <000101c21262$6cab3610$0c01a8c0@NICKLEBY> Message-ID: : > There are now dead horses all over the landscape, and I for one am going > to shut up. > Not enough dead horses for me :-). But seriously, I would like to hear from others about this issue (I already knew what Paul, Paul, Eric, Travis and Konrad felt about this before it started up). You can either post to the mailing list or email directly if you are the shy, retiring type. Perry From perry at stsci.edu Thu Jun 13 13:40:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 13:40:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: > I guess the best reason to bite the bullet and carry around state > information > would be if there were significant other cases where one also > would want to > optimize operations under the hood. If there isn't much else in > this direction > then the effort involved might not be justified. One thing that bugs me in > Numeric (and that might already have been solved in numarray) is that > e.g. ``ravel`` (and I think also ``transpose``) creates > unnecessary copies, > whereas ``.flat`` doesn't, but won't work in all cases (viz. when > the array is > non-contiguous), so I can either have ugly or inefficient code. > I guess that depends on what you mean by unnecessary copies. If the array is non-contiguous what would you have it do? > > a feature one wants even though they are not the default, it turns > > out that it isn't all that simple to obtain views without sacrificing > > ordinary slicing syntax to obtain a view. It is simple to obtain > > copies of view slices though. > > I'm not sure I understand the above. What is the problem with > ``a.view[1:3]`` > (or``a.view()[1:3])? > I didn't mean to imply it wasn't possible, but that it was not quite as clean. The thing I don't like about this approach (or Paul's suggestion of a.sub) is the creation of an odd object that has as its only purpose being sliced. (Even worse, in my opinion, is making it a different kind of array where slicing behaves differently. That will lead to the problem we have discussed for other kinds of array behavior, namely, how do you keep from being confused about a particular array's slicing behavior). That could lead to confusion as well. Many may be under the impression that x = a.view makes x refer to an array when it doesn't. Users would need to know that a.view without a '[' is usually an error. Sure it's not hard to implement. But I don't view it as that clean a solution. On the other hand, a[1:3].copy() (or alternatively, a[1:3].copy) is another array just like any other. > Perry From perry at stsci.edu Thu Jun 13 14:17:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 14:17:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: > > Copy-on-demand requires the maintenance of a global list of all the > > active views associated with a particular array buffer. Here is a > > simple example: > > > > >>> a = zeros((5000,5000)) > > >>> b = a[49:51,50] > > >>> c = a[51:53,50] > > >>> a[50,50] = 1 > > > > The assignment to a[50,50] must trigger a copy of the array b; > > otherwise b also changes. On the other hand, array c does not need to > > be copied since its view does not include element 50,50. You could > > instead copy the array a -- but that means copying a 100 Mbyte array > > while leaving the original around (since b and c are still using it) -- > > not a good idea! > > Sure, if one wants do perform only the *minimum* amount of > copying, things can > get rather tricky, but wouldn't it be satisfactory for most cases > if attempted > modification of the original triggered the delayed copying of the "views" > (lazy copies)? In those cases were it isn't satisfactory the > user could still > explicitly create real (i.e. alias-only) views. > I'm not sure what you mean. Are you saying that if anything in the buffer changes, force all views of the buffer to generate copies (rather than try to determine if the change affected only selected views)? If so, yes, it is easier, but it still is a non-trivial capability to implement. > > > > The bookkeeping can get pretty messy (if you care about memory usage, > > which we definitely do). Consider this case: > > > > >>> a = zeros((5000,5000)) > > >>> b = a[0:-10,0:-10] > > >>> c = a[49:51,50] > > >>> del a > > >>> b[50,50] = 1 > > > > Now what happens? Either we can copy the array for b (which means two > > ``b`` and ``c`` are copied and then ``a`` is deleted. > > What does numarray currently keep of a if I do something like the > above or: > > >>> b = a.flat[::-10000] > >>> del a > > ? > The whole buffer remains in both cases. > > copies of the huge (5000,5000) array exist, one used by c and the new > > version used by b), or we can be clever and copy c instead. > > > > Even keeping track of the views associated with a buffer doesn't solve > > the problem of an array that is passed to a C extension and is modified > > in place. It would seem that passing an array into a C extension would > > always require all the associated views to be turned into copies. > > Otherwise we can't guarantee that views won't be modifed. > > Yes -- but only if the C extension is destructive. In that case > the user might > well be making a mistake in current Numeric if he has views and > doesn't want > them to be modified by the operation (of course he might know > that the inplace > operation does not affect the view(s) -- but wouldn't such cases be rather > rare?). If he *does* want the views to be modified, he would > obviously have to > explictly specify them as such in a copy-on-demand scheme and in the other > case he has been most likely been prevented from making an error (and can > still explicitly use real views if he knows that the inplace > operation on the > original will not have undesired effects on the "views"). > If the point is that views are susceptible to unexpected changes made in place by a C extension, yes, certainly (just as they are for changes made in place in Python). But I'm not sure what that has to do with the implied copy (even if delayed) being broken by extensions written in C. Promising a copy, and not honoring it is not the same as not promising it in the first place. But I may be misunderstanding your point. Perry From perry at stsci.edu Thu Jun 13 14:52:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jun 13 14:52:03 2002 Subject: [Numpy-discussion] Some initial thoughts about the past week's discussions Message-ID: Impressions so far on various issues raised regarding numarray interfaces 1) We are mostly persuaded that rank-0 arrays are the way to go. We will pursue the issue of whether it is possible to have Python accept these as indices for sequence objects with python-dev. 2) We are still mulling over the axis order issue. Regardless of which convention we choose, we are almost certainly going to make it consistent (always the same axis as default). A compatibility module will be provided to replicate Numeric defaults. 3) repr. Finally, a consensus! Even unanimity. 4) Complex comparisons. Implement equality, non-equality, predictable sorting. Make >,<,>=,<= illegal. 5) Copy vs view. Open to more input (but no delayed copying or such). Perry From a.schmolck at gmx.net Thu Jun 13 17:36:05 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Thu Jun 13 17:36:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > I'm not sure what you mean. Are you saying that if anything in the > buffer changes, force all views of the buffer to generate copies > (rather than try to determine if the change affected only selected Yes (I suspect that this will be be sufficient in practice). > views)? If so, yes, it is easier, but it still is a non-trivial > capability to implement. Sure. But since copy-on-demand is only an optimization and as such doesn't affect the semantics, it could also be implemented at a later point if the resources are currently not available. I have little doubt that someone will eventually add copy-on-demand, if the option is kept open and in the meantime one could still get all the performance (and alias behavior) of the current implementation by explicitly using ``.view`` (or ``.sub`` if you prefer) to create aliases. I'm becoming increasingly convinced (see below) that copy-slicing-semantics are much to be preferred as the default, so given the above I don't think that performance concerns should sway one towards alias-slicing, if enough people feel that copy semantics as such are preferable. > > > > > > The bookkeeping can get pretty messy (if you care about memory usage, > > > which we definitely do). Consider this case: > > > > > > >>> a = zeros((5000,5000)) > > > >>> b = a[0:-10,0:-10] > > > >>> c = a[49:51,50] > > > >>> del a > > > >>> b[50,50] = 1 > > > > > > Now what happens? Either we can copy the array for b (which means two > > > > ``b`` and ``c`` are copied and then ``a`` is deleted. > > > > What does numarray currently keep of a if I do something like the > > above or: > > > > >>> b = a.flat[::-10000] > > >>> del a > > > > ? > > > The whole buffer remains in both cases. OK, so this is then a nice example where even eager copy slicing behavior would be *significantly* more efficient than the current aliasing behavior -- so copy-on-demand would then on the whole seem to be not just nearly equally but *more* efficient than alias slicing. And as far as difficult to understand runtime behavior is concerned, the extra ~100MB useless baggage carried around by b (second case) are, I'd venture to suspect, less than obvious to the casual observer. In fact I remember one of my fellow phd-students having significant problems with mysterious memory consumption (a couple of arrays taking up more than 1GB rather than a few hundred MB) -- maybe something like the above was involved. That ``A = A[::-1]`` doesn't work (as pointed out by Paul Barrett) will also come as a surprise to most people. If I understand all this correctly, I consider it a rather strong case against alias slicing as default behavior. > > > Even keeping track of the views associated with a buffer doesn't solve > > > the problem of an array that is passed to a C extension and is modified > > > in place. It would seem that passing an array into a C extension would > > > always require all the associated views to be turned into copies. > > > Otherwise we can't guarantee that views won't be modifed. > > > > Yes -- but only if the C extension is destructive. In that case > > the user might > > well be making a mistake in current Numeric if he has views and > > doesn't want > > them to be modified by the operation (of course he might know > > that the inplace > > operation does not affect the view(s) -- but wouldn't such cases be rather > > rare?). If he *does* want the views to be modified, he would > > obviously have to > > explictly specify them as such in a copy-on-demand scheme and in the other > > case he has been most likely been prevented from making an error (and can > > still explicitly use real views if he knows that the inplace > > operation on the > > original will not have undesired effects on the "views"). > > > If the point is that views are susceptible to unexpected changes > made in place by a C extension, yes, certainly (just as they > are for changes made in place in Python). But I'm not sure what > that has to do with the implied copy (even if delayed) being > broken by extensions written in C. Promising a copy, and not > honoring it is not the same as not promising it in the first > place. But I may be misunderstanding your point. > OK, I'll try again, hopefully this is clearer. In a sentence: I don't see any problems with C extensions in particular that would arise from copy-on-demand (I might well be overlooking something, though). Rick was saying that passing an array to a C extension that performs an inplace operation on it means that all copies of all its (lazy) views must be performed. My point was that this is correct, but I can't see any problem with that, neither from the point of extension writer, nor from the point of performance nor from the point of the user, nor indeed from the point of the numarray implementors (obviously the copy-on-demand scheme *as such* will be an effort). All that is needed is a separate interface for (the minority of) C extensions that destructively modify their arguments (they only need to call some function `actualize_views(the_array_or_view)` or whatever at the start -- this function will obviously be necessary regardless of the C extensions). So nothing will break, the promises are kept and no extra work. It won't be any slower than what would happen with current Numeric, either, because either the (Numeric) user intended his (aliased) views to modified as well or it was a bug. If he intended the views to be modified, he would explicitly use alias-views under the new scheme and everything would behave exactly the same. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From a.schmolck at gmx.net Thu Jun 13 17:36:10 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Thu Jun 13 17:36:10 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > > I guess the best reason to bite the bullet and carry around state > > information > > would be if there were significant other cases where one also > > would want to > > optimize operations under the hood. If there isn't much else in > > this direction > > then the effort involved might not be justified. One thing that bugs me in > > Numeric (and that might already have been solved in numarray) is that > > e.g. ``ravel`` (and I think also ``transpose``) creates > > unnecessary copies, > > whereas ``.flat`` doesn't, but won't work in all cases (viz. when > > the array is > > non-contiguous), so I can either have ugly or inefficient code. > > > I guess that depends on what you mean by unnecessary copies. In most cases the array of which I desire a flattened representation is contiguous (plus, I usually don't intend to modify it). Consequently, in most cases I don't want to any copies of it to be created (especially not if it is really large -- which is not seldom the case). The fact that you can never really be sure whether you can actually use ``.flat``, without checking beforehand if the array is in fact contiguous (I don't think there are many guarantees about something being contiguous, or are there?) and that ravel will always work but has a huge overhead, suggests to me that something is not quite right. > If the array is non-contiguous what would you have it do? Simple -- in that case 'lazy ravel' would do the same as 'ravel' currently does, create a copy (or alternatively rearrange the memory representation to make it non-contiguous and then create a lazy copy, but I don't know whether this would be a good or even feasible idea). A lazy version of ravel would have the same semantics as ravel but only create an actual copy if necessary-- which means as long as no modification takes place and the array is non-contiguous, it will be sufficient to return the ``.flat`` (for starters). If it is contiguous than the copying can't be helped, but these cases are rare and currently you either have to test for them explicitly or slow everything down and waste memory by just always using ``ravel()``. For example, if bar is contiguous ``foo = ravel(bar)`` would be computationally equivalent to ``bar.flat``, as long as neither of them is modified, but semantically equivalent to the current ``foo = ravel(bar)`` in all cases. Thus you could now write: >>> a = ravel(a)[20:] wherever you've written this boiler-plate code before: >>> if a.iscontiguous(): >>> a = a.flat[20:] >>> else: >>> a = ravel(a)[20:] without any loss of performance. > > > > a feature one wants even though they are not the default, it turns > > > out that it isn't all that simple to obtain views without sacrificing > > > ordinary slicing syntax to obtain a view. It is simple to obtain > > > copies of view slices though. > > > > I'm not sure I understand the above. What is the problem with > > ``a.view[1:3]`` > > (or``a.view()[1:3])? > > > I didn't mean to imply it wasn't possible, but that it was not > quite as clean. The thing I don't like about this approach (or > Paul's suggestion of a.sub) is the creation of an odd object > that has as its only purpose being sliced. (Even worse, in my I personally don't find it messy. And please keep in mind that the ``view`` construct would only very seldomly be used if copy-on-demand is the default -- as I said, I've only needed the aliasing behavior once -- no doubt it was really handy then, but the fact that e.g. matlab doesn't have anything along those lines (AFAIK) suggests that many people will never need it. So even if ``.view`` is messy, I'd rather have something messy that is almost never used, in exchange for (what I perceive as) significantly nicer and cleaner semantics for something that is used all the time (array slicing; alias slicing is messy in at least the respect that it breaks standard usage and generic sequence code as well as causing potentially devious bugs. Unexpected behaviors like phantom buffers kept alive in their entirety by partial views etc. or what ``A = A[::-1]`` does are not exactly pretty either). > opinion, is making it a different kind of array where slicing > behaves differently. That will lead to the problem we have > discussed for other kinds of array behavior, namely, how do > you keep from being confused about a particular array's slicing > behavior). That could lead to confusion as well. Many may be I don't see that problem, frankly. The view is *not* an array. It doesn't need (and shouldn't have) anything except a method to access slices (__getitem__). As mentioned before, I also regard it as highly desirable that ``b = a.view[3:10]`` sticks out immediately. This signals "warning -- potentially tricky code ahead". Nothing in ``b = a[3:10]`` tells you that someone intends to modify a and b depedently (because in more than 9 out of 10 cases he won't) -- now *this* is confusing. > under the impression that x = a.view makes x refer to an array > when it doesn't. Users would need to know that a.view without > a '[' is usually an error. Since the ``.view`` shouldn't allow anything except slicing, they'll soon find out ("Error: you can't multiply me, I'm a view and not an array"). And I can't see why that would be harder to figure out (or look up in the docu) than that a[1:3] creates an alias and *not* a copy contrary to *everything* else you've ever heard or read about python sequences (especially since in most cases it will work as intended). Also what exactly is the confused person's notion of the purpose of ``x = a.view`` supposed to be? That ``x = a`` is what ``x = a.copy()`` really does and that to create aliases an alias to ``a`` they would have to use ``x = a.view``? In that case they'd better read the python tutorial before they do any more python programming, because they are in for all kinds of unpleasant surprises (``a = []; b = a; b[1] = 3; print a`` -- oops). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From perry at stsci.edu Fri Jun 14 08:02:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 14 08:02:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: : : > > I guess that depends on what you mean by unnecessary copies. > > In most cases the array of which I desire a flattened representation is > contiguous (plus, I usually don't intend to modify it). > Consequently, in most > cases I don't want to any copies of it to be created (especially > not if it is > really large -- which is not seldom the case). > Numarray already returns a view of the array if it is contiguous. Copies are only produced if it is non-contiguous. I assume that is the behavior you are asking for? > The fact that you can never really be sure whether you can actually use > ``.flat``, without checking beforehand if the array is in fact > contiguous (I > don't think there are many guarantees about something being > contiguous, or are > there?) and that ravel will always work but has a huge overhead, > suggests to > me that something is not quite right. > Not for numarray, at least in this context. > > If the array is non-contiguous what would you have it do? > > Simple -- in that case 'lazy ravel' would do the same as 'ravel' currently > does, create a copy (or alternatively rearrange the memory > representation to > make it non-contiguous and then create a lazy copy, but I don't > know whether > this would be a good or even feasible idea). > > A lazy version of ravel would have the same semantics as ravel > but only create > an actual copy if necessary-- which means as long as no modification takes > place and the array is non-contiguous, it will be sufficient to return the > ``.flat`` (for starters). If it is contiguous than the copying can't be > helped, but these cases are rare and currently you either have to test for > them explicitly or slow everything down and waste memory by just > always using > ``ravel()``. > Currently for numarray .flat will fail if it isn't contiguous. It isn't clear if this should change. If .flat is meant to be a view always, then it should always fail it the array is not contiguous. Ravel is not guaranteed to be a view. This is a problematic issue if we decide to switch from view to copy semantics. If slices produce copies, then does .flat? If so, then how does one produce a flattened view? x.view.flat? > For example, if bar is contiguous ``foo = ravel(bar)`` would be > computationally equivalent to ``bar.flat``, as long as neither of them is > modified, but semantically equivalent to the current ``foo = > ravel(bar)`` in > all cases. > > Thus you could now write: > > >>> a = ravel(a)[20:] > > wherever you've written this boiler-plate code before: > > >>> if a.iscontiguous(): > >>> a = a.flat[20:] > >>> else: > >>> a = ravel(a)[20:] > > without any loss of performance. > I believe this is already true in numarray. > > I personally don't find it messy. And please keep in mind that > the ``view`` > construct would only very seldomly be used if copy-on-demand is > the default > -- as I said, I've only needed the aliasing behavior once -- no > doubt it was > really handy then, but the fact that e.g. matlab doesn't have > anything along > those lines (AFAIK) suggests that many people will never need it. > You're kidding, right? Particularly after arguing for aliasing semantics in the previous paragraph for .flat ;-) > > Also what exactly is the confused person's notion of the purpose of ``x = > a.view`` supposed to be? That ``x = a`` is what ``x = a.copy()`` > really does > and that to create aliases an alias to ``a`` they would have to use > ``x = a.view``? In that case they'd better read the python > tutorial before they do > any more python programming, because they are in for all kinds of > unpleasant > surprises (``a = []; b = a; b[1] = 3; print a`` -- oops). > This is basically true, though the confusion may be that a.view is an array object that has different slicing behavior instead of an non-array object that can be sliced to produce a view. I don't view it as a major issue but I do see how may mistakenly infer that. Perry From tim.hochberg at ieee.org Fri Jun 14 09:14:05 2002 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Fri Jun 14 09:14:05 2002 Subject: [Numpy-discussion] copy on demand References: Message-ID: <007601c213be$6dc61fd0$061a6244@cx781526b> <"Perry Greenfield" writes> [SNIP] > Numarray already returns a view of the array if it is contiguous. > Copies are only produced if it is non-contiguous. I assume that > is the behavior you are asking for? This is one horrible aspect of NumPy that I hope you get rid of. I've been burned by this several times -- I expected a view, but silently got a copy because my array was noncontiguous. If you go with copy semantics, this will go away, if you go with view semantics, this should raise an exception instead of silently copying. Ditto with reshape, etc. In my experience, this is a source of hard to find bugs (as opposed to axes issues which tend to produce shallow bugs). [SNIP] > Currently for numarray .flat will fail if it isn't contiguous. It isn't > clear if this should change. If .flat is meant to be a view always, then > it should always fail it the array is not contiguous. Ravel is not > guaranteed to be a view. Ravel should either always return a view or always return a copy -- I don't care which > This is a problematic issue if we decide to switch from view to copy > semantics. If slices produce copies, then does .flat? If so, then > how does one produce a flattened view? x.view.flat? Wouldn't that just produce a copy of the view? Unless you did some weird special casing on view? The following would work, although it's a little clunky. flat_x = x.view[:] # Or however "get me a view" would be spelled. flat_x.shape = (-1,) -tim From hinsen at cnrs-orleans.fr Fri Jun 14 10:52:02 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Jun 14 10:52:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > I didn't mean to imply it wasn't possible, but that it was not > quite as clean. The thing I don't like about this approach (or > Paul's suggestion of a.sub) is the creation of an odd object > that has as its only purpose being sliced. (Even worse, in my Not necessarily. We could decide that array.view is a view of the full array object, and that slicing views returns subviews. > opinion, is making it a different kind of array where slicing > behaves differently. That will lead to the problem we have > discussed for other kinds of array behavior, namely, how do A view could be a different type of object, even though much of the implementation would be shared with arrays. This would help to reduce confusion. > behavior). That could lead to confusion as well. Many may be > under the impression that x = a.view makes x refer to an array > when it doesn't. Users would need to know that a.view without > a '[' is usually an error. Why? It would be a full-size view, which might actually be useful in many situations. My main objection to changing the slicing behaviour is, like with some other proposed changes, compatibility. Even though view behaviour is not required by every NumPy program, there are people out there who use it and finding the locations in the code that need to be changed is a very tricky business. It may keep programmers from switching to Numarray in spite of benefits elsewhere. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From perry at stsci.edu Fri Jun 14 12:19:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jun 14 12:19:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: : : > > > I didn't mean to imply it wasn't possible, but that it was not > > quite as clean. The thing I don't like about this approach (or > > Paul's suggestion of a.sub) is the creation of an odd object > > that has as its only purpose being sliced. (Even worse, in my > > Not necessarily. We could decide that > > array.view > > is a view of the full array object, and that slicing views returns > subviews. > > > opinion, is making it a different kind of array where slicing > > behaves differently. That will lead to the problem we have > > discussed for other kinds of array behavior, namely, how do > > A view could be a different type of object, even though much of the > implementation would be shared with arrays. This would help to > reduce confusion. > I'd be strongly against this. This has the same problem that other customized array objects have (whether regarding slicing behavior, operators, coercion...). In particular, it is clear which kind it is when you create it, but you may pass it to a module that presumes different array behavior. Having different kind of arrays floating around just seems like an invitation for confusion. I'm very much in favor of picking one or the other behaviors and then making some means of explicitly getting the other behavior. > > behavior). That could lead to confusion as well. Many may be > > under the impression that x = a.view makes x refer to an array > > when it doesn't. Users would need to know that a.view without > > a '[' is usually an error. > > Why? It would be a full-size view, which might actually be useful > in many situations. > But one can do that simply by x = a (Though there is the issue that one could do the following which is not the same: x = a.view x.shape = (2,50) so that x is a full array view with a different shape than a) ******** I understand the backward compatibilty issue here, but it is clear that this is an issue that appears to be impossible to get a consensus on. There appear to be significant factions that care passionately about copy vs view and no matter what decision is made many will be unhappy. Perry From jjl at pobox.com Fri Jun 14 12:22:04 2002 From: jjl at pobox.com (John J. Lee) Date: Fri Jun 14 12:22:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: Message-ID: On 14 Jun 2002, Alexander Schmolck wrote: [...] > The fact that you can never really be sure whether you can actually use > ``.flat``, without checking beforehand if the array is in fact > contiguous (I don't think there are many guarantees about something > being contiguous, or are there?) and that ravel will always work but has > a huge overhead, suggests to me that something is not quite right. Why does ravel have a huge overhead? It seems it already doesn't copy unless required: search for 'Chacking' -- including the mis-spelling -- in this thread: http://groups.google.com/groups?hl=en&lr=&threadm=abjbfp%241t9%241%40news5.svr.pol.co.uk&rnum=1&prev=/groups%3Fq%3Diterating%2Bover%2Bthe%2Bcells%2Bgroup:comp.lang.python%26hl%3Den%26lr%3D%26scoring%3Dr%26selm%3Dabjbfp%25241t9%25241%2540news5.svr.pol.co.uk%26rnum%3D1 or start up your Python interpreter, if you're less lazy than me. John From Chris.Barker at noaa.gov Fri Jun 14 16:20:03 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jun 14 16:20:03 2002 Subject: [Numpy-discussion] copy on demand References: Message-ID: <3D0A75D9.4AF344B3@noaa.gov> Konrad Hinsen wrote: > Not necessarily. We could decide that > > array.view > > is a view of the full array object, and that slicing views returns > subviews. Please don't!! Having two types of arrays around in a single program that have the same behaviour except when they are sliced is begging for confusion and hard to find bugs. I agree with Perry, that I occasionaly use the view behaviour of slicing, and it is very usefull when I do, but most of the time I would be happier with copy symantics. All I want is a way to get at a view of part of an array, I don't want two different kinds of array around with different slicing behaviour. > My main objection to changing the slicing behaviour is, like with some > other proposed changes, compatibility. The switch from Numeric to Numarray is a substantial change. I think we should view it like the mythical Py3k: an oportunity to make incompatible changes that will really make it better. By the way, as an old MATLAB user, I have to say that being able to get views from a slice is one behaviour of NumPy that I really appreciate, even though I only need it occasionally. MATLAB, howver is a whole different ball of wax in a lot of ways. There has been a lot of discussion about the copy on demand idea in MATLAB, but that is primarily useful because MATLAB has call by value function semantics, so without copy on demand, you would be making copies of large arrays passed to functions that weren't even going to change them. I don't think MATLAB impliments copy on demand for slices anyway, but I could be wrong there. Oh, and no function (ie ravel() ) should return a view in some cases, and a copy in others, that is just asking for bugs! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ransom at physics.mcgill.ca Fri Jun 14 16:27:01 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Fri Jun 14 16:27:01 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <3D0A75D9.4AF344B3@noaa.gov> References: <3D0A75D9.4AF344B3@noaa.gov> Message-ID: I was going to write an almost identical email, but Chris saved me the trouble. These are my feelings as well. Scott On June 14, 2002 07:01 pm, Chris Barker wrote: > Konrad Hinsen wrote: > > Not necessarily. We could decide that > > > > array.view > > > > is a view of the full array object, and that slicing views returns > > subviews. > > Please don't!! Having two types of arrays around in a single program > that have the same behaviour except when they are sliced is begging for > confusion and hard to find bugs. > > I agree with Perry, that I occasionaly use the view behaviour of > slicing, and it is very usefull when I do, but most of the time I would > be happier with copy symantics. All I want is a way to get at a view of > part of an array, I don't want two different kinds of array around with > different slicing behaviour. > > > My main objection to changing the slicing behaviour is, like with some > > other proposed changes, compatibility. > > The switch from Numeric to Numarray is a substantial change. I think we > should view it like the mythical Py3k: an oportunity to make > incompatible changes that will really make it better. > > By the way, as an old MATLAB user, I have to say that being able to get > views from a slice is one behaviour of NumPy that I really appreciate, > even though I only need it occasionally. MATLAB, howver is a whole > different ball of wax in a lot of ways. There has been a lot of > discussion about the copy on demand idea in MATLAB, but that is > primarily useful because MATLAB has call by value function semantics, so > without copy on demand, you would be making copies of large arrays > passed to functions that weren't even going to change them. I don't > think MATLAB impliments copy on demand for slices anyway, but I could be > wrong there. > > Oh, and no function (ie ravel() ) should return a view in some cases, > and a copy in others, that is just asking for bugs! > > -Chris -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From hinsen at cnrs-orleans.fr Sat Jun 15 01:56:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sat Jun 15 01:56:03 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> > I'd be strongly against this. This has the same problem that other > customized array objects have (whether regarding slicing behavior, > operators, coercion...). In particular, it is clear which kind it > is when you create it, but you may pass it to a module that > presumes different array behavior. Having different kind of arrays We already have that situation with lists and arrays (and in much of my code netCDF arrays, which have copy semantics) , but in my experience this has never caused confusion. Most general code working on sequences doesn't modify elements at all. When it does, it either clearly requires view semantics (a function you call in order to modify (parts of) an array) or clearly requires copy semantics (a function that uses an array argument as an initial value that it then modifies). > floating around just seems like an invitation for confusion. I'm > very much in favor of picking one or the other behaviors and then > making some means of explicitly getting the other behavior. Then the only solution I see is the current one: default behaviour is view, and when you want a copy yoy copy explicitly. The inverse is not possible, once you made a copy you can't make it behave like a view anymore. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From ransom at physics.mcgill.ca Sat Jun 15 06:13:05 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Sat Jun 15 06:13:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> Message-ID: <20020615131238.GB7948@spock.physics.mcgill.ca> On Sat, Jun 15, 2002 at 10:53:17AM +0200, Konrad Hinsen wrote: > > floating around just seems like an invitation for confusion. I'm > > very much in favor of picking one or the other behaviors and then > > making some means of explicitly getting the other behavior. > > Then the only solution I see is the current one: default behaviour is > view, and when you want a copy yoy copy explicitly. The inverse is not > possible, once you made a copy you can't make it behave like a view > anymore. I don't think it is necessary to create the other object _from_ the default one. You could have copy behavior be the default, and if you want a view of some array you simply request one explicitly with .view, .sub, or whatever. Since creating a view is "cheap" compared to creating a copy, there is nothing sacrificed doing things in this manner. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From victor1977 at fazter.com Sat Jun 15 20:17:02 2002 From: victor1977 at fazter.com (victor ichaka nabia) Date: Sat Jun 15 20:17:02 2002 Subject: [Numpy-discussion] Personal Message-ID: Dear Sir, I am the Chairman Contract Review Committee of National Electric Power Authority (NEPA). Although this proposal might come to you as a surprise since it is coming from someone you do not know or ever seen before, but after due deliberation with my colleagues, I decided to contact you based onIntuition. We are soliciting for your humble and confidential assistance to take custody of Seventy One Million, Five Hundred Thousand United StatesDollars.{US$71,500,000.00}. This sum (US$71.5M) is an over invoiced contract sum which is currently in offshore payment account of the Central Bank of Nigeria as an unclaimed contract entitlement which can easily be withdrawn or drafted or paid to any recommended beneficiary by my committee. On this note, you will be presented as a contractor to NEPA who has executed a contract to a tune of the above sum and has not been paid. Proposed Sharing Partern (%): 1. 70% for me and my colleagues. 2. 20% for you as a partner/fronting for us. 3. 10% for expenses that may be incure by both parties during the cause of this transacton. Our law prohibits a civil servant from operating a foreign account, hence we are contacting you. If this proposal satisfies you, do response as soon as possible with the following information: 1. The name you wish to use as the beneficiary of thefund. 2. Your Confidential Phone and Fax Numbers. Further discussion will be centered on how the fund shall be transferred and full details on how to accomplish this great opportunity of ours. Thank you and God bless. Best regards, victor ichaka nabia From a.schmolck at gmx.net Sun Jun 16 15:59:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Sun Jun 16 15:59:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: References: Message-ID: "Perry Greenfield" writes: > : > : > > > I guess that depends on what you mean by unnecessary copies. > > > > In most cases the array of which I desire a flattened representation is > > contiguous (plus, I usually don't intend to modify it). > > Consequently, in most > > cases I don't want to any copies of it to be created (especially > > not if it is > > really large -- which is not seldom the case). > > > Numarray already returns a view of the array if it is contiguous. > Copies are only produced if it is non-contiguous. I assume that > is the behavior you are asking for? Not at all -- in fact I was rather shocked when my attention was drawn to the fact that this is also the behavior of Numeric -- I had thought that ravel would *always* create a copy. I absolutely agree with the other posters that remarked that different behavior of ravel (creating a copy vs creating a view, depending on whether the argument is contiguous) is highly undesirable and error-prone (especially since it is not even possible to determine at compile time which behavior will occur, if I'm not mistaken). In fact, I think this behavior is worse than what I incorrectly assumed to be the case. What I was arguing for is a ravel that always has the same semantics, (namely creating a copy) but tha -- because it would create the copy only demand -- would be just as efficient as using .flat when a) its argument were contiguous; and b) neither the result nor the argument were modified while both are alive. The reason that I view `.flat` as a hack, is that it is an operation that is there exclusively for efficiency reasons and has no well defined semantics -- it will only work stochastically, giving better performance in certain cases. Thus you have to cast lots whether you actually use it at runtime (calling .iscontiguous) and always have a fall-back scheme (most likely using ravel) at hand -- there seems to be no way to determine at compile time what's going to happen. I don't think a language or a library should have any such constructs or at least strive to minimize their number. The fact that the current behavior of ravel actually achieves the effect I want in most cases doesn't justify its obscure behavior in my eyes, which translates into a variation of the boiler-plate code previously mentioned (``if a.iscontiguous:...else:``) when you actually want a *single* ravelled copy and it also is a very likely candidate for extremely hard to find bugs. One nice thing about python is that there is very little undefined behavior. I'd like to keep it that way. [snipped] > > I personally don't find it messy. And please keep in mind that > > the ``view`` > > construct would only very seldomly be used if copy-on-demand is > > the default > > -- as I said, I've only needed the aliasing behavior once -- no > > doubt it was > > really handy then, but the fact that e.g. matlab doesn't have > > anything along > > those lines (AFAIK) suggests that many people will never need it. > > > You're kidding, right? Particularly after arguing for aliasing > semantics in the previous paragraph for .flat ;-) I didn't argue for any semantics of ``.flat`` -- I just pointed out that I found the division of labour that I (incorrectly) assumed to be the case an ugly hack (for the reasons outlined above): ``ravel``: always works, but always creates copy (which might be undesirable wastage of resources); [this was mistaken; the real semantics are: always works, creates view if contiguous, copy otherwise] ``.flat``: behavior undefined at compile time, a runtime-check can be used to ensure that it can be used as a more efficient alternative to ``ravel`` in some cases. If I now understand the behavior of both ``ravel`` and ``.flat`` correctly then I can't currently see *any* raison d'?tre for a ``.flat`` attribute. If, as I would hope, the behavior of ravel is changed to always create copies (ideally on-demand), then matters might look different. In that case, it might be justifiable to have ``.flat`` as a specialized construct analogous to what I proposed as``.view``, but only if there is some way to make it work (the same) for both contiguous and non-contiguous arrays. I'm not sure that it would be needed at all (especially with a lazy ravel). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From hinsen at cnrs-orleans.fr Mon Jun 17 01:46:08 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 17 01:46:08 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: (message from Alexander Schmolck on 17 Jun 2002 00:30:19 +0100) References: Message-ID: <200206170843.g5H8h4u08627@chinon.cnrs-orleans.fr> > Konrad Hinsen writes: > > [did you mean this to be off-list? If not, please just forward it to the > list.] No, I sent the mail to the list as well, but one out of three mails I send to the list never arrive there at first try... In this case, the copy sent to myself got lost as well, so I don't have any copy left, sorry. > > > > > > I don't know about the others out there, but I have 30000 lines of > > published Python code plus a lot of unpublished code (scripts), all of > > which use NumPy arrays almost everywhere. There are also a few places > > where views are created intentionally, which are then passed around to > > other code and can end up anywhere. The time required to update all > > that to new slicing semantics would be enormous, and I don't see how I > > could justify it to myself or to my employer. I'd also have to stop > > advertising Python as a time-efficient development tool. > > I sympathize with this view. However, I think the solution to this problem > should be a compatibility wrapper rather than a design compromise. > > There are at least 2 reasons why: > > 1. Numarray has quite a few incompatibilities to Numeric anyway, so even > without this change you'd be forced to rewrite all or most of those scripts The question is how much effort it is to update code. If it is easy, most people will do it sooner or later. If it is difficult, they won't. And that will lead to a split in the user community, which I think is highly detrimental to the further development of NumPy and Numarray. A compatibility wrapper won't change this. Assume that I have tons of code that I can't update because it's too much effort. Instead I use the compatbility wrapper. When I add a line or a function to that code, it will of course stick to the old conventions. When I add a new module, I will also prefer the old conventions, for consistency. And other people working with the code will pick up the old conventions as well. At the same time, other people will use the new conventions. There will be two parts of the community that cannot easily read each other's code. So unless we can reach a concensus that will guarantee that 90% of existing code will be adapted to the new interfaces, there will be a split. > (or use the wrapper), but none of the incompatibilities I'm currently aware > of would, in my eyes, buy one as much as introducing copy-indexing > semantics would. So if things get broken anyway, one might as well take I agree, but it also comes at the highest cost. There is absolute no way to identify automatically the code that needs to be adapted, and there is no run-time error message in case of failure - just a wrong result. None of the other proposed changes is as risky as this one. > this step (especially since intentional views are, on the whole, used > rather sparingly -- although tracking down these uses in retrospect might > admittedly be unpleasant). It is not merely unpleasant, the cost is simply prohibitive. > 2. Numarray is supposed to be incorporated into the core. Compromising the > consistency of core python (and code that depends on it) is in my eyes > worse than compromising code written for Numeric. I don't see view behaviour as inconsistent with Python. Python has one mutable sequence type, the list, with copy behaviour. One type is hardly enough to establish a rule. > As a third reason I could claim that there is some hope of a much more > widespread adoption of Numeric/numarray as an alternative to matlab etc. in > the next couple of years, so that it might be wise to fix things now, but I'd > understand if you'd remain unimpressed by that :) I'd like to see any supporting evidence. I think this argument is based on the reasoning "I would prefer it to be this way, so many others would certainly also prefer it, so they would start using NumPy if only these changes were made." This is not how decision processes work in real life. On the contrary, people might look at the history of NumPy and decide that it is too unreliable to base a serious project on - if they changed the interface once, they might do it again. This is a particularly important aspect in the OpenSource universe, where there are no contracts that promise anything. If you want people to use your code, you have to demonstrate that it is reliable, and that applies to both the code and the interfaces. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hinsen at cnrs-orleans.fr Mon Jun 17 02:01:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Jun 17 02:01:05 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <20020615131238.GB7948@spock.physics.mcgill.ca> (message from Scott Ransom on Sat, 15 Jun 2002 09:12:38 -0400) References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> <20020615131238.GB7948@spock.physics.mcgill.ca> Message-ID: <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> > > Then the only solution I see is the current one: default behaviour is > > view, and when you want a copy yoy copy explicitly. The inverse is not > > possible, once you made a copy you can't make it behave like a view > > anymore. > > I don't think it is necessary to create the other object _from_ > the default one. You could have copy behavior be the default, > and if you want a view of some array you simply request one > explicitly with .view, .sub, or whatever. Let's make this explicit. Given the following four expressions, 1) array 2) array[0] 3) array.view 4) array.view[0] what would the types of each of these objects be according to your proposal? What would the indexing behaviour of those types be? I don't see how you can avoid having either two types or two different behaviours within one type. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From a.schmolck at gmx.net Mon Jun 17 08:12:03 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Mon Jun 17 08:12:03 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206170843.g5H8h4u08627@chinon.cnrs-orleans.fr> References: <200206170843.g5H8h4u08627@chinon.cnrs-orleans.fr> Message-ID: Konrad Hinsen writes: [Konrad wants to keep alias-slicing behavior for backward-compatibility] > > I sympathize with this view. However, I think the solution to this problem > > should be a compatibility wrapper rather than a design compromise. > > > > There are at least 2 reasons why: > > > > 1. Numarray has quite a few incompatibilities to Numeric anyway, so even > > without this change you'd be forced to rewrite all or most of those scripts > > The question is how much effort it is to update code. If it is easy, > most people will do it sooner or later. If it is difficult, they won't. > And that will lead to a split in the user community, which I think > is highly detrimental to the further development of NumPy and Numarray. I agree that avoiding a split of the Numeric user community is a crucial issue and that efforts have to be taken to make transition painless enough to happen (in most cases; maybe it needs to be even 90% or more as you say). > > A compatibility wrapper won't change this. Assume that I have tons of > code that I can't update because it's too much effort. Instead I use > the compatbility wrapper. When I add a line or a function to that > code, it will of course stick to the old conventions. When I add a new > module, I will also prefer the old conventions, for consistency. And > other people working with the code will pick up the old conventions as > well. At the same time, other people will use the new conventions. > There will be two parts of the community that cannot easily read each > other's code. I don't think the situation is quite so bleak. Yes, library code should be converted, and although a compatibility wrapper might be helpful in the process, I agree that it isn't a full solution for the reasons you cite above. But there is plenty of code that is mainly used internally and no longer changes (much), for which I think a compatibility wrapper is a fine solution (and might be preferable to conversion, even if it involves little effort). If I had some matlab (or C) code that fulfills similar criteria, I'd also rather wrap it somehow rather than to convert it to python. > > So unless we can reach a concensus that will guarantee that 90% of > existing code will be adapted to the new interfaces, there will be a > split. > > > (or use the wrapper), but none of the incompatibilities I'm currently aware > > of would, in my eyes, buy one as much as introducing copy-indexing > > semantics would. So if things get broken anyway, one might as well take > > I agree, but it also comes at the highest cost. There is absolute no > way to identify automatically the code that needs to be adapted, and > there is no run-time error message in case of failure - just a wrong > result. None of the other proposed changes is as risky as this one. Wouldn't an (almost) automatic solution be to simply replace (almost) all instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual cases (like if you heavily mix arrays and lists) you could still autoconvert by inserting ``if type(foo) == ArrayType:...``, although this would admittedly be rather messy. The unnecessary ``.view``s can be eliminated over time and even if they aren't, no one would have to learn or switch between two libraries. > > > this step (especially since intentional views are, on the whole, used > > rather sparingly -- although tracking down these uses in retrospect might > > admittedly be unpleasant). > > It is not merely unpleasant, the cost is simply prohibitive. See above. I personally hope that even without resorting to something like the above, converting my code to copy behavior wouldn't be too much of an effort, but my code-base is much smaller than yours and I can't currently recall more than one case of intended aliasing that would require a couple of changes and my estimate might also prove quite wrong. I have no idea which scenario is typical. > > > 2. Numarray is supposed to be incorporated into the core. Compromising the > > consistency of core python (and code that depends on it) is in my eyes > > worse than compromising code written for Numeric. > > I don't see view behaviour as inconsistent with Python. Python has one > mutable sequence type, the list, with copy behaviour. One type is > hardly enough to establish a rule. Well, AFAIK there are actually three mutable sequence types in python core and all have copy-slicing behavior: list, UserList and array: >>> import array >>> aa = array.array('d', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]) >>> bb = aa[:] >>> bb is aa 0 I would suppose that in the grand scheme of things numarray.array is intended as an eventual replacement for array.array, or not? Furthermore list is such a fundamental data type in python that I think it is actually enough to establish a rule (if the vast majority of 3rd party modules sequence types don't have the same semantics, I'd regard it as a strong argument for your position, but I haven't checked). > > > As a third reason I could claim that there is some hope of a much more > > widespread adoption of Numeric/numarray as an alternative to matlab etc. in > > the next couple of years, so that it might be wise to fix things now, but I'd > > understand if you'd remain unimpressed by that :) > > I'd like to see any supporting evidence. I think this argument is > based on the reasoning "I would prefer it to be this way, so many > others would certainly also prefer it, so they would start using NumPy > if only these changes were made." This is not how decision processes > work in real life. Sure, but I didn't try to imply this causality anyway:) My argument wasn't so much "lets make it really good (where good is what *I* say) then loads of people will adopt it", it was more: "Numeric has a good chance to grow considerably in popularity over the next years, so it will be much easier to fix things now than later" (for slicing behavior, now is likely to be the last chance). The fact that matlab users are used to copy-on-demand and the fact that many people, (including you if I understand you correctly) think that copy-slicing semantics as such (without backward compatibility concerns) are preferable, might have a small influence on people's decision to adopt Numeric, but I perfectly agree that this influence will be minor compared to other issues. > > On the contrary, people might look at the history of NumPy and decide > that it is too unreliable to base a serious project on - if they > changed the interface once, they might do it again. This is a > particularly important aspect in the OpenSource universe, where there > are no contracts that promise anything. If you want people to use your I don't think matlab or similar alternatives make legally binding promises about backwards compatibility, or do they? It guess it is actually more difficult to *force* incompatible changes on people with an open source project than with commercial software, but I agree that splitting or lighthearted sacrifices of backwards compatibility are more of a temptation with open source, for one thing because there are usually less financial stakes involved for the authors. > code, you have to demonstrate that it is reliable, and that applies to > both the code and the interfaces. Yes, this is very important and I very much appreciate that you stress these and similar points in your postings. But reliability to me also includes the ability for growth -- I not only want my old code to work in a couple of years, I also want the tool I wrote it in to remain competitive and this can conflict with backwards-compatibility. I like the balance python strikes here so far -- the language has improved significantly (and in my eyes has remained superior to newer competitors such as ruby) but at the same time for me and most other people transitions between versions haven't caused too much trouble. This increases the value of my code-base to me: I can assume that it will still work (or be adapted without too much effort) in years to come and yet be written in an excellent language for the job. Striking this balance is however quite difficult (as can be seen by the heated discussions in c.l.p), so getting it right will most likely involve considerable effort (and controversy) within the Numeric community. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From ransom at physics.mcgill.ca Mon Jun 17 10:14:48 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Jun 17 10:14:48 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> References: <20020615131238.GB7948@spock.physics.mcgill.ca> <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> Message-ID: On June 17, 2002 04:57 am, Konrad Hinsen wrote: > > > Then the only solution I see is the current one: default behaviour is > > > view, and when you want a copy yoy copy explicitly. The inverse is not > > > possible, once you made a copy you can't make it behave like a view > > > anymore. > > > > I don't think it is necessary to create the other object _from_ > > the default one. You could have copy behavior be the default, > > and if you want a view of some array you simply request one > > explicitly with .view, .sub, or whatever. > > Let's make this explicit. Given the following four expressions, > > 1) array > 2) array[0] > 3) array.view > 4) array.view[0] > > what would the types of each of these objects be according to your > proposal? What would the indexing behaviour of those types be? > I don't see how you can avoid having either two types or two > different behaviours within one type. If we assume that a slice returns a copy _always_, then I agree that #4 in your list above would not give a user what they would expect: array.view[0] would give the view of a copy of array[0], _not_ a view of array[0] which is probably what is wanted. I _think_ that this could be fixed by making view (or something similar) an option of the slice rather than a method of the object. For example (assuming that a is an array): Expression: Returns: Slicing Behavior: a or a[:] Copy of all of a Returns a copy of the sub-array a[0] Copy of a[0] Returns a copy of the sub-array a[:,view] View of all of a Returns a copy of the sub-array a[0,view] View of a[0] Returns a copy of the sub-array Notice that it is possible to return a copy of a sub-array from a view since you have access (through a pointer) to the original array data. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From fardal at uvastro.phys.uvic.ca Mon Jun 17 13:49:03 2002 From: fardal at uvastro.phys.uvic.ca (Mark Fardal) Date: Mon Jun 17 13:49:03 2002 Subject: [Numpy-discussion] Re: Personal Message-ID: <200206172047.g5HKlrw09617@mussel.phys.uvic.ca> Dear Numpy-Discussion, It is good to see that Numeric Python inspires such confidence in people all around the world, especially when subjected to due deliberation. I hope that this invoiced contract entitlement will not be set to zero once we obtain a view of it. I would like to propose a further elaboration of the Sharing Partern. Eric, Travis, Konrad, Scott, Paul, and Perry will each get 2% of the total based on their contributions to the mailing list traffic so far (I am blissfully ignorant of who has written actual code), and the rest of the 20% will go to the first individual to deliver a finished working Numarray. With copy semantics only, please. best regards, Mark Fardal > Dear Sir, > I am the Chairman Contract Review Committee of > National Electric Power Authority (NEPA). > Although this proposal might come to you as a surprise > since it is coming from someone you do not know or > ever seen before, but after due deliberation with my > colleagues, I decided to contact you based onIntuition. > We are soliciting for your humble and confidential > assistance to take custody of Seventy One Million, > Five Hundred Thousand United StatesDollars.{US$71,500,000.00}. > This sum (US$71.5M) is an over invoiced contract sum > which is currently in offshore payment account of the > Central Bank of Nigeria as an unclaimed contract > entitlement which can easily be withdrawn or drafted > or paid to any recommended beneficiary by my committee. > On this note, you will be presented as a contractor to > NEPA who has executed a contract to a tune of the > above sum and has not been paid. > Proposed Sharing Partern (%): > 1. 70% for me and my colleagues. > 2. 20% for you as a partner/fronting for us. > 3. 10% for expenses that may be incure by both parties > during the cause of this transacton. > Our law prohibits a civil servant from operating a > foreign account, hence we are contacting you. > If this proposal satisfies you, do response as soon as > possible with the following information: > 1. The name you wish to use as the beneficiary of thefund. > 2. Your Confidential Phone and Fax Numbers. > Further discussion will be centered on how the fund > shall be transferred and full details on how to accomplish this great opportuni\ ty of ours. > Thank you and God bless. > > Best regards, > > victor ichaka nabia > From Chris.Barker at noaa.gov Mon Jun 17 15:49:04 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Mon Jun 17 15:49:04 2002 Subject: [Numpy-discussion] copy on demand References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> <20020615131238.GB7948@spock.physics.mcgill.ca> <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> Message-ID: <3D0E634F.9B53A102@noaa.gov> Konrad Hinsen wrote: > Let's make this explicit. Given the following four expressions, > 1) array > 2) array[0] > 3) array.view > 4) array.view[0] I thought I had I clear idea of what I wanted here, which was the non-view stuff being the same as Python lists, but I discovered something: Python lists provide slices that are copies, but they are shallow copies, so nested lists, which are sort-of the equivalent of multidimensional arrays, act a lot like the view behavior of NumPy arrays: make a "2-d" list >>> l = [[i, 1+5] for i in range(5)] >>> l [[0, 6], [1, 6], [2, 6], [3, 6], [4, 6]] make an array that is the same: >>> a = array(l) array([[0, 6], [1, 6], [2, 6], [3, 6], [4, 6]]) assign a new binding to the first element: >>> b = a[0] >>> m = l[0] change something in it: >>> b[0] = 30 >>> a array([[30, 6], [ 1, 6], [ 2, 6], [ 3, 6], [ 4, 6]]) The first array is changed Change something in the first element of the list: >>> m[0] = 30 >>> l [[30, 6], [1, 6], [2, 6], [3, 6], [4, 6]] The first list is changed too. Now try slices instead: >>> b = a[2:4] change an element in the slice: >>>> b[1,0] = 55 >>> a array([[30, 6], [ 1, 6], [ 2, 6], [55, 6], [ 4, 6]])>> a The first array is changed Now with the list >>> m = l[2:4] >>> m [[2, 6], [3, 6]] This is a copy, but it is a shallow copy, so: >>> m[1][0] = 45 Change an element >>> l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]] The list is changed, but: m[0] = [56,65] >>> l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]] The list doesn't change, where: >>> b[0] = [56,65] >>> a array([[30, 6], [ 1, 6], [56, 65], [55, 6], [ 4, 6]]) The array does change My conclusion is that nested lists and Arrays simply are different beasts so we can't expect complete compatibility. I'm also wondering why lists have that weird behavior of a single index returning a reference, and a slice returning a copy. Perhaps it has something to so with the auto-resizing of lists. That being said, I still like the idea of slices producing copies, so: > 1) array An Array like we have now, but slice-is-copy semantics. > 2) array[0] An Array of rank one less than array, sharing data with array > 3) array.view An object that can do nothing but create other Arrays that share data with array. I don't know if is possible but I'd be just as happy if array.view returned None, and array.view[slice] returned an Array that shared data with array. Perhaps there is some other notation that could do this. > 4) array.view[0] Same as 2) To add a few: 5) array[0:1] An Array with a copy of the data in array[0] 6) array.view[0:1] An Array sharing data with array As I write this, I am starting to think that this is all a bit strange. Even though lists treat slices and indexes differently, perhaps Arrays should not. They really are different beasts. I also see why it was done the way it was in the first place! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From a.schmolck at gmx.net Tue Jun 18 15:23:02 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Tue Jun 18 15:23:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <3D0E634F.9B53A102@noaa.gov> References: <200206150853.g5F8rHC31759@chinon.cnrs-orleans.fr> <20020615131238.GB7948@spock.physics.mcgill.ca> <200206170857.g5H8vsr08849@chinon.cnrs-orleans.fr> <3D0E634F.9B53A102@noaa.gov> Message-ID: Chris Barker writes: > My conclusion is that nested lists and Arrays simply are different > beasts so we can't expect complete compatibility. I'm also wondering why > lists have that weird behavior of a single index returning a reference, > and a slice returning a copy. Perhaps it has something to so with the This is not weird at all. Slicing and single item indexing are different conceptually and what I think you have in mind wouldn't really work. Think of a real life container, like box with subcompartments. Obviously you should be able to take out (or put in) an item from the box, which is what single indexing does (and the item may happen to be another box). My understanding is that you'd like the box to return copies of whatever was put into it on indexing, rather than the real thing -- this would not only be counterintuitive and inefficient, it also means that you could exclusively put items with a __copy__-method in lists, which would rather limit their usefulness. Slicing on the other hand creates a whole new box but this box is filled with (references to) the same items (a behavior for which a real life equivalent is more difficult to find :) : >>> l = 'foobar' >>> l = ['foobar', 'barfoot'] >>> l2 = l[:] >>> l2[0] is l[0] 1 Because the l and l2 are different boxes, however, assigning new items to l1 doesn't change l2 and vice versa. It is true, however that the situation is somewhat different for arrays, because "multidimensional" lists are just nested boxed, whereas multidimensional arrays have a different structure. array[1] indexes some part of itself according to its .shape (which can be modified, thus changing what array[1] indexes, without modifying the actual array contents in memory), whereas list[1] indexes some "real" object. This may mean that the best behavior for ``array[0]`` would be to return a copy and ``array[:]`` etc. what would be a "deep copy" if it where nested lists. I think this is the behavior Paul Dubois MA currently has. > auto-resizing of lists. That being said, I still like the idea of slices > producing copies, so: > > > 1) array > An Array like we have now, but slice-is-copy > semantics. > > > 2) array[0] > An Array of rank one less than array, sharing data with array > > > 3) array.view > An object that can do nothing but create other Arrays that share data > with array. I don't know if is possible but I'd be just as happy if > array.view returned None, and array.view[slice] returned an Array that No it is not possible. > shared data with array. Perhaps there is some other notation that could > do this. > > > 4) array.view[0] > Same as 2) I can't see why single-item indexing views would be needed at all if ``array[0]`` doesn't copy as you suggest above. > > To add a few: > > 5) array[0:1] > An Array with a copy of the data in array[0] (I suppose you'd also want array[0:1] and array[0] to have different shape?) > > 6) array.view[0:1] > An Array sharing data with array > > As I write this, I am starting to think that this is all a bit strange. > Even though lists treat slices and indexes differently, perhaps Arrays > should not. They really are different beasts. I also see why it was done Yes, arrays and lists are indeed different beasts and a different indexing behavior (creating copies) for arrays might well be preferable (since array indexing doesn't refer to "real" objects). > the way it was in the first place! > > -Chris alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From hinsen at cnrs-orleans.fr Thu Jun 20 09:30:05 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Thu Jun 20 09:30:05 2002 Subject: [Numpy-discussion] copy on demand Message-ID: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> > Wouldn't an (almost) automatic solution be to simply replace (almost) all > instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual That would convert all slicing operations, even those working on strings, lists, and user-defined sequence-type objects. > cases (like if you heavily mix arrays and lists) you could still I do, and I don't consider it that unusual. Anyway, even if some function gets called only with array arguments, I don't see how a code analyzer could detect that. So it would be... > autoconvert by inserting ``if type(foo) == ArrayType:...``, although typechecks for every slicing or indexing operation (a[0] generates a view as well for a multidimensional array). Guaranteed to render most code unreadable, and of course slow down execution. A further challenge for your code convertor: f(a[0], b[2:3], c[-1, 1]) That makes eight type combination cases. > Well, AFAIK there are actually three mutable sequence types in > python core and all have copy-slicing behavior: list, UserList and > array: UserList is not an independent type, it is merely a subclassable wrapper around lists. As for the array module, I haven't seen any code that uses it. > I would suppose that in the grand scheme of things numarray.array is intended > as an eventual replacement for array.array, or not? In the interest of those who rely on the current array module, I hope not. > much "lets make it really good (where good is what *I* say) then loads of > people will adopt it", it was more: "Numeric has a good chance to grow > considerably in popularity over the next years, so it will be much easier to > fix things now than later" (for slicing behavior, now is likely to be the last > chance). I agree - except that I think it is already too late. > The fact that matlab users are used to copy-on-demand and the fact that many > people, (including you if I understand you correctly) think that copy-slicing > semantics as such (without backward compatibility concerns) are preferable, Yes, assuming that views are somehow available. But my preference is not so strong that I consider it a sufficient reason to break lots of code. View semantics is not a catastrophe. All of us continue to use NumPy in spite of it, and I suspect none of use loses any sleep over it. I have spent perhaps a few hours in total (over six years of using NumPy) to track down view-related bugs, which makes it a minor problem on my personal scale. > I don't think matlab or similar alternatives make legally binding promises > about backwards compatibility, or do they? It guess it is actually more Of course not, software providers for the mass market take great care not to promise anything. But if Matlab did anything as drastic as what we are discussing, they would loose lots of paying customers. > But reliability to me also includes the ability for growth -- I not only want > my old code to work in a couple of years, I also want the tool I wrote it in > to remain competitive and this can conflict with backwards-compatibility. I In what way does the current slicing behaviour render your code non-competitive? > like the balance python strikes here so far -- the language has Me too. But there haven't been any incompatible changes in the documented core language, and only very few in the standard library (the to-be-abandoned re module comes to mind - anything else?). For a bad example, see the Python XML package(s). Lots of changes, incompatibilities between parsers, etc. The one decision I really regret is to have chosen an XML-based solution for documentation. Now I spend two days at every new release of my stuff to adapt the XML code to the fashion of the day. It is almost ironic that I appear here as the great anti-change advocate, since in many other occasions I have argued for improvement over excessive compatiblity. Basically I favour motivated incompatible changes, but under the condition that updating of existing code is manageable. Changing the semantics of a type is about the worst I can imagine in this respect. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From magnus at hetland.org Fri Jun 21 04:38:03 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Fri Jun 21 04:38:03 2002 Subject: [Numpy-discussion] average Message-ID: <20020621133705.A15296@idi.ntnu.no> One quick question: Why does the MA module have an average function, but not Numeric? And what is the equivalent in numarray? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From a.schmolck at gmx.net Fri Jun 21 16:42:01 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Fri Jun 21 16:42:01 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> Message-ID: [sorry for replying so late, an almost finished email got lost in a computer accident and I was rather busy.] Konrad Hinsen writes: > > Wouldn't an (almost) automatic solution be to simply replace (almost) all > > instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual > > That would convert all slicing operations, even those working on > strings, lists, and user-defined sequence-type objects. Well that's where the "(almost)" comes in ;) If you can tell at glance for most instances in you code whether the ``foo`` in ``foo[a:b]`` is an array, then running a query replace isn't that much trouble. Of course this might not be true. But the question really is: to what extent would it be more difficult to tell than what you need to find out already in all the other situations where code needs changing because of the incompatibilities numarray already introduces? (I think I have for example already found a slicing-incompatibility -- unfortunately the list of the issues I hit upon so far has disappeared somewhere, so I'll have to try to reconstruct it sometime...) If the answer is "not much", then you would have to regard these incompatibilities as even less acceptable than the introduction of copy-slicing semantics (because as you've already agreed, these incompatibilities don't confer the same benefit) or otherwise it would be difficult to see why copy-slicing shouldn't be introduced as well (just as an example, I'm sure I've already come across a slicing incompatibility -- unfortunately I've lost my compilation of this and similar problems, but I'll try to reconstruct it). View semantics have always bothered me, but if it weren't for the fact that numarray is going to cause me not inconsiderable inconvenience through various incompatibilities anyway, I would have been satisfied with the status quo. As things are, however I must admit I feel a strong temptation to get this fixed as well, especially as most of the other laudable improvements of numarray wouldn't seem to be of great importance to me personally at the moment (much nicer C code base, better handling of byteswapped data and very large arrays etc.). So I fully admit to a selfish desire for either more gain or less pain (incompatibility) or maybe even a bit of both. Of course I don't think these subjective desires of mine are a good standard to go by, but I am convinced that offering attractive improvements or few compatibility problems (or both) to the widest possible audience of current Numeric users is important in order to replace Numeric, quickly and cleanly, without any splitting. > > > autoconvert by inserting ``if type(foo) == ArrayType:...``, although > > typechecks for every slicing or indexing operation (a[0] generates a > view as well for a multidimensional array). Guaranteed to render most > code unreadable, and of course slow down execution. > > A further challenge for your code convertor: > > f(a[0], b[2:3], c[-1, 1]) > > That makes eight type combination cases. I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This was mainly intended as a demonstration that you *can* do it automatically, if you really need to. A function call would help the readability but obviously be even more inefficient. If I really had large amounts of code that needed that conversion, I'd be tempted to write such a function with an additional twist: have it monitor the input argument type whenever the program is run and if it isn't an array, the wrapping in this particular line can be discarded (with less confidence, if it always seems to be an array it could be converted into ``a.view[b:c]``, but that might need additional checking). In code that isn't reached, the wrapper just stays forever. I've always been looking for an excuse to write some self-modifying code :) > > > Well, AFAIK there are actually three mutable sequence types in > > python core and all have copy-slicing behavior: list, UserList and > > array: > > UserList is not an independent type, it is merely a subclassable > wrapper around lists. As for the array module, I haven't seen any code > that uses it. It is AFAIK the only way to work efficiently with large strings, so I guess it is important also I agree that it is not that often used. > > > I would suppose that in the grand scheme of things numarray.array is intended > > as an eventual replacement for array.array, or not? > > In the interest of those who rely on the current array module, I hope not. As long as array is kept around for backwards-compatibility, why not? [...] > > But reliability to me also includes the ability for growth -- I not only want > > my old code to work in a couple of years, I also want the tool I wrote it in > > to remain competitive and this can conflict with backwards-compatibility. I > > In what way does the current slicing behaviour render your code > non-competitive? A single design decision obviously doesn't have such an immediate huge negative impact that it immediately renders all your code-noncompetive, unless it was a *really* bad design decision it just means more bugs and less clear and general code. But language warts are more like tumours, they grow over the years and become increasingly difficult to excise (just look what tremendous redesign effort the perl people go through at the moment). The closer warts come to the core language the worse, and since numarray aims for inclusion I think it must be measured to a higher standard than other modules that don't. > > > like the balance python strikes here so far -- the language has > > Me too. But there haven't been any incompatible changes in the > documented core language, and only very few in the standard library > (the to-be-abandoned re module comes to mind - anything else?). I don't think this is true (and the documented core language is not necessarily a good standard to go by as far as python is concerned, because not quite everything one has to rely upon is actually documented (instead one can find things like: "XXX Can't be bothered to spell this out right now...")). Among the incompatible changes that I would strongly assume *were* documented before and after are: exceptions (strings -> classes), automatic conversion of ints to longs (instead of an exception) and the new division rules whose stepwise introduction has already started. There are also quite a few things that used to work for all classes, but that now no longer work with new-style classes, some of which can be quite annoying (you loose quite a bit of introspective and interactive power), but I'm not sure to which extent they were documented. > > For a bad example, see the Python XML package(s). Lots of changes, > incompatibilities between parsers, etc. The one decision I really > regret is to have chosen an XML-based solution for documentation. Now > I spend two days at every new release of my stuff to adapt the XML > code to the fashion of the day. I didn't do much xml processing, but as far as I can remember I was happy with 4suite: http://4suite.org/index.xhtml. > > It is almost ironic that I appear here as the great anti-change > advocate, since in many other occasions I have argued for improvement > over excessive compatiblity. Basically I favour motivated incompatible I don't think a particularly conservative character is necessary to fill that role :) You've got a big code base, which automatically reduces the desire for incompatibilities because you have to pay a hefty cost that is difficult to offset by potential advantages for future code. But that side of the argument is clearly important and I think even if you don't like to be an anti-change advocate you still often make valuable points against changes you perceive as uncalled for. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From hinsen at cnrs-orleans.fr Sun Jun 23 01:24:02 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Sun Jun 23 01:24:02 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: (message from Alexander Schmolck on 22 Jun 2002 00:41:13 +0100) References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> Message-ID: <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> > If you can tell at glance for most instances in you code whether the ``foo`` > in ``foo[a:b]`` is an array, then running a query replace isn't that much How could I? Moreover, even if I could, that's not enough. I need a program to spot those places for me, as I won't go through 30000 lines of code by hand. > trouble. Of course this might not be true. But the question really > is: to what extent would it be more difficult to tell than what you > need to find out already in all the other situations where code > needs changing because of the incompatibilities numarray already What are those? In general, changes related to NumPy functions or attributes of array objects are relatively easy to deal with, as one can use a text editor to search for the name and thereby capture most locations (not all though). Changes related to generic operatinos that many other types share are the worst. > If the answer is "not much", then you would have to regard these I am not aware of any other incompatibility in the "worst" category. If there is one, I will probably never use Numarray. > > A further challenge for your code convertor: > > > > f(a[0], b[2:3], c[-1, 1]) > > > > That makes eight type combination cases. > > I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This c[-1,1] can't be a list, but it needn't be an array. Any class can implement multiple-dimension indexing. My netCDF array objects do, for example. > be even more inefficient. If I really had large amounts of code that needed > that conversion, I'd be tempted to write such a function with an additional > twist: have it monitor the input argument type whenever the program is run and I have large amounts of code that would need conversion. However, it is code that myself and about 100 other users rely on for their daily work, so it won't be the subject of empirical fixing of any kind. Either there will be an automatic procedure that is guaranteed to keep the code working, or there won't be any update. > just means more bugs and less clear and general code. But language > warts are more like tumours, they grow over the years and become > increasingly difficult to excise (just look what tremendous redesign I don't see any evidence for this in NumPy. > now...")). Among the incompatible changes that I would strongly assume *were* > documented before and after are: exceptions (strings -> classes), automatic String exceptions still work. I am not aware of any code that was broken by the fact that the standard exceptions are now classes. > conversion of ints to longs (instead of an exception) and the new division > rules whose stepwise introduction has already started. There are also quite a The division rules are the only case of serious incompatibilities I know of, and I am in fact against them; although I agree that the proposed new rules are much better. On the other hand, the proposed transition procedure provides much more help for updating code than we would get from Numarray. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From magnus at hetland.org Mon Jun 24 06:56:04 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Jun 24 06:56:04 2002 Subject: [Numpy-discussion] (K-Mean) Clustering Message-ID: <20020624155508.A15028@idi.ntnu.no> Hi! I've been looking for an implementation of k-means clustering in Python, and haven't really found anything I could use... I believe there is one in SciPy, but I'd rather keep the required number of packages as low as possible (already using Numeric/numarray), and Orange seems a bit hard to install in UNIX... So, I've fiddled with using Numeric/numarray for the purpose. Has anyone else done something like this (or some other clustering algorithm for that matter)? The approach I've been using (but am not completely finished with) is to use a two-dimensional multiarray for the data (i.e. a "set" of vectors) and a one-dimensional array with a cluster assignment for each vector. E.g. >>> data[42] array([1, 2, 3, 4, 5]) >>> cluster[42] 10 >>> reps[10] array([1, 2, 4, 5, 4]) Here reps is the representative of the cluster. Using argmin it should be relatively easy to assign each vector to the cluster with the closest representative (using sum((x-y)**2) as the distance measure), but how do I calculate the new representatives effectively? (The representative of a cluster, e.g., 10, should be the average of all vectors currently assigned to that cluster.) I could always use a loop and then compress() the data based on cluster number, but I'm looking for a way of calculating all the averages "simultaneously", to avoid using a Python loop... I'm sure there's a simple solution -- I just haven't been able to think of it yet. Any ideas? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From Aureli.Soria_Frisch at ipk.fhg.de Mon Jun 24 11:12:08 2002 From: Aureli.Soria_Frisch at ipk.fhg.de (Aureli Soria Frisch) Date: Mon Jun 24 11:12:08 2002 Subject: [Numpy-discussion] Numeric objects, os.spawnlp and pickle In-Reply-To: <20020621133705.A15296@idi.ntnu.no> References: <20020621133705.A15296@idi.ntnu.no> Message-ID: Hi all, I am trying to make run a numerical computation (with arrays) in different computers simultaneously (in parallel). The computation is done under Linux. For that purpose a master organizes the process and send rexec (remote execute) commands to the different slaves via the python command spawnlp. The slaves execute the script specified through rexec. Inside this script the slaves open a file with the arguments of the process, which were serialized via pickle, then make the numerical computation, and write the result (a NumPy array) again via pickle in a file. This file is opened by the master, which uses the different results. I am having the problem that the master sometimes (the problem does not happen always!!!) open the result and load an object of instead of the expected object of (what then produces an error). I have tested the type of the objects in the slaves and it is always 'array'. Has someone made similar experiences by 'pickling' arrays? Could it be a problem of the different computers running versions of Python from 2.0 to 2.2.1? Or a problem of different versions of NumPy? Is there any other way for doing such a parallel computation? Thanks for the time... Regards, Aureli -- ################################# Aureli Soria Frisch Fraunhofer IPK Dept. Pattern Recognition post: Pascalstr. 8-9, 10587 Berlin, Germany e-mail: aureli at ipk.fhg.de fon: +49 30 39006-143 fax: +49 30 3917517 web: http://vision.fhg.de/~aureli/web-aureli_en.html ################################# From tchur at optushome.com.au Mon Jun 24 12:15:03 2002 From: tchur at optushome.com.au (Tim Churches) Date: Mon Jun 24 12:15:03 2002 Subject: [Numpy-discussion] Numeric objects, os.spawnlp and pickle References: <20020621133705.A15296@idi.ntnu.no> Message-ID: <3D176B1A.B7F546FC@optushome.com.au> Aureli Soria Frisch wrote: > > Hi all, > > I am trying to make run a numerical computation (with arrays) in > different computers simultaneously (in parallel). The computation is > done under Linux. > > For that purpose a master organizes the process and send rexec > (remote execute) commands to the different slaves via the python > command spawnlp. The slaves execute the script specified through > rexec. > > Inside this script the slaves open a file with the arguments of the > process, which were serialized via pickle, then make the numerical > computation, and write the result (a NumPy array) again via pickle in > a file. This file is opened by the master, which uses the different > results. > > I am having the problem that the master sometimes (the problem does > not happen always!!!) open the result and load an object of 'instance'> instead of the expected object of (what > then produces an error). I have tested the type of the objects in the > slaves and it is always 'array'. > > Has someone made similar experiences by 'pickling' arrays? Could it > be a problem of the different computers running versions of Python > from 2.0 to 2.2.1? Or a problem of different versions of NumPy? > > Is there any other way for doing such a parallel computation? I am not sure what is causing the unpickling problem you are seeing, but I suggest that you consider MPI for what you are doing. There are a number of Python MPI interfaces around, but I can personally recommend PyPar by Ole Nielsen at the Australian National University. You can use PyPar with LAM/MPI, which runs in user mode and is very easy to install, and PyPar itself does not require any modifications to the Python interpreter. PyPar will automatically serialise Python objects for you (and deserialise them at the destination) but also has methods to send NumPy arrays directly which is very efficient. See http://datamining.anu.edu.au/~ole/pypar/ for more details. Tim C From a.schmolck at gmx.net Mon Jun 24 12:28:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Mon Jun 24 12:28:04 2002 Subject: [Numpy-discussion] Numeric objects, os.spawnlp and pickle In-Reply-To: References: <20020621133705.A15296@idi.ntnu.no> Message-ID: Aureli Soria Frisch writes: > Has someone made similar experiences by 'pickling' arrays? Could it be a > problem of the different computers running versions of Python from 2.0 to > 2.2.1? Or a problem of different versions of NumPy? Yes -- pickling isn't meant to work across different python versions (it might to some extent, but I wouldn't try it unless there is no way around it). Using netcdf as a data format instead of pickling might also be a solution (if intermediate storage on the disk is not too inefficient, but your original approach involved that anyway). Konrad Hinsen has written a nice wrapper for python that is quite easy to use: http://starship.python.net/crew/hinsen/scientific.html. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From ransom at physics.mcgill.ca Mon Jun 24 17:06:11 2002 From: ransom at physics.mcgill.ca (Scott Ransom) Date: Mon Jun 24 17:06:11 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> Message-ID: <20020625000529.GA20926@spock.physics.mcgill.ca> Hi Konrad, On Sun, Jun 23, 2002 at 10:20:35AM +0200, Konrad Hinsen wrote: > > be even more inefficient. If I really had large amounts of code that needed > > that conversion, I'd be tempted to write such a function with an additional > > twist: have it monitor the input argument type whenever the program is run and > > I have large amounts of code that would need conversion. However, it > is code that myself and about 100 other users rely on for their daily > work, so it won't be the subject of empirical fixing of any kind. > Either there will be an automatic procedure that is guaranteed to keep > the code working, or there won't be any update. I think you are painting an overly bleak picture -- and one that is certainly more black and white than reality. I am one of those 100 users and I would (will) certainly go through the code that I use on a daily basis (and the other code that I use less frequently) -- just as I have every time there is an update to the Python core or your code. Hell, some of those 30000 line of "your" code are actually _my_ code. And out of those 100 other users, I'd be willing to bet a beer or three that at least a couple would help to track down incompatibilities as well. Many (perhaps even most) of the problems will be able to be spotted by simply running the test codes provided with the individual modules. By generously releasing your code, you have made it possible for your code to become part of my -- and many others -- "standard library". And it is a part that I don't want to get rid of. I truly hope that this incompatibility (i.e. copy vs view) and the time that it will take to update older code will not cause many potentially beneficial (or at least requested) features/changes to be dropped. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From Janne.Sinkkonen at hut.fi Tue Jun 25 05:04:04 2002 From: Janne.Sinkkonen at hut.fi (Janne Sinkkonen) Date: Tue Jun 25 05:04:04 2002 Subject: [Numpy-discussion] (K-Mean) Clustering In-Reply-To: Magnus Lie Hetland's message of "Mon, 24 Jun 2002 15:55:08 +0200" References: <20020624155508.A15028@idi.ntnu.no> Message-ID: <2b7kkno99g.fsf@james.hut.fi> > Using argmin it should be relatively easy to assign each vector to the > cluster with the closest representative (using sum((x-y)**2) as the > distance measure), but how do I calculate the new representatives > effectively? (The representative of a cluster, e.g., 10, should be the > average of all vectors currently assigned to that cluster.) I could > always use a loop and then compress() the data based on cluster > number, but I'm looking for a way of calculating all the averages > "simultaneously", to avoid using a Python loop... I'm sure there's a > simple solution -- I just haven't been able to think of it yet. Any > ideas? Maybe this helps (old code, may contain some suboptimal or otherwise weird things): from Numeric import * from RandomArray import randint import sys def squared_distances(X,Y): return add.outer(sum(X*X,-1),sum(Y*Y,-1))- 2*dot(X,transpose(Y)) def kmeans(data,M, wegstein=0.2, r_convergence=0.001, epsilon=0.001, debug=0, minit=20): """Computes kmeans for DATA with M centers until convergence in the sense that relative change of the quantization error is less than the optional RCONV (3rd param). WEGSTEIN (2nd param), by default .2 but always between 0 and 1, stabilizes the convergence process. EPSILON is used to quarantee centers are initially all different. DEBUG causes some intermediate output to appear to stderr. Returns centers and the average (squared) quantization error. """ N,D=data.shape # Selecting the initial centers has to be done carefully. # We have to ensure all of them are different, otherwise the # algorithm below will produce empty classes. centers=[] if debug: sys.stderr.write("kmeans: Picking centers.\n") while len(centers)0: d=minimum.reduce(squared_distances(array(centers), candidate)) else: d=2*epsilon if d>epsilon: centers.append(candidate) if debug: sys.stderr.write("kmeans: Iterating.\n") centers=array(centers) qerror,old_qerror,counter=None,None,0 while (counterr_convergence): # Initialize # Not like this, you get doubles: centers=take(data,randint(0,N,(M,))) # Iterate: # Squared distances from data to centers (all pairs) distances=squared_distances(data,centers) # Matrix telling which data item is closest to which center x=equal.outer(argmin(distances), arange(centers.shape[0])).astype(Float32) # Compute new centers centers=( ( wegstein)*(dot(transpose(x),data)/sum(x)[...,NewAxis]) + (1.0-wegstein)*centers) # Quantization error old_qerror=qerror qerror=sum(minimum.reduce(distances,1))/N counter=counter+1 if debug: try: sys.stderr.write("%f %f %i\n" %(qerror,old_qerror,counter)) except TypeError: sys.stderr.write("%f None %i\n" %(qerror,counter)) return centers, qerror -- Janne From magnus at hetland.org Tue Jun 25 06:30:04 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Jun 25 06:30:04 2002 Subject: [Numpy-discussion] (K-Mean) Clustering In-Reply-To: <2b7kkno99g.fsf@james.hut.fi>; from Janne.Sinkkonen@hut.fi on Tue, Jun 25, 2002 at 03:03:39PM +0300 References: <20020624155508.A15028@idi.ntnu.no> <2b7kkno99g.fsf@james.hut.fi> Message-ID: <20020625152918.C1200@idi.ntnu.no> Janne Sinkkonen : > [snip] > > Maybe this helps (old code, may contain some suboptimal or otherwise > weird things): Thanks :) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From hinsen at cnrs-orleans.fr Tue Jun 25 06:43:04 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Tue Jun 25 06:43:04 2002 Subject: [Numpy-discussion] copy on demand In-Reply-To: <20020625000529.GA20926@spock.physics.mcgill.ca> (message from Scott Ransom on Mon, 24 Jun 2002 20:05:29 -0400) References: <200206201626.g5KGQSl20827@chinon.cnrs-orleans.fr> <200206230820.g5N8KZB31745@chinon.cnrs-orleans.fr> <20020625000529.GA20926@spock.physics.mcgill.ca> Message-ID: <200206251339.g5PDdkH04049@chinon.cnrs-orleans.fr> > that is certainly more black and white than reality. I am one > of those 100 users and I would (will) certainly go through the > code that I use on a daily basis (and the other code that I use I certainly appreciate any help, but this is not just a matter of amount of time, but also of risk, the risk of introducing bugs. The package that you are using, Scientific Python, is the lesser of my worries, as the individual parts are very independent. My other package, MMTK, is not only bigger, but also consists of many tightly coupled modules. Moreover, I am not aware of any user except for myself who knows the code well enough to be able to work on such an update project. Finally, this is not just my personal problem, there is lots of NumPy code out there, publically released or not, whose developers would face the same difficulties. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From travis at enthought.com Tue Jun 25 12:25:07 2002 From: travis at enthought.com (Travis N. Vaught) Date: Tue Jun 25 12:25:07 2002 Subject: [Numpy-discussion] [ANN] SciPy '02 - Python for Scientific Computing Workshop Message-ID: ---------------------------------------- Python for Scientific Computing Workshop ---------------------------------------- CalTech, Pasadena, CA Septemer 5-6, 2002 http://www.scipy.org/site_content/scipy02 This workshop provides a unique opportunity to learn and affect what is happening in the realm of scientific computing with Python. Attendees will have the opportunity to review the available tools and how they apply to specific problems. By providing a forum for developers to share their Python expertise with the wider industrial, academic, and research communities, this workshop will foster collaboration and facilitate the sharing of software components, techniques and a vision for high level language use in scientific computing. The two-day workshop will be a mix of invited talks and training sessions in the morning. The afternoons will be breakout sessions with the intent of getting standardization of tools and interfaces. The cost of the workshop is $50.00 and includes 2 breakfasts and 2 lunches on Sept. 5th and 6th, one dinner on Sept. 5th, and snacks during breaks. There is a limit of 50 attendees. Should we exceed the limit of 50 registrants, the 50 persons selected to attend will be invited individually by the organizers. Discussion about the conference may be directed to the SciPy-user mailing list: mailto:scipy-user at scipy.org http://www.scipy.org/MailList ------------- Co-Hosted By: ------------- The National Biomedical Computation Resource (NBCR, SDSC, San Diego, CA) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ http://nbcr.sdsc.edu The mission of the National Biomedical Computation Resource at the San Diego Supercomputer Center is to conduct, catalyze, and enable biomedical research by harnessing advanced computational technology. The Center for Advanced Computing Research (CACR, CalTech, Pasadena, CA) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ http://nbcr.sdsc.edu CACR is dedicated to the pursuit of excellence in the field of high-performance computing, communication, and data engineering. Major activities include carrying out large-scale scientific and engineering applications on parallel supercomputers and coordinating collaborative research projects on high-speed network technologies, distributed computing and database methodologies, and related topics. Our goal is to help further the state of the art in scientific computing. Enthought, Inc. (Austin, TX) ^^^^^^^^^^^^^^^ http://enthought.com Enthought, Inc. provides business and scientific computing solutions through software development, consulting and training. Enthought also fosters the development of SciPy (http://scipy.org), an open source library of scientific tools for Python. From magnus at hetland.org Tue Jun 25 14:01:03 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Jun 25 14:01:03 2002 Subject: [Numpy-discussion] Rephrasing the question... Message-ID: <20020625230038.A26576@idi.ntnu.no> Thanks for the input on k-means clustering, but the main questionw as actully this... If I have the following: for i in xrange(k): w[i] = average(compress(C == i, V, 0)) ... can that be expressed without the Python for loop? (I.e. without using compress etc.) I want w[i] to be the average of the vectors in V[x] for which C[x] == i... -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From frankpit at erols.com Wed Jun 26 05:06:05 2002 From: frankpit at erols.com (Bernard Frankpitt) Date: Wed Jun 26 05:06:05 2002 Subject: [Numpy-discussion] Copy/View data point References: Message-ID: <3D19BE44.2060001@erols.com> My preference would be Copy semantics for a=b View semantics for a=b.view (or some other explicit syntax) Bernie From a.schmolck at gmx.net Wed Jun 26 06:30:04 2002 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Wed Jun 26 06:30:04 2002 Subject: [Numpy-discussion] Copy/View data point In-Reply-To: <3D19BE44.2060001@erols.com> References: <3D19BE44.2060001@erols.com> Message-ID: Bernard Frankpitt writes: > My preference would be > > Copy semantics for a=b > View semantics for a=b.view (or some other explicit syntax) Although I have been arguing for copy semantics for a=b[c:d], what you want is not really possible (a=b creates and always will create an alias in python -- and this is really a good design decision; just compare it to other languages that do different things depending on what you are assigning). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck at gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/ From e.maryniak at pobox.com Wed Jun 26 09:34:04 2002 From: e.maryniak at pobox.com (Eric Maryniak) Date: Wed Jun 26 09:34:04 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) Message-ID: <200206261833.29702.e.maryniak@pobox.com> Dear crunchers, Please excuse me for dropping a feature request here as I'm new to the list and don't have the 'feel' of this list yet. Should feature requests be submitted to the bug tracker? Anyways, I installed Numarray on a SuSE/Linux box, following the Numarray PDF manual's directions. Having installed Python packages (like, ehm, Numeric) before, here are a few impressions: 1. When running 'python setup.py' and 'python setup.py --help' I was surprised to see that already source generation took place: Using EXTRA_COMPILE_ARGS = [] generating new version of Src/_convmodule.c ... generating new version of Src/_ufuncComplex64module.c Normally, you would expect that at build/install time. 2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like: /usr/local/bin/python ./setup.py install --prefix=/usr/local 3. After installation, I usually test the success of a library's import by looking at version info (especially with multiple installations, see [2]). However, numarray does not seem to have version info? : # python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.version '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' >>> sys.version_info (2, 2, 1, 'final', 0) >>> import Numeric >>> Numeric.__version__ '21.3' >>> import numarray >>> numarray.__version__ Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute '__version__' >>> numarray.version Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'version' The __doc__ string: 'numarray: The big enchilada numeric module\n\n $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' does not seem to give a hint at the version (i.c. 0.3.4), either. Well, enough nitpicking for now I guess. Thanks to the Numarray developers for this project, it's much appreciated. Bye-bye, Eric -- Eric Maryniak WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. An error in the premise will appear in the conclusion. From perry at stsci.edu Wed Jun 26 10:30:12 2002 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jun 26 10:30:12 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) In-Reply-To: <200206261833.29702.e.maryniak@pobox.com> Message-ID: Hi Eric, Todd Miller should answer these but he is away for a few days. > > 1. When running 'python setup.py' and 'python setup.py --help' > I was surprised to see that already source generation > took place: > > Using EXTRA_COMPILE_ARGS = [] > generating new version of Src/_convmodule.c > ... > generating new version of Src/_ufuncComplex64module.c > > Normally, you would expect that at build/install time. > Yes, it looks like it does the code generation regardless of the option. We should change that. > 2. Because I'm running two versions of Python (because Zope > and a lot of Zope/C products depend on a particular version) > the 'development' Python is installed in /usr/local/bin > (whereas SuSE's python is in /usr/bin). > It probably wouldn't do any harm if the manual would include > a hint at the '--prefix' option and mention an alternative > Python installation like: > > /usr/local/bin/python ./setup.py install --prefix=/usr/local > Good idea. > 3. After installation, I usually test the success of a library's > import by looking at version info (especially with multiple > installations, see [2]). However, numarray does not seem to > have version info? : > > # python > Python 2.2.1 (#1, Jun 25 2002, 20:45:02) > [GCC 2.95.3 20010315 (SuSE)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.version > '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' > >>> sys.version_info > (2, 2, 1, 'final', 0) > > >>> import Numeric > >>> Numeric.__version__ > '21.3' > > >>> import numarray > >>> numarray.__version__ > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute '__version__' > >>> numarray.version > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'version' > > The __doc__ string: > 'numarray: The big enchilada numeric module\n\n > $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' > does not seem to give a hint at the version (i.c. 0.3.4), either. > Well, I remember putting this on the to do list and thought it had been done, but obviously not. I'm sure Todd will take care of these. Thanks very much for the feedback. Perry From e.maryniak at pobox.com Wed Jun 26 11:48:01 2002 From: e.maryniak at pobox.com (Eric Maryniak) Date: Wed Jun 26 11:48:01 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) In-Reply-To: References: Message-ID: <200206262047.00731.e.maryniak@pobox.com> Hello Perry, On Wednesday 26 June 2002 19:29, Perry Greenfield wrote: > ... > > 2. Because I'm running two versions of Python (because Zope > > and a lot of Zope/C products depend on a particular version) > > the 'development' Python is installed in /usr/local/bin > > (whereas SuSE's python is in /usr/bin). > > It probably wouldn't do any harm if the manual would include > > a hint at the '--prefix' option and mention an alternative > > Python installation like: > > > > /usr/local/bin/python ./setup.py install --prefix=/usr/local > > Good idea. And perhaps another suggestion: no mention is made of the 'setupall.py' script... and setup.py does _not_ install the LinearAlgebra2 (including our favorite SVD ;-), Convolve, RandomArray2 and FFT2 packages. I successfully installed them with: python ./setupall.py install Other minor notes: #1: No FFT2.pth file is generated (the others are ok). It should just include the string 'FFT2'. #2: While RandomArray2 etc. nicely stay away from a concurrently imported Numeric.RandomArray, shouldn't Convolve, for orthogonality, be named Convolve2? (cuz who knows, numarray's Convolve may be backported to Numeric in the future, for comparative testing etc.). Of course in the end, when numarray is to replace Numeric, the '2' could be dropped altogether (breaking some programs then ;-) #3: LinearAlgebra2, RandomArray2 and Convolve have empty __doc__ 's. FFT and these 3 have no __version__ attributes, either (like numarray itself, too). Module sys uses a tuple 'version_info': >>> sys.version_info (2, 2, 1, 'final', 0) allowing fine-grained version testing and e.g. conditional importing etc. based on that. This may be a good idea for numarray, where interfaces may change and you could thus allow your code to support multiple (or rather, evolving) versions of numarray. Btw: imho __versioninfo__ or just __version__ would be a better standard attribute (for all modules) allowing a standard way of testing for major/minor version number, if __version__[0] >= 2: etc() Ideally, numarray's sub-packages' numbers would be in sync with that of numarray itself. Numeric's __version__ is a string, which is not so handy, either. #4: It is very helpful that there are a large number of self-tests of the packages, together with expected values. E.g.: Average of 10000 chi squared random numbers with 11 degrees of freedom (should be about 11 ): 11.0404176623 Variance of those random numbers (should be about 22 ): 21.6517761217 Skewness of those random numbers (should be about 0.852802865422 ): 0.718573002875 But sometimes you wonder (e.g. 0.85 / 0.71) if deviations are not too serious. Perhaps a 95%-int or std.dev. could be added? > >... > Thanks very much for the feedback. > > Perry You're welcome, they're just minor things one notices in the beginning and tends to ignore later; please say so if this kind of feedback should be postponed for later. Bye-bye, Eric -- Eric Maryniak WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. Puzzle: what's another word for synonym? From frankpit at erols.com Wed Jun 26 17:51:03 2002 From: frankpit at erols.com (Bernard Frankpitt) Date: Wed Jun 26 17:51:03 2002 Subject: [Numpy-discussion] Copy/View data point References: <3D19BE44.2060001@erols.com> Message-ID: <3D1A718E.6060300@erols.com> Bernard Frankpitt writes: >> My preference would be >> >> Copy semantics for a=b >> View semantics for a=b.view (or some other explicit syntax) > > And Alexander Schmolck Replies: > Although I have been arguing for copy semantics for a=b[c:d], what > you want is > not really possible (a=b creates and always will create an alias in > python -- Yes, you are right. In my haste I left out the slice notation Bernie From ndavis at spacedata.net Thu Jun 27 14:08:03 2002 From: ndavis at spacedata.net (Norman Davis) Date: Thu Jun 27 14:08:03 2002 Subject: [Numpy-discussion] How are non-contiguous arrays created? Message-ID: <5.1.0.14.0.20020627140032.030b16d0@spacedata.net> Hi All, In the "Copy on demand" discussion, the differences between ravel and flat were discussed with regards to contiguous/non-contiguous arrays. I want to experiment, but after looking/researching I can't figure it out: How is a non-contiguous array created? Thanks. Norman Davis Space Data Corporation From Chris.Barker at noaa.gov Thu Jun 27 15:07:04 2002 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jun 27 15:07:04 2002 Subject: [Numpy-discussion] How are non-contiguous arrays created? References: <5.1.0.14.0.20020627140032.030b16d0@spacedata.net> Message-ID: <3D1B8371.49905EA2@noaa.gov> Norman Davis wrote: > How is a > non-contiguous array created? By slicing an array. Since slicing created a "view" into the same data, it may not represent a contiguous portion of memory. Example: >>> from Numeric import * >>> a = ones((3,4)) >>> a array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]) >>> a.iscontiguous() 1 # a newly created array will always be contiguous >>> b = a[3:3,:] >>> b.iscontiguous() 1 # sliced this way, you get a contiguous array >>> c = a[:,3:3] >>> c.iscontiguous() 0 #but sliced another way you don't -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jmiller at stsci.edu Sun Jun 30 06:24:03 2002 From: jmiller at stsci.edu (Todd Miller) Date: Sun Jun 30 06:24:03 2002 Subject: [Numpy-discussion] Numarray: minor feature requests (setup.py and version info) References: Message-ID: <3D1F0839.2090802@stsci.edu> Perry Greenfield wrote: >Hi Eric, > >Todd Miller should answer these but he is away for a few days. > >>1. When running 'python setup.py' and 'python setup.py --help' >> I was surprised to see that already source generation >> took place: >> >>Using EXTRA_COMPILE_ARGS = [] >>generating new version of Src/_convmodule.c >>... >>generating new version of Src/_ufuncComplex64module.c >> >> Normally, you would expect that at build/install time. >> >Yes, it looks like it does the code generation regardless of >the option. We should change that. > I'll clean this up. > > >>2. Because I'm running two versions of Python (because Zope >> and a lot of Zope/C products depend on a particular version) >> the 'development' Python is installed in /usr/local/bin >> (whereas SuSE's python is in /usr/bin). >> It probably wouldn't do any harm if the manual would include >> a hint at the '--prefix' option and mention an alternative >> Python installation like: >> >> /usr/local/bin/python ./setup.py install --prefix=/usr/local >> >Good idea. > I'm actually surprised that this is necessary. I was under the impression that the distutils pick reasonable defaults simply based on the python that is running. In your case, I would expect numarray to install to /usr/local/lib/pythonX.Y/site-packages without specifying any prefix. What happens on SuSE? > > >>3. After installation, I usually test the success of a library's >> import by looking at version info (especially with multiple >> installations, see [2]). However, numarray does not seem to >> have version info? : >> > >># python >>Python 2.2.1 (#1, Jun 25 2002, 20:45:02) >>[GCC 2.95.3 20010315 (SuSE)] on linux2 >>Type "help", "copyright", "credits" or "license" for more information. >> >>>>>import sys >>>>>sys.version >>>>> >>'2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' >> >>>>>sys.version_info >>>>> >>(2, 2, 1, 'final', 0) >> >>>>>import Numeric >>>>>Numeric.__version__ >>>>> >>'21.3' >> In numarray, this is spelled: >>> import numinclude >>> numinclude.version '0.3.4' I'll add __version__ to numarray as a synonym. >> >>>>>import numarray >>>>>numarray.__version__ >>>>> >>Traceback (most recent call last): >> File "", line 1, in ? >>AttributeError: 'module' object has no attribute '__version__' >> >>>>>numarray.version >>>>> >>Traceback (most recent call last): >> File "", line 1, in ? >>AttributeError: 'module' object has no attribute 'version' >> >> The __doc__ string: >> 'numarray: The big enchilada numeric module\n\n >> $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' >> does not seem to give a hint at the version (i.c. 0.3.4), either. >> >Well, I remember putting this on the to do list and thought it >had been done, but obviously not. I'm sure Todd will take care >of these. > >Thanks very much for the feedback. > >Perry > Thanks again, Todd > > > >------------------------------------------------------- >This sf.net email is sponsored by: Jabber Inc. >Don't miss the IM event of the season | Special offer for OSDN members! >JabberConf 2002, Aug. 20-22, Keystone, CO http://www.jabberconf.com/osdn >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion >