From haase at msg.ucsf.edu Tue Mar 1 09:43:31 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Tue Mar 1 09:43:31 2005 Subject: [Numpy-discussion] bug in pyfits w/ numarray 1.2 Message-ID: <200503010942.41026.haase@msg.ucsf.edu> Hi, After upgrading to the latest numarray we get this error from pyfits: >>> a = U.loadFits(fn) Traceback (most recent call last): File "", line 1, in ? File "/jws30/haase/PrLin/Priithon/useful.py", line 1069, in loadFits return ff[ slot ].data File "/jws30/haase/PrLin/pyfits.py", line 1874, in __getattr__ raw_data = num.fromfile(self._file, type=code, shape=dims) File "/jws30/haase/PrLin0/numarray/numarraycore.py", line 517, in fromfile bytesleft=type.bytes*_gen.product(shape) AttributeError: 'str' object has no attribute 'bytes' >>>pyfits.__version__ '0.9.3 (June 30, 2004)' Looks like pyfits uses a typecode-string 'code' in this line 1874: raw_data = num.fromfile(self._file, type=code, shape=dims) I this supposed to still work in numarray ? Or should pyfits be updated ? I tried num.fromfile(self._file, typecode=code, shape=dims) but 'typecode' doesn't seem an allowed keyword for fromfile() Thanks, Sebastian Haase From cjw at sympatico.ca Tue Mar 1 11:10:17 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Mar 1 11:10:17 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 Message-ID: <4224BDB2.5010203@sympatico.ca> An HTML attachment was scrubbed... URL: From rkern at ucsd.edu Tue Mar 1 12:09:17 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Mar 1 12:09:17 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <4224BDB2.5010203@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> Message-ID: <4224CB47.6030802@ucsd.edu> Colin J. Williams wrote: > I suggest that Numeric3 offers the opportunity to drop the word /rank/ > from its lexicon. "rank" has an established usage long before digital > computers. See: http://mathworld.wolfram.com/Rank.html It also has a well-established usage with multi-arrays. http://mathworld.wolfram.com/TensorRank.html > Perhaps some abbreviation for "Dimensions" would be acceptable. It is also reasonable to say that array([1., 2., 3.]) has 3 dimensions. > Matrix Class > > " A default Matrix class will either inherit from or contain the Python > class". Surely, almost all of the objects above are to be rooted in > "new" style classes. See PEP's 252 and 253 or > http://www.python.org/2.2.2/descrintro.html Sure, but just because inheritance is possible does not entail that it is a good idea. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From konrad.hinsen at laposte.net Wed Mar 2 00:03:16 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Mar 2 00:03:16 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <4224BDB2.5010203@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> Message-ID: On 01.03.2005, at 20:08, Colin J. Williams wrote: > Basic Types > These are, presumably,? intended as the types of the data elements > contained in an Array instance.? I would see then as sub-types of > Array. Element types as subtypes??? > I wonder why there is a need for 30 new types.? Python itself has > about 30 distinct types.? Wouldn't it be more saleable to think in > terms of an Array The Python standard library has hundreds of types, considering that the difference between C types and classes is an implementation detail. > Suppose one has: > import numarray.numerictypes as _nt > > Then, the editor (PythonWin for example) responds to the entry of > "_nt." with a drop down menu offering the available types from which > the user can select one. That sounds interesting, but it looks like this would require specific support from the editor. > I suggest that Numeric3 offers the opportunity to drop the word rank > from its lexicon.? "rank" has an established usage long before digital > computers.? See: http://mathworld.wolfram.com/Rank.html The meaning of "tensor rank" comes very close and was probably the inspiration for the use of this terminology in array system. > Perhaps some abbreviation for "Dimensions" would be acceptable. The equivalent of "rank" is "number of dimensions", which is a bit long for my taste. > len() seems to be treated as a synonym for the number of dimensions.? > Currently, in numarray, it follows the usual sequence of sequences > approach of Python and returns the number of rows in a two dimensional > array. As it should. The rank is given by len(array.shape), which is pretty much a standard idiom in Numeric code. But I don't see any place in the PEP that proposes something different! > Rank-0 arrays and Python Scalars > > Regarding Rank-0 Question 2.? I've already, in effect, answered > "yes".? I'm sure that a more compelling "Pro" could be written Three "pro" argument to be added are: - No risk of user confusion by having two types that are nearly but not exactly the same and whose separate existence can only be explained by the history of Python and NumPy development. - No problems with code that does explicit typechecks (isinstance(x, float) or type(x) == types.FloatType). Although explicit typechecks are considered bad practice in general, there are a couple of valid reasons to use them. - No creation of a dependency on Numeric in pickle files (though this could also be done by a special case in the pickling code for arrays) > The "Con" case is valid but, I suggest, of no great consequence.? In > my view, the important considerations are (a) the complexity of > training the newcomer and (b) whether the added work should be imposed > on the generic code writer or the end user.? I suggest that the aim > should be to make things as easy as possible for the end user. That is indeed a valid argument. > Mapping Iterator > An example could help here.? I am puzzled by "slicing syntax does not > work in constructors.". Python allows the colon syntax only inside square brackets. x[a:b] and x[a:b:c] are fine but it is not possible to write iterator(a:b). One could use iterator[a:b] instead, but this is a bit confusing, as it is not the iterator that is being sliced. Konrad. From cjw at sympatico.ca Wed Mar 2 09:22:16 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Mar 2 09:22:16 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: References: <4224BDB2.5010203@sympatico.ca> Message-ID: <4225F634.1040305@sympatico.ca> konrad.hinsen at laposte.net wrote: > On 01.03.2005, at 20:08, Colin J. Williams wrote: > >> Basic Types >> These are, presumably, intended as the types of the data elements >> contained in an Array instance. I would see then as sub-types of Array. > > > Element types as subtypes??? Sub-types in the sense that, given an instance a of Array, a.elementType gives us the type of the data elements contained in a. > >> I wonder why there is a need for 30 new types. Python itself has >> about 30 distinct types. Wouldn't it be more saleable to think in >> terms of an Array > > > The Python standard library has hundreds of types, considering that > the difference between C types and classes is an implementation detail. > I was thinking of the objects in the types module. >> Suppose one has: >> import numarray.numerictypes as _nt >> >> Then, the editor (PythonWin for example) responds to the entry of >> "_nt." with a drop down menu offering the available types from which >> the user can select one. > > > That sounds interesting, but it looks like this would require specific > support from the editor. > Yes, it is built into Mark Hammond's PythonWin and is a valuable tool. Unfortunately, it is not available for Linux. However, I believe that SciTE and boa-constructor are intended to have the "completion" facility. These open source projects are available both with Linux and Windows. >> I suggest that Numeric3 offers the opportunity to drop the word rank >> from its lexicon. "rank" has an established usage long before >> digital computers. See: http://mathworld.wolfram.com/Rank.html > > > The meaning of "tensor rank" comes very close and was probably the > inspiration for the use of this terminology in array system. Yes: The total number of contravariant and covariant indices of a tensor . The rank of a tensor is independent of the number of dimensions of the space . I was thinking in terms of linear independence, as with Matrix Rank: The rank of a matrix or a linear map is the dimension of the range of the matrix or the linear map , corresponding to the number of linearly independent rows or columns of the matrix, or to the number of nonzero singular values of the map. I guess there has been a tussle between the tensor users and the matrix users for some time. > >> Perhaps some abbreviation for "Dimensions" would be acceptable. > > > The equivalent of "rank" is "number of dimensions", which is a bit > long for my taste. Perhaps nDim, numDim or dim would be acceptable. > >> len() seems to be treated as a synonym for the number of >> dimensions. Currently, in numarray, it follows the usual sequence of >> sequences approach of Python and returns the number of rows in a two >> dimensional array. > > > As it should. The rank is given by len(array.shape), which is pretty > much a standard idiom in Numeric code. But I don't see any place in > the PEP that proposes something different! This was probably my misreading of len(T). > >> Rank-0 arrays and Python Scalars >> >> Regarding Rank-0 Question 2. I've already, in effect, answered >> "yes". I'm sure that a more compelling "Pro" could be written > > > Three "pro" argument to be added are: > > - No risk of user confusion by having two types that are nearly but not > exactly the same and whose separate existence can only be explained > by the history of Python and NumPy development. Thanks, history has a pull in favour of retaining the current approach. > > - No problems with code that does explicit typechecks (isinstance(x, > float) > or type(x) == types.FloatType). Although explicit typechecks are > considered > bad practice in general, there are a couple of valid reasons to use > them. > I would see this as supporting the conversion to a scalar. For example: >>> type(type(x)) >>> isinstance(x, float) True >>> isinstance(x, types.FloatType) True >>> > - No creation of a dependency on Numeric in pickle files (though this > could > also be done by a special case in the pickling code for arrays) > >> The "Con" case is valid but, I suggest, of no great consequence. In >> my view, the important considerations are (a) the complexity of >> training the newcomer and (b) whether the added work should be >> imposed on the generic code writer or the end user. I suggest that >> the aim should be to make things as easy as possible for the end user. > > > That is indeed a valid argument. > >> Mapping Iterator >> An example could help here. I am puzzled by "slicing syntax does >> not work in constructors.". > > > Python allows the colon syntax only inside square brackets. x[a:b] and > x[a:b:c] are fine but it is not possible to write iterator(a:b). One > could use iterator[a:b] instead, but this is a bit confusing, as it is > not the iterator that is being sliced. Thanks. It would be nice if a:b or a:b:c could return a slice object. > > Konrad. > Colin W. From stephen.walton at csun.edu Wed Mar 2 09:26:27 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Mar 2 09:26:27 2005 Subject: [Numpy-discussion] bug in pyfits w/ numarray 1.2 In-Reply-To: <200503010942.41026.haase@msg.ucsf.edu> References: <200503010942.41026.haase@msg.ucsf.edu> Message-ID: <4225F6A1.4020901@csun.edu> Sebastian Haase wrote: >Hi, >After upgrading to the latest numarray we get this error from pyfits: > > >>>>a = U.loadFits(fn) >>>> >>>> >Traceback (most recent call last): > File "", line 1, in ? > File "/jws30/haase/PrLin/Priithon/useful.py", line 1069, in loadFits > return ff[ slot ].data > Are you sure the value of 'slot' and 'ff' in your code are correct. pyfits 0.9.3 and numarray 1.2.2 seem to work fine for me: In [5]: f=pyfits.open(file) In [6]: v=f[0].data In [7]: v? Type: NumArray Base Class: String Form: [[ 221 171 67 ..., 112 -136 12] [ 125 78 159 ..., 249 -345 -260] [ 346 47 250 ..., <...> ..., 206 -106 -127] [ 187 16 218 ..., 342 -243 -59] [ 156 200 279 ..., 138 -209 -230]] Namespace: Interactive Length: 1024 Docstring: Fundamental Numeric Array type The type of each data element, e.g. Int32 byteorder The actual ordering of bytes in buffer: "big" or "little". In [8]: pyfits.__version__ Out[8]: '0.9.3 (June 30, 2004)' In [9]: numarray.__version__ Out[9]: '1.2.2' From southey at uiuc.edu Wed Mar 2 12:15:24 2005 From: southey at uiuc.edu (Bruce Southey) Date: Wed Mar 2 12:15:24 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 Message-ID: <245dddb2.d6e3d971.8a87b00@expms6.cites.uiuc.edu> Hi, >>> I suggest that Numeric3 offers the opportunity to drop the word rank >>> from its lexicon. "rank" has an established usage long before >>> digital computers. See: http://mathworld.wolfram.com/Rank.html >> >> >> The meaning of "tensor rank" comes very close and was probably the >> inspiration for the use of this terminology in array system. > >Yes: The total number of contravariant > and covariant > indices of a tensor >. The rank of a tensor > is independent of the number >of dimensions of the space >. > >I was thinking in terms of linear independence, as with Matrix Rank: The >rank of a matrix or a linear >map is the dimension > of the range > of the matrix > or the linear map >, corresponding to the >number of linearly independent > rows or columns >of the matrix, or to the number of nonzero singular values > of the map. > >I guess there has been a tussle between the tensor users and the matrix >users for some time. > If you come from the linear algebra, rank is the column or row space which is not the current usage in numarray but this is the Matlab usage. The matrix rank doesn't exist in numarray (as such, but can be computed) so the only problem for is remembering what rank provides and avoiding it in numarray. >> >>> Perhaps some abbreviation for "Dimensions" would be acceptable. >> >> >> The equivalent of "rank" is "number of dimensions", which is a bit >> long for my taste. > >Perhaps nDim, numDim or dim would be acceptable. > There needs to be a clarification that by dimensions, one does not mean the number of rows and columns etc. However, taking directly from the numarray manual: "The rank of an array A is always equal to len(A.getshape())." So I would guess the best solution is to find out how people actually use the term 'rank' in Numerical Python applications. Regards Bruce From gc238 at cornell.edu Wed Mar 2 13:11:18 2005 From: gc238 at cornell.edu (Garnet Chan) Date: Wed Mar 2 13:11:18 2005 Subject: [Numpy-discussion] PyObject arrays Message-ID: <33471.128.253.229.184.1109797814.squirrel@128.253.229.184> Hi All, Do PyObject arrays works, more specifically Numeric arrays of Numeric arrays? I've tried: from Numeric import * mat = zeros([2, 2], PyObject) mat[0, 0] = zeros([2, 2]) which gives ValueError: array too large for destination. It seems to be calling PyArray_CopyObject; I noticed that there was some special code to make arrays of strings work, but not for other objects. This is on Python 2.3.4 and Numeric 23.3 thanks, Garnet Chan From oliphant at ee.byu.edu Wed Mar 2 14:20:28 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 2 14:20:28 2005 Subject: [Numpy-discussion] PyObject arrays In-Reply-To: <33471.128.253.229.184.1109797814.squirrel@128.253.229.184> References: <33471.128.253.229.184.1109797814.squirrel@128.253.229.184> Message-ID: <42263BDD.3010503@ee.byu.edu> Garnet Chan wrote: >Hi All, >Do PyObject arrays works, more specifically Numeric arrays of Numeric arrays? > > They probably don't work when the objects are Numeric arrays. It would be nice if they did, but this could take some effort. -Travis From Sebastien.deMentendeHorne at electrabel.com Wed Mar 2 15:24:11 2005 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Wed Mar 2 15:24:11 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-M ar-05 Message-ID: <035965348644D511A38C00508BF7EAEB145CB35D@seacex03.eib.electrabel.be> > It might be useful to have a Table type where there is a header of some sort to keep track, > for each column of the column name name and the datatype in that column, so that the user > could, optionally specify validity checks. Another useful type for arrays representing physical values would be an array that keeps vectors for each dimension with index values. For instance, an object representing temperature at a given time in a given location would consist in data = N x M array of Float64 = [ [ 23, 34, 23], [ 31, 28,29] ] first_axis = N array of time = [ "01/01/2004", "02/01/2004" ] second_axis = M array of location = [ "Paris", "New York" ] All slicing operation would equivalently slice the corresponding axis. Assignment between arrays would be axis coherent (assigning "Paris" in one array to "Paris" in another while putting NaN or 0 if there is no correspondance). If indexing could also be done via component of *_axis, it would be also useful. Several field of applications could benefit of this (econometrics, monte carlo simulation, physical simulation, time series,...). In fact most of real data consist usually of values for tuples of general indices (e.g. temparature@("01/01/2004","Paris")) Hmmm, I think I was just thinking aloud :-) ======================================================= This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. http://www.electrabel.be/homepage/general/disclaimer_EN.asp ======================================================= From konrad.hinsen at laposte.net Thu Mar 3 00:27:19 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 3 00:27:19 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-M ar-05 In-Reply-To: <035965348644D511A38C00508BF7EAEB145CB35D@seacex03.eib.electrabel.be> References: <035965348644D511A38C00508BF7EAEB145CB35D@seacex03.eib.electrabel.be> Message-ID: <6c12559b2562b43c7df9ae564df5443e@laposte.net> On 03.03.2005, at 00:23, Sebastien.deMentendeHorne at electrabel.com wrote: > Another useful type for arrays representing physical values would be an > array that keeps vectors for each dimension with index values. For > instance, > an object representing temperature at a given time in a given location > would > consist in > data = N x M array of Float64 = [ [ 23, 34, 23], [ 31, 28,29] ] > first_axis = N array of time = [ "01/01/2004", "02/01/2004" ] > second_axis = M array of location = [ "Paris", "New York" ] > > All slicing operation would equivalently slice the corresponding axis. That is indeed useful, but rather a class written using arrays than a variety of the basic array type. It's actually pretty straightforward to implement, the most difficult choice being the form of the constructor that gives most flexibility in use. Konrad. From konrad.hinsen at laposte.net Thu Mar 3 00:34:18 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 3 00:34:18 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <4225F634.1040305@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> Message-ID: <72b45bee60a00e5e61b9538359b98e59@laposte.net> On 02.03.2005, at 18:21, Colin J. Williams wrote: > Sub-types in the sense that, given an instance a of Array, > a.elementType gives us the type of the data elements contained in a. Ah, I see, it's just about how to access the type object. That's not my first worry in design questions. Once you can get the object somehow, you can make it accessible in nearly any way you like. >> The Python standard library has hundreds of types, considering that >> the difference between C types and classes is an implementation >> detail. >> > I was thinking of the objects in the types module. Those are just the built-in types. There are no plans to increase their number. > Yes, it is built into Mark Hammond's PythonWin and is a valuable tool. > Unfortunately, it is not available for Linux. However, I believe > that SciTE and boa-constructor are intended to have the "completion" > facility. These open source projects are available both with Linux > and Windows. The number of Python IDEs seems to be growing all the time - I haven't even heard of those. And I am still using Emacs... >> The equivalent of "rank" is "number of dimensions", which is a bit >> long for my taste. > > Perhaps nDim, numDim or dim would be acceptable. As a variable name, fine. As a pseudo-word in normal language, no. Not for me at least. I like sentences to use real, pronouncable words. >> - No problems with code that does explicit typechecks (isinstance(x, >> float) >> or type(x) == types.FloatType). Although explicit typechecks are >> considered >> bad practice in general, there are a couple of valid reasons to use >> them. >> > I would see this as supporting the conversion to a scalar. For > example: But technically it isn't, so some code would cease to work. > Thanks. It would be nice if a:b or a:b:c could return a slice object. That would be difficult to reconcile with Python syntax because of the use of colons in the block structure of the code. The parser (and the programmers' brains) would have to handle stuff like if slice == 1:: pass correctly. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From cjw at sympatico.ca Thu Mar 3 08:47:34 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 3 08:47:34 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <72b45bee60a00e5e61b9538359b98e59@laposte.net> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> Message-ID: <42273F71.9060005@sympatico.ca> konrad.hinsen at laposte.net wrote: > On 02.03.2005, at 18:21, Colin J. Williams wrote: > [snip] > >>> The Python standard library has hundreds of types, considering that >>> the difference between C types and classes is an implementation >>> detail. >>> >> I was thinking of the objects in the types module. > > > Those are just the built-in types. There are no plans to increase > their number. My understanding was that there was to be a new builtin multiarray/Array class/type which eventually would replace the existing array.ArrayType. Thus, for a time at least, there would be at least one new class/type. In addition, it seemed to be proposed that the new class/type would not just be Array but Array_with_Int32, Array_with_Float64 etc.. I'm not too clear on this latter point but Konrad says that there would not be this multiplicity of basic class/type's. > >> Yes, it is built into Mark Hammond's PythonWin and is a valuable >> tool. Unfortunately, it is not available for Linux. However, I >> believe that SciTE and boa-constructor are intended to have the >> "completion" facility. These open source projects are available >> both with Linux and Windows. > > > The number of Python IDEs seems to be growing all the time - I > haven't even heard of those. And I am still using Emacs... Having spent little time with Unices, I'm not familiar with emacs. Another useful facility with PythonWin is that when one enters a class, function or method, followed by "(", the docstring is presented. This is often helpful. Finally, the PythonWin debug facility provides useful context information. Suppose that f1 calls f2 which calls ... fn and that we have a breakpoint in fn, then the current values in each of these contexts is available in a PythonWin panel. > [snip] > >> Thanks. It would be nice if a:b or a:b:c could return a slice object. > > > That would be difficult to reconcile with Python syntax because of > the use of colons in the block structure of the code. The parser (and > the programmers' brains) would have to handle stuff like > > if slice == 1:: > pass > > correctly. > > Konrad. Yes, that it a problem which is not well resolved by requiring that a slice be terminated with a ")", "]", "}" or a space. One of the difficulties is that the slice is not recognized in the current syntax. We have a "slicing" which ties a slice with a primary, but no "slice". Your earlier suggestion that a slice be [a:b:c] is probably better. Then a slicing would be: primary slice which no doubt creates parsing problems. Thomas Wouters proposed a similar structure for a range in PEP204 (http://python.fyxm.net/peps/pep-0204.html), which was rejected. Colin W. From jmiller at stsci.edu Thu Mar 3 09:08:19 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Mar 3 09:08:19 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 Message-ID: <1109869619.19608.16.camel@halloween.stsci.edu> numarray-1.2.3 is a bugfix release for numarray-1.2.2 which fixes a problem with universal function setup caching which noticeably impaired 1.2.2 small array performance. Get it if you are new to numarray, haven't upgraded to 1.2.2 yet, or use a lot of small arrays. numarray-1.2.3 is here: http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=32367 Thanks to Ralf Juengling for quietly reporting this and working with me to identify and fix the problem. From konrad.hinsen at laposte.net Thu Mar 3 09:33:17 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 3 09:33:17 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue Message-ID: Following a bug report concerning ScientificPython with numarray, I noticed an incompatibility between Numeric and numarray, and I am wondering if this is intentional. In Numeric, the result of a comparison operation is an integer array. In numarray, it is a Bool array. Bool arrays seem to behave like Int8 arrays when arithmetic operations are applied. The net result is that print n.add.reduce(n.greater(n.arange(128), -1)) yields -128, which is not what I would expect. I can see two logically coherent points of views: 1) The Numeric view: comparisons yield integer arrays, which may be used freely in arithmetic. 2) The "logician's" view: comparisons yield arrays of boolean values, on which no arithmetic is allowed at all, only logical operations. The first approach is a lot more pragmatic, because there are a lot of useful idioms that use the result of comparisons in arithmetic, whereas an array of boolean values cannot be used for much else than logical operations. And now for my pragmatic question: can anyone come up with a solution that will work under both Numeric an numarray, won't introduce a speed penalty under Numeric, and won't leave the impression that the programmer had had too many beers? There is the quick hack print n.add.reduce(1*n.greater(n.arange(128), -1)) but it doesn't satisfy the last two criteria. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From haase at msg.ucsf.edu Thu Mar 3 09:49:21 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Mar 3 09:49:21 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <1109869619.19608.16.camel@halloween.stsci.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> Message-ID: <200503030948.17122.haase@msg.ucsf.edu> Hi, what is the cvs command to update to the exact same 1.2.3 version using cvs? Also I'm wondering if numarray.__version__ could be more informative about e.g. "1.2" vs. "1.2.2" vs. "1.2.3" ? (What does the 'a' stand for in na.__version__ == '1.2a' ? Does that mean I got it from CVS ? ) Thanks, Sebastian Haase On Thursday 03 March 2005 09:07, Todd Miller wrote: > numarray-1.2.3 is a bugfix release for numarray-1.2.2 which fixes a > problem with universal function setup caching which noticeably impaired > 1.2.2 small array performance. Get it if you are new to numarray, > haven't upgraded to 1.2.2 yet, or use a lot of small arrays. > > numarray-1.2.3 is here: > > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=32367 > > Thanks to Ralf Juengling for quietly reporting this and working with me > to identify and fix the problem. > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jmiller at stsci.edu Thu Mar 3 10:34:26 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Mar 3 10:34:26 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <200503030948.17122.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <200503030948.17122.haase@msg.ucsf.edu> Message-ID: <1109874753.19608.24.camel@halloween.stsci.edu> On Thu, 2005-03-03 at 12:48, Sebastian Haase wrote: > Hi, > what is the cvs command to update to the exact same 1.2.3 version using cvs? % cvs update -r v1_2_3 > Also I'm wondering if numarray.__version__ could be more informative about > e.g. "1.2" vs. "1.2.2" vs. "1.2.3" ? They're already OK I think, just like you're showing above. Do you want something else? > (What does the 'a' stand for in na.__version__ == '1.2a' ? Does that mean I > got it from CVS ? ) The 'a' in 1.2a stands for "optimism". It actually took 1.2, 1.2.1, 1.2.2 to get to 1.2.3. My original plan was 1.2a, pass go, 1.2... it just didn't work out that way. Regards, Todd > Thanks, > Sebastian Haase > > > > On Thursday 03 March 2005 09:07, Todd Miller wrote: > > numarray-1.2.3 is a bugfix release for numarray-1.2.2 which fixes a > > problem with universal function setup caching which noticeably impaired > > 1.2.2 small array performance. Get it if you are new to numarray, > > haven't upgraded to 1.2.2 yet, or use a lot of small arrays. > > > > numarray-1.2.3 is here: > > > > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=32367 > > > > Thanks to Ralf Juengling for quietly reporting this and working with me > > to identify and fix the problem. > > > > > > > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real users. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From haase at msg.ucsf.edu Thu Mar 3 11:14:26 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Mar 3 11:14:26 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <1109874753.19608.24.camel@halloween.stsci.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <200503030948.17122.haase@msg.ucsf.edu> <1109874753.19608.24.camel@halloween.stsci.edu> Message-ID: <200503031113.15222.haase@msg.ucsf.edu> On Thursday 03 March 2005 10:32, Todd Miller wrote: > On Thu, 2005-03-03 at 12:48, Sebastian Haase wrote: > > Hi, > > what is the cvs command to update to the exact same 1.2.3 version using > > cvs? > > % cvs update -r v1_2_3 I just did this - but comparing with the 1.2.3 from sourceforge I have some files, e.g. Examples/ufunc/Src/airy.h only in the CVS version !? Thanks, Sebastian Haase From jmiller at stsci.edu Thu Mar 3 11:40:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Mar 3 11:40:22 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <200503031113.15222.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <200503030948.17122.haase@msg.ucsf.edu> <1109874753.19608.24.camel@halloween.stsci.edu> <200503031113.15222.haase@msg.ucsf.edu> Message-ID: <1109878692.19608.97.camel@halloween.stsci.edu> On Thu, 2005-03-03 at 14:13, Sebastian Haase wrote: > On Thursday 03 March 2005 10:32, Todd Miller wrote: > > On Thu, 2005-03-03 at 12:48, Sebastian Haase wrote: > > > Hi, > > > what is the cvs command to update to the exact same 1.2.3 version using > > > cvs? > > > > % cvs update -r v1_2_3 > > I just did this - but comparing with the 1.2.3 from sourceforge I have some > files, e.g. > Examples/ufunc/Src/airy.h > only in the CVS version !? airy.h exists now for me on both the CVS head and 1.2.3. airy.h did not always exist throughout the entire pre-release lifespan of version 1.2 so if you did a checkout (cvs checkout numarray or cvs update numarray) and saw 1.2, there's no guarantee what the state of airy.h would have been. CVS versions just tend to be stale. I tag CVS and change the numarray version only when I do a tarball or semi-formal tests involving other people. Also note that CVS can be used with dates rather than version numbers or tags, so there is some recourse even when numarray.__version__ isn't telling the whole story. Regards, Todd From oliphant at ee.byu.edu Thu Mar 3 11:41:24 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 3 11:41:24 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <42273F71.9060005@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> <42273F71.9060005@sympatico.ca> Message-ID: <42276807.8020007@ee.byu.edu> Colin J. Williams wrote: > > My understanding was that there was to be a new builtin > multiarray/Array class/type which eventually would replace the > existing array.ArrayType. Thus, for a time at least, there would be > at least one new class/type. The new type will actually be in the standard library. For backwards compatibility we will not be replacing the existing array.ArrayType but providing an additional ndarray.ndarray (or some such name -- the name hasn't been finalized yet). > > In addition, it seemed to be proposed that the new class/type would > not just be Array but Array_with_Int32, Array_with_Float64 etc.. I'm > not too clear on this latter point but Konrad says that there would > not be this multiplicity of basic class/type's. The arrays have always been homogeneous collections of "something". This 'something' has been indicated by typecodes characters (Numeric) or Python classes (numarray). The proposal is that the "something" that identifies what the homogeneous arrays are collections of will be actual type objects. Some of these type objects are just "organizational types" which help to classify the different kinds of homogeneous arrays. The "leaf-node" types are also the types of new Python scalars that act as a transition layer between ndarrays with their variety of objects and traditional Python bool, int, float, complex, string, and unicode objects which do not "understand" that they could be considered as 0-dimensional arrays. -Travis From haase at msg.ucsf.edu Thu Mar 3 11:41:32 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Mar 3 11:41:32 2005 Subject: [Numpy-discussion] Re: ANN: numarray-1.2.3 -- segfault in in my C program In-Reply-To: <200503031113.15222.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <1109874753.19608.24.camel@halloween.stsci.edu> <200503031113.15222.haase@msg.ucsf.edu> Message-ID: <200503031140.21522.haase@msg.ucsf.edu> Hi, After upgrading from numarray 1.1 (now 1.2.3) We get a Segmentation fault in our C++ program on Linux (python2.2,gcc2.95) , gdb says this: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1087498336 (LWP 8279)] 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 (gdb) where #0 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 #1 0x410f905e in deferred_libnumarray_init () at Src/libnumarraymodule.c:149 #2 0x410f98a8 in NA_NewAllFromBuffer (ndim=3, shape=0xbffff2e4, type=tFloat32, bufferObject=0x8a03988, byteoffset=0, bytestride=0, byteorder=0, aligned=1, writeable=1) at Src/ libnumarraymodule.c:636 #3 0x0805b159 in MyApp::OnInit (this=0x8108f50) at omx_app.cpp:519 #4 0x4026f616 in wxEntry () from /jws30/haase/PrLin0/wxGtkLibs/ libwx_gtk-2.4.so #5 0x0805a91a in main (argc=1, argv=0xbffff414) at omx_app.cpp:247 To initialize libnumarray I was using this: { // import_libnumarray(); { PyObject *module = PyImport_ImportModule("numarray.libnumarray"); if (!module) Py_FatalError("Can't import module 'numarray.libnumarray'"); if (module != NULL) { PyObject *module_dict = PyModule_GetDict(module); PyObject *c_api_object = PyDict_GetItemString(module_dict, "_C_API"); if (PyCObject_Check(c_api_object)) { libnumarray_API = (void **)PyCObject_AsVoidPtr(c_api_object); } else { Py_FatalError("Can't get API for module 'numarray.libnumarray'"); } } } } Any idea ? Thanks, Sebastian Haase From cjw at sympatico.ca Thu Mar 3 12:14:19 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 3 12:14:19 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <42276807.8020007@ee.byu.edu> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> <42273F71.9060005@sympatico.ca> <42276807.8020007@ee.byu.edu> Message-ID: <42276FE5.5000303@sympatico.ca> Travis Oliphant wrote: > Colin J. Williams wrote: > >> >> My understanding was that there was to be a new builtin >> multiarray/Array class/type which eventually would replace the >> existing array.ArrayType. Thus, for a time at least, there would be >> at least one new class/type. > > > The new type will actually be in the standard library. For backwards > compatibility we will not be replacing the existing array.ArrayType > but providing an additional ndarray.ndarray (or some such name -- the > name hasn't been finalized yet). > >> >> In addition, it seemed to be proposed that the new class/type would >> not just be Array but Array_with_Int32, Array_with_Float64 etc.. I'm >> not too clear on this latter point but Konrad says that there would >> not be this multiplicity of basic class/type's. > > > The arrays have always been homogeneous collections of "something". > This 'something' has been indicated by typecodes characters (Numeric) > or Python classes (numarray). The proposal is that the "something" > that identifies what the homogeneous arrays are collections of will be > actual type objects. Some of these type objects are just > "organizational types" which help to classify the different kinds of > homogeneous arrays. The "leaf-node" types are also the types of new > Python scalars that act as a transition layer between ndarrays with > their variety of objects and traditional Python bool, int, float, > complex, string, and unicode objects which do not "understand" that > they could be considered as 0-dimensional arrays. > Thanks. This clarifies things. These 'somethingTypes' would presumably not be in the standard library but in some module like Numeric3.numerictypes. Colin W. From stephen.walton at csun.edu Thu Mar 3 17:00:05 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Mar 3 17:00:05 2005 Subject: [Numpy-discussion] bdist-rpm problem Message-ID: <4227B2AC.1080200@csun.edu> Hi, All, A week or so ago, I posted to matplotlib-users about a problem with bdist_rpm. I'd asked about python 2.3 on Fedora Core 1. It turns out there are two problems. One is that even if one has python2.3 and python2.2 installed, bdist_rpm always calls the interpreter named 'python', which is 2.2 on FC1. The other problem is that in bdist_rpm.py there is a set of lines near line 307 which tests if the number of generated RPM files is 1. This fails because all of matplotlib, numeric, numarray and scipy generate a debuginfo RPM when one does 'python setup.py bdist_rpm'. (Why the RPM count doesn't fail with Python 2.3 on FC3 is beyond me, but nevermind.) The patch is at http://opensvn.csie.org/pyvault/rpms/trunk/python23/python-2.3.4-distutils-bdist-rpm.patch and I have verified that after applying this patch to /usr/lib/python2.2/distutils/command/bdist_rpm.py on FC1 that 'python setup.py bdist_rpm' works for numarray 1.2.2, scipy current CVS, and matplotlib 0.72 (after changing setup.py for python2.2 as documented in the latter). It still fails with Numeric 23.6 however for reasons I'm still checking into; the failed "setup.py bdist_rpm" claims that arraytypes.c doesn't exist. Steve Walton From stephen.walton at csun.edu Thu Mar 3 17:02:42 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Mar 3 17:02:42 2005 Subject: [Numpy-discussion] Re: [SciPy-user] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: <4227B375.8000200@csun.edu> Stephen Walton wrote: > [bdist_rpm] still fails with Numeric 23.6 however for reasons I'm > still checking into; Posted too soon; this problem is fixed at Numeric 23.7. From pearu at scipy.org Thu Mar 3 17:05:26 2005 From: pearu at scipy.org (Pearu Peterson) Date: Thu Mar 3 17:05:26 2005 Subject: [Numpy-discussion] Re: [SciPy-user] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: On Thu, 3 Mar 2005, Stephen Walton wrote: > Hi, All, > > A week or so ago, I posted to matplotlib-users about a problem with > bdist_rpm. I'd asked about python 2.3 on Fedora Core 1. > > It turns out there are two problems. One is that even if one has python2.3 > and python2.2 installed, bdist_rpm always calls the interpreter named > 'python', which is 2.2 on FC1. Using `bdist_rpm --fix-python` should take care of this issue. Pearu From oliphant at ee.byu.edu Thu Mar 3 17:24:41 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 3 17:24:41 2005 Subject: [Numpy-discussion] CVS version of Numeric3 compiles again Message-ID: <4227B88C.1060400@ee.byu.edu> For any that tried to check out the CVS version of numeric3 while it was in a transition state of adding the new Python Scalar Objects, you can now try again and help me test it. The current CVS version of numeric3 builds on linux (there is some magic in the setup.py file to do some autoconfiguration stuff that I would like to see if it works on other platforms). The arrayobject is nearing completion. There are only a couple of things left to do before I can start tackling the ufuncobject (which is part-way transitioned but needs more numarray-inspired fixes). If anyone would like to help, now is a good time, since at least the codebase should compile and you can play with it. Best regards, -Travis From Fernando.Perez at colorado.edu Thu Mar 3 17:25:01 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Mar 3 17:25:01 2005 Subject: [Numpy-discussion] Re: [SciPy-user] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: <4227B86E.6070108@colorado.edu> Stephen Walton wrote: > Hi, All, > > A week or so ago, I posted to matplotlib-users about a problem with > bdist_rpm. I'd asked about python 2.3 on Fedora Core 1. > > It turns out there are two problems. One is that even if one has > python2.3 and python2.2 installed, bdist_rpm always calls the > interpreter named 'python', which is 2.2 on FC1. The other problem is You need to 'fix' the python version to be called inside the actual rpm build. From the ipython release script: # A 2.4-specific RPM, where we must use the --fix-python option to ensure that # the resulting RPM is really built with 2.4 (so things go to # lib/python2.4/...) python2.4 ./setup.py bdist_rpm --release=py24 --fix-python > that in bdist_rpm.py there is a set of lines near line 307 which tests > if the number of generated RPM files is 1. This fails because all of > matplotlib, numeric, numarray and scipy generate a debuginfo RPM when > one does 'python setup.py bdist_rpm'. (Why the RPM count doesn't fail > with Python 2.3 on FC3 is beyond me, but nevermind.) The patch is at This problem has been fixed in recent 2.3 and 2.4. 2.2 still has it. Best, f From nwagner at mecha.uni-stuttgart.de Fri Mar 4 00:40:27 2005 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Fri Mar 4 00:40:27 2005 Subject: [Numpy-discussion] PyTrilinos - Python interface to Trilinos libraries Message-ID: <42281E8C.60800@mecha.uni-stuttgart.de> Hi all, A new release of Trilinos is available. It includes a Python interface to Trilinos libraries. http://software.sandia.gov/trilinos/release_5.0_notes.html Regards, Nils From konrad.hinsen at laposte.net Fri Mar 4 02:10:21 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 4 02:10:21 2005 Subject: [Numpy-discussion] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: <034e251d7331d04a59b2c6785a094eb5@laposte.net> On Mar 4, 2005, at 1:58, Stephen Walton wrote: > It turns out there are two problems. One is that even if one has > python2.3 and python2.2 installed, bdist_rpm always calls the > interpreter named 'python', which is That can be changed with the option "--python". > 2.2 on FC1. The other problem is that in bdist_rpm.py there is a set > of lines near line 307 which tests if the number of generated RPM > files is 1. This fails because all of matplotlib, numeric, numarray > and scipy generate a debuginfo RPM when one does 'python setup.py > bdist_rpm'. (Why the RPM count doesn't fail with This is a common problem, but it is safe to ignore the error message, the RPMs are fine. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From konrad.hinsen at laposte.net Fri Mar 4 05:39:15 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 4 05:39:15 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <42273F71.9060005@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> <42273F71.9060005@sympatico.ca> Message-ID: <51fd185dec4aa52157a8d2257c895e7d@laposte.net> On Mar 3, 2005, at 17:46, Colin J. Williams wrote: >> Those are just the built-in types. There are no plans to increase >> their number. > > My understanding was that there was to be a new builtin > multiarray/Array class/type which eventually would replace the > existing array.ArrayType. Thus, for a time Neither the current array type nor the proposed multiarray type are builtin types. They are types defined in modules belonging to the standard library. > Yes, that it a problem which is not well resolved by requiring that a > slice be terminated with a ")", "]", "}" or a space. One of the > difficulties is that the slice is not recognized in the current > syntax. We have a "slicing" which ties a slice with a It is, but in the form of a standard constructor: slice(a, b, c). > Thomas Wouters proposed a similar structure for a range in PEP204 > (http://python.fyxm.net/peps/pep-0204.html), which was rejected. > We would probably face the same problem: a syntax change must matter to many people to have a chance of being accepted. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From cosbys at yahoo.com Fri Mar 4 06:53:49 2005 From: cosbys at yahoo.com (kristen kaasbjerg) Date: Fri Mar 4 06:53:49 2005 Subject: [Numpy-discussion] Problem with dashes and savefig in 0.72.1 In-Reply-To: 6667 Message-ID: <20050304144913.46367.qmail@web52903.mail.yahoo.com> 1) Running the dash_control.py example I get the following error message (the problem is present on both linux and windows installations): Traceback (most recent call last): File "/usr/lib/python2.3/lib-tk/Tkinter.py", line 1345, in __call__ return self.func(*args) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_tkagg.py", line 140, in resize self.show() File "/home/camp/s991416/lib/python/matplotlib/backends/backend_tkagg.py", line 143, in draw FigureCanvasAgg.draw(self) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_agg.py", line319, in draw self.figure.draw(self.renderer) File "/home/camp/s991416/lib/python/matplotlib/figure.py", line 338, in draw for a in self.axes: a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/axes.py", line 1296, in draw a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 283, in draw lineFunc(renderer, gc, xt, yt) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 543, in _draw_dashed renderer.draw_lines(gc, xt, yt, self._transform) TypeError: CXX: type error. 2) And when using savefig I get : Traceback (most recent call last): File "dash_control.py", line 13, in ? savefig('dash_control') File "/home/camp/s991416/lib/python/matplotlib/pylab.py", line 763, in savefig try: ret = fig.savefig(*args, **kwargs) File "/home/camp/s991416/lib/python/matplotlib/figure.py", line 455, in savefig self.canvas.print_figure(*args, **kwargs) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_tkagg.py", line 161, in print_figure agg.print_figure(filename, dpi, facecolor, edgecolor, orientation) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_agg.py", line370, in print_figure self.draw() File "/home/camp/s991416/lib/python/matplotlib/backends/backend_agg.py", line319, in draw self.figure.draw(self.renderer) File "/home/camp/s991416/lib/python/matplotlib/figure.py", line 338, in draw for a in self.axes: a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/axes.py", line 1296, in draw a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 283, in draw lineFunc(renderer, gc, xt, yt) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 543, in _draw_dashed renderer.draw_lines(gc, xt, yt, self._transform) __________________________________ Celebrate Yahoo!'s 10th Birthday! Yahoo! Netrospective: 100 Moments of the Web http://birthday.yahoo.com/netrospective/ From jmiller at stsci.edu Fri Mar 4 07:04:31 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Mar 4 07:04:31 2005 Subject: [Numpy-discussion] Re: ANN: numarray-1.2.3 -- segfault in in my C program In-Reply-To: <200503031140.21522.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <1109874753.19608.24.camel@halloween.stsci.edu> <200503031113.15222.haase@msg.ucsf.edu> <200503031140.21522.haase@msg.ucsf.edu> Message-ID: <1109948612.19608.210.camel@halloween.stsci.edu> >From what you're showing me, it looks like libnumarray initialization is failing which makes me suspect a corrupted numarray installation. Here are some things to try: 1. Completely delete your existing site-packages/numarray. Also delete numarray/build then re-install numarray. 2. Delete and re-install your extensions. In principle, numarray-1.2.3 is supposed to be binary compatible with numarray-1.1.1 but maybe I'm mistaken. 3. Hopefully you won't get this far but... a python which works well with gdb can be built from source using ./configure --with-pydebug. So a debug scenario is something like: % tar zxf Python-2.2.3.tar.gz % cd Python-2.2.3 % ./configure --with-pydebug --prefix=$HOME % make % make install % cd .. % tar zxf numarray-1.2.3.tar.gz % cd numarray-1.2.3 % python setup.py install % cd .. % tar zxf your_stuff.tar.gz % cd your_stuff % python setup.py install This makes a debug Python installed in $HOME/bin, $HOME/lib, and $HOME/include. This process is useful for compiling Python itself and extensions with "-g -O0" and hence gdb works better. Besides appropriate compiler switches, debug Python also has more robust object memory management and better tracked reference counting. Debug like this: % setenv PATH $HOME/bin:$PATH # export if you use bash % rehash % gdb python (gdb) run >>> (gdb) l , # to see some code (gdb) p (gdb) up # Move up the stack frame to see where the bogus value came from Regards, Todd On Thu, 2005-03-03 at 14:40, Sebastian Haase wrote: > Hi, > After upgrading from numarray 1.1 (now 1.2.3) > We get a Segmentation fault in our C++ program on Linux (python2.2,gcc2.95) , > gdb says this: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 1087498336 (LWP 8279)] > 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 > (gdb) where > #0 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 > #1 0x410f905e in deferred_libnumarray_init () at Src/libnumarraymodule.c:149 > #2 0x410f98a8 in NA_NewAllFromBuffer (ndim=3, shape=0xbffff2e4, > type=tFloat32, bufferObject=0x8a03988, byteoffset=0, > bytestride=0, byteorder=0, aligned=1, writeable=1) at Src/ > libnumarraymodule.c:636 > #3 0x0805b159 in MyApp::OnInit (this=0x8108f50) at omx_app.cpp:519 > #4 0x4026f616 in wxEntry () from /jws30/haase/PrLin0/wxGtkLibs/ > libwx_gtk-2.4.so > #5 0x0805a91a in main (argc=1, argv=0xbffff414) at omx_app.cpp:247 > > > To initialize libnumarray I was using this: > { > // import_libnumarray(); > { > PyObject *module = PyImport_ImportModule("numarray.libnumarray"); > if (!module) > Py_FatalError("Can't import module 'numarray.libnumarray'"); > if (module != NULL) { > PyObject *module_dict = PyModule_GetDict(module); > PyObject *c_api_object = > PyDict_GetItemString(module_dict, "_C_API"); > if (PyCObject_Check(c_api_object)) { > libnumarray_API = (void **)PyCObject_AsVoidPtr(c_api_object); > } else { > Py_FatalError("Can't get API for module 'numarray.libnumarray'"); > } > } > } > } > > Any idea ? > > Thanks, > Sebastian Haase > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From perry at stsci.edu Fri Mar 4 07:49:14 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Mar 4 07:49:14 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: On Mar 3, 2005, at 12:31 PM, konrad.hinsen at laposte.net wrote: > Following a bug report concerning ScientificPython with numarray, I > noticed an incompatibility between Numeric and numarray, and I am > wondering if this is intentional. > > In Numeric, the result of a comparison operation is an integer array. > In numarray, it is a Bool array. Bool arrays seem to behave like Int8 > arrays when arithmetic operations are applied. The net result is that > > print n.add.reduce(n.greater(n.arange(128), -1)) > > yields -128, which is not what I would expect. > > I can see two logically coherent points of views: > > 1) The Numeric view: comparisons yield integer arrays, which may be > used freely in arithmetic. > > 2) The "logician's" view: comparisons yield arrays of boolean values, > on which no arithmetic is allowed at all, only logical operations. > > The first approach is a lot more pragmatic, because there are a lot of > useful idioms that use the result of comparisons in arithmetic, > whereas an array of boolean values cannot be used for much else than > logical operations. > > And now for my pragmatic question: can anyone come up with a solution > that will work under both Numeric an numarray, won't introduce a speed > penalty under Numeric, and won't leave the impression that the > programmer had had too many beers? There is the quick hack > > print n.add.reduce(1*n.greater(n.arange(128), -1)) > > but it doesn't satisfy the last two criteria. First of all, isn't the current behavior a little similar to Python in that Python Booleans aren't pure either (for backward compatibility purposes)? I think this has come up in the past, and I thought that one possible solution was to automatically coerce all integer reductions and accumulations to Int32 to avoid overflow issues. That had been discussed before and apparently many preferred avoiding automatic promotion (the reductions allow specifying a new type for the reduction, but I don't believe that helps your specific example for code that works for both). Using .astype(Int32) should work for both, right? (or is that too much of a speed hit?) But it is a fair question to ask if arithmetic operations should be allowed on booleans without explicit casts. Perry From stephen.walton at csun.edu Fri Mar 4 09:29:18 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Fri Mar 4 09:29:18 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: <42289A75.8090907@csun.edu> Perry Greenfield wrote: > On Mar 3, 2005, at 12:31 PM, konrad.hinsen at laposte.net wrote: > >> print n.add.reduce(n.greater(n.arange(128), -1)) >> >> yields -128, which is not what I would expect. >> >> > > I think this has come up in the past, It has. I think I commented on it some time back, and the consensus was that, as Perry suggested, using .astype(Int32) is the best fix. I think the fact that arithmetic is allowed on booleans without casts is an oversight; standard Python 2.3 allows you to do True+False. Fortran would never let you do .TRUE.+.FALSE. :-) . From konrad.hinsen at laposte.net Fri Mar 4 10:25:43 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 4 10:25:43 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: On 04.03.2005, at 16:44, Perry Greenfield wrote: > First of all, isn't the current behavior a little similar to Python in > that Python Booleans aren't pure either (for backward compatibility > purposes)? Possibly, but the use of boolean scalars and boolean arrays is very different, so that's not necessarily the model to follow. > apparently many preferred avoiding automatic promotion (the reductions > allow specifying a new type for the reduction, but I don't believe > that helps your specific example for code that works for both). Using > .astype(Int32) Right, because it doesn't work with Numeric. > should work for both, right? (or is that too much of a speed hit?) > But it is a Yes, but it costs both time and memory. I am more worried about the memory, since this is one of the few operations that I do mostly with big arrays. Under Numeric, this doubles memory use, costs time, and makes no difference for the result. I am not sure that numarray compatibility is worth that much for me (OK, there is a dose of laziness in that argument as well). > fair question to ask if arithmetic operations should be allowed on > booleans without explicit casts. > What is actually the difference between Bool and Int8? On 04.03.2005, at 18:27, Stephen Walton wrote: > It has. I think I commented on it some time back, and the consensus > was that, as Perry suggested, using .astype(Int32) is the best fix. I > think the fact that arithmetic is allowed on booleans without casts is > an oversight; standard Python 2.3 allows you to do True+False. > Fortran would never let you do .TRUE.+.FALSE. :-) . I am in fact not convinced that adding booleans to Python was a very good idea, for exactly that reason: they try to be both booleans and compatible with integers. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From perry at stsci.edu Fri Mar 4 10:51:33 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Mar 4 10:51:33 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: On Mar 4, 2005, at 1:24 PM, konrad.hinsen at laposte.net wrote: > On 04.03.2005, at 16:44, Perry Greenfield wrote: > >> First of all, isn't the current behavior a little similar to Python >> in that Python Booleans aren't pure either (for backward >> compatibility purposes)? > > Possibly, but the use of boolean scalars and boolean arrays is very > different, so that's not necessarily the model to follow. > No, but that some people know that arithmetic can be done with Python Booleans may lead them to think the same should be possible with Boolean arrays (not that should be the sole criteria). >> for both). Using .astype(Int32) > > Right, because it doesn't work with Numeric. > >> should work for both, right? (or is that too much of a speed hit?) >> But it is a > > Yes, but it costs both time and memory. I am more worried about the > memory, since this is one of the few operations that I do mostly with > big arrays. Under Numeric, this doubles memory use, costs time, and > makes no difference for the result. I am not sure that numarray > compatibility is worth that much for me (OK, there is a dose of > laziness in that argument as well). > Hmmm, I'm a little confused here. If the overflow issue is what you are worried about, then use of Int8 for boolean results would still be a problem here. Since Numeric is already likely generating Int32 from logical ufuncs (Int actually), the use of astype(Int) is little different than many of the temporaries that Numeric creates in expressions. I find it hard to believe that this is a make or break issue for Numeric users since it typically generates more temporaries than does numarray. >> fair question to ask if arithmetic operations should be allowed on >> booleans without explicit casts. >> > What is actually the difference between Bool and Int8? > I'm not sure I remember all the differences (Todd can add to this if he remembers better). Booleans are treated differently as array indices than Int8 arrays are. The machinery of generating Boolean results is different in that it forces results to be either 0 or 1. In other words, Boolean arrays should only have 0 or 1 values in those bytes (not that it isn't possible for someone to break this in C code or though undiscovered bugs. Ufuncs that generate different values such as arithmetic operators result in a different type. Perry From jmiller at stsci.edu Fri Mar 4 11:44:40 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Mar 4 11:44:40 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: <1109965365.21215.324.camel@halloween.stsci.edu> On Fri, 2005-03-04 at 13:50, Perry Greenfield wrote: > On Mar 4, 2005, at 1:24 PM, konrad.hinsen at laposte.net wrote: > > > What is actually the difference between Bool and Int8? > > > I'm not sure I remember all the differences (Todd can add to this if he > remembers better). Booleans are treated differently as array indices > than Int8 arrays are. The machinery of generating Boolean results is > different in that it forces results to be either 0 or 1. Conversions to Bool, logical operations, and (implicitly) comparisons constrain values to 0 or 1. > In other > words, Boolean arrays should only have 0 or 1 values in those bytes > (not that it isn't possible for someone to break this in C code or > though undiscovered bugs. Ufuncs that generate different values such as > arithmetic operators result in a different type. More general arithmetic appears to have unconstrained results. From haase at msg.ucsf.edu Fri Mar 4 11:48:32 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Mar 4 11:48:32 2005 Subject: [Numpy-discussion] Re: ANN: numarray-1.2.3 -- segfault in in my C program In-Reply-To: <1109948612.19608.210.camel@halloween.stsci.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <200503031140.21522.haase@msg.ucsf.edu> <1109948612.19608.210.camel@halloween.stsci.edu> Message-ID: <200503041146.50434.haase@msg.ucsf.edu> On Friday 04 March 2005 07:03, Todd Miller wrote: >From what you're showing me, it looks like libnumarray initialization > > is failing which makes me suspect a corrupted numarray installation. It understood it saying it fails in MyApp::OnInit omx_app.cpp:519 while doing: NA_NewAllFromBuffer (ndim=3, shape=0xbffff2e4, type=tFloat32, bufferObject=0x8a03988, byteoffset=0, bytestride=0, byteorder=0, aligned=1, writeable=1) the "initialize libnumarray"-stuff is in the 20 lines above that. Do you use NA_NewAllFromBuffer anywhere ? Thanks, Sebastian Haase > Here are some things to try: > > 1. Completely delete your existing site-packages/numarray. Also delete > numarray/build then re-install numarray. > > 2. Delete and re-install your extensions. In principle, > numarray-1.2.3 is supposed to be binary compatible with numarray-1.1.1 > but maybe I'm mistaken. > > 3. Hopefully you won't get this far but... a python which works well > with gdb can be built from source using ./configure --with-pydebug. So > a debug scenario is something like: > > % tar zxf Python-2.2.3.tar.gz > % cd Python-2.2.3 > % ./configure --with-pydebug --prefix=$HOME > % make > % make install > > % cd .. > % tar zxf numarray-1.2.3.tar.gz > % cd numarray-1.2.3 > % python setup.py install > > % cd .. > % tar zxf your_stuff.tar.gz > % cd your_stuff > % python setup.py install > > This makes a debug Python installed in $HOME/bin, $HOME/lib, and > $HOME/include. This process is useful for compiling Python itself and > extensions with "-g -O0" and hence gdb works better. Besides > appropriate compiler switches, debug Python also has more robust object > memory management and better tracked reference counting. > > Debug like this: > > % setenv PATH $HOME/bin:$PATH # export if you use bash > % rehash > > % gdb python > (gdb) run > > >>> > > > (gdb) l , # to see some code > (gdb) p > (gdb) up # Move up the stack frame to see where the bogus value came > from > > Regards, > Todd > > On Thu, 2005-03-03 at 14:40, Sebastian Haase wrote: > > Hi, > > After upgrading from numarray 1.1 (now 1.2.3) > > We get a Segmentation fault in our C++ program on Linux > > (python2.2,gcc2.95) , gdb says this: > > > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 1087498336 (LWP 8279)] > > 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 > > (gdb) where > > #0 0x406d68d5 in PyObject_GetAttrString () from > > /usr/lib/libpython2.2.so.0.0 #1 0x410f905e in deferred_libnumarray_init > > () at Src/libnumarraymodule.c:149 #2 0x410f98a8 in NA_NewAllFromBuffer > > (ndim=3, shape=0xbffff2e4, type=tFloat32, bufferObject=0x8a03988, > > byteoffset=0, > > bytestride=0, byteorder=0, aligned=1, writeable=1) at Src/ > > libnumarraymodule.c:636 > > #3 0x0805b159 in MyApp::OnInit (this=0x8108f50) at omx_app.cpp:519 > > #4 0x4026f616 in wxEntry () from /jws30/haase/PrLin0/wxGtkLibs/ > > libwx_gtk-2.4.so > > #5 0x0805a91a in main (argc=1, argv=0xbffff414) at omx_app.cpp:247 > > > > > > To initialize libnumarray I was using this: > > { > > // import_libnumarray(); > > { > > PyObject *module = PyImport_ImportModule("numarray.libnumarray"); > > if (!module) > > Py_FatalError("Can't import module > > 'numarray.libnumarray'"); if (module != NULL) { > > PyObject *module_dict = PyModule_GetDict(module); > > PyObject *c_api_object = > > PyDict_GetItemString(module_dict, "_C_API"); > > if (PyCObject_Check(c_api_object)) { > > libnumarray_API = (void **)PyCObject_AsVoidPtr(c_api_object); > > } else { > > Py_FatalError("Can't get API for module > > 'numarray.libnumarray'"); } > > } > > } > > } > > > > Any idea ? > > > > Thanks, > > Sebastian Haase > > > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real users. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From konrad.hinsen at laposte.net Fri Mar 4 12:20:23 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 4 12:20:23 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: <9bf6ec57f1072de4cf3cb95052bda9ed@laposte.net> On 04.03.2005, at 19:50, Perry Greenfield wrote: > Hmmm, I'm a little confused here. If the overflow issue is what you > are worried about, then use of Int8 for boolean results would still be > a problem Yes. The question about the difference was just out of curiosity. > here. Since Numeric is already likely generating Int32 from logical > ufuncs (Int actually), the use of astype(Int) is little different than > many of the temporaries that Numeric creates in expressions. I find it > hard to believe It's the same, but it's one more. The only one is some of my large-array code, as I have carefully used the three-argument forms of the binary operators to avoid intermediate results. I can't do that for comparisons between float arrays. After some consideration, I think the best solution is a special "sum integer array" function in my Numeric/numarray adaptor module (the one that chooses which module to import). The numarray version can then use the type specifier in the reduction. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From juenglin at cs.pdx.edu Fri Mar 4 17:14:46 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Fri Mar 4 17:14:46 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: <9bf6ec57f1072de4cf3cb95052bda9ed@laposte.net> References: <9bf6ec57f1072de4cf3cb95052bda9ed@laposte.net> Message-ID: <1109985212.22526.46.camel@alpspitze.cs.pdx.edu> On Fri, 2005-03-04 at 12:18, konrad.hinsen at laposte.net wrote: > On 04.03.2005, at 19:50, Perry Greenfield wrote: > After some consideration, I think the best solution is a special "sum > integer array" function in my Numeric/numarray adaptor module (the one > that chooses which module to import). The numarray version can then use > the type specifier in the reduction. That ufunc.reduce takes an optional type specifier was news to me. Neither the manual nor the on-line help mentions it. ralf From cmeesters at ucdavis.edu Mon Mar 7 00:10:18 2005 From: cmeesters at ucdavis.edu (Christian Meesters) Date: Mon Mar 7 00:10:18 2005 Subject: [Numpy-discussion] Gaussian fits? Sum of Gaussians? Message-ID: <200503070808.j2788wbL029902@diometes.ucdavis.edu> Hi I was wondering whether there are scripts or modules around which can calculate, on a given 1D-Numarray or Numeric array, a sum of Gaussians? E.g. something like: x = sumofgaussians(input_array[, other_parameters]) where x would contain a list of arrays representing Gaussian curves, which, when added together, would result in (a good approximation of) the input_array. It would be nice, of course, if information like the standarddeviation and peak height would be associated with that data. Perhaps I am hoping for too much, but in this case you guys at least had a good laugh when reading these lines ;-) - and I'd had to write something myself or find it in other software ... Thanks a lot in advance, Christian From brendansimons at yahoo.ca Tue Mar 8 08:38:49 2005 From: brendansimons at yahoo.ca (Brendan Simons) Date: Tue Mar 8 08:38:49 2005 Subject: [Numpy-discussion] Re: Gaussian fits? Sum of Gaussians In-Reply-To: 6667 Message-ID: <20050308163749.15581.qmail@web31106.mail.mud.yahoo.com> I am not familiar enough with Stats to say exactly what you need, but you might try looking at the scipy statistics module: http://www.scipy.org/documentation/apidocs/scipy/scipy.stats.html If nothing else, you can use that to generate numeric arrays with gaussian (normal) distributions and add them together. -Brendan > Message: 1 > Date: Mon, 7 Mar 2005 00:08:58 -0800 (PST) > To: numpy-discussion at lists.sourceforge.net > From: "Christian Meesters" > Subject: [Numpy-discussion] Gaussian fits? Sum of > Gaussians? > > > Hi > > I was wondering whether there are scripts or modules > around which can calculate, on a given > 1D-Numarray or Numeric array, a sum of Gaussians? > E.g. something like: x = > sumofgaussians(input_array[, other_parameters]) > where x would contain a list of arrays > representing Gaussian curves, which, when added > together, would result in (a good > approximation of) the input_array. It would be nice, > of course, if information like the > standarddeviation and peak height would be > associated with that data. > > Perhaps I am hoping for too much, but in this case > you guys at least had a good laugh when > reading these lines ;-) - and I'd had to write > something myself or find it in other software ... > > Thanks a lot in advance, > Christian > > > > --__--__-- > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > End of Numpy-discussion Digest > ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca From konrad.hinsen at laposte.net Tue Mar 8 09:53:31 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Tue Mar 8 09:53:31 2005 Subject: [Numpy-discussion] Gaussian fits? Sum of Gaussians? In-Reply-To: <200503070808.j2788wbL029902@diometes.ucdavis.edu> References: <200503070808.j2788wbL029902@diometes.ucdavis.edu> Message-ID: <02ff5d028fe5a68d4bbe8f5f31cf33e8@laposte.net> On Mar 7, 2005, at 9:08, Christian Meesters wrote: > I was wondering whether there are scripts or modules around which can > calculate, on a given 1D-Numarray or Numeric array, a sum of > Gaussians? E.g. something That doesn't look like a well-defined problem. At the very least, you will have to provide the code with the number of Gaussians that you want to fit. But even then, unless your data has particular properties, this is likely to result in an ill-defined fit problem. While the sum of several Gaussians is not a Gaussian itself, it can look very very similar, depending on the parameter combinations. I don't think that the kind of black-box function you are looking for exists, and I think that in the long run this is good for you. There is code for the tough part of the task, nonlinear curve fitting (both in my ScientificPython library and in SciPy, with different strong and weak points). If you make the effort to formulate your problem in terms that such routines can handle, you can be reasonably sure that you have understood your problem and the solution approach, i.e. you know what you are doing. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From cjw at sympatico.ca Tue Mar 8 14:59:09 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Mar 8 14:59:09 2005 Subject: [Numpy-discussion] Announcement: PyMatrix-0.0.1a Released Message-ID: <422E2E28.9050009@sympatico.ca> PyMatrix is a package to provide access to the functionality of matrix algebra. This package is currently based on numarray. It includes a statistics module which includes a basic analysis of variance. In the future it is hoped to enhance the generality of the divide operation, to add the transcendental functions as methods of the matrix class and to improve the documentation. The expectation is that Numeric3 will eventually replace numarray and that this will necessitate some changes to PyMatrix Downloads in the form of a Windows Installer (Inno) and a zip file are available at: http://www3.sympatico.ca/cjw/PyMatrix An /Introduction to PyMatrix/ is available: http://www3.sympatico.ca/cjw/PyMatrix/IntroToPyMatrix.pdf Information on the functions and methods of the matrix module is given at: http://www3.sympatico.ca/cjw/PyMatrix/Doc/matrix-summary.html Colin W. From oliphant at ee.byu.edu Tue Mar 8 23:34:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 8 23:34:07 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley Message-ID: <422EA691.9080404@ee.byu.edu> I wanted to send an update to this list regarding the meeting at Berkeley that I attended. A lot of good disscussions took place at the meeting that should stimulate larger feedback. Personally, I had far more to discuss before I had to leave, and so I hope that the discussions can continue. I was looking to try and understand why with an increasing number of Scientific users of Python, relatively few people actually seem to want to contribute to scipy, regularly, even becoming active developers. There are lots of people who seem to identify problems (though very often vague ones), but not many who seem able (either through time or interest contraints) to actually contribute to code, documentation, or infrastructure. Scipy is an Open source project and relies on the self-selection process of open source contributors. It would seem that while the scipy conference demonstrates a continuing and even increasing use of Python for scientific computing, not as many of these users are scipy devotees. Why? I think the answers come down to a few issues which I will attempt to answer with proposals. 1) Plotting -- scipy's plotting wasn't good enough (we knew that) and the promised solution (chaco) took too long to emerge as a simple replacement. While the elements were all there for chaco to work, very few people knew that and nobody stepped up to take chaco to the level that matplotlib, for example, has reached in terms of cross-gui applicability and user-interface usability. Proposal: Incorporate matplotlib as part of the scipy framework (replacing plt). Chaco is not there anymore and the other two plotting solutions could stay as backward compatible but not progressing solutions. I have not talked to John about this, though I would like to. I think if some other packaging issues are addressed we might be able to get John to agree. 2) Installation problems -- I'm not completely clear on what the "installation problems" really are. I hear people talk about them, but Pearu has made significant strides to improve installation, so I'm not sure what precise issues remain. Yes, installing ATLAS can be a pain, but scipy doesn't require it. Yes, fortran support can be a pain, but if you use g77 then it isn't a big deal. The reality, though, is that there is this perception of installation trouble and it must be based on something. Let's find out what it is. Please speak up users of the world!!!! Proposal (just an idea to start discussion): Subdivide scipy into several super packages that install cleanly but can also be installed separately. Implement a CPAN-or-yum-like repository and query system for installing scientific packages. Base package: scipy_core -- this super package should be easy to install (no Fortran) and should essentially be old Numeric. It was discussed at Berkeley, that very likely Numeric3 should just be included here. I think this package should also include plotting, weave, scipy_distutils, and even f2py. Some of these could live in dual namespaces (i.e. both weave and scipy.weave are available on install). scipy.traits scipy.weave (weave) scipy.plt (matplotlib) scipy.numeric (Numeric3 -- uses atlas when installed later) scipy.f2py scipy.distutils scipy.fft scipy.linalg? (include something like lapack-lite for basic but slow functionality, installation of improved package replaces this with atlas usage) scipy.stats scipy.util (everything else currently in scipy_core) scipy.testing (testing facilities) Each of these should be a separate package installable and distributable separately (though there may be co-dependencies so that scipy.plt would have to be distributed with scipy. Libraries (each separately installable). scipy.lib -- there should be several sub-packages that could live under hear. This is simply raw code with basic wrappers (kind of like a /usr/lib) scipy.lib.lapack -- installation also updates narray and linalg (hooks to do that) scipy.lib.blas -- installation updates narray and linalg scipy.lib.quadpack etc... Extra sub-packages: named in a hierarchy to be determined and probably each dependent on a variety of scipy-sub-packages. I haven't fleshed this thing out yet as you can tell. I'm mainly talking publicly to spur discussion. The basic idea is that we should force ourselves to distribute scipy in separate packages. This would force us to implement a yum-or-CPAN-like package repository, so that we define the interface as to how an additional module could be developed by someone, even maintained separately (with a different license), and simply inserted into an intelligent point under the scipy infrastructure. It also allow installation/compilation issues to be handled on a more per-module basis so that difficult ones could be noted. I think this would also help interested people get some of the enthought stuff put in to the scipy hierarchy as well. Thoughts and comments (and even half-working code) welcomed and encouraged... -Travis O. From mdehoon at ims.u-tokyo.ac.jp Wed Mar 9 00:33:07 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 9 00:33:07 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: <422EB48F.30808@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > It would seem that while the scipy conference demonstrates a continuing > and even increasing use of Python for scientific computing, not as many > of these users are scipy devotees. Why? > > I think the answers come down to a few issues which I will attempt to > answer with proposals. > > 1) Plotting While plotting is important, I don't think that SciPy needs to offer plotting capabilities in order to become successful. Numerical Python doesn't include plotting, and it's hugely popular. I would think that installing Scipy-lite + (selection of SciPy-lib sub-packages) + (your favorite plotting package) separately is acceptable. > 2) Installation problems This is the real problem. I'm one of the maintainers of Biopython (python and C code for computational biology), which relies on Numerical Python. Now that Numerical Python is not being actively maintained, I'd love to be able to direct our users to SciPy instead. But as long as SciPy doesn't install out of the box with a python setup.py install, it's not viable as a replacement for Numerical Python. I'd spend the whole day dealing with installation problems from Biopython users. There are three other reasons why I have not become a SciPy devotee, although I use Python for scientific computing all the time: 3) Numerical Python already does the job very well. There are few packages in SciPy that I actually need. Special functions would be nice, but it's easier to write your own module than to install SciPy. 4) SciPy looks bloated. It seems to try to do too many things, so that it becomes impossible to maintain SciPy well. 5) Uncertain future. With Numerical Python, we know what we get. I don't know what SciPy will look like in a few years (numarray? Numeric3? Numeric2?) and if it still has a trouble-free installation. So it's too risky for Biopython to go over to SciPy. It's really unfortunate, because my impression is that the SciPy developers are smart people who write good code, which currently is not used as much as it could because of these problems. I hope my comments will be helpful. --Michiel. From mdehoon at ims.u-tokyo.ac.jp Wed Mar 9 00:52:29 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 9 00:52:29 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: <422EB917.4090402@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > 1) Plotting -- scipy's plotting wasn't good enough (we knew that) and > the promised solution (chaco) took too long to emerge as a simple > replacement. While the elements were all there for chaco to work, very > few people knew that and nobody stepped up to take chaco to the level > that matplotlib, for example, has reached in terms of cross-gui > applicability and user-interface usability. > I actually looked at Chaco before I started working on pygist (which is now also included in SciPy, I think). My impression was that Chaco was under active development by enthought, and that they were not looking for developers to join in. When Chaco didn't come through, I tried several plotting packages for python that were around at the time, some of which were farther along than Chaco. In the end, I decided to work on pygist instead because it was already working (on unix/linux, at least) and seemed to be a better starting point for a cross-platform plotting package, which pygist is today. The other point is that different plotting packages have different advantages and disadvantages, so you may not be able to find a plotting package that suits everybody's needs. --Michiel. From mdehoon at ims.u-tokyo.ac.jp Wed Mar 9 01:07:44 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 9 01:07:44 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: <422EBC9C.1060503@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > Proposal (just an idea to start discussion): > > Subdivide scipy into several super packages that install cleanly but can > also be installed separately. Implement a CPAN-or-yum-like repository > and query system for installing scientific packages. Yes! If SciPy could become a kind of scientific CPAN for python from which users can download the packages they need, it would be a real improvement. In the end, the meaning of SciPy evolve into "the website where you can download scientific packages for python" rather than "a python package for scientific computing", and the SciPy developers might not feel OK with that. > Base package: > > scipy_core -- this super package should be easy to install (no Fortran) > and should essentially be old Numeric. It was discussed at Berkeley, > that very likely Numeric3 should just be included here. +1. > I think this > package should also include plotting, weave, scipy_distutils, and even > f2py. I think you are underestimating the complexity of plotting software. Matplotlib relies on a number of other packages, which breaks the "easy to install" rule. Pygist doesn't rely on other packages, but (being the pygist maintainer) I know that in practice users can still run into trouble installing pygist (it's a little bit harder than installing Numerical Python). And if you do include pygist with scipy_core anyway, you may find out that some users want matplotlib after all. Since both pygist and matplotlib exist as separate packages, it's better to leave them out of scipy_core, I'd say. --Michiel. From pearu at cens.ioc.ee Wed Mar 9 01:51:13 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Wed Mar 9 01:51:13 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EBC9C.1060503@ims.u-tokyo.ac.jp> Message-ID: On Wed, 9 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > > Proposal (just an idea to start discussion): > > > > Subdivide scipy into several super packages that install cleanly but can > > also be installed separately. Implement a CPAN-or-yum-like repository > > and query system for installing scientific packages. > > Yes! If SciPy could become a kind of scientific CPAN for python from > which users can download the packages they need, it would be a real > improvement. In the end, the meaning of SciPy evolve into "the website > where you can download scientific packages for python" rather than "a > python package for scientific computing", and the SciPy developers might > not feel OK with that. Personally, I would be OK with that. SciPy as a "download site" does not exclude it to provide also a "scipy package" as it is now. I am all in favore of refactoring current scipy modules as much as possible. Pearu From konrad.hinsen at laposte.net Wed Mar 9 02:00:28 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Mar 9 02:00:28 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EB48F.30808@ims.u-tokyo.ac.jp> References: <422EA691.9080404@ee.byu.edu> <422EB48F.30808@ims.u-tokyo.ac.jp> Message-ID: <20d1eba7b3e99120c39fa25ad3ae0aa9@laposte.net> On Mar 9, 2005, at 9:32, Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: >> It would seem that while the scipy conference demonstrates a >> continuing and even increasing use of Python for scientific >> computing, not as many of these users are scipy devotees. Why? >> I think the answers come down to a few issues which I will attempt to >> answer with proposals. >> 1) Plotting > While plotting is important, I don't think that SciPy needs to offer > plotting capabilities in order to become successful. Numerical Python > doesn't include plotting, and it's ... Thanks for your three comments, they reflect exactly my views as well, so I'll just add a "+1" to them. There is only one aspect I would like to add: predictibility of development. Python has become my #1 tool in my everyday research over the last years. I haven't done any scientific computation for at least five years that did not involve some Python code. Which means that I am very much dependent on Python and some Python packages. Moreover, I publish computational methods that I develop in the form of Python code that is used by a community large enough to make support an important consideration. There are only two kinds of computational tools on which I can accept being dependent: those that are supported by a sufficiently big and stable community that I don't need to worry about their disappearence or sudden mutation into something different, and those small enough that I can maintain them in usable state myself if necessary. Python is in the first category, Numeric in the second. SciPy is not in either one. The proposed division of SciPy into separately installable maintainable subpackages could make a big difference there. The core could actually be both easily maintainable and supported by a big enough community. So I am all for it, and I expect to contribute to such a loser package collection as well. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From konrad.hinsen at laposte.net Wed Mar 9 02:07:30 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Mar 9 02:07:30 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: On Mar 9, 2005, at 8:32, Travis Oliphant wrote: > 2) Installation problems -- I'm not completely clear on what the > "installation problems" really are. I hear people talk about them, > but Pearu has made significant strides to improve installation, so I'm > not sure what precise issues remain. Yes, installing ATLAS can be a > pain, but scipy doesn't require it. Yes, fortran support can be a > pain, but if you use g77 then it isn't a big deal. The reality, > though, is that there is this perception of installation trouble and > it must be based on something. Let's find out what it is. Please > speak up users of the world!!!! One more comment on this: Ease of installation depends a lot on the technical expertise of the people doing it. If you see SciPy as a package aimed at computational scientists and engineers, then you can indeed expect them to be able to handle some difficulties (though that doesn't mean that thery are willing to if the quantity of trouble is too high). But for me, scientific Python packages are not only modules used by me in my own scripts, but also building blocks in the assembly of end-user applications aimed at non-experts in computation. For example, my DomainFinder tool (http://dirac.cnrs-orleans.fr/DomainFinder), is used mostly by structural biologists. Most people in that community don't even know that a compiler is, so how can I expect them to install g77? Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From verveer at embl-heidelberg.de Wed Mar 9 03:00:44 2005 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Wed Mar 9 03:00:44 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: <6c9c1490f05fc0812a640a1897574857@embl-heidelberg.de> > Proposal (just an idea to start discussion): > > Subdivide scipy into several super packages that install cleanly but > can also be installed separately. Implement a CPAN-or-yum-like > repository and query system for installing scientific packages. +1, I would be far more inclined to contribute if we could agree on such a structure. > Extra sub-packages: named in a hierarchy to be determined and probably > each dependent on a variety of scipy-sub-packages. > > I haven't fleshed this thing out yet as you can tell. I'm mainly > talking publicly to spur discussion. The basic idea is that we should > force ourselves to distribute scipy in separate packages. This would > force us to implement a yum-or-CPAN-like package repository, so that > we define the interface as to how an additional module could be > developed by someone, even maintained separately (with a different > license), and simply inserted into an intelligent point under the > scipy infrastructure. Two comments: 1) We should consider the issue of licenses. For instance: the python wrappers for GSL and FFTW probably need to be GPL-licensed. These packages definitely need to be part of a repository. There needs to be some kind of a category for such packages, as their license is more restrictive. 2) If there is going to be a repository structure it should provide for packages that can be installed independently of a scipy hierarchy. Packages that only require a dependency on the Numeric core should not require scipy_core. That makes sense if Numeric3 ever gets into the core Python. Such packages could (and probably should) also live in a dual scipy namespace. Peter From prabhu_r at users.sf.net Wed Mar 9 03:25:30 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Wed Mar 9 03:25:30 2005 Subject: [Numpy-discussion] Re: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: <16942.56570.375971.565270@monster.linux.in> Hi Travis, >>>>> "TO" == Travis Oliphant writes: TO> I was looking to try and understand why with an increasing TO> number of Scientific users of Python, relatively few people TO> actually seem to want to contribute to scipy, regularly, even TO> becoming active developers. There are lots of people who seem TO> to identify problems (though very often vague ones), but not TO> many who seem able (either through time or interest TO> contraints) to actually contribute to code, documentation, or TO> infrastructure. I think there are two issues here. 1. Finding developers. Unfortunately, I'm as clueless as anyone else. It looks to me that most folks who are capable of contributing are already occupied with other projects. The rest use scipy and are quite happy with it (except for the occasional problem). Others are either heavily invested in other solutions, or don't have the skill or time to contribute. I also think that there are a fair number of users who use scipy at some level or another but are quiet about it and don't have a chance to contribute. From what I can tell, the intersection of the set of people who possess good computing skills and also persue numerical work from Python is still a very small number compared to other fields. 2. Packaging issues. More on this later. [...] TO> I think the answers come down to a few issues which I will TO> attempt to answer with proposals. TO> 1) Plotting -- scipy's plotting wasn't good enough (we knew I am not sure what this has to do with scipy's utility? Do you mean to say that you'd like to have people starting to use scipy to plot things and then hope that they contribute back to scipy's numeric algorithms? If all they did was to use scipy for plotting, the only contributions would be towards plotting. If you only mean this as a convenience, then this seems like a packaging issue and not related to scipy. Plotting is one part of the puzzle. You don't seem to mention any deficiencies with respect to numerical algorithms. This seems to suggest that apart from things like packaging and docs, the numeric side is pretty solid. Let me take this to an extreme, if plotting be deemed a part of scipy's core then how about f2py? It is definitely core functionality. So why not make f2py part of scipy? How about g77, g95, and gcc. The only direction this looks to be headed is to make a SciPy OS (== Enthon?). I think we are mixing packaging along with other issues here. To make it clear, I am not against incorporating matplotlib in scipy. I just think that the argument for its inclusion does not seem clear to me. [...] TO> 2) Installation problems -- I'm not completely clear on what TO> the TO> "installation problems" really are. I hear people talk about [...] TO> Proposal (just an idea to start discussion): TO> Subdivide scipy into several super packages that install TO> cleanly but can also be installed separately. Implement a TO> CPAN-or-yum-like repository and query system for installing TO> scientific packages. What does this have to do with scipy per se? This is more like a user convenience issue. [scipy-sub-packages] TO> I haven't fleshed this thing out yet as you can tell. I'm TO> mainly talking publicly to spur discussion. The basic idea is TO> that we should force ourselves to distribute scipy in separate TO> packages. This would force us to implement a yum-or-CPAN-like TO> package repository, so that we define the interface as to how TO> an additional module could be developed by someone, even TO> maintained separately (with a different license), and simply TO> inserted into an intelligent point under the scipy TO> infrastructure. This is in general a good idea but one that goes far beyond scipy itself. Joe Cooper mentioned that he had ideas on how to really do this in a cross-platform way. Many of us eagerly await his solution. :) regards, prabhu From aisaac at american.edu Wed Mar 9 05:50:32 2005 From: aisaac at american.edu (Alan G Isaac) Date: Wed Mar 9 05:50:32 2005 Subject: [Numpy-discussion] =?UTF-8?Q?Re[2]:=20[SciPy-dev]=20Future=20directions=20for=20SciPy=20in=20?==?UTF-8?Q?light=20of=20meeting=09at=09Berkeley?= In-Reply-To: <16942.56570.375971.565270@monster.linux.in> References: <422EA691.9080404@ee.byu.edu><16942.56570.375971.565270@monster.linux.in> Message-ID: On Wed, 9 Mar 2005, Prabhu Ramachandran apparently wrote: > What does this have to do with scipy per se? This is more > like a user convenience issue. I think the proposal is: development effort is a function of community size, and community size is a function of convenience as well as functionality. This seems right to me. Cheers, Alan Isaac From cdavis at staffmail.ed.ac.uk Wed Mar 9 06:25:25 2005 From: cdavis at staffmail.ed.ac.uk (Cory Davis) Date: Wed Mar 9 06:25:25 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Re[2]: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <16942.56570.375971.565270@monster.linux.in> Message-ID: <1110378257.10146.28.camel@fog> Hi All > I think the proposal is: > development effort is a function of community size, Undeniably true! > and community size is a function of convenience as > well as functionality. > This is only partly true. I think the main barriers to more people using scipy are... 1. Not that many people actually know about it 2. People aren't easily convinced to change from what they were taught to use as an under-graduate (e.g. Matlab, IDL, Mathematica) As it stands, I don't think scipy is particularly inconvenient to install or use. On the two suggested improvements: I think incorporating matplotlib is an excellent idea. But I think the second suggestion of separating Scipy into independent packages will prove to be counter-productive. It might put people off even before they start, because instead of installing one package, they have a bewildering choice of many. And it could prove to be annoying to people using scipy who want to share or distribute code, with the requirement that both parties have scipy becoming a requirement that both parties have a specific combination of scipy packages. Also, another reason why there might be a lack of developers is that there a people like me who find that scipy and matplotlib already do everything that they need. Which is good right? Cheers, Cory. > This seems right to me. > > Cheers, > Alan Isaac > > > > > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.net > http://www.scipy.net/mailman/listinfo/scipy-user -- )))))))))))))))))))))))))))))))))))))))))))) Cory Davis Meteorology School of GeoSciences University of Edinburgh King's Buildings EDINBURGH EH9 3JZ ph: +44(0)131 6505092 fax +44(0)131 6505780 cdavis at staffmail.ed.ac.uk cory at met.ed.ac.uk http://www.geos.ed.ac.uk/contacts/homes/cdavis )))))))))))))))))))))))))))))))))))))))))))) From southey at uiuc.edu Wed Mar 9 07:24:29 2005 From: southey at uiuc.edu (Bruce Southey) Date: Wed Mar 9 07:24:29 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley Message-ID: Hi, I fully agree with these comments but I think there is a user experience aspect as well. This is my little rant (if you want) as a different view because I really do appreciate the scientific python community. Please understand that these are issues that I see as problems and do not reflect any negative view of what is available. The basics of Python and numarray (and Numeric almost to the same extent) already provide what most users need, basically the implementation of matrix algorithms. I have not tried SciPy for some time so I really will not address it. So in one sense, what more is there to achieve? :-) For a user to contribute material there are some issues that I tend to think about. As you know, it is usually easier (and quicker with Python) to write your own code than try to adapt existing code (and the bloat issue with code that is unnecessary to the user needs). The second aspect is being able to contribute that code back into a package - usually this is too hard (coding styles etc.), may not have high programming experience to be able to achieve this and may not know how to contribute it in the first place. This also gets problematic when items are passed to C or Fortran. My 'job' is not to develop packages but to get results (mainly statistics and bioinformatics). Any free time to do development is usually nonexistant (one has to write papers for example). I would guess that this is not uncommon for the scientific python users. A related issue is missing (or at least not obvious) and inflexible features. For example, I do statistics and missing (unobserved) values are a problem (cannot mix types or missing value code may actually occur). But I can use masked arrays (which really means numarray) to handle this rather nicely. I fully agree with others on directions. From a Python view, if "python setup.py install" doesn't work 'out of the box' then there are big problems. Regards Bruce ---- Original message ---- >Date: Wed, 09 Mar 2005 17:32:15 +0900 >From: Michiel Jan Laurens de Hoon >Subject: Re: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley >To: Travis Oliphant >Cc: SciPy Developers List , scipy-user at scipy.net, numpy-discussion > >Travis Oliphant wrote: >> It would seem that while the scipy conference demonstrates a continuing >> and even increasing use of Python for scientific computing, not as many >> of these users are scipy devotees. Why? >> >> I think the answers come down to a few issues which I will attempt to >> answer with proposals. >> >> 1) Plotting >While plotting is important, I don't think that SciPy needs to offer >plotting capabilities in order to become successful. Numerical Python >doesn't include plotting, and it's hugely popular. I would think that >installing Scipy-lite + (selection of SciPy-lib sub-packages) + (your >favorite plotting package) separately is acceptable. > >> 2) Installation problems >This is the real problem. I'm one of the maintainers of Biopython >(python and C code for computational biology), which relies on Numerical >Python. Now that Numerical Python is not being actively maintained, I'd >love to be able to direct our users to SciPy instead. But as long as >SciPy doesn't install out of the box with a python setup.py install, >it's not viable as a replacement for Numerical Python. I'd spend the >whole day dealing with installation problems from Biopython users. > >There are three other reasons why I have not become a SciPy devotee, >although I use Python for scientific computing all the time: > >3) Numerical Python already does the job very well. There are few >packages in SciPy that I actually need. Special functions would be nice, >but it's easier to write your own module than to install SciPy. > >4) SciPy looks bloated. It seems to try to do too many things, so that >it becomes impossible to maintain SciPy well. > >5) Uncertain future. With Numerical Python, we know what we get. I don't >know what SciPy will look like in a few years (numarray? Numeric3? >Numeric2?) and if it still has a trouble-free installation. So it's too >risky for Biopython to go over to SciPy. > >It's really unfortunate, because my impression is that the SciPy >developers are smart people who write good code, which currently is not >used as much as it could because of these problems. I hope my comments >will be helpful. > >--Michiel. > > >------------------------------------------------------- >SF email is sponsored by - The IT Product Guide >Read honest & candid reviews on hundreds of IT Products from real users. >Discover which products truly live up to the hype. Start reading now. >http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jh at oobleck.astro.cornell.edu Wed Mar 9 07:43:13 2005 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Wed Mar 9 07:43:13 2005 Subject: [Numpy-discussion] Re: Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <20050309112636.24F99334FE@sc8-sf-spam1.sourceforge.net> (numpy-discussion-request@lists.sourceforge.net) References: <20050309112636.24F99334FE@sc8-sf-spam1.sourceforge.net> Message-ID: <200503091542.j29Fg7nX021779@oobleck.astro.cornell.edu> These were exactly the issues we addressed at SciPy04, and which led to the ASP project. All of the issues brought up in the current discussion have already been discussed there, and with largely the same conclusions. The basic gist is this: THERE ARE THOUSANDS OF PEOPLE WAITING FOR SCIPY TO REACH CRITICAL MASS! SciPy will reach the open-source jumping-off point when an outsider has the following experience: They google, find us, visit us, learn what they'll be getting, install it trivially, and read a tutorial that in less than 15 minutes has them plotting their own data. In that process, which will take less than 45 minutes total, they must also gain confidence in the solidity and longevity of the software and find a supportive community. We don't meet all the elements of this test now. Once we do, people will be ready to jump on and work the open-source magic. The goal of ASP (Accessible SciPy) is to meet that test. Some of what we need is being done already, but by a very small number of people. We need everyone's help to reach a meaningful rate of progress. The main points and their status: 1. Resolve the numeric/numarray split and get at least stubs for the basic routines in the Python core. Nothing scares new users more than instability and uncertainty. Travis O. is now attempting to incorporate numarray's added features (including much of the code that implements them) into numeric, and has made a lot of headway. Perry G. has said that he would switch back to numeric if it did the things numarray does. I think we can forsee a resolution to this split in the calendar year IF that effort stays the course. 2. Package it so that it's straightforward to install on all the popular architectures. Joe Cooper has done a lot here, as have others. The basic stuff installs trivially on Red Hat versions of Linux, Windows, and several others (including Debian, I think, and Mac, modulo the inherent problems people report with the Mac package managers, which we can do nothing about). Optimized installs are also available and not all that difficult, particularly if you're willing to issue a one-line command to rebuild a source package. For Linux, it was decided to stick with a core and add-on packages, and to offer umbrella packages that install common groups of packages through the dependency mechanism (e.g., for astronomy or biology). The main issue here is not the packaging, but the documentation, which is trivial to write at this point. I was able to do a "yum install scipy" at SciPy04, once I knew where the repository was. It's: http://www.enthought.com/python/fedora/$releasever We need someone to write installation notes for each package manager. We also need umbrella packages. 3. Document it thoroughly for both new and experienced users. Right now what we have doesn't scratch the surface. I mean no offense to those who have written what we do have. We need to update that and to write a lot more and a lot else. Janet Swisher and several others are ready to dig into this, but we're waiting for the numeric/numarray split to resolve. A list of needed documents is in the ASP proposal. 4. Focus new users on a single selection of packages. The variety of packages available to do a particular task is both a strength and a weakness. While experienced people will want choice, new users need simplicity. We will select a single package each application (like plotting), and will mainly describe those in the tutorial-level docs. We will not be afraid to change the selection of packages. You're only a new user once, so it will not affect you if we switch the docs after you've become experienced. For example, Matplotlib was selected at the SciPy04 BoF, but if Chaco ever reaches that level of new-user friendliness, we might switch. Both packages will of course always be available. Neither needs to be in the core on Linux and other systems that have package management. New users will be steered to the "starter" umbrella package, which will pull in any components that are not in the core. Enthon will continue to include all the packages in the world, I'm sure! 5. Provide a web site that is easy to use and that communicates to each client audience. We (me, Perry, Janet, Jon-Eric) were actually gearing up to solicit proposals for improving the site and making it the go-to place for all things numerical in python when Travis started his work on problem #1. This is the next step, but we're waiting for item 1 to finish so that we don't distract everyone's attention from its resolution. Many developers are interested in contributing here, too. If people feel it's time, we can begin this process. I just don't want to slow Travis and his helpers one tiny bit! 6. Catalog all the add-ons and external web sites so that scipy.org becomes the portal for all things numeric in python. This, at least, is done, thanks to Fernando Perez. See: http://www.scipy.org/wikis/topical_software/TopicalSoftware I'll add one more issue: 7. Do something so people who use SciPy, numeric, and numarray remember that these issues are being worked, and where, and how to contribute. To that end, all I can do is post periodically about ASP and encourage you to remember it whenever someone wonders why we haven't hit critical mass yet. Please visit the ASP wiki. Read the ASP proposal if you haven't, sign up to do something, and do it! Right now, a paltry 6 people have signed up to help out. http://www.scipy.org/wikis/accessible_scipy/AccessibleSciPy The ASP proposal is linked in the first paragraph of the wiki. After giving it some thought, we decided to use scipy-dev at scipy.net as our mailing list, to avoid cross-posted discussions on the 4 mailing lists. Please carry on any further discussion there. Thanks, --jh-- From gr at grrrr.org Wed Mar 9 08:00:34 2005 From: gr at grrrr.org (Thomas Grill) Date: Wed Mar 9 08:00:34 2005 Subject: [Numpy-discussion] Re: Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <200503091542.j29Fg7nX021779@oobleck.astro.cornell.edu> References: <20050309112636.24F99334FE@sc8-sf-spam1.sourceforge.net> <200503091542.j29Fg7nX021779@oobleck.astro.cornell.edu> Message-ID: <422F1D58.1070907@grrrr.org> Hi all, i'd like to introduce myself as a new member to this list. I'm reading about how to gain new users of Scipy - well, let me be the example of one. I'm personally using numarrays for audio work in real-time systems such as Pure Data and Max/MSP. I'm the author of an extension object connecting Python scriptability to these systems (http://grrrr.org/ext/py) - numarray support for audio processing is thus just a logical thing. As an unexperienced user i'm currently concerned about two things: - the dilemma/convergence of Numeric and numarrays: when writing new ufuncs for numarrays will i be able to use them without much work in Numeric3, in case that's the future? - the lack of SIMD support for ufuncs: i'm used to the power of SSE and Altivec and browsing through the ufunc code of numarrays, i don't see any implementation of that. For my applications, this is pretty much a must - is it possible to implement SIMD support in the current system design and how will custom-made ufuncs be able to profit from that? best greetings, Thomas From chodgins at predict.com Wed Mar 9 08:51:36 2005 From: chodgins at predict.com (Cindy Hodgins Burian) Date: Wed Mar 9 08:51:36 2005 Subject: [Numpy-discussion] Numeric and ATLAS Message-ID: <422F291C.3010600@predict.com> Matt Hyclak and Stephen Walton posted about this very problem about a month ago, and I hope they're still reading this forum. I'm having the exact same problem when trying to install Numeric-23.7: gcc -pthread -shared -L/usr/local/lib -I/usr/local/include build/temp.linux-x86_64-2.4/Src/lapack_litemodule.o -L/usr/local/atlas/lib/Linux_HAMMER64SSE2_2 -llapack -lcblas -lf77blas -latlas -lg2c -o build/lib.linux-x86_64-2.4/lapack_lite.so /usr/bin/ld: /usr/local/atlas/lib/Linux_HAMMER64SSE2_2/liblapack.a(dgesv.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /usr/local/atlas/lib/Linux_HAMMER64SSE2_2/liblapack.a: could not read symbols: Bad value collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 I did indeed compile ATLAS with -fPIC. I am new to linux so I'm not sure how to do as Matt said: Sorry for replying to myself. Just for the archives, the problem seems to be the static lapack/blas libraries provided with RHEL3 are not compiled with -fPIC. I ripped open the rpm and rebuild it with an -fPIC added in and all at least compiles now. I"ll leave it up to my faculty to tell me whether or not it works :-) Thanks, Matt So I did what Stephen said: Common problem which I just posted to scipy-devel about. In most variants of RH and Fedora, the RH-provided lapack RPM is only there to satisfy Octave"s dependency. If you"re not using Octave, you"ll be happiest uninstalling both it and the RedHat provided lapack (rpm -e lapack octave) But I'm still having the same problem. Any insight is greatly appreciated. Thanks. Cindy From prabhu_r at users.sf.net Wed Mar 9 08:55:34 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Wed Mar 9 08:55:34 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Re[2]: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <16942.56570.375971.565270@monster.linux.in> Message-ID: <16943.10714.85815.666793@monster.linux.in> >>>>> "AI" == Alan G Isaac writes: AI> On Wed, 9 Mar 2005, Prabhu Ramachandran apparently wrote: >> What does this have to do with scipy per se? This is more like >> a user convenience issue. AI> I think the proposal is: development effort is a function of AI> community size, and community size is a function of AI> convenience as well as functionality. To put it bluntly, I don't believe that someone who can't install scipy today is really capable of contributing code to scipy. I seriously doubt claims that scipy is scary or hard to install today. Therefore, the real problem does not appear to be convenience and IMHO neither is functionality the problem. My only point is this. I think Travis and Pearu have been doing a great job! I'd rather see them working on things like Numeric3 and core scipy functionality rather than spend time worrying about packaging, including other new packages and making things more comfortable for the user (especially when these things are already taken care of). Anyway, Joe's post about ASP's role is spot on! Thanks Joe. More on that thread. cheers, prabhu From stephen.walton at csun.edu Wed Mar 9 09:18:16 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Mar 9 09:18:16 2005 Subject: [Numpy-discussion] Numeric and ATLAS In-Reply-To: <422F291C.3010600@predict.com> References: <422F291C.3010600@predict.com> Message-ID: <422F2F85.5060709@csun.edu> Hi, Cindy, Well, I'm still reading this forum. > /usr/bin/ld: > /usr/local/atlas/lib/Linux_HAMMER64SSE2_2/liblapack.a(dgesv.o): > relocation R_X86_64_32 can not be used when making a shared object; > recompile with -fPIC Unfortunately I'm not on a 64 bit architecture, and this problem may be peculiar to it. I guess to be of more help I'd have to see some outputs from your compilation of both lapack and atlas. Are you using g77? After hassles with Absoft, and finding that g77 was actually better in most instances as measured by the LAPACK timing tests, I used g77 throughout for my LAPACK and ATLAS compiles. For LAPACK, I used the supplied make.inc.LINUX file as make.inc, with the exception that I removed the '-fno-f2c' switch. A simple 'make config' then worked fine for me with ATLAS. The LAPACK compile doesn't seem to use -fPIC, which shouldn't be needed anyway; neither LAPACK nor ATLAS can easily be installed as shared libraries anyway, which is why Fernando Perez has his scripts to build separate versions of Numeric/numarray/scipy/etc. statically linked against various hardware-specific versions of ATLAS. This is a bit of a ramble; hope some of it helps. Stephen From stephen.walton at csun.edu Wed Mar 9 09:34:29 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Mar 9 09:34:29 2005 Subject: [Numpy-discussion] Re: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: <422F335F.8060107@csun.edu> I only have a little to contribute at this point: > Proposal: > Incorporate matplotlib as part of the scipy framework (replacing plt). While this is an admirable goal, I personally find scipy and matplotlib easy to install separately. The only difficulty (of course!) is the numarray/numeric split, so I have to be sure that I select numerix as Numeric in my .matplotlibrc file before typing 'ipython -pylab -p scipy', which actually works really well. > 2) Installation problems -- I'm not completely clear on what the > "installation problems" really are. scipy and matplotlib are both very easy to install. Using ATLAS is the biggest pain, as Travis says, and one can do without it. Now that a simple 'scipy setup.py bdist_rpm' seems to work reliably, I for one am happy. I think splitting scipy up into multiple subpackages isn't such a good idea. Perhaps I'm in the minority, but I find CPAN counter-intuitive, hard to use, and hard to keep track of in an RPM-based environment. Any large package is going to include a lot of stuff most people don't need, but like a NY Times ad used to say, "You might not read it all, but isn't it nice to know it's all there?" I can tell you why I'm not contributing much code to the effort at least in one recent instance. Since I'm still getting core dumps when I try to use optimize.leastsq with a defined Jacobian function, I dove into _minpackmodule.c and its associated routines last night. I'm at sea. I know enough Python to be dangerous, used LMDER from Fortran extensively while doing my Ph.D., and am pretty good at C, but am completely unfamiliar with the Python-C API. So I don't even know how to begin tracking the problem down. Finally, as I mentioned at SciPy04, our particular physics department is at an undergraduate institution (no Ph.D. program), so we mainly produce majors who stop at the B.S. or M.S. degree. Their job market seems to want MATLAB skills, not Python, at the moment, so that's what the faculty are learning and teaching to their students. Many of them/us simply don't have the time to learn Python on top of that. Though, when I showed some colleagues how trivial it was to trim some unwanted bits out of data files they had using Python, I think I converted them. From faltet at carabos.com Wed Mar 9 10:47:19 2005 From: faltet at carabos.com (Francesc Altet) Date: Wed Mar 9 10:47:19 2005 Subject: [Numpy-discussion] Reversing RecArrays Message-ID: <200503091946.11956.faltet@carabos.com> Hi, I would be interested in having a fast way to reverse RecArrays. Regrettably, the most straightforward way to reverse them does not work properly: >>> from numarray import records >>> r = records.array([('Smith', 1234),\ ... ('Johnson', 1001),\ ... ('Williams', 1357),\ ... ('Miller', 2468)], \ ... names='Last_name, phone_number') >>> r[::-1] Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.3/site-packages/numarray/records.py", line 749, in __repr__ outlist.append(Record.__str__(i)) File "/usr/lib/python2.3/site-packages/numarray/records.py", line 797, in __str__ outlist.append(`self.array.field(i)[self.row]`) File "/usr/lib/python2.3/site-packages/numarray/records.py", line 736, in field self._fields = self._get_fields() # Refresh the cache File "/usr/lib/python2.3/site-packages/numarray/records.py", line 705, in _get_fields bytestride=_stride) File "/usr/lib/python2.3/site-packages/numarray/strings.py", line 112, in __init__ raise ValueError("Inconsistent string and array parameters.") ValueError: Inconsistent string and array parameters. Anyway, anybody knows if there is some way to achieve this (until this bug would be eventually fixed, of course)? Thanks, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From jdhunter at ace.bsd.uchicago.edu Wed Mar 9 10:56:30 2005 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Wed Mar 9 10:56:30 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> (Travis Oliphant's message of "Wed, 09 Mar 2005 00:32:33 -0700") References: <422EA691.9080404@ee.byu.edu> Message-ID: >>>>> "Travis" == Travis Oliphant writes: Travis> It would seem that while the scipy conference demonstrates Travis> a continuing and even increasing use of Python for Travis> scientific computing, not as many of these users are scipy Travis> devotees. Why? Hi Travis, I like a lot of your proposal, and I want to throw a couple of additional ideas into the mix. There are two ideas about what scipy is: a collection of scientific algorithms and a general purpose scientific computing environment. On the first front, scipy has been a great success; on the second, less so. I think the following would be crucial to make such an effort a success (some of these are just restatements of your ideas with additional comments) * Easy to install: - it would be probably be important to have a fault-tolerant install so that even if a component fails, the parts that don't depend on that can continue. Matthew Knepley's build system might be an important tool to make this work right for source installs, rather than trying to push distutils too hard. * A package repository and a way of specifying dependencies between the packages and allow automated recursive downloads ala apt-get, yum, etc.... So basically we have to come up with a package manager, and probably one that supports src as well as binary installs. Everyone knows this is a significant problem in python, and we're in a good place to tackle it in that we have experience distributing complex packages across platforms which are a mixture of python/C/C++/FORTRAN, so if we can make it work, it will probably work for all of python. I think we would want contributions from people who do packaging on OSX and win32, eg Bob Ippolito, Joe Cooper, Robert Kern, and others. * Transparent support for Numeric, numarray and Numeric3 built into a compatibility layer, eg something like matplotlib.numerix which enables the user to be shielded from past and future changes in the array package. If you and the numarray developers can agree on that interface, that is an important start, because no matter how much success you have with Numeric3, Numeric 23.x and numarray will be in the wild for some time to come. Having all the major players come together and agree on a core interface layer would be a win. In practice, it works well in matplotlib.numerix. * Buy-in from the developers of all the major packages that people want and need to have the CVS / SVN live on a single site which also has mailing lists etc. I think this is a possibility, actually; I'm open to it at least. * Good tutorial, printable documentation, perhaps following a "dive into python" model with a "just-in-time" model of teaching the language; ie, task oriented. A question I think should be addressed is whether scipy is the right vehicle for this aggregation. I know this has been a long-standing goal of yours and appreciate your efforts to continue to make it happen. But there is a lot of residual belief that scipy is hard to install, and this is founded in an old memory that refuses, sometimes irrationally, to die, and in part from people's continued difficulties. If we make a grand effort to unify into a coherent whole, we might be better off with a new name that doesn't carry the difficult-to-install connotation. And easy-to-install should be our #1 priority. Another reason to consider a neutral name is that it wouldn't scare off a lot of people who want to use these tools but don't consider themselves to be scientists. In matplotlib, there are people who just want to make bar and pie charts, and in talks I've given many people are very happy when I tell them that I am interested in providing plotting capabilities outside the realm of scientific plotting. This is obviously a lot to bite off but it could be made viable with some dedicated effort; python is like that. Another concern I have, though, is that it seems to duplicate a lot of the enthought effort to build a scientific python bundle -- they do a great job already for win32 and I think an enthought edition for linux and OSX are in the works. The advantage of your approach is that it is modular rather than monolithic. To really make this work, I think enthought would need to be on board with it. Eg mayavi2 and traits2 are both natural candidates for inclusion into this beast, but both live in the enthought subversion tree. Much of what you describe seems to be parallel to the enthought python, which also provides scipy, numeric, ipython, mayavi, plotting, and so on. I am hesitant to get too involved in the packaging game -- it's really hard and would take a lot of work. We might be better off each making little focused pieces, and let packagers (pythonmac, fink, yum, debian, enthought, ...) do what they do well. Not totally opposed, mind you, just hesitant.... JDH From matthew.brett at gmail.com Wed Mar 9 10:58:29 2005 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed Mar 9 10:58:29 2005 Subject: [Numpy-discussion] Re: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422F335F.8060107@csun.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> Message-ID: <1e2af89e050309105740b7d5e6@mail.gmail.com> Hi, Thanks for the excellent discussion - and this has really been said already, but just for clarity: It seems that SciPy has two intended markets. The first is as a competitor to languages like Matlab and IDL. Here the ideal is that a Matlab IDL user can just google, look, download, install and have something with all the features they are used to sitting and looking at them saying "aren't I beautiful". I guess this is the point of ASP. Such a package will definitely need very good default plotting. The second is open-source developers. Until we reach the ideal above, developers will need flexibility and independence of install options to minimize the support they have to offer for SciPy install issues. So, aren't we suggesting providing a solution for both types of users? If cleverly done, can't we have nicely parsed separate packages for developers to use, which can also be downloaded as one big SciPy install? Over time, we can expect that individual installs will improve until we reach the necessary stability of the full install. In the meantime, we also have a problem of the perception that efforts in numerical python are widely spread across developers and websites; this makes new users googling for Python and Matlab or IDL nervous. It would be a great help if those writing scientific projects for python could try to use the SciPy home as a base, even if at first the project is rather independent of SciPy itself - IPython being a good example. Best, Matthew From jmiller at stsci.edu Wed Mar 9 13:55:09 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Mar 9 13:55:09 2005 Subject: [Numpy-discussion] Reversing RecArrays In-Reply-To: <200503091946.11956.faltet@carabos.com> References: <200503091946.11956.faltet@carabos.com> Message-ID: <1110405239.524.546.camel@halloween.stsci.edu> On Wed, 2005-03-09 at 13:46, Francesc Altet wrote: > Hi, > > I would be interested in having a fast way to reverse RecArrays. > Regrettably, the most straightforward way to reverse them does not > work properly: > > >>> from numarray import records > >>> r = records.array([('Smith', 1234),\ > ... ('Johnson', 1001),\ > ... ('Williams', 1357),\ > ... ('Miller', 2468)], \ > ... names='Last_name, phone_number') > >>> r[::-1] > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.3/site-packages/numarray/records.py", line 749, in > __repr__ > outlist.append(Record.__str__(i)) > File "/usr/lib/python2.3/site-packages/numarray/records.py", line 797, in > __str__ > outlist.append(`self.array.field(i)[self.row]`) > File "/usr/lib/python2.3/site-packages/numarray/records.py", line 736, in > field > self._fields = self._get_fields() # Refresh the cache > File "/usr/lib/python2.3/site-packages/numarray/records.py", line 705, in > _get_fields > bytestride=_stride) > File "/usr/lib/python2.3/site-packages/numarray/strings.py", line 112, in > __init__ > raise ValueError("Inconsistent string and array parameters.") > ValueError: Inconsistent string and array parameters. > > Anyway, anybody knows if there is some way to achieve this (until this > bug would be eventually fixed, of course)? This now works in CVS. The attached reverse() also works against CVS and should work against 1.2.2. -------------- next part -------------- A non-text attachment was scrubbed... Name: revrec.py Type: text/x-python Size: 708 bytes Desc: not available URL: From konrad.hinsen at laposte.net Wed Mar 9 14:31:15 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Mar 9 14:31:15 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Re[2]: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <16943.10714.85815.666793@monster.linux.in> References: <422EA691.9080404@ee.byu.edu> <16942.56570.375971.565270@monster.linux.in> <16943.10714.85815.666793@monster.linux.in> Message-ID: On 09.03.2005, at 17:52, Prabhu Ramachandran wrote: > To put it bluntly, I don't believe that someone who can't install > scipy today is really capable of contributing code to scipy. I True, but not quite to the point. I can install SciPy, but given that most of my code is written with the ultimate goal of being published and used by people with less technical experience, I need to take those people into account when choosing packages to build on. > seriously doubt claims that scipy is scary or hard to install today. I get support questions from people who are not aware that they need root permissions to do "python setup.py install" on a standard Linux system. On that scale of expertise, scipy *is* scary. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From Chris.Barker at noaa.gov Wed Mar 9 15:33:28 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Mar 9 15:33:28 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: References: <422EA691.9080404@ee.byu.edu> Message-ID: <422F8793.60900@noaa.gov> John Hunter wrote: > I think we would want > contributions from people who do packaging on OSX and win32, eg > Bob Ippolito, Joe Cooper, Robert Kern, and others. Just a note about this. For OS-X, Jack Jansen developed PIMP, and the Package Manger App to go with it. Someone even made a wxPython based Packaged Manager app also. It was designed to be platform independent from the start. I think part of the idea was that if if caught on on the Mac, maybe it would be adopted elsewhere. I think it's worth looking at. However... The PIMP database maintenance has not been going very well. In fact, to some extent it's been abandoned, and replaced with a set of native OS-X .mpkg files. These are easy to install, and familiar to Mac users. This supports my idea from long ago: what we need are simply a set of packages in a platform native format: Windows Installers, rpms, .debs, .mpkg, etc. Whenever this comes up, it seems like people focus on nifty technological solutions for a package repository, which makes sense as we're all a bunch of programmers, but I'm not sure it gets the job done. a simple web site you can download all the installers you need is fine. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From juenglin at cs.pdx.edu Wed Mar 9 16:36:26 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Wed Mar 9 16:36:26 2005 Subject: [Numpy-discussion] comments on array iteration behavior as described in current PEP draft Message-ID: <1110414917.24560.80.camel@alpspitze.cs.pdx.edu> >From the current PEP draft: 1-d Iterator A 1-d iterator will be defined that will walk through any array, returning a Python scalar at each step. Order of the iteration is the same for contiguous and discontiguous arrays. The last index always varies the fastest These 1-d iterators can also be indexed and set. In which case the underlying array will be considered 1-d (but does not have to be contiguous in memory). Mapping Iterator ... (2) if contains only standard slicing (no index arrays or boolean mask arrays), then a view is returned if using a[] while a copy is returned using the iterator intermediary. The rule is that iterator slicing always produces a copy. (1) In Python parlance, an iterator is not indexable (cf. the iterator protocol). You probably meant to say "sequence"? (2) Why should the mapiter object not return views under the same circumstances that array indexing operations return views? I can see this being useful and would favor this behavior. Ralf From bob at redivi.com Wed Mar 9 16:52:07 2005 From: bob at redivi.com (Bob Ippolito) Date: Wed Mar 9 16:52:07 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Re: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422F335F.8060107@csun.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> Message-ID: <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> On Mar 9, 2005, at 12:33 PM, Stephen Walton wrote: >> 2) Installation problems -- I'm not completely clear on what the >> "installation problems" really are. > > scipy and matplotlib are both very easy to install. Using ATLAS is > the biggest pain, as Travis says, and one can do without it. Now that > a simple 'scipy setup.py bdist_rpm' seems to work reliably, I for one > am happy. On Mac OS X, using ATLAS should be pretty trivial because the OS already ships with an optimized implementation! The patch I created for Numeric was very short, and I'm pretty sure it's on the trunk (though last I packaged it, I had to make a trivial fix or two, which I reported on sourceforge). I haven't delved into SciPy's source in a really long time, so I'm not sure where changes would need to be made, but I think someone else should be fine to look at Numeric's setup.py and do what needs to be done to SciPy. FYI, matplotlib, the optimized Numeric, and several other Mac OS X packages are available in binary form here: http://pythonmac.org/packages/ > I think splitting scipy up into multiple subpackages isn't such a good > idea. Perhaps I'm in the minority, but I find CPAN counter-intuitive, > hard to use, and hard to keep track of in an RPM-based environment. > Any large package is going to include a lot of stuff most people don't > need, but like a NY Times ad used to say, "You might not read it all, > but isn't it nice to know it's all there?" I also think that a monolithic package is a pretty good idea until it begins to cause problems with the release cycle. Twisted had this problem at 1.3, and went through a major refactoring between then and 2.0 (which is almost out the door). Though Twisted 2.0 is technically many different packages, they still plan on maintaining a "sumo" package that includes all of the Twisted components, plus zope.interface (the only required dependency). There are still several optional dependencies not included, though (such as PyCrypto). SciPy could go this route, and simply market the "sumo" package to anyone who doesn't already know what they're doing. An experienced SciPy user may want to upgrade one particular component of SciPy as early as possible, but leave the rest be, for example. -bob From daishi at egcrc.net Wed Mar 9 17:09:04 2005 From: daishi at egcrc.net (Daishi Harada) Date: Wed Mar 9 17:09:04 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Re[2]: [SciPy-dev] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <16942.56570.375971.565270@monster.linux.in> <16943.10714.85815.666793@monster.linux.in> Message-ID: <29f97aa647609886b2bfbd27cb66761d@egcrc.net> I'd like to second Konrad's point and restate what I tried to articulate (probably poorly) at SciPy 04. How easy it is for me, as a developer, to install SciPy on my particular development platform (in my case OS X and Linux) is not the same as how easy it is to *deploy* an application which uses SciPy as a library to end-user clients (in my case on Windows). I had originally hoped that having the client simply install Enthon would suffice, but I wanted to use some features from wxPython 2.5.x (perhaps that's what I should have reconsidered). I tried combinations of having the client install packages separately and me using py2exe, but in the end my dependency on SciPy was small enough that it was easiest to just dump SciPy altogether. Just my 2c. (and I hope that it's clear that I do appreciate all the work that people have done and that I mean no offense by my comments). On Mar 9, 2005, at 2:30 PM, konrad.hinsen at laposte.net wrote: > On 09.03.2005, at 17:52, Prabhu Ramachandran wrote: > >> To put it bluntly, I don't believe that someone who can't install >> scipy today is really capable of contributing code to scipy. I > > True, but not quite to the point. I can install SciPy, but given that > most of my code is written with the ultimate goal of being published > and used by people with less technical experience, I need to take > those people into account when choosing packages to build on. > >> seriously doubt claims that scipy is scary or hard to install today. > > I get support questions from people who are not aware that they need > root permissions to do "python setup.py install" on a standard Linux > system. On that scale of expertise, scipy *is* scary. > > Konrad. > -- > ----------------------------------------------------------------------- > -------- > Konrad Hinsen > Laboratoire Leon Brillouin, CEA Saclay, > 91191 Gif-sur-Yvette Cedex, France > Tel.: +33-1 69 08 79 25 > Fax: +33-1 69 08 82 61 > E-Mail: khinsen at cea.fr > ----------------------------------------------------------------------- > -------- > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From brendansimons at yahoo.ca Wed Mar 9 18:46:13 2005 From: brendansimons at yahoo.ca (Brendan Simons) Date: Wed Mar 9 18:46:13 2005 Subject: [Numpy-discussion] Re: Packaging Scipy (was Future directions for SciPy in light of meeting at Berkeley ) In-Reply-To: <20050310011059.559EBF54B@sc8-sf-spam2.sourceforge.net> References: <20050310011059.559EBF54B@sc8-sf-spam2.sourceforge.net> Message-ID: <4c92afecf47730e9ec45bbbbc35a32d6@yahoo.ca> Hear hear. Everytime I see a pathname I groan out loud. Does that mean I'm too wussy to be a programmer? Maybe ;) but there are plenty of potential users (call us the matlab crowd) who feel the same way. I'd much rather just grab a binary installer from a website than manage some giant registry. The appearance of python .mpkg bundles on the mac has been a blessing. -Brendan On 9-Mar-05, at 8:09 PM, numpy-discussion-request at lists.sourceforge.net wrote: > Whenever this comes up, it seems like people focus on > nifty technological solutions for a package repository, which makes > sense as we're all a bunch of programmers, but I'm not sure it gets the > job done. a simple web site you can download all the installers you > need > is fine. > > -Chris > > > -- > Christopher Barker, Ph.D. > Oceanographer From oliphant at ee.byu.edu Wed Mar 9 19:23:17 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 9 19:23:17 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> Message-ID: <422FBD4A.3030708@ee.byu.edu> I had a lengthy discussion with Eric today and clarified some things in my mind about the future directions of scipy. The following is basically what we have decided. We are still interested in input so don't think the issues are closed, but I'm just giving people an idea of my (and Eric's as far as I understand it) thinking on scipy. 1) There will be a scipy_core package which will be essentially what Numeric has always been (plus a few easy to install extras already in current scipy_core). It will likely contain the functionality of (the names and placements will be similar to current scipy_core). Numeric3 (actually called ndarray or narray or numstar or numerix or something....) fft (based on c-only code -- no fortran dependency) linalg (a lite version -- no fortran or ATLAS dependency) stats (a lite version --- no fortran dependency) special (only c-code --- no fortran dependency) weave f2py? (still need to ask Pearu about this) scipy_distutils and testing matrix and polynomial classes ...others...? We will push to make this an easy-to-install effective replacement for Numeric and hopefully for numarray users as well. Therefore community input and assistance will be particularly important. 2) The rest of scipy will be a package (or a series of packages) of algorithms. We will not try to do plotting as part of scipy. The current plotting in scipy will be supported for a time, but users will be weaned off to other packages: matplotlib, pygist (for xplt -- and I will work to get any improvements for xplt into pygist itself), gnuplot, etc. 3) Having everything under a scipy namespace is not necessary, nor worth worrying about at this point. My scipy-related focus over the next 5-6 months will be to get scipy_core to the point that most can agree it effectively replaces the basic tools of Numeric and numarray. -Travis From eric at enthought.com Wed Mar 9 20:42:18 2005 From: eric at enthought.com (eric jones) Date: Wed Mar 9 20:42:18 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <422FBD4A.3030708@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> Message-ID: <422FD009.4020706@enthought.com> Hey Travis, It sounds like the Berkeley meeting went well. I am glad that the Numeric3 project is going well and looks like it has a good chance to unify the Numeric/Numarray communities. I really appreciate you putting in so much effort intto its implementation. I also appreciate all the work by Perry, Todd, and the others at StSci have done building Numarray. We've all learned a ton from it. Most of the plans sound right to me (several questions/comments below). Much of SciPy has been structured in this way already, but we really have never worked to make the core useful as a stand alone package. Supporting lite and full versions of fft, linalg, and stats sounds potentially painful, but also worthwhile given the circumstances. Now: 1. How much of stats do we loose from removing fortran dependencies? 2. I do question whether weave really be in this core? I think it was in scipy_core before because it was needed to build some of scipy. 3. Now that I think about it, I also wonder if f2py should really be there -- especially since we are explicitly removing any fortran dependencies from the core. 4. I think keeping scipy a algorithms library and leaving plotting to other libraries is a good plan. At one point, the setup_xplt.py file was more than 1000 lines. It is much cleaner now, but dealing with X11, etc. does take maintenance work. Removing these libraries from scipy would decrease the maintenance effort and leave the plotting to matplotlib, chaco, and others. 5. I think having all the generic algorithm packages (signal, ga, stats, etc. -- basically all the packages that are there now) under the scipy namespace is a good idea. It prevents worry about colliding with other peoples packages. However, I think domain specific libraries (such as astropy) should be in their own namespace and shouldn't be in scipy. thanks, eric Travis Oliphant wrote: > I had a lengthy discussion with Eric today and clarified some things > in my mind about the future directions of scipy. The following is > basically what we have decided. We are still interested in input so > don't think the issues are closed, but I'm just giving people an idea > of my (and Eric's as far as I understand it) thinking on scipy. > > 1) There will be a scipy_core package which will be essentially what > Numeric has always been (plus a few easy to install extras already in > current scipy_core). It will likely contain the functionality of > (the names and placements will be similar to current scipy_core). > Numeric3 (actually called ndarray or narray or numstar or numerix or > something....) > fft (based on c-only code -- no fortran dependency) > linalg (a lite version -- no fortran or ATLAS dependency) > stats (a lite version --- no fortran dependency) > special (only c-code --- no fortran dependency) > weave > f2py? (still need to ask Pearu about this) > scipy_distutils and testing > matrix and polynomial classes > > ...others...? > > We will push to make this an easy-to-install effective replacement for > Numeric and hopefully for numarray users as well. Therefore > community input and assistance will be particularly important. > > 2) The rest of scipy will be a package (or a series of packages) of > algorithms. We will not try to do plotting as part of scipy. The > current plotting in scipy will be supported for a time, but users will > be weaned off to other packages: matplotlib, pygist (for xplt -- and > I will work to get any improvements for xplt into pygist itself), > gnuplot, etc. > > 3) Having everything under a scipy namespace is not necessary, nor > worth worrying about at this point. > > My scipy-related focus over the next 5-6 months will be to get > scipy_core to the point that most can agree it effectively replaces > the basic tools of Numeric and numarray. > > > -Travis > > _______________________________________________ > SciPy-user mailing list > SciPy-user at scipy.net > http://www.scipy.net/mailman/listinfo/scipy-user From mdehoon at ims.u-tokyo.ac.jp Wed Mar 9 23:30:15 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 9 23:30:15 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: <422FBD4A.3030708@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> Message-ID: <422FF74F.8000001@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > 1) There will be a scipy_core package which will be essentially what > Numeric has always been (plus a few easy to install extras already in > current scipy_core). It will likely contain the functionality of (the > names and placements will be similar to current scipy_core). > Numeric3 (actually called ndarray or narray or numstar or numerix or > something....) > fft (based on c-only code -- no fortran dependency) > linalg (a lite version -- no fortran or ATLAS dependency) > stats (a lite version --- no fortran dependency) > special (only c-code --- no fortran dependency) That would be great! If it can be installed as easily as Numerical Python (and I have no reason to believe it won't be), I will certainly point users to this package instead of the older Numerical Python. I'd be happy to help out here, but I guess most of this code is working fine already. > 2) The rest of scipy will be a package (or a series of packages) of > algorithms. We will not try to do plotting as part of scipy. The > current plotting in scipy will be supported for a time, but users will > be weaned off to other packages: matplotlib, pygist (for xplt -- and I > will work to get any improvements for xplt into pygist itself), > gnuplot, etc. Let me know which improvements from xplt you want to include into pygist. It might also be a good idea to move the pygist web pages to scipy.org. --Michiel. From pearu at scipy.org Thu Mar 10 00:50:16 2005 From: pearu at scipy.org (Pearu Peterson) Date: Thu Mar 10 00:50:16 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <422FD009.4020706@enthought.com> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> Message-ID: Hi, To clarify few technical details: On Wed, 9 Mar 2005, eric jones wrote: > 1. How much of stats do we loose from removing fortran dependencies? > 2. I do question whether weave really be in this core? I think it was in > scipy_core before because it was needed to build some of scipy. At the moment scipy does not contain modules that need weave. > 3. Now that I think about it, I also wonder if f2py should really be there -- > especially since we are explicitly removing any fortran dependencies from the > core. f2py is not a fortran-only tool. In scipy it has been used to wrap also C codes (fft, atlas) and imho f2py should be used more so whenever possible. > Travis Oliphant wrote: > >> 1) There will be a scipy_core package which will be essentially what >> Numeric has always been (plus a few easy to install extras already in >> current scipy_core). It will likely contain the functionality of (the >> names and placements will be similar to current scipy_core). >> Numeric3 (actually called ndarray or narray or numstar or numerix or >> something....) >> fft (based on c-only code -- no fortran dependency) Hmm, what would be the default underlying fft library here? Currently in scipy it is Fortran fftpack. And when fftw is available, it is used instead. >> linalg (a lite version -- no fortran or ATLAS dependency) Again, what would be the underlying linear algebra library here? Numeric uses f2c version of lite lapack library. Shall we do the same but wrapping the c codes with f2py rather than by hand? f2c might be useful also in other cases to reduce fortran dependency, but only when it is critical to ease the scipy_core installation. >> stats (a lite version --- no fortran dependency) >> special (only c-code --- no fortran dependency) >> weave >> f2py? (still need to ask Pearu about this) I am not against it, it actually would simplify many things (for scipy users it provides one less dependency to worry about, f2py bug fixes and new features are immidiately available, etc). And I can always ship f2py as standalone for non-scipy users. >> scipy_distutils and testing >> matrix and polynomial classes >> >> ...others...? There are few pure python modules (ppimport,machar,pexec,..) in scipy_base that I have heard to be used as very useful standalone modules. >> We will push to make this an easy-to-install effective replacement for >> Numeric and hopefully for numarray users as well. Therefore community >> input and assistance will be particularly important. >> >> 2) The rest of scipy will be a package (or a series of packages) of >> algorithms. We will not try to do plotting as part of scipy. The current >> plotting in scipy will be supported for a time, but users will be weaned >> off to other packages: matplotlib, pygist (for xplt -- and I will work to >> get any improvements for xplt into pygist itself), gnuplot, etc. +1 for not doing plotting in scipy. Pearu From konrad.hinsen at laposte.net Thu Mar 10 01:09:20 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 10 01:09:20 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> Message-ID: <0e6de2eb91964aab6be56def725b0b4a@laposte.net> On 10.03.2005, at 09:49, Pearu Peterson wrote: > f2py is not a fortran-only tool. In scipy it has been used to wrap > also C codes (fft, atlas) and imho f2py should be used more so > whenever possible. Good to know. I never looked at f2py because I don't use Fortran any more. > Hmm, what would be the default underlying fft library here? Currently > in scipy it is Fortran fftpack. And when fftw is available, it is used > instead. How about an f2c version of FFTPACK? Plus keeping the option of using fftw if installed, of course. > Again, what would be the underlying linear algebra library here? > Numeric uses f2c version of lite lapack library. Shall we do the same > but wrapping the c codes with f2py rather than by hand? f2c might be > useful I like the idea of the f2c versions because they can easily be replaced by the original Fortran code for more speed. It might even be a good to have scipy_core include the Fortran version as well and use it optionally during installation. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From faltet at carabos.com Thu Mar 10 01:45:24 2005 From: faltet at carabos.com (Francesc Altet) Date: Thu Mar 10 01:45:24 2005 Subject: [Numpy-discussion] Reversing RecArrays In-Reply-To: <1110405239.524.546.camel@halloween.stsci.edu> References: <200503091946.11956.faltet@carabos.com> <1110405239.524.546.camel@halloween.stsci.edu> Message-ID: <200503101044.34038.faltet@carabos.com> A Dimecres 09 Mar? 2005 22:53, Todd Miller va escriure: > > I would be interested in having a fast way to reverse RecArrays. > > Regrettably, the most straightforward way to reverse them does not > > This now works in CVS. The attached reverse() also works against CVS > and should work against 1.2.2. Todd, your workaround works pretty well. Many thanks! -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From nico at logilab.fr Thu Mar 10 04:01:30 2005 From: nico at logilab.fr (Nicolas Chauvat) Date: Thu Mar 10 04:01:30 2005 Subject: [Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley In-Reply-To: <422EA691.9080404@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> Message-ID: <20050310120035.GI27725@crater.logilab.fr> Hello, On Wed, Mar 09, 2005 at 12:32:33AM -0700, Travis Oliphant wrote: > Subdivide scipy into several super packages that install cleanly but can > also be installed separately. Implement a CPAN-or-yum-like repository > and query system for installing scientific packages. Please don't try to reinvent a repository and installation system specific to scipy. Under unix distribution and package systems are already solving this problem. Python folks already reinvented part of the wheel with a PythonPackageIndex that can be updated in one command using distutils. If your goal is to have a unique reference for scientific tools, I think it would be better to set up a Python Scientific Package Index or just use the existing one at http://www.python.org/pypi/ Packaging/Installation/Querying/Upgrading is a complex task better left to dedicated existing tools, namely apt-get/yum/urpmi/portage/etc. Regarding subdividing scipy into several packages installable separately under the same scipy base namespace umbrella, you should be aware that PyXML has had many problems doing the same (but PyXML has also been occulting existing parts of standard library, which may feel a bit too weird). -- Nicolas Chauvat logilab.fr - services en informatique avanc?e et gestion de connaissances From perry at stsci.edu Thu Mar 10 07:02:13 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Mar 10 07:02:13 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <422FD009.4020706@enthought.com> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> Message-ID: On Mar 9, 2005, at 11:41 PM, eric jones wrote: > > 2. I do question whether weave really be in this core? I think it was > in scipy_core before because it was needed to build some of scipy. > 3. Now that I think about it, I also wonder if f2py should really be > there -- especially since we are explicitly removing any fortran > dependencies from the core. It would seem to me that so long as: 1) both these tools have very general usefulness (and I think they do), and 2) are not installation problems (I don't believe they are since they themselves don't require any compilation of Fortran, C++ or whatever--am I wrong on that?) That they are perfectly fine to go into the core. In fact, if they are used by any of the extra packages, they should be in the core to eliminate the extra step in the installation of those packages. Perry From perry at stsci.edu Thu Mar 10 07:30:30 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Mar 10 07:30:30 2005 Subject: [Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core Message-ID: On March 7th Travis Oliphant and Perry Greenfield met Guido and Paul Dubois to discuss some issues regarding the inclusion of an array package within core Python. The following represents thoughts and conclusions regarding our meeting with Guido. They in no way represent the order of discussion with Guido and some of the points we raise weren't actually mentioned during the meeting, but instead were spurred by subsequent discussion after the meeting with Guido. 1) Including an array package in the Python core. To start, before the meeting we both agreed that we did not think that this itself was a high priority in itself. Rather we both felt that the most important issue was making arrays an acceptable and widely supported interchange format (it may not be apparent to some that this does not require arrays be in the core; more on that later). In discussing the desirability of including arrays in the core with Guido, we quickly came to the conclusion that not only was it not important, that in the near term (the next couple years and possibly much longer) it was a bad thing to do so. This is primarily because it would mean that updates to the array package would wait on Python releases potentially delaying important bug fixes, performance enhancements, or new capabilities greatly. Neither of us envisions any scenario regarding array packages, whether that be Numeric3 or numarray, where we would consider it to be something that would not *greatly* benefit from decoupling its release needs from that of Python (it's also true that it possibly introduces complications for Python releases if they need to synch with array schedules, but being inconsiderate louts, we don't care much about that). And when one considers that the move to multicore and 64-bit processors will introduce the need for significant changes in the internals to take advantage of these capabilities, it is unlike we will see a quiescent, maintenance-level state for an array package for some time. In short, this issue is a distraction at the moment and will only sap energy from what needs to be done to unify the array packages. So what about supporting arrays as an interchange format? There are a number of possibilities to consider, none of which require inclusion of arrays into the core. It is possible for 3rd party extensions to optionally support arrays as an interchange format through one of the following mechanisms: a) So long as the extension package has access to the necessary array include files, it can build the extension to use the arrays as a format without actually having the array package installed. The include files alone could be included into the core (Guido has previously been receptive to doing this though at this meeting he didn't seem quite as receptive instead suggesting the next option) or could be packaged with extension (we would prefer the former to reduce the possibilities of many copies of include files). The extension could then be successfully compiled without actually having the array package present. The extension would, when requested to use arrays would see if it could import the array package, if not, then all use of arrays would result in exceptions. The advantage of this approach is that it does not require that arrays be installed before the extension is built for arrays to supported. It could be built, and then later the array package could be installed and no rebuilding would be necessary. b) One could modify the extension build process to see if the package is installed and the include files are available, if so, it is built with the support, otherwise not. The advantage of this approach is that it doesn't require the include files be included with the core or be bundled with the extension, thus avoiding any potential version mismatches. The disadvantage is that later adding the array package require the extension to be rebuilt, and it results in more complex build process (more things to go wrong). c) One could provide the support at the Python level by instead relying on the use of buffer objects by the extension at the C level, thus avoiding any dependence on the array C api. So long as the extension has the ability to return buffer objects containing the putative array data to the Python level and the necessary meta information (in this case, the shape, type, and other info, e.g., byteswapping, necessary to properly interpret the array) to Python, the extension can provide its own functions or methods to convert these buffer objects into arrays without copying of the data in the buffer object. The extension can try to import the array package, and if it is present, provide arrays as a data format using this scheme. In many respects this is the most attractive approach. It has no dependencies on include files, build order, etc. This approach led to the suggestion that Python develop a buffer object that could contain meta information, and a way of supporting community conventions (e.g., a name attribute indicating which conventions was being used) to facilitate the interchange of any sort of binary data, not just arrays. We also concluded that it would be nice to be able create buffer objects from Python with malloced memory (currently one can only create buffer objects from other objects that already have memory allocated; there is no way of creating newly allocated, writable memory from Python within a buffer object; one can create a buffer object from a string, but it is not writable). Nevertheless, if an extension is written in C, none of these changes are necessary to make use of this mechanism for interchange purposes now. This is the approach we recommend trying. The obvious case to apply it to is PIL as test case. We should do this ourselves and offer it as a patch to PIL. Other obvious cases are to support image interchange for GUIs (e.g., wxPython) and OpenGL. 2) Scalar support, rank-0 and related. Travis and I agreed (we certainly seek comments on this conclusion; we may have forgotten about key arguments arguing for one the different approaches) that the desirability of using rank-0 arrays as return values from single element indexing depends on other factors, most importantly Python's support for scalars in various aspects. This is a multifaceted issue that will need to be determined by considering all the facets simultaneously. The following tries to list the pro's and con's previously discussed for returning scalars (two cases previously discussed) or rank-0 arrays (input welcomed). a) return only existing Python scalar types (cast upwards except for long long and long double based types) Pros: - What users probably expect (except matlab users!) - No performance hit in subsequent scalar expressions - faster indexing performance (?) Cons: - Doesn't support array attributes, numeric behaviors - What do you return for long long and long double? No matter what is done, you will either lose precision or lose consistency. Or you create a few new Python scalar types for the unrepresentable types? But, with subclassing in C the effort to create a few scalar types is very close to the effort to create many. b) create new Python scalar types and return those (one for each basic array type) Pros: - Exactly what numeric users expect in representation - No peformance hit in subsequent scalar expressions - faster indexing performance - Scalars have the same methods and attributes as arrays Cons: - Might require great political energy to eventually get the arraytype with all of its scalartype-children into the Python core. This is really an unknown, though, since if the arrayobject is in the standard module and not in the types module, then people may not care (a new type is essentially a new-style class and there are many, many classes in the Python standard library). A good scientific-packaging solution that decreases the desireability of putting the arrayobject into the core would help alleviate this problem as well. - By itself it doesn't address different numeric behaviors for the "still-present" Python scalars throughout Python. c) return rank-0 array Pros: - supports all array behaviors, particularly with regard to numerical processing, particularly with regard to ieee exception handling (a matter of some controversy, some would like it also to be len()=1 and support [0] index, which strictly speaking rank-0 arrays should not support) Cons: - Performance hit on all scalar operations (e.g., if one then does many loops over what appears to be a pure scalar expression, use of rank-0 will be much slower than Python scalars since use of arrays incurs significant overhead. - Doesn't eliminate the fact that one can still run into different numerical behavior involving operations between Python scalars. - Still necessary to write code that must deal with Python scalars "leaking" into code as inputs to functions. - Can't currently be used to index sequences (so not completely usable in place of scalars) Out of this came two potential needs (The first isn't strictly necessary if approach a is taken, but could help smooth use of all integer types as indexes if approach b is taken): If rank-0 arrays are returned, then Guido was very receptive to supporting a special method, __index__ which would allow any Python object to be used as an index to a sequence or mapping object. Calling this would return a value that would be suitable as index if the object was not itself suitable directly. Thus rank-0 arrays would have this method called to convert its internal integer value into a Python integer. There are some details about how this would work at the C level that need to be worked out. This would allow rank-0 integer arrays to be used as indices. To be useful, it would be necessary to get this into the core as quickly as possible (if there are C API issues that have lingering solutions that won't be solved right away, then a greatly delayed implementation in Python would make this less than useful). We talked at some length about whether it was possible to change Python's numeric behavior for scalars, namely support for configurable handling of numeric exceptions in the way numarray does it (and Numeric3 as well). In short, not much was resolved. Guido didn't much like the stack approach to the exception handling mode. His argument (a reasonable one) was that even if the stack allowed pushing and popping modes, it was fragile for two reasons. If one called other functions in other modules that were previously written without knowledge that the mode could be changed, those functions presumed the previous behavior and thus could be broken with mode change (though we suppose that just puts the burden on the caller to guard all external calls with restores to default behavior; even so, many won't do that leading to spurious bug reports that may annoy maintainers to no end though no fault of their own). He also felt that some termination conditions may cause missed pops leading to incorrect modes. He suggested studying the use of the decimal's use of context to see if it could used as a model. Overall he seemed to think that setting mode on a module basis was a better approach. Travis and I wondered about how that could be implemented (it seems to imply that the exception handling needs to know what module or namespace is being executed in order to determine the mode. So some more thought is needed regarding this. The difficulty of proposing such changes and getting them accepted is likely to be considerable. But Travis had a brilliant idea (some may see this as evil but I think it has great merit). Nothing prevents a C extension from hijacking the existing Python scalar objects behaviors. Once a reference is obtained to an integer, float or complex value, one can replace the table of operations on those objects with whatever code one wishes. In this way an array package could (optionally) change the behavior of Python scalars. In this way we could test the behavior of proposed changes quite easily, distribute that behavior quite easily in the community, and ultimately see if there are really any problems without expending any political energy to get it accepted. Once seeing if it really worked (without "forking" Python either), would place us in a much stronger position to have the new behaviors incorporated into the core. Even then, it may never prove necessary if can be so customized by the array package. This holds out the potential of making scalar/array behavior much more consistent. Doing this may allow option a) as the ultimate solution, i.e., no changes needed to Python at all (as such), no rank-0 arrays. This will be studied further. One possible issue is that adding the necessary machinery to make numeric scalar processing consistent with that of the array package may introduce significant performance penalties (what is negligible overhead for arrays may not be for scalars). One last comment is that it is unlikely that any choice in this area prevents the need for added helper functions to the array package to assist in writing code that works well with scalars and arrays. There are likely a number of such issues. A common approach is to wrap all unknown objects with "asarray". This works reasonably well but doesn't handle the following case: If you wish to write a function that will accept arrays or scalars, in principal it would be nice to return scalars if all that was supplied were scalars. So functions to help determine what the output type should be based on the inputs would be helpful, for example to distinguish from when someone provided a rank-0 array as an input (or rank-1 len-1 array) and an actual scalar if asarray happens to map this to the same thing so that the return can properly return a scalar if that is what was originally input. Other such tools may help writing code that allows the main body to treat all objects as arrays without needing checks for scalars. Other miscellaneous comments. The old use of where() may be deprecated and only "nonzero" interpretation will be kept. A new function will be defined to replace the old usage of where (we deem that regular expression search and replaces should work pretty well to make changes in almost all old code). With the use of buffer objects, tostring methods are likely to be deprecated. Python PEPs needed =================== From the discussions it was clear that at least two Python PEPs need to be written and implemented, but that these needed to wait until the unification of the arrayobject takes place. PEP 1: Insertion of an __index__ special method and an as_index slot (perhaps in the as_sequence methods) in the C-level typeobject into Python. PEP 2: Improvements on the buffer object and buffer builtin method so that buffer objects can be Python-tracked wrappers around allocated memory that extension packages can use and share. Two extensions are considered so far. 1) The buffer objects have a meta attribute so that meta information can be passed around in a unified manner and 2) The buffer builtin should take an integer giving the size of writeable buffer object to create. From jh at oobleck.astro.cornell.edu Thu Mar 10 08:46:24 2005 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Thu Mar 10 08:46:24 2005 Subject: [Numpy-discussion] Re: Notes from meeting with Guido regarding inclusion of array package in Python core In-Reply-To: <20050310153125.D7F6088827@sc8-sf-spam1.sourceforge.net> (numpy-discussion-request@lists.sourceforge.net) References: <20050310153125.D7F6088827@sc8-sf-spam1.sourceforge.net> Message-ID: <200503101645.j2AGjopf019350@oobleck.astro.cornell.edu> It never rains, but it pours! Thanks for talking with Guido and hammering out these issues and options. You are of course right that the release schedule issue is enough to keep us out of Python core for the time being (and matplotlib out of scipy, according to JDH at SciPy04, for the same reason). However, I think we should still strongly work to put it there eventually. For now, this means keeping it "acceptable", and communicating with Guido often to get his feedback and let him know what we are doing. There are three reasons I see for this. First, having it core-acceptable makes it clear to potential users that this is standard, stable, well-thought-out stuff. Second, it will mean that numerical behavior and plain python behavior will be as close as possible, so it will be easiest to switch between the two. Third, if we don't strive for acceptability, we will likely run into a problem in the future when something we depend on is deprecated or changed. No doubt this will happen anyway, but it will be worse if we aren't tight with Guido. Conversely, if we *are* tight with Guido, he is likely to be aware of our concerns and take them into account when making decisions about Python core. --jh-- From stephen.walton at csun.edu Thu Mar 10 09:35:01 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Mar 10 09:35:01 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: <422FBD4A.3030708@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> Message-ID: <423084F6.7020804@csun.edu> Can I put in a good word for Fortran? Not the language itself, but the available packages for it. I've always thought that one of the really good things about Scipy was the effort put into getting all those powerful, well tested, robust Fortran routines from Netlib inside Scipy. Without them, it seems to me that folks who just install the new scipy_base are going to re-invent a lot of wheels. Is it really that hard to install g77 on non-Linux platforms? Steve Walton From konrad.hinsen at laposte.net Thu Mar 10 10:47:01 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 10 10:47:01 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: <423084F6.7020804@csun.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <423084F6.7020804@csun.edu> Message-ID: On Mar 10, 2005, at 18:33, Stephen Walton wrote: > Can I put in a good word for Fortran? Not the language itself, but > the available packages for it. I've always thought that one of the > really good things about Scipy was the effort put into getting all > those powerful, well tested, robust Fortran routines from Netlib > inside Scipy. Without them, it seems to me that folks who just > install the new scipy_base are going to re-invent a lot of wheels. > > Is it really that hard to install g77 on non-Linux platforms? It takes some careful reading of the instructions, which in turn requires a good command of the English language, including some peculiar technical terms, and either some experience in software installation or a high intimidation threshold. It also takes a significant amount of time and disk space. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From Chris.Barker at noaa.gov Thu Mar 10 11:26:29 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Mar 10 11:26:29 2005 Subject: [Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core In-Reply-To: References: Message-ID: <4230D60A.9050108@noaa.gov> Perry Greenfield wrote: > So what about supporting arrays as an interchange format? I'd like to see some kind of definition of what this means, or maybe a set of examples, to help clarify this discussion. I'll start with my personal example: wxPython has a number of methods that can potentially deal with large datasets being passed between Python and C++. My personal example is drawing routines. For instance, drawing a large polyline or set of many points. When I need these, I invariably use NumPy arrays to store and manipulate the data in Python, then pass it in to wxPython to draw or whatever. Robin has created a set of functions like: "wxPointListHelper" that convert between Python sequences and the wxList of wxPoints that are required by wx. Early on, only lists of tuples (for this example) could be used. At some point, the Helper functions were extended (thanks to Tim Hochberg, I think) to use the generic sequence access methods so that Numeric arrays and other data structures could be used. This was fabulous, but at the moment, it is faster to pass in a list of tuples than it is to pass in a NX2 Numeric array, and numarrays are much slower still. A long time ago I suggested that Robin add (with help from me and others), Numeric-specific version of wxPointListHelper and friends. Robin declined, as he (quite reasonably) doesn't want a dependency on Numeric in wxPython. However, I still very much want wxPython to be able to work efficiently with numerix arrays. I'm going to comment on the following in light of this example. > a) So long as the extension package has access to the necessary array > include files, it can build the extension to use the arrays as a format > without actually having the array package installed. > The > extension would, when requested to use arrays would see if it could > import the array package, if not, then all use of arrays would result in > exceptions. I'm not sure this is even necessary. In fact, in the above example, what would most likely happen is that the **Helper functions would check to see if the input object was an array, and then fork the code if it were. An array couldn't be passed in unless the package were there, so there would be no need for checking imports or raising exceptions. > It could be built, and then later the array package could be > installed and no rebuilding would be necessary. That is a great feature. I'm concerned about the inclusion of all the headers in either the core or with the package, as that would lock you to a different upgrade cycle than the main numerix upgrade cycle. It's my experience that Numeric has not been binary compatible across versions. > b) One could modify the extension build process to see if the package is > installed and the include files are available, if so, it is built with > the support, otherwise not.The disadvantage is that later adding the array package > require the extension to be rebuilt This is a very big deal as most users on Windows and OS-X (and maybe even Linux) don't build packages themselves. A while back this was discussed on this very list, and it seemed like there was some idea about including not the whole numerix header package, but just the code for PyArray_Check or an equivalent. This would allow code to check if an input object was an array, and do something special if it was. That array-specific code would only get run if an array was passed in, so you'd know numerix was installed at run time. This would require Numerix to be installed at build time, but it would be optional at run time. I like this, because anyone capable of building wxPython (it can be tricky) is capable of installing Numeric, but folks that are using binaries don't need to know anything about it. This would only really work for extensions that use arrays, but don't create them. We'd still have the version mismatch problem too. > c) One could provide the support at the Python level by instead relying > on the use of buffer objects by the extension at the C level, thus > avoiding any dependence on the array C api. This sounds great, but is a little beyond me technically. > c) return rank-0 array > > Particularly with regard to ieee exception handling major pro here for me! > Guido was very receptive to > supporting a special method, __index__ which would allow any Python > object to be used as an index to a sequence or mapping object. yeah! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From perry at stsci.edu Thu Mar 10 12:21:20 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Mar 10 12:21:20 2005 Subject: [Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core In-Reply-To: <4230D60A.9050108@noaa.gov> References: <4230D60A.9050108@noaa.gov> Message-ID: <747d91fa83ebcbcfc71fb15dc54bce5b@stsci.edu> On Mar 10, 2005, at 6:19 PM, Chris Barker wrote: > >> a) So long as the extension package has access to the necessary array >> include files, it can build the extension to use the arrays as a >> format without actually having the array package installed. > > The >> extension would, when requested to use arrays would see if it could >> import the array package, if not, then all use of arrays would result >> in exceptions. > > I'm not sure this is even necessary. In fact, in the above example, > what would most likely happen is that the **Helper functions would > check to see if the input object was an array, and then fork the code > if it were. An array couldn't be passed in unless the package were > there, so there would be no need for checking imports or raising > exceptions. > So what would the helper function do if the argument was an array? You mean use the sequence protocol? Yes, I suppose that is always a fallback (but presumes that the original code to deal with such things is present; figuring out that a sequence satisfies array constraints can be a bit involved, especially at the C level) >> It could be built, and then later the array package could be >> installed and no rebuilding would be necessary. > > That is a great feature. > > I'm concerned about the inclusion of all the headers in either the > core or with the package, as that would lock you to a different > upgrade cycle than the main numerix upgrade cycle. It's my experience > that Numeric has not been binary compatible across versions. Hmmm, I thought it had been. It does make it much harder to change the api and structure layouts once in, but I thought that had been pretty stable. >> b) One could modify the extension build process to see if the package >> is installed and the include files are available, if so, it is built >> with the support, otherwise not.The disadvantage is that later adding >> the array package >> require the extension to be rebuilt > > This is a very big deal as most users on Windows and OS-X (and maybe > even Linux) don't build packages themselves. > > A while back this was discussed on this very list, and it seemed like > there was some idea about including not the whole numerix header > package, but just the code for PyArray_Check or an equivalent. This > would allow code to check if an input object was an array, and do > something special if it was. That array-specific code would only get > run if an array was passed in, so you'd know numerix was installed at > run time. This would require Numerix to be installed at build time, > but it would be optional at run time. I like this, because anyone > capable of building wxPython (it can be tricky) is capable of > installing Numeric, but folks that are using binaries don't need to > know anything about it. > > This would only really work for extensions that use arrays, but don't > create them. We'd still have the version mismatch problem too. > Yes, at the binary level. Perry From cookedm at physics.mcmaster.ca Thu Mar 10 12:45:28 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Mar 10 12:45:28 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: (konrad hinsen's message of "Thu, 10 Mar 2005 19:48:11 +0100") References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <423084F6.7020804@csun.edu> Message-ID: konrad.hinsen at laposte.net writes: > On Mar 10, 2005, at 18:33, Stephen Walton wrote: > >> Can I put in a good word for Fortran? Not the language itself, but >> the available packages for it. I've always thought that one of the >> really good things about Scipy was the effort put into getting all >> those powerful, well tested, robust Fortran routines from Netlib >> inside Scipy. Without them, it seems to me that folks who just >> install the new scipy_base are going to re-invent a lot of wheels. >> >> Is it really that hard to install g77 on non-Linux platforms? > > It takes some careful reading of the instructions, which in turn > requires a good command of the English language, including some > peculiar technical terms, and either some experience in software > installation or a high intimidation threshold. > > It also takes a significant amount of time and disk space. > > Konrad. I don't know about Windows, but on OS X it involves going to http://hpc.sourceforge.net/ and following the one paragraph of instructions. That could be even be simplified if an .pkg were made... In fact, it's so easy to make a .pkg with PackageMaker that I've done it :-) I've put a .pkg of g77 3.4 for OS X (using the above binaries) at http://arbutus.mcmaster.ca/dmc/osx/ [Warning: unsupported and lightly-tested. I'll email Gaurav Khanna about making packages of his other binaries.] It'll run, install into /usr/local/g77v3.4, and make a symlink at /usr/local/bin/g77 to the right binary. (To compile SciPy with this, I have to add -lcc_dynamic to the libraries to link with. I've got a patch which I'll submit to the SciPy bug tracker for that, soonish.) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From pf_moore at yahoo.co.uk Thu Mar 10 13:02:34 2005 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Thu Mar 10 13:02:34 2005 Subject: [Numpy-discussion] Re: Future directions for SciPy in light of meeting at Berkeley References: <422EA691.9080404@ee.byu.edu> Message-ID: Travis Oliphant writes: > 2) Installation problems -- I'm not completely clear on what the > "installation problems" really are. I hear people talk about them, but > Pearu has made significant strides to improve installation, so I'm not > sure what precise issues remain. Yes, installing ATLAS can be a pain, > but scipy doesn't require it. Yes, fortran support can be a pain, but > if you use g77 then it isn't a big deal. The reality, though, is that > there is this perception of installation trouble and it must be based on > something. Let's find out what it is. Please speak up users of the > world!!!! While I am not a scientific user, I occasionally have a need for something like stats, linear algebra, or other such functions. I'm happy to install something (I'm using Python on Windows, so when I say "install", I mean "download and run a binary installer") but I'm a casual user, so I am not going to go to too much trouble. First problem - no scipy Windows binaries for Python 2.4. I'm not going to downgrade my Python installation for the sake of scipy. Even assuming there were such binaries, I can't tell from the installer page whether I need to have Numeric, or is it included. Assuming I need to install it, the binaries say Numeric 23.5, with 23.1 available. But the latest Numeric is 23.8, and only 23.8 and 23.7 have Python 2.4 compatible Windows binaries. Stuck again. As for the PIII/P4SSE2 binaries, I don't know which of those I'd need, but that's OK, I'd go for "Generic", on the basis that speed isn't relevant to me... There's no way on Windows that I'd even consider building scipy from source - my need for it simply isn't sufficient to justify the cost. As I say, this is from someone who is clearly not in the target audience of scipy, but maybe it is of use... Paul. -- A little inaccuracy sometimes saves tons of explanation -- Saki From Chris.Barker at noaa.gov Thu Mar 10 14:31:33 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Mar 10 14:31:33 2005 Subject: [Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core In-Reply-To: <747d91fa83ebcbcfc71fb15dc54bce5b@stsci.edu> References: <4230D60A.9050108@noaa.gov> <747d91fa83ebcbcfc71fb15dc54bce5b@stsci.edu> Message-ID: <423101A0.8000804@noaa.gov> Perry Greenfield wrote: > On Mar 10, 2005, at 6:19 PM, Chris Barker wrote: >>> a) So long as the extension package has access to the necessary array >>> include files, it can build the extension to use the arrays as a >>> format without actually having the array package installed. >>> extension would, when requested to use arrays would see if it could >>> import the array package, if not, then all use of arrays would result >>> in exceptions. >> >> I'm not sure this is even necessary. In fact, in the above example, >> what would most likely happen is that the **Helper functions would >> check to see if the input object was an array, and then fork the code >> if it were. An array couldn't be passed in unless the package were >> there, so there would be no need for checking imports or raising >> exceptions. >> > So what would the helper function do if the argument was an array? You > mean use the sequence protocol? Sorry I wasn't clear. The present Helper functions check to see if the sequence is a list, and use list specific code if it is, otherwise, it falls back the sequence protocol, which is why it's slow for Numeric arrays. I'm proposing that if the input is an array, it will then use array-specific code (perhaps PyArray_ContiguousFromObject, then accessing *data directly) > (but presumes that the original code to deal with such things is > present; figuring out that a sequence satisfies array constraints can be > a bit involved, especially at the C level) yes, involved, and kind of slow. If it were me (and for my custom extensions is it), I'd just require Numeric, then always call PyArray_ContiguousFromObject and access the data array. Now that I've written that, I have a new idea: use the approach mentioned, and check if Numeric can be imported. If so go straight to PyArray_ContiguousFromObject every time. >> It's my experience >> that Numeric has not been binary compatible across versions. > > Hmmm, I thought it had been. It does make it much harder to change the > api and structure layouts once in, but I thought that had been pretty > stable. I now at least once I tried a Numeric extension (Konrad's netcdf one) that had been built with other version of Numeric, and weird results occurred. Nothing so obvious as a crash or error, however. You've got to love C! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Thu Mar 10 15:16:27 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 10 15:16:27 2005 Subject: [Numpy-discussion] Matlab is a tool for doing numerical computations with matrices and vectors. In-Reply-To: <421A26A5.7070306@sympatico.ca> References: <4218FAD8.6060804@sympatico.ca> <421908B0.90406@ee.byu.edu> <421A26A5.7070306@sympatico.ca> Message-ID: <4230D4FA.6010505@ee.byu.edu> >> I remember his work. I really liked many of his suggestions, though >> it took him a while to recognize that a Matrix class has been >> distributed with Numeric from very early on. > > > numpy.pdf dated 03-07-18 has > > "For those users, the Matrix class provides a more intuitive > interface. We defer discussion of the Matrix class until later." > [snip] > On the same page there is: > > "Matrix.py > The Matrix.py python module defines a class Matrix which is a > subclass of UserArray. The only differences > between Matrix instances and UserArray instances is that the * > operator on Matrix performs a > matrix multiplication, as opposed to element-wise multiplication, > and that the power operator ** is disallowed > for Matrix instances." > > In view of the above, I can understand why Huaiyu Zhu took a while. > His proposal was much more ambitious. There is always a lag between documentation and implementation. I would be interested to understand what "more ambitious" elements are still not in Numeric's Matrix object (besides the addition of a language operator of course). > > Yes, I know that the power operator is implemented and that there is a > random matrix but I hope that some attention is given to the > functionality PyMatrix. I recognize that the implementation has some > weakneses. Which aspects are you most interested in? I would be happy if you would consider placing something like PyMatrix under scipy_core instead of developing it separately. > >> Yes, it needed work, and a few of his ideas were picked up on and >> included in Numeric's Matrix object. > > > I suggest that this overstates what was picked up. I disagree. I was the one who picked them up and I spent a bit of time doing it. I implemented the power method, the ability to build matrices in blocks, the string processing for building matrices, and a lot of the special attribute names for transpose, hermitian transpose, and so forth. There may be some attributes that weren't picked up, and a discussion of which attributes are most important is warranted. > > Good, on both scores. I hope that the PEP will set out these ideas. You are probably in a better position time-wise to outline what you think belongs in a Matrix class. I look forward to borrowing your ideas for inclusion in scipy_core. -Travis From konrad.hinsen at laposte.net Thu Mar 10 16:17:27 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 10 16:17:27 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <423084F6.7020804@csun.edu> Message-ID: <0a038c9f9c74e4f11854c882f0d100d5@laposte.net> On 10.03.2005, at 21:44, David M. Cooke wrote: > I don't know about Windows, but on OS X it involves going to > http://hpc.sourceforge.net/ > and following the one paragraph of instructions. That could be > even be simplified if an .pkg were made... I wasn't thinking of Windows and OS X, but of the less common Unices. I did my last gcc/g77 installation three years ago on an Alpha station running whatever Compaq's Unix is called. It worked without any problems, but it still took me about two hours, and I am pretty experienced at installation work. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From rkern at ucsd.edu Thu Mar 10 17:19:18 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Mar 10 17:19:18 2005 Subject: [Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core In-Reply-To: <423101A0.8000804@noaa.gov> References: <4230D60A.9050108@noaa.gov> <747d91fa83ebcbcfc71fb15dc54bce5b@stsci.edu> <423101A0.8000804@noaa.gov> Message-ID: <4230DEE7.2020802@ucsd.edu> Chris Barker wrote: > Perry Greenfield wrote: > >> On Mar 10, 2005, at 6:19 PM, Chris Barker wrote: >> >>>> a) So long as the extension package has access to the necessary >>>> array include files, it can build the extension to use the arrays as >>>> a format without actually having the array package installed. > > >>>> extension would, when requested to use arrays would see if it could >>>> import the array package, if not, then all use of arrays would >>>> result in exceptions. >>> >>> >>> I'm not sure this is even necessary. In fact, in the above example, >>> what would most likely happen is that the **Helper functions would >>> check to see if the input object was an array, and then fork the code >>> if it were. An array couldn't be passed in unless the package were >>> there, so there would be no need for checking imports or raising >>> exceptions. >>> >> So what would the helper function do if the argument was an array? You >> mean use the sequence protocol? > > > Sorry I wasn't clear. The present Helper functions check to see if the > sequence is a list, and use list specific code if it is, otherwise, it > falls back the sequence protocol, which is why it's slow for Numeric > arrays. I'm proposing that if the input is an array, it will then use > array-specific code (perhaps PyArray_ContiguousFromObject, then > accessing *data directly) If the ?ber-buffer object (item 1c in Perry's notes) gets implemented in the standard library, then the Helper functions could test PyUberBuffer_Check() (or perhaps test for the presence of the extra Numeric information, whatever), dispatch on the typecode, and iterate through the data as appropriate. wx's C code doesn't need to know about the Numeric array struct (and thus doesn't need to include any headers), it just needs to know how to interpret the metadata provided by the ?ber-buffer. What's more, other packages could nearly seamlessly provide data in the same way. For example, suppose your wx function plopped a pixel image onto a canvas. It could take one of these buffers as the pixel source. PIL could be a source. A Numeric array could be a source. A string could be a source. A Quartz CGBitmapContext could be a source. As long as each could be adapted to include the conventional metadata, they could all be source for the wx function, and none of the packages need to know about each other much less be compiled against one another or depend on their existence at runtime. I say "nearly seamlessly" only because there might be an inevitable adaptation layer that adds or modifies the metadata. The buffer approach seems like the most Pythonic way to go. It encourages loose coupling and flexibility. It also encourages object adaptation, a la PyProtocols[1], which I like to push now and again. [1] http://peak.telecommunity.com/PyProtocols.html -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From cjw at sympatico.ca Thu Mar 10 17:54:20 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 10 17:54:20 2005 Subject: [Numpy-discussion] Matlab is a tool for doing numerical computations with matrices and vectors. In-Reply-To: <4230D4FA.6010505@ee.byu.edu> References: <4218FAD8.6060804@sympatico.ca> <421908B0.90406@ee.byu.edu> <421A26A5.7070306@sympatico.ca> <4230D4FA.6010505@ee.byu.edu> Message-ID: <4230FA13.4010202@sympatico.ca> Travis Oliphant wrote: > >>> I remember his work. I really liked many of his suggestions, though >>> it took him a while to recognize that a Matrix class has been >>> distributed with Numeric from very early on. >> >> >> >> numpy.pdf dated 03-07-18 has >> >> "For those users, the Matrix class provides a more intuitive >> interface. We defer discussion of the Matrix class until later." >> > [snip] > >> On the same page there is: >> >> "Matrix.py >> The Matrix.py python module defines a class Matrix which is a >> subclass of UserArray. The only differences >> between Matrix instances and UserArray instances is that the * >> operator on Matrix performs a >> matrix multiplication, as opposed to element-wise multiplication, >> and that the power operator ** is disallowed >> for Matrix instances." >> >> In view of the above, I can understand why Huaiyu Zhu took a while. >> His proposal was much more ambitious. > > > There is always a lag between documentation and implementation. I > would be interested to understand what "more ambitious" elements are > still not in Numeric's Matrix object (besides the addition of a > language operator of course). > >> >> Yes, I know that the power operator is implemented and that there is >> a random matrix but I hope that some attention is given to the >> functionality PyMatrix. I recognize that the implementation has some >> weakneses. > > > Which aspects are you most interested in? I would be happy if you > would consider placing something like PyMatrix under scipy_core > instead of developing it separately. Yes, after the dust of the current activity settles, I would certainly be interested in exploring this although I would see a closer association with Numeric3 than with scipy. > >> >>> Yes, it needed work, and a few of his ideas were picked up on and >>> included in Numeric's Matrix object. >> >> >> >> I suggest that this overstates what was picked up. > > > I disagree. I was the one who picked them up and I spent a bit of > time doing it. I implemented the power method, the ability to build > matrices in blocks, the string processing for building matrices, and a > lot of the special attribute names for transpose, hermitian transpose, > and so forth. > There may be some attributes that weren't picked up, and a discussion > of which attributes are most important is warranted. > >> >> Good, on both scores. I hope that the PEP will set out these ideas. > > > You are probably in a better position time-wise to outline what you > think belongs in a Matrix class. I look forward to borrowing your > ideas for inclusion in scipy_core. My thoughts are largely in the current implementation of PyMatrix. Below is an extract from the most recent announcement. I propose to explore the changes needed to use Numeric3 with the new ufuncs. Do you have any feel for when Alpha binary versions will likely be available? Colin W. ------------------------------------------------------------------------ Downloads in the form of a Windows Installer (Inno) and a zip file are available at: http://www3.sympatico.ca/cjw/PyMatrix An /Introduction to PyMatrix/ is available: http://www3.sympatico.ca/cjw/PyMatrix/IntroToPyMatrix.pdf Information on the functions and methods of the matrix module is given at: http://www3.sympatico.ca/cjw/PyMatrix/Doc/matrix-summary.html From mdehoon at ims.u-tokyo.ac.jp Thu Mar 10 18:51:16 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 10 18:51:16 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: <423084F6.7020804@csun.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <423084F6.7020804@csun.edu> Message-ID: <4231076E.6090507@ims.u-tokyo.ac.jp> Stephen Walton wrote: > Can I put in a good word for Fortran? Not the language itself, but the > available packages for it. I've always thought that one of the really > good things about Scipy was the effort put into getting all those > powerful, well tested, robust Fortran routines from Netlib inside > Scipy. Without them, it seems to me that folks who just install the new > scipy_base are going to re-invent a lot of wheels. > > Is it really that hard to install g77 on non-Linux platforms? > I agree that Netlib should be in SciPy. But why should Netlib be in scipy_base? If SciPy evolves into a website of scientific packages for python, I presume Netcdf will be in one of those packages, maybe even a package by itself. Such a package, together with a couple of binary installers for common platforms, will be appreciated by users and developers who need Netcdf. But if Netcdf is in scipy_base, you're effectively forcing most users to waste time on Fortran only to install something they don't need. In turn, those users will ask their developers for help if something goes wrong (or give up altogether). And those developers, also not willing to waste time on something they don't need, will tell their users to use Numerical Python instead of SciPy. --Michiel. From mdehoon at ims.u-tokyo.ac.jp Thu Mar 10 18:56:14 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 10 18:56:14 2005 Subject: [Numpy-discussion] Re: Future directions for SciPy in light of meeting at Berkeley In-Reply-To: References: <422EA691.9080404@ee.byu.edu> Message-ID: <42310899.1090302@ims.u-tokyo.ac.jp> Paul Moore wrote: > Travis Oliphant writes >>2) Installation problems -- I'm not completely clear on what the >>"installation problems" really are. > > While I am not a scientific user, I occasionally have a need for > something like stats, linear algebra, or other such functions. I'm > happy to install something (I'm using Python on Windows, so when I > say "install", I mean "download and run a binary installer") but I'm > a casual user, so I am not going to go to too much trouble. ... > There's no way on Windows that I'd even consider building scipy from > source - my need for it simply isn't sufficient to justify the cost. > > As I say, this is from someone who is clearly not in the target > audience of scipy, but maybe it is of use... > I think you perfectly described the experience of a typical Biopython user. So as far as I'm concerned, you're squarely in the target audience of SciPy, if it intends to replace Numeric. --michiel. From mdehoon at ims.u-tokyo.ac.jp Thu Mar 10 19:18:11 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 10 19:18:11 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> Message-ID: <42310D7D.3000009@ims.u-tokyo.ac.jp> Perry Greenfield wrote: > On Mar 9, 2005, at 11:41 PM, eric jones wrote: >> 2. I do question whether weave really be in this core? I think it was >> in scipy_core before because it was needed to build some of scipy. >> 3. Now that I think about it, I also wonder if f2py should really be >> there -- especially since we are explicitly removing any fortran >> dependencies from the core. > > > It would seem to me that so long as: > > 1) both these tools have very general usefulness (and I think they do), and > 2) are not installation problems (I don't believe they are since they > themselves don't require any compilation of Fortran, C++ or whatever--am > I wrong on that?) > > That they are perfectly fine to go into the core. In fact, if they are > used by any of the extra packages, they should be in the core to > eliminate the extra step in the installation of those packages. > -0. 1) In der Beschraenkung zeigt sich der Meister. In other words, avoid software bloat. 2) f2py is a Fortran-Python interface generator, once the interface is created there is no need for the generator. 3) I'm sure f2py is useful, but I doubt that it has very general usefulness. There are lots of other useful Python packages, but we're not including them in scipy-core either. 4) f2py and weave don't fit in well with the rest of scipy-core, which is mainly standard numerical algorithms. --Michiel. --Michiel. From oliphant at ee.byu.edu Thu Mar 10 19:55:34 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 10 19:55:34 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <42310D7D.3000009@ims.u-tokyo.ac.jp> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <42310D7D.3000009@ims.u-tokyo.ac.jp> Message-ID: <4231165F.1040908@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Perry Greenfield wrote: > >> On Mar 9, 2005, at 11:41 PM, eric jones wrote: >> >>> 2. I do question whether weave really be in this core? I think it >>> was in scipy_core before because it was needed to build some of scipy. >>> 3. Now that I think about it, I also wonder if f2py should really be >>> there -- especially since we are explicitly removing any fortran >>> dependencies from the core. >> >> >> >> It would seem to me that so long as: >> >> 1) both these tools have very general usefulness (and I think they >> do), and >> 2) are not installation problems (I don't believe they are since they >> themselves don't require any compilation of Fortran, C++ or >> whatever--am I wrong on that?) >> >> That they are perfectly fine to go into the core. In fact, if they >> are used by any of the extra packages, they should be in the core to >> eliminate the extra step in the installation of those packages. >> > -0. > 1) In der Beschraenkung zeigt sich der Meister. In other words, avoid > software bloat. > 2) f2py is a Fortran-Python interface generator, once the interface is > created there is no need for the generator. > 3) I'm sure f2py is useful, but I doubt that it has very general > usefulness. There are lots of other useful Python packages, but we're > not including them in scipy-core either. > 4) f2py and weave don't fit in well with the rest of scipy-core, which > is mainly standard numerical algorithms. I'm of the opinion that f2py and weave should go into the core. 1) Neither one requires Fortran and both install very, very easily. 2) These packages are fairly small but provide huge utility --- inlining fortran or C code is an easy way to speed up Python. People who don't "need it" will never realize it's there 3) Building the rest of scipy will need at least f2py already installed and it would simplify the process. 4) Enthought packages (to be released in the future and of interest to scientists) rely on weave. Why not make that process easier with a single initial install. 5) It would encourage improvements of weave and f2py from the entire community. 6) The developers of f2py and weave are both scipy developers and so it would make sense for their code that forms a foundation for other work to go into scipy_core. -Travis From prabhu_r at users.sf.net Fri Mar 11 00:30:16 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Fri Mar 11 00:30:16 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <4231165F.1040908@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <42310D7D.3000009@ims.u-tokyo.ac.jp> <4231165F.1040908@ee.byu.edu> Message-ID: <16945.22219.772480.154332@monster.linux.in> >>>>> "TO" == Travis Oliphant writes: TO> I'm of the opinion that f2py and weave should go into the TO> core. If you are looking for feedback, I'd say +2 for that. regards, prabhu From oliphant at ee.byu.edu Fri Mar 11 01:07:05 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 11 01:07:05 2005 Subject: [Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core In-Reply-To: <4230DEE7.2020802@ucsd.edu> References: <4230D60A.9050108@noaa.gov> <747d91fa83ebcbcfc71fb15dc54bce5b@stsci.edu> <423101A0.8000804@noaa.gov> <4230DEE7.2020802@ucsd.edu> Message-ID: <42315EEE.2090304@ee.byu.edu> >> Sorry I wasn't clear. The present Helper functions check to see if >> the sequence is a list, and use list specific code if it is, >> otherwise, it falls back the sequence protocol, which is why it's >> slow for Numeric arrays. I'm proposing that if the input is an array, >> it will then use array-specific code (perhaps >> PyArray_ContiguousFromObject, then accessing *data directly) > > > If the ?ber-buffer object (item 1c in Perry's notes) gets implemented > in the standard library, then the Helper functions could test > PyUberBuffer_Check() (or perhaps test for the presence of the extra > Numeric information, whatever), dispatch on the typecode, and iterate > through the data as appropriate. wx's C code doesn't need to know > about the Numeric array struct (and thus doesn't need to include any > headers), it just needs to know how to interpret the metadata provided > by the ?ber-buffer. > > What's more, other packages could nearly seamlessly provide data in > the same way. For example, suppose your wx function plopped a pixel > image onto a canvas. It could take one of these buffers as the pixel > source. PIL could be a source. A Numeric array could be a source. A > string could be a source. A Quartz CGBitmapContext could be a source. > As long as each could be adapted to include the conventional metadata, > they could all be source for the wx function, and none of the packages > need to know about each other much less be compiled against one > another or depend on their existence at runtime. I say "nearly > seamlessly" only because there might be an inevitable adaptation layer > that adds or modifies the metadata. > > The buffer approach seems like the most Pythonic way to go. It > encourages loose coupling and flexibility. It also encourages object > adaptation, a la PyProtocols[1], which I like to push now and again. I really, really like this direction. Todd's memoryobject in numarray should be merged with the buffer object in Python to be this new buffer type and the appropriate meta-data added. We should then, start encouraging this sort of buffer-mediated duck-typing for all raw memory-like objects and the buffer protocol expanded to encourage the specification of metadata (or classes of metadata). We should do a lot more of this....(a la namespaces...) -Travis From konrad.hinsen at laposte.net Fri Mar 11 02:34:14 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 11 02:34:14 2005 Subject: [Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core In-Reply-To: References: Message-ID: <17b4f8747178ed5df4e3ab152ee69fb7@laposte.net> On Mar 10, 2005, at 16:28, Perry Greenfield wrote: > On March 7th Travis Oliphant and Perry Greenfield met Guido and Paul > Dubois to discuss some issues regarding the inclusion of an array > package within core Python. A good initiative - and thanks for the report! > So what about supporting arrays as an interchange format? There are a > number of possibilities to consider, none of which require inclusion > of arrays into the core. It is possible for 3rd party extensions to > optionally support arrays as an interchange format through one of the > following mechanisms: True, but any of these options requires a much bigger effort than relying on a module in the standard library. Pointing out these methods is not exactly a way of encouraging people to use arrays as an interchange format, it's more a way of telling them that if they need a compact interchange format badly, there is a solution. > a) So long as the extension package has access to the necessary array > include files, it can build the extension to use the arrays as a > format without actually having the array package installed. The > include files alone could be included into the core True, but this implies nearly the same restrictions to evolution of the array code as having it in the core. The Numeric headers have changed frequently in the past. > seem quite as receptive instead suggesting the next option) or could > be packaged with extension (we would prefer the former to reduce the > possibilities of many copies of include files). The extension could > then be successfully compiled without Having the header files in all client extensions is a sure recipe to block Numeric development. Any header change would imply non-acceptance by the end-user community. If C were a language with implementation-independent interface descriptions, such approaches would be reasonable, but C is... well, C. > b) One could modify the extension build process to see if the package > is installed and the include files are available, if so, it is built > with the support, otherwise not. This is already possible today, and probably used by some extension modules. I use a similar test to build the netCDF interface selectively (if netCDF is available), and I can tell from experience that this causes quite some confusion for some users who install ScientificPython before netCDF (although the instructions point this out - but nobody seems to read instructions). But the main problem with this approach is that it doesn't work for pre-built binary distributions, i.e. in particular the Windows world. > c) One could provide the support at the Python level by instead > relying on the use of buffer objects by the extension at the C level, > thus avoiding any dependence on the array C api. So long as the > extension has the ability to return buffer objects That's certainly the cleanest solution, but it also requires a serious effort from the extension module writer: one more API to learn and use, and conversion between buffers and arrays in all modules that definitely need array functions. > We talked at some length about whether it was possible to change > Python's numeric behavior for scalars, namely support for configurable > handling of numeric exceptions in the way numarray does it (and > Numeric3 as well). In short, not much was resolved. Guido didn't much > like the stack approach to the exception handling mode. His argument > (a reasonable one) was that even if the stack allowed pushing I agree with Guido there. It looks like a hack. > the decimal's use of context to see if it could used as a model. > Overall he seemed to think that setting mode on a module basis was a > better approach. Travis and I wondered about how that could be > implemented (it seems to imply that the exception handling needs to > know what module or namespace is being executed in order to determine > the mode. That doesn't look simple. How about making error handling a characteristic of the type itself? That would double the number of float element types, but that doesn't seem a big deal to me. Handling the conversions and coercions is probably a bigger headache. > So some more thought is needed regarding this. The difficulty of > proposing such changes and getting them accepted is likely to be > considerable. But Travis had a brilliant idea (some may see this as > evil but I think it has great merit). Nothing prevents a C extension > from hijacking the existing Python scalar objects behaviors. True, and I like that idea a lot for testing and demonstrating concepts. Whether it's a good idea for production code is another question, and one to be discussed with Guido and the Python team in my opinion. > Python at all (as such), no rank-0 arrays. This will be studied > further. One possible issue is that adding the necessary machinery to > make numeric scalar processing consistent with that of the array > package may introduce significant performance penalties (what is > negligible overhead for arrays may not be for scalars). Adding a couple of methods should not cause any overhead at all. Where do you see the origin of the overhead? > One last comment is that it is unlikely that any choice in this area > prevents the need for added helper functions to the array package to > assist in writing code that works well with scalars and arrays. There > are likely a number of such issues. A common That remains to be seen. I must admit that I am personally a bit surprised by the importance this problem seems to have for many. I have a single spot on a single module that checks for scalar vs. array, which is negligible considering the amount of numerical code that I have. > approach is to wrap all unknown objects with "asarray". This works > reasonably well but doesn't handle the following case: If you wish to > write a function that will accept arrays or scalars, in principal it > would be nice to return scalars if all that was supplied were scalars. > So functions to help determine what the output type should That happens automatically is you use asarray() only when you definitely need an array. I would expect this to be the case for list arguments rather than for scalar arguments. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From Chris.Barker at noaa.gov Fri Mar 11 08:59:06 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Mar 11 08:59:06 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <4231165F.1040908@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <42310D7D.3000009@ims.u-tokyo.ac.jp> <4231165F.1040908@ee.byu.edu> Message-ID: <4232051A.1010701@noaa.gov> Travis Oliphant wrote: > I'm of the opinion that f2py and weave should go into the core. <(6 good points)> The act of putting something into the core will encourage people to use it. My understanding of the idea of the core is that it is minimal set of packages that various developers can use as a basis for their domain specific stuff. One barrier to entry for people currently using the whole of SciPy is the ease of installation issue, and f2py and weave are easy to install, so that's not a problem. However, if I understand it correctly, neither weave nor f2py is the least bit useful without a compiler. If they are in the core, you are encouraging people to use them in their larger packages, which will then impose a dependency on compilers. This seems to me not to fit in with the purpose of the core, which is to be a SINGLE, robust, easy-to-install dependency that others can build on. I suggest that weave and f2py go into a "devel" or "high-performance" package instead. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Fri Mar 11 09:08:28 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Mar 11 09:08:28 2005 Subject: [Numpy-discussion] Another thought on future directions In-Reply-To: <4231165F.1040908@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <42310D7D.3000009@ims.u-tokyo.ac.jp> <4231165F.1040908@ee.byu.edu> Message-ID: <4232076B.1030200@noaa.gov> I've got one more issue that might bear thinking about at this juncture: Versioning control One issue that has been brought up in the discussion of using ndarrays as an interchange format with other packages is that those packages might well become dependent on a particular version of SciPy. For me, this brings up the issue that I might well want (or need) to have more than one version of SciPy installed at once, and be able to select which one is used at run time. If nothing else, it facilitates testing as new versions come out. I suggest a system similar to that recently added to wxPython: import wxversion wxversion.select("2.5") import wx See: http://wiki.wxpython.org/index.cgi/MultiVersionInstalls for more details. Between the wxPython list and others, a lot of pros and cons to doing this have been laid out. Honestly, there never really was a consensus among the wxPython community, but Robin decided to go for it, and I, for one, am very happy with it. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Fri Mar 11 18:27:31 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 11 18:27:31 2005 Subject: [Numpy-discussion] Slightly altered multidimensional slicing behavior Message-ID: <42325338.3020709@ee.byu.edu> Hi all, I've updated the PEP on the numeric web page to reflect an improved (I think) usage of Ellipsis and slice objects when mixed with integer indexing arrays. Basically, since partial indexing already assumed an ending ellipsis. The presence of ellipsis or slice objects in the tuple, allow the user to move the position of the partial indexing. It does get a little mind blowing, but is actually not "too" bad using the mapiter object to implement. -Travis From juenglin at cs.pdx.edu Sat Mar 12 20:17:30 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Sat Mar 12 20:17:30 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <4231165F.1040908@ee.byu.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <42310D7D.3000009@ims.u-tokyo.ac.jp> <4231165F.1040908@ee.byu.edu> Message-ID: <1110686482.18704.23.camel@localhost.localdomain> On Thu, 2005-03-10 at 19:54, Travis Oliphant wrote: > I'm of the opinion that f2py and weave should go into the core. > +1 ralf From mdehoon at ims.u-tokyo.ac.jp Sun Mar 13 04:50:28 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 13 04:50:28 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> Message-ID: <423436E5.1070407@ims.u-tokyo.ac.jp> Pearu Peterson wrote: >> Travis Oliphant wrote: >> >>> 1) There will be a scipy_core package which will be essentially what >>> Numeric has always been (plus a few easy to install extras already in >>> current scipy_core). ... >>> linalg (a lite version -- no fortran or ATLAS dependency) > > Again, what would be the underlying linear algebra library here? > Numeric uses f2c version of lite lapack library. Shall we do the same > but wrapping the c codes with f2py rather than by hand? f2c might be > useful also in other cases to reduce fortran dependency, but only when > it is critical to ease the scipy_core installation. > If I understand Travis correctly, the idea is to use Numeric as the basis for scipy_core, allowing current Numerical Python users to switch to scipy_core with a minimum of trouble. So why not use Numeric's lite lapack library directly? What is the advantage of repeating the c code wrapping (by f2py or by hand)? --Michiel. From pearu at scipy.org Sun Mar 13 11:34:33 2005 From: pearu at scipy.org (Pearu Peterson) Date: Sun Mar 13 11:34:33 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <423436E5.1070407@ims.u-tokyo.ac.jp> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <423436E5.1070407@ims.u-tokyo.ac.jp> Message-ID: On Sun, 13 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Pearu Peterson wrote: >>> Travis Oliphant wrote: >>> >>>> 1) There will be a scipy_core package which will be essentially what >>>> Numeric has always been (plus a few easy to install extras already in >>>> current scipy_core). > ... >>>> linalg (a lite version -- no fortran or ATLAS dependency) >> >> Again, what would be the underlying linear algebra library here? >> Numeric uses f2c version of lite lapack library. Shall we do the same but >> wrapping the c codes with f2py rather than by hand? f2c might be useful >> also in other cases to reduce fortran dependency, but only when it is >> critical to ease the scipy_core installation. >> > If I understand Travis correctly, the idea is to use Numeric as the basis for > scipy_core, allowing current Numerical Python users to switch to scipy_core > with a minimum of trouble. So why not use Numeric's lite lapack library > directly? What is the advantage of repeating the c code wrapping (by f2py or > by hand)? First, I wouldn't repeat wrapping c codes by hand. But using f2py wrappers has the following advantages: (i) maintaining the wrappers is easier (as the wrappers are generated) (ii) one can easily link linalg_lite against optimized lapack. This is certainly possible with current Numeric but for a smaller set of Fortran compilers than when using f2py generated wrappers (for example, if a compiler produces uppercased symbol names then Numeric wrappers won't work) (iii) scipy provides wrappers to a larger set of lapack subroutines than Numeric, and with f2py it is easier and less errorprone to add new wrappers to lapack functions than wrapping them by hand, i.e. extending f2py generated linalg_lite is much easier than extending the current Numeric lapack_lite. (iv) and finally, f2py generated wrappers tend to be more efficient than Numeric hand coded wrappers. Here are some benchmark results comparing scipy and Numeric linalg functions: Finding matrix determinant ================================== | contiguous | non-contiguous ---------------------------------------------- size | scipy | Numeric | scipy | Numeric 20 | 0.16 | 0.22 | 0.17 | 0.26 (secs for 2000 calls) 100 | 0.29 | 0.41 | 0.28 | 0.56 (secs for 300 calls) 500 | 0.31 | 0.36 | 0.33 | 0.45 (secs for 4 calls) Finding matrix inverse ================================== | contiguous | non-contiguous ---------------------------------------------- size | scipy | Numeric | scipy | Numeric 20 | 0.28 | 0.33 | 0.27 | 0.37 (secs for 2000 calls) 100 | 0.64 | 1.06 | 0.64 | 1.24 (secs for 300 calls) 500 | 0.83 | 1.10 | 0.84 | 1.18 (secs for 4 calls) Solving system of linear equations ================================== | contiguous | non-contiguous ---------------------------------------------- size | scipy | Numeric | scipy | Numeric 20 | 0.26 | 0.18 | 0.26 | 0.21 (secs for 2000 calls) 100 | 0.31 | 0.35 | 0.31 | 0.52 (secs for 300 calls) 500 | 0.33 | 0.34 | 0.35 | 0.41 (secs for 4 calls) Remark: both scipy and Numeric are linked agaist the same ATLAS/Lapack library. Pearu From mdehoon at ims.u-tokyo.ac.jp Sun Mar 13 18:07:03 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 13 18:07:03 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <423436E5.1070407@ims.u-tokyo.ac.jp> Message-ID: <4234F187.2030506@ims.u-tokyo.ac.jp> Pearu Peterson wrote: >> If I understand Travis correctly, the idea is to use Numeric as the >> basis for scipy_core, allowing current Numerical Python users to >> switch to scipy_core with a minimum of trouble. So why not use >> Numeric's lite lapack library directly? What is the advantage of >> repeating the c code wrapping (by f2py or by hand)? > > > First, I wouldn't repeat wrapping c codes by hand. > But using f2py wrappers has the following advantages: OK I'm convinced. From a user perspective, it's important that the scipy_core linear algebra looks and feels as the Numerical Python linear algebra package. So if a user does >>> from LinearAlgebra import myfavoritefunction s/he should not note any difference other than "hey, my favorite function seems to be running faster now!" --Michiel. From stephen.walton at csun.edu Mon Mar 14 15:13:10 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Mar 14 15:13:10 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: <4231076E.6090507@ims.u-tokyo.ac.jp> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <423084F6.7020804@csun.edu> <4231076E.6090507@ims.u-tokyo.ac.jp> Message-ID: <42361A2D.2030708@csun.edu> Michiel Jan Laurens de Hoon wrote: > I agree that Netlib should be in SciPy. But why should Netlib be in > scipy_base? It should not, and I'm sorry if my original message made it sound like I was advocating for that. I was mainly advocating for f2py to be in scipy_base. From juenglin at cs.pdx.edu Mon Mar 14 17:51:19 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Mon Mar 14 17:51:19 2005 Subject: [Numpy-discussion] Half baked C API? Message-ID: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> I recently took a closer at Numeric's and numarray's C APIs for the first time and was surprised not to find the counterparts for all the array functions that are available in the Python API. Did I overlook anything, or do I really have to re-implement things like 'sum', 'argmax', 'convolve', 'cos' in C? Ralf From nwagner at mecha.uni-stuttgart.de Mon Mar 14 23:55:39 2005 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Mon Mar 14 23:55:39 2005 Subject: [Numpy-discussion] cvs access is broken Message-ID: <423694C7.4020506@mecha.uni-stuttgart.de> cvs access is broken cvs -z3 -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy co -P Numerical cvs [checkout aborted]: unrecognized auth response from cvs.sourceforge.net: M PserverBackend::PserverBackend() Connect (Connection refused) From konrad.hinsen at laposte.net Tue Mar 15 00:23:39 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Tue Mar 15 00:23:39 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> References: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> Message-ID: <0112f75f1a55b092a59733925faf2056@laposte.net> On 15.03.2005, at 02:50, Ralf Juengling wrote: > I recently took a closer at Numeric's and numarray's C APIs for > the first time and was surprised not to find the counterparts > for all the array functions that are available in the Python API. > > Did I overlook anything, or do I really have to re-implement > things like 'sum', 'argmax', 'convolve', 'cos' in C? Can you think of a real-life situation where you would want to call these from C? Usually, C modules using arrays are written to add functionality that can not be expressed efficiently in terms of existing array operations. If you want to compose things like sum and argmax, you can do that in Python. Note also that if you do need to call these routines from your C code, you can always do so via the generic Python API for calling Python functions. However, reimplementing them in C may often turn out to be simpler - doing sum() in C is really a trivial piece of work. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From cjw at sympatico.ca Tue Mar 15 04:47:24 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Mar 15 04:47:24 2005 Subject: [Numpy-discussion] SourceForge.net: A04. Site Status (en) Message-ID: <4236D8D8.4010900@sympatico.ca> Travis, In view of the difficulty at Sourceforge (see below), would it make sense to make the draft PEP available on the Python site? I haven't been able to read the most recent update. Colin W. http://sourceforge.net/docman/display_doc.php?group_id=1&docid=2352#1107968334 From juenglin at cs.pdx.edu Tue Mar 15 09:13:30 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Tue Mar 15 09:13:30 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <0112f75f1a55b092a59733925faf2056@laposte.net> References: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> <0112f75f1a55b092a59733925faf2056@laposte.net> Message-ID: <423715A6.40601@cs.pdx.edu> konrad.hinsen at laposte.net wrote: >> Did I overlook anything, or do I really have to re-implement >> things like 'sum', 'argmax', 'convolve', 'cos' in C? > > > Can you think of a real-life situation where you would want to call > these from C? Usually, C modules using arrays are written to add > functionality that can not be expressed efficiently in terms of > existing array operations. If you want to compose things like sum and > argmax, you can do that in Python. Yes. Think of dynamic programming algorithms like forward, backward, and viterbi for Hidden Markov Models. In this case you cannot avoid a loop over one axis, yet the code in the loop can be expressed in a few lines by matrix operations like 'dot', 'sum', 'outerproduct', 'argmax', elementwise multiplication, etc. As an example, the forward algorithm can be written as alpha[0] = P_YcS[y[0]]*P_S0 gamma[0] = sum(alpha[0]) alpha[0] /= gamma[0] for t in xrange(1, T): P_ScY_1_prev = dot(P_ScS, alpha[t-1]) P_SYcY_1_prev = P_YcS[y[t]]*P_ScY_1_prev gamma[t] = sum(P_SYcY_1_prev) alpha[t] = P_SYcY_1_prev/gamma[t] > Note also that if you do need to call these routines from your C code, > you can always do so via the generic Python API for calling Python > functions. However, reimplementing them in C may often turn out to be > simpler - doing sum() in C is really a trivial piece of work. Sure, many array functions like 'sum' are easy to implement in C, but I don't want to, if don't have to for performance reasons. In an ideal world, I'd spend most of my time prototyping the algorithms in Python, and then, if performance is a problem, translate parts to C (or hand them to weave.blitz) with only minor changes to the prototype code. And if that's still not fast enough, then I'd go and rethink the problem in C. Referring to the example above, I'd also want that an optimized BLAS implementation of 'dot' be used if available, and the scipy_core substitute version if not. So yeah, I claim that, to make weave a truly useful tool, all array functions of the Python API should also be available in the C API. Maybe having a few specialized versions, e.g., for contiguous arrays of double floats, would be a good idea, too. Ralf > > Konrad. > -- > ------------------------------------------------------------------------ > ------- > Konrad Hinsen > Laboratoire Leon Brillouin, CEA Saclay, > 91191 Gif-sur-Yvette Cedex, France > Tel.: +33-1 69 08 79 25 > Fax: +33-1 69 08 82 61 > E-Mail: khinsen at cea.fr > ------------------------------------------------------------------------ > ------- From konrad.hinsen at laposte.net Tue Mar 15 09:53:52 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Tue Mar 15 09:53:52 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <423715A6.40601@cs.pdx.edu> References: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> <0112f75f1a55b092a59733925faf2056@laposte.net> <423715A6.40601@cs.pdx.edu> Message-ID: <6bd62f3f856891896b1a630cbb8b3aa1@laposte.net> On Mar 15, 2005, at 18:04, Ralf Juengling wrote: > Yes. Think of dynamic programming algorithms like forward, backward, > and viterbi for Hidden Markov Models. In this case you cannot avoid > a loop over one axis, yet the code in the loop can be expressed in > a few lines by matrix operations like 'dot', 'sum', 'outerproduct', How much do you expect to gain compared to a Python loop in such a case? > I don't want to, if don't have to for performance reasons. In an ideal > world, I'd spend most of my time prototyping the algorithms in Python, > and then, if performance is a problem, translate parts to C (or hand > them to weave.blitz) with only minor changes to the prototype code. Did you consider Pyrex? It lets you move from pure Python to pure C with Python syntax, mixing both within a single function. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From oliphant at ee.byu.edu Tue Mar 15 09:54:35 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 15 09:54:35 2005 Subject: [Numpy-discussion] SourceForge.net: A04. Site Status (en) In-Reply-To: <4236D8D8.4010900@sympatico.ca> References: <4236D8D8.4010900@sympatico.ca> Message-ID: <423720EB.5070604@ee.byu.edu> Colin J. Williams wrote: > Travis, > > In view of the difficulty at Sourceforge (see below), would it make > sense to make the draft PEP available on the Python site? > > I haven't been able to read the most recent update. > A recent copy of the PEP is always here: http://numeric.scipy.org/PEP.txt From juenglin at cs.pdx.edu Tue Mar 15 10:26:56 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Tue Mar 15 10:26:56 2005 Subject: [Numpy-discussion] Half baked C API? Message-ID: <423726FC.5040709@cs.pdx.edu> konrad.hinsen at laposte.net wrote: > > How much do you expect to gain compared to a Python loop in such > a case? I'd expect a factor 5 to 10. > > Did you consider Pyrex? It lets you move from pure Python to pure C > with Python syntax, mixing both within a single function. I looked at it, but haven't tried it out yet. As far as I understand it, if I'd give Pyrex the example code in my previous posting to translate it to C, the result would contain calls to the Python interpreter to have it eveluate unknown functions like 'dot', 'sum' etc. That would be quite slow. So besides having counterparts in the C API, the tool that does the translation also needs to know about those. Ralf From oliphant at ee.byu.edu Tue Mar 15 10:30:53 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 15 10:30:53 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> References: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> Message-ID: <42372930.8060301@ee.byu.edu> Ralf Juengling wrote: >I recently took a closer at Numeric's and numarray's C APIs for >the first time and was surprised not to find the counterparts >for all the array functions that are available in the Python API. > > > How much to support on the C-API level is a question I am interested in right now. I have mixed feelings. On the one hand, it is much simpler on the array package developer (which from my perspective seems to be the short-end of the current stick) to have a reduced C-API and require individuals who want to access the functionality to go through the PyObject_CallMethod approach. We could perhaps provide a single function that made this a little simpler for arrays. On the other hand, a parallel API that made available everything that was present in Python might "look nicer," be a little faster, and make it easier on the extension writer. I'm interested in opinions, -Travis From juenglin at cs.pdx.edu Tue Mar 15 10:34:19 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Tue Mar 15 10:34:19 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <6bd62f3f856891896b1a630cbb8b3aa1@laposte.net> References: <1110851415.27984.44.camel@alpspitze.cs.pdx.edu> <0112f75f1a55b092a59733925faf2056@laposte.net> <423715A6.40601@cs.pdx.edu> <6bd62f3f856891896b1a630cbb8b3aa1@laposte.net> Message-ID: <423728E0.7050500@cs.pdx.edu> konrad.hinsen at laposte.net wrote: > Did you consider Pyrex? It lets you move from pure Python to pure C with > Python syntax, mixing both within a single function. Forgot to say, I very much like the Pyrex approach, though. Ralf From perry at stsci.edu Tue Mar 15 11:03:57 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Mar 15 11:03:57 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <423726FC.5040709@cs.pdx.edu> References: <423726FC.5040709@cs.pdx.edu> Message-ID: <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> On Mar 15, 2005, at 1:18 PM, Ralf Juengling wrote: > konrad.hinsen at laposte.net wrote: > > > > How much do you expect to gain compared to a Python loop in such > > a case? > > I'd expect a factor 5 to 10. > How did you come to that conclusion? It's not at all clear to me that the overhead of the Python operation (i.e., calling the appropriate Python method or function from C) will add appreciably to the time it takes to call it from C. Remember, the speed of the C version of the Python function may have much more overhead than what you envision for an equivalent C function that you would write. So it isn't good enough to compare the speed of a python loop to the C code to do sum and dot that you would write. Adding these to the API is extra work, and worse, it perhaps risks making it harder to change the internals since so much more of what is in C is exposed. The current API is essentially centered around exposing the data and means of converting and copying the data, and to a lesser extent, building new UFuncs (for use at the Python level). Perry From juenglin at cs.pdx.edu Tue Mar 15 14:34:55 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Tue Mar 15 14:34:55 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> References: <423726FC.5040709@cs.pdx.edu> <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> Message-ID: <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> On Tue, 2005-03-15 at 11:03, Perry Greenfield wrote: > On Mar 15, 2005, at 1:18 PM, Ralf Juengling wrote: > > konrad.hinsen at laposte.net wrote: > > > > > > How much do you expect to gain compared to a Python loop in such > > > a case? > > > > I'd expect a factor 5 to 10. > > > > How did you come to that conclusion? It's not at all clear to me that > the overhead of the Python operation (i.e., calling the appropriate > Python method or function from C) will add appreciably to the time it > takes to call it from C. Good question. Per experiment and profiling I found that I could speed up the code by redefining a few functions. E.g., by setting dot = multiarray.matrixpultiply sum = add.reduce and rewriting outerproduct as a array multiplication (using appropriately reshaped arrays; outerproduct does not occur in forward but in another HMM function) I got a speedup close to 3 over my prototype implementation for the Baum-Welch algorithm (which calls forward). The idea is to specialize a function and avoid dispatching code in the loop. I guess that a factor of 5 to 10 is reasonable to achieve by specializing other functions in the loop, too. > Remember, the speed of the C version of the > Python function may have much more overhead than what you envision for > an equivalent C function that you would write. Yes, because of argument checking and dispatching code. I have not studied the implementation of Numeric, but I assume that there are different specialized implementations (for performance reasons) of array functions. To have an example, let's say that there are three special implementations for '*', for the special cases a) both arguments contiguous and of same shape b) both arguments contiguous but of different shape c) otherwise The __mul__ method then has to examine its arguments and dispatch to one of the specialized implementations a), b) or use the generic one c). If I know in advance that both arguments are contiguous and of same shape, then, in a C implementation, I could call a) directly and avoid calling the dispatching code 10000 times in a row. Since the specialized implementations are already there (presumably), the real work in extending the C API is design, i.e., to expose them in a principled way. Please don't get me wrong, I'm not saying that this is an easy thing to do. If you think that this idea is too far off, consider Pyrex. The idea behind Pyrex is essentially the same: You take advantage of special cases by annotating variables. So far this only concerns the type of object, but it is conceivable to extend it to array properties like contiguity. > Adding these to the API is extra work, and worse, > it perhaps risks making it harder to change the internals since so much > more of what is in C is exposed. That's a good point. > The current API is essentially > centered around exposing the data and means of converting and copying > the data, and to a lesser extent, building new UFuncs (for use at the > Python level). Yes. The question is whether it should be more than just that. I believe that, currently, when somebody decides to move a significant portion of numerical code from Python to C, he or she will likely end up writing (specialized versions of) things like 'sum', and 'dot'. But shouldn't those things be provided by an programming environment for scientific computing? Does Scipy have, for instance, a documented C interface to blas and lapack functions? You answer, "Well, there is CBLAS and CLAPACK already." Yes, but by the same argument that pushes Travis to reconsider what should go into scipy_core: it would be nice to be able to use the blas_lite and lapack_lite functions if they cover my needs, and to tell my client, "All else you need to have installed is Python and scipy_core." Ralf From mdehoon at ims.u-tokyo.ac.jp Tue Mar 15 17:37:16 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Mar 15 17:37:16 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> References: <423726FC.5040709@cs.pdx.edu> <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> Message-ID: <42378D2E.9010905@ims.u-tokyo.ac.jp> Ralf Juengling wrote: > I believe that, currently, when somebody decides to move a > significant portion of numerical code from Python to C, he or > she will likely end up writing (specialized versions of) things > like 'sum', and 'dot'. But shouldn't those things be provided by > an programming environment for scientific computing? > > Does Scipy have, for instance, a documented C interface to blas > and lapack functions? You answer, "Well, there is CBLAS and > CLAPACK already." Yes, but by the same argument that pushes > Travis to reconsider what should go into scipy_core: it would be > nice to be able to use the blas_lite and lapack_lite functions > if they cover my needs, and to tell my client, "All else you > need to have installed is Python and scipy_core." > I am not sure about the particular case Ralf is considering, but in the past I have been in the situation that I wanted to access algorithms in Numerical Python (such as blas or lapack) at the C level and I couldn't find a way to do it. Note that for ranlib, the header files are actually installed as Numeric/ranlib.h, but as far as I know it is not possible to link a C extension module to Numerical Python's ranlib at the C level. So I would welcome what Ralf is suggesting. --Michiel From Fernando.Perez at colorado.edu Tue Mar 15 17:49:23 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Tue Mar 15 17:49:23 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <42310D7D.3000009@ims.u-tokyo.ac.jp> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <42310D7D.3000009@ims.u-tokyo.ac.jp> Message-ID: <42379006.4070501@colorado.edu> Michiel Jan Laurens de Hoon wrote: > Perry Greenfield wrote: [weave & f2py in the core] >>That they are perfectly fine to go into the core. In fact, if they are >>used by any of the extra packages, they should be in the core to >>eliminate the extra step in the installation of those packages. >> > > -0. > 1) In der Beschraenkung zeigt sich der Meister. In other words, avoid > software bloat. > 2) f2py is a Fortran-Python interface generator, once the interface is > created there is no need for the generator. > 3) I'm sure f2py is useful, but I doubt that it has very general > usefulness. There are lots of other useful Python packages, but we're > not including them in scipy-core either. > 4) f2py and weave don't fit in well with the rest of scipy-core, which > is mainly standard numerical algorithms. I'd like to argue that these two tools are actually critically important in the core of a python for scientific computing toolkit, at its most basic layer. The reason is that python's dynamic runtime type checking makes it impossible to write efficient loop-based code, as we all know. And it is not always feasible to write all algorithms in terms of Numeric vector operations: sometimes you just need to write an indexed loop. At this point, the standard python answer is 'go write an extension module'. While writing extension modules by hand, from scratch, is not all that hard, it certainly presents a significant barrier for less experienced programmers. And yet both weave and f2py make it incredibly easy to get working compiled array code in no time at all. I say this from direct experience, having pointed colleagues to weave and f2py for this very problem. After handing them some notes I have to get started, they've come back saying "I can't believe it was that easy: in a few minutes I had sped up the loop I needed with a bit of C, and now I can continue working on the problem I'm interested in". I know for a fact that if I'd told them to write a full extension module by hand, the result would have been quite different. The reality is that, in scientific work, you are likely to run into this problem at a very early stage, much more so than for other kinds of python usage. For this reason, it is important that the basic toolset provides a clean solution from the start. At least that's been my experience. Regards, f From Alexandre.Fayolle at logilab.fr Tue Mar 15 23:04:26 2005 From: Alexandre.Fayolle at logilab.fr (Alexandre) Date: Tue Mar 15 23:04:26 2005 Subject: [Numpy-discussion] Python implementation of HMM In-Reply-To: <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> References: <423726FC.5040709@cs.pdx.edu> <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> Message-ID: <20050316070254.GC21421@crater.logilab.fr> On Tue, Mar 15, 2005 at 02:32:06PM -0800, Ralf Juengling wrote: > dot = multiarray.matrixpultiply > sum = add.reduce > and rewriting outerproduct as a array multiplication (using > appropriately reshaped arrays; outerproduct does not occur in > forward but in another HMM function) > > I got a speedup close to 3 over my prototype implementation for the > Baum-Welch algorithm (which calls forward). The idea is to specialize > a function and avoid dispatching code in the loop. I guess that a > factor of 5 to 10 is reasonable to achieve by specializing other > functions in the loop, too. Hi, this is only side-related to your problem, but are you aware of the existence of http://www.logilab.org/projects/hmm/ ? It may not be very fast (we mainly looked for clarity in the code, and ended with something "fast enough" for our needs), but maybe it will match yours. Or it may provide a starting poing for your implementation. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From konrad.hinsen at laposte.net Tue Mar 15 23:50:51 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Tue Mar 15 23:50:51 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <42378D2E.9010905@ims.u-tokyo.ac.jp> References: <423726FC.5040709@cs.pdx.edu> <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> <42378D2E.9010905@ims.u-tokyo.ac.jp> Message-ID: On 16.03.2005, at 02:34, Michiel Jan Laurens de Hoon wrote: > do it. Note that for ranlib, the header files are actually installed > as Numeric/ranlib.h, but as far as I know it is not possible to link a > C extension module to Numerical Python's ranlib at the C level. So I > would welcome what Ralf is suggesting. > That's not possible in a portable way, right. For those reasons I usually propose a C API in my C extension modules (Scientific.IO.NetCDF and Scientific.MPI for example) that is accessible through C pointer objects in Python. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Tue Mar 15 23:53:48 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Tue Mar 15 23:53:48 2005 Subject: [Numpy-discussion] Re: [SciPy-user] Current thoughts on future directions In-Reply-To: <42379006.4070501@colorado.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <422FD009.4020706@enthought.com> <42310D7D.3000009@ims.u-tokyo.ac.jp> <42379006.4070501@colorado.edu> Message-ID: <3ea05ed9734066a936c970baabb327ef@laposte.net> On 16.03.2005, at 02:46, Fernando Perez wrote: > The reality is that, in scientific work, you are likely to run into > this problem at a very early stage, much more so than for other kinds > of python usage. For this reason, it is important that the basic > toolset provides a clean solution from the start. One can in fact argue that f2py, weave, and other tools (Pyrex comes to mind) are the logical extensions of Distutils, which is part of the Python core. As long as they can be installed without additional requirements (in particular requiring the compilers that they need to work), I don't mind having them in the core distribution, though I would still have them as logically separate packages (i.e. not scipy.core.f2py but scipy.f2py) . Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From antti.korvenoja at helsinki.fi Wed Mar 16 00:15:21 2005 From: antti.korvenoja at helsinki.fi (Antti Korvenoja) Date: Wed Mar 16 00:15:21 2005 Subject: [Numpy-discussion] Record array field dimensions Message-ID: <1110960798.4237ea9eb7e69@www2.helsinki.fi> Hi! I was surprised to find out that when I read a field from file into a record array with format string '8f4' the corresponding field has dimension (1,8) and not (8,) that I would intuitively expect. Am I possibly spesifying the format incorrectly? If not, why is there an extra dimension? Antti Korvenoja From hsu at stsci.edu Wed Mar 16 08:41:18 2005 From: hsu at stsci.edu (Jin-chung Hsu) Date: Wed Mar 16 08:41:18 2005 Subject: [Numpy-discussion] Re: Record array dimension Message-ID: <42386135.9090305@stsci.edu> > I was surprised to find out that when I read a field from file into a record > array with format string "8f4" the corresponding field has dimension (1,8) > and not (8,) that I would intuitively expect. Am I possibly spesifying the > format incorrectly? If not, why is there an extra dimension? The first dimension(s) in a record array is always referring to the number of "rows". So, if you have: >>> import numarray.records as rec >>> r=rec.array(formats='f4', shape=8) >>> r.field(0).shape (8,) which will be what you might have expected. But if >>> r.rec.array(formats='8f4', shape=10) >>> r.field(0).shape (10, 8) I assume you have the shape=1, and that's why you get (1, 8) for the field shape. You can have complicated tables like: --> r=rec.array(formats='(4,5)f4', shape=(2,3)) --> r.field(0).shape (2, 3, 4, 5) So, the field shape always has the record array's shape as first dimension(s). JC Hsu From konrad.hinsen at laposte.net Wed Mar 16 13:26:37 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Mar 16 13:26:37 2005 Subject: [Numpy-discussion] Current thoughts on future directions In-Reply-To: <42388FBF.4000004@csun.edu> References: <422EA691.9080404@ee.byu.edu> <422F335F.8060107@csun.edu> <6916ec732f2e70d1789cc0f480f82e7f@redivi.com> <422FBD4A.3030708@ee.byu.edu> <423084F6.7020804@csun.edu> <42361B39.6090300@csun.edu> <42388FBF.4000004@csun.edu> Message-ID: On 16.03.2005, at 20:57, Stephen Walton wrote: > Well, how much is their time worth is then the question? If they can > afford Not exactly. The first question they ask is how much is the code worth that requires them to buy a Fortran license or to install g77. They might then well choose my competitor's code that doesn't have such requirements. It doesn't help if I tell them that my code doesn't require Fortran at all, but that it relies on a library that can't be installed without a Fortran compiler. > an SGI workstation, I'd think they could afford a copy of SGI's > compiler, From memory, a compiler license is about half the price of a workstation. Depending on particular circumstances (special offers, campus licenses, etc.) the prices can be lower. > I might also point out, not to be argumentative, that the real > difficulty is installing gcc. The extra effort to install g77 once > gcc is working is very small. Are the users you mention doing without > C as well? I guess some do. On all the workstations I have ever used, there was a minimal C compiler for recompiling the kernel, which was also good enough for installing software, though not necessarily a pleasure for development. > Back when I was administering HP-UX, I found a community supported > archive with contributed pre-compiled software in HP's package format. > Is there a similar thing in the SGI community? I don't know. I have used SGI machines for only two years (ten years ago), and in an environment where compilers and development tools were considered necessary. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From mdehoon at ims.u-tokyo.ac.jp Wed Mar 16 16:53:57 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 16 16:53:57 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: References: <423726FC.5040709@cs.pdx.edu> <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> <42378D2E.9010905@ims.u-tokyo.ac.jp> Message-ID: <4238D495.3030402@ims.u-tokyo.ac.jp> konrad.hinsen at laposte.net wrote: > On 16.03.2005, at 02:34, Michiel Jan Laurens de Hoon wrote: > >> do it. Note that for ranlib, the header files are actually installed >> as Numeric/ranlib.h, but as far as I know it is not possible to link >> a C extension module to Numerical Python's ranlib at the C level. So >> I would welcome what Ralf is suggesting. >> > That's not possible in a portable way, right. I'm not sure why that wouldn't be portable, since we wouldn't be distributing binaries. The idea is that both a ranlib/blas/lapack library and the extension module is compiled when installing Numerical Python, installing the library in /usr/local/lib/python2.4/Numeric (and the module as usual in /usr/local/lib/python2.4/site-packages/Numeric). Extension modules that what to use ranlib/blas/lapack at the C level can then use the include file from /usr/local/include/python2.4/Numeric and link to the library in /usr/local/lib/python2.4/Numeric. Well maybe I'm missing something basic here ... --Michiel. From oliphant at ee.byu.edu Wed Mar 16 20:27:29 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 16 20:27:29 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays Message-ID: <423906B5.9080501@ee.byu.edu> One item I have not received a lot of feedback on is the new proposal for a greatly increased number of methods on the ndarray. The current PEP has a listing of all the proposed methods and attributes (some more were added after consulting current numarray in more detail and looking at all the functions in current Numeric.py) If a function call essentially involved an arrayobject with some other parameters then it was turned into a method. If it involved two "equal" arrays then it was left as a function. This is a somewhat arbitrary convention, and so I am asking for suggestions as to what should be methods. Should all the ufuncs be methods as well? I think Konrad suggested this. What is the opinion of others? The move from functions to methods will mean that some of the function calls currently in Numeric.py will be redundant, but I think they should stay there for backwards compatibility, (perhaps with a deprecation warning...) A final question: I think we need to think carefully about multidimensional indexing so that it replaces current usage of take, put, putmask. For example, how, in numarray would you replace take(a,[1,5,10],axis=-2) if a is a 10x20x30 array? Note that in this case take returns a 10x3x30 array (call it g) with g[:,0,:] = a[:,1,:] g[:,1,:] = a[:,5,:] g[:,2,:] = a[:,10,:] I submit that a[...,[1,5,10],:] would be an appropriate syntax. This would mean changing the current PEP a bit. -Travis From mdehoon at ims.u-tokyo.ac.jp Wed Mar 16 20:48:16 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 16 20:48:16 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: <423906B5.9080501@ee.byu.edu> References: <423906B5.9080501@ee.byu.edu> Message-ID: <42390B9E.2070203@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > Should all the ufuncs be methods as well? I think Konrad suggested > this. What is the opinion of others? > Just to make sure I understand the proposal correctly. Does this mean that >>> anotherarray = sin(myarray) becomes >>> anotherarray = myarray.sin() ? --Michiel. From rkern at ucsd.edu Wed Mar 16 21:10:00 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Mar 16 21:10:00 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: <423906B5.9080501@ee.byu.edu> References: <423906B5.9080501@ee.byu.edu> Message-ID: <423910D8.8050109@ucsd.edu> Travis Oliphant wrote: > > One item I have not received a lot of feedback on is the new proposal > for a greatly increased number of methods on the ndarray. > > The current PEP has a listing of all the proposed methods and attributes > (some more were added after consulting current numarray in more detail > and looking at all the functions in current Numeric.py) > > If a function call essentially involved an arrayobject with some other > parameters then it was turned into a method. If it involved two "equal" > arrays then it was left as a function. This is a somewhat arbitrary > convention, and so I am asking for suggestions as to what should be > methods. > > Should all the ufuncs be methods as well? I think Konrad suggested > this. What is the opinion of others? I'm too lazy to search right now, but I'm pretty sure that Konrad suggested the opposite: that x.sin(), while possibly "cleaner" in an OO-fetishistic sense, jars too much against the expectation of sin(x) that all of us got accustomed to in math class. Maybe I should let him speak for himself, though. :-) I think the division you have listed in the PEP is a reasonable one. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From Fernando.Perez at colorado.edu Wed Mar 16 21:17:16 2005 From: Fernando.Perez at colorado.edu (Fernando.Perez at colorado.edu) Date: Wed Mar 16 21:17:16 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: <423910D8.8050109@ucsd.edu> References: <423906B5.9080501@ee.byu.edu> <423910D8.8050109@ucsd.edu> Message-ID: <1111036570.4239129a4fe43@webmail.colorado.edu> Quoting Robert Kern : > Travis Oliphant wrote: > > Should all the ufuncs be methods as well? I think Konrad suggested > > this. What is the opinion of others? > > I'm too lazy to search right now, but I'm pretty sure that Konrad > suggested the opposite: that x.sin(), while possibly "cleaner" in an > OO-fetishistic sense, jars too much against the expectation of sin(x) > that all of us got accustomed to in math class. Maybe I should let him > speak for himself, though. :-) I certainly cringe at the sight of its_a.sin(). One of the advantages of python is that it doesn't impose any one methodology for software development: while an OO approach may be great for allowing arbitrary function-like objects to be callable, for example, it doesn't mean that everything under the sun has to become a method call. And, though I'm as lazy as Robert (in fact, I've proven to be lazier than him in the past), my memory also tells me that Konrad's mathematical sensibilities lean in the direction of sin(x). Best, f From oliphant at ee.byu.edu Wed Mar 16 21:56:16 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 16 21:56:16 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core Message-ID: <42391B6E.8060709@ee.byu.edu> I wanted to let people who may be waiting, that now is a good time to help with numeric3. The CVS version builds (although I'm sure there are still bugs), but more eyes could help me track them down. Currently, all that remains for the arrayobject is to implement the newly defined methods (really it's just a re-organization and re-inspection of the code in multiarraymodule.c to call it using methods). I also need to check the multidimensional slicing syntax when mixed with ellipses and slice objects so that take can be (functionally) replaced with multidimensional slicing. Any input on this would be appreciated. I'm referring to the fact that I think that a[...,ind,:] should be equivalent to take(a,ind,axis=-2). But, this necessitates some re-thinking about what partial indexing returns. What should a[:,ind1,:,ind2,:] return if a is a five-dimensional array? Currently, the proposed PEP for partial indexing always has the result as the broadcasted shape of ind1 and ind2 + the dimensionality of the un-indexed subspace. In otherwords, the unindexed subspace shape is always appended to the end of the result shape. I think this is wrong at least for the case of 1 indexing array because it does not let a[...,ind,:] be a replacement for take(a,ind,axis=-2). Is it wrong for more than 1 indexing array? To clarify the situation: Suppose X has shape (10,20,30,40,50) and suppose ind1 and ind2 are both broadcastable to the shape (2,3,4). Note for reference that take(X,ind1,axis=-2).shape returns (10,20,30,2,3,4,50) Now, according to the current proposal: X[..., ind1, :] will return a (2,3,4,10,20,30,50) --- I think this should be changed to return the same as take.... X[ind1, ind1, ind1, ind1, ind1] will return a (2,3,4) array (all dimensions are indexed) --- O.K. X[ind1, ind1, ind1, ind1] will return a (2,3,4,50) array X[ind1, ind1, :, ind1, ind1] will return a (2,3,4,30) array X[...,ind1,ind1,ind1] returns a (2,3,4,10,20) array --- is this right? X[:,ind1,:,ind2,:] returns a (2,3,4,10,30,50) array result[i,j,k,:,:,:] = X[:,ind1[i,j,k],:,ind2[i,j,k],:] So, here's the issue (if you are not familiar with the concept of subspace you can replace the word subspace with "shape tuple" in the following): - indexing with multidimensional index arrays under the numarray-introduced scheme (which seems reasonable to me) creates a single "global" subspace for all of the index arrays provided (i.e. there is no implied outer-product). - When there is a single index array it is unambiguous to replace the single-axis subspace with the index array subspace: i.e. X[...,ind1,:] can replace the second-to-last axis shape with the ind1.shape to get a (10,20,30,2,3,4,50) array. - Where there is more than one index array, what should replace the single-axis subspaces that the indexes are referencing? Remember, all of the single-axis subspaces are being replaced with one "global" subspace. The current proposal states that this indexing subspace should be placed first and the "remaining subspaces" pasted in at the end. Is this acceptable, or can someone see a problem?? Best regards, -Travis From oliphant at ee.byu.edu Wed Mar 16 22:33:50 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 16 22:33:50 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <42391B6E.8060709@ee.byu.edu> References: <42391B6E.8060709@ee.byu.edu> Message-ID: <42392481.1010701@ee.byu.edu> Travis Oliphant wrote: > > - Where there is more than one index array, what should replace the > single-axis subspaces that the indexes are referencing? Remember, > all of the single-axis subspaces are being replaced with one "global" > subspace. The current proposal states that this indexing subspace > should be placed first and the "remaining subspaces" pasted in at the > end. > > Is this acceptable, or can someone see a problem?? Answering my own question... I think that it makes sense to do a direct subspace replacement whenever the indexing arrays are right next to each other. In other words, I would just extend the "one-index array" rule to "all-consecutive-index-arrays" where of course one index array satisfies the all-consecutive requirement. Hence in the previous example: X[:,ind1,ind2,:,:] would result in a (10,2,3,4,40,50) with the (20,30)-subspace being replaced by the (2,3,4) indexing subspace. result[:,i,j,k,:,:] = X[:,ind1[i,j,k],ind2[i,j,k],:,:] Any other thoughts. (I think I will implement this initially by just using swapaxes on the current implementation...) -Travis From cjw at sympatico.ca Thu Mar 17 04:10:16 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 17 04:10:16 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: <423906B5.9080501@ee.byu.edu> References: <423906B5.9080501@ee.byu.edu> Message-ID: <4239734B.6060701@sympatico.ca> Travis Oliphant wrote: > > One item I have not received a lot of feedback on is the new proposal > for a greatly increased number of methods on the ndarray. > > The current PEP has a listing of all the proposed methods and > attributes (some more were added after consulting current numarray in > more detail and looking at all the functions in current Numeric.py) > > If a function call essentially involved an arrayobject with some other > parameters then it was turned into a method. This seems a good idea. I would suggest going a step further. If a method has no parameters then make it a property and adopt the naming convention that the property names start with an upper case character, eg. Cos. In other words, drop the redundant parentheses. > If it involved two "equal" arrays then it was left as a function. > This is a somewhat arbitrary convention, and so I am asking for > suggestions as to what should be methods. Why change from the above proposal? > > Should all the ufuncs be methods as well? I think Konrad suggested > this. What is the opinion of others? > Yes, modified as suggested above. > > > The move from functions to methods will mean that some of the function > calls currently in Numeric.py will be redundant, but I think they > should stay there for backwards compatibility, (perhaps with a > deprecation warning...) Yes, the same for numarray. Deprecation, with dropping from some future version. > > > A final question: > I think we need to think carefully about multidimensional indexing so > that it replaces current usage of take, put, putmask. > > For example, how, in numarray would you replace > take(a,[1,5,10],axis=-2) if a is a 10x20x30 array? > Note that in this case take returns a 10x3x30 array (call it g) with > > g[:,0,:] = a[:,1,:] > g[:,1,:] = a[:,5,:] > g[:,2,:] = a[:,10,:] > > I submit that a[...,[1,5,10],:] would be an appropriate syntax. > This would mean changing the current PEP a bit. What is the benefit of the ellipsis here? It seems to serve the same purpose as the colon. Colin W. > > -Travis > > > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From cjw at sympatico.ca Thu Mar 17 04:12:15 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 17 04:12:15 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: <42390B9E.2070203@ims.u-tokyo.ac.jp> References: <423906B5.9080501@ee.byu.edu> <42390B9E.2070203@ims.u-tokyo.ac.jp> Message-ID: <423973C0.1090901@sympatico.ca> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >> Should all the ufuncs be methods as well? I think Konrad suggested >> this. What is the opinion of others? >> > Just to make sure I understand the proposal correctly. Does this mean > that > >>> anotherarray = sin(myarray) > becomes > >>> anotherarray = myarray.sin() > ? Or even myarray.Sin? > > --Michiel. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tkorvola at welho.com Thu Mar 17 05:41:52 2005 From: tkorvola at welho.com (Timo Korvola) Date: Thu Mar 17 05:41:52 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <42391B6E.8060709@ee.byu.edu> (Travis Oliphant's message of "Wed, 16 Mar 2005 22:53:50 -0700") References: <42391B6E.8060709@ee.byu.edu> Message-ID: <878y4mxuf7.fsf@welho.com> Travis Oliphant writes: > - indexing with multidimensional index arrays under the > numarray-introduced scheme (which seems reasonable to me) It is powerful but likely to confuse Matlab and Fortran users because a[[0,1], [1,2]] is different from a[0:2, 1:3]. I suspect that the most commonly used case of index arrays is a single vector as the first index, which has an intuitively clear meaning regardless of the current ordering issue. -- Timo Korvola From konrad.hinsen at laposte.net Thu Mar 17 09:08:07 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 17 09:08:07 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: <423910D8.8050109@ucsd.edu> References: <423906B5.9080501@ee.byu.edu> <423910D8.8050109@ucsd.edu> Message-ID: On 17.03.2005, at 06:08, Robert Kern wrote: > I'm too lazy to search right now, but I'm pretty sure that Konrad > suggested the opposite: that x.sin(), while possibly "cleaner" in an > OO-fetishistic sense, jars too much against the expectation of sin(x) > that all of us got accustomed to in math class. Maybe I should let him > speak for himself, though. :-) I agree. What I suggested is that there should be methods as well as functions, and that the ufuncs should call the methods, such that Numeric.sin(x) would simply become syntactic sugar for x.sin() whatever the type of x. But I don't expect to see x.sin() in application code, it's just a convenient way of implementing sin() in new classes and subclasses. Actually, x.__sin__() would be a more pythonic choice of method name. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Thu Mar 17 09:13:08 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 17 09:13:08 2005 Subject: [Numpy-discussion] Half baked C API? In-Reply-To: <4238D495.3030402@ims.u-tokyo.ac.jp> References: <423726FC.5040709@cs.pdx.edu> <56d7ebae25ec84351c0bddb7e68c86e1@stsci.edu> <1110925926.6533.160.camel@alpspitze.cs.pdx.edu> <42378D2E.9010905@ims.u-tokyo.ac.jp> <4238D495.3030402@ims.u-tokyo.ac.jp> Message-ID: On 17.03.2005, at 01:51, Michiel Jan Laurens de Hoon wrote: > I'm not sure why that wouldn't be portable, since we wouldn't be > distributing binaries. The idea is that both a ranlib/blas/lapack > library and the extension In general, shared library A cannot rely on having access to the symbols of shared library B. So if shared library A (NumPy) wants to make symbols that it got from ranlib or BLAS available to other modules, it must make them available through C objects. > ranlib/blas/lapack at the C level can then use the include file from > /usr/local/include/python2.4/Numeric and link to the library in > /usr/local/lib/python2.4/Numeric. If it placed there as a standard linkable library, that would of course work, but that would be an additional step in NumPy installation. I am not sure it's a good idea in the long run. I'd rather have libraries of general interests in /usr/local/lib or /usr/lib. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From perry at stsci.edu Thu Mar 17 09:50:20 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Mar 17 09:50:20 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <42392481.1010701@ee.byu.edu> References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> Message-ID: Before I delve too deeply into what you are suggesting (or asking), has the idea to have a slice be equivalent to an index array been changed. For example, I recall seeing (I forget where), the suggestion that X[:,ind] is the same as X[arange(X.shape[0]), ind] The following seems to be at odds with this. The confusion of mixing slices with index arrays led me to just not deal with them in numarray. I thought index arrays were getting complicated enough. I suppose it may be useful, but I would be good to give some motivating, realistic examples of why they are useful. For example, I can think of lots of motivating examples for: using more than one index array (e.g., X[ind1, ind2]) allowing index arrays to have arbitrary shape allowing partial indexing with index arrays Though I'm not sure I can think of good examples of arbitrary combinations of these capabilities (though the machinery allows it). So one question is there a good motivating example for X[:, ind]? By the interpretation I remember (maybe wrongly), I'm not sure I know where that would be commonly used (it would suggest that all the sizes of the sliced dimensions must have consistent lengths which doesn't seem typical. Any one have good examples? Perry On Mar 17, 2005, at 1:32 AM, Travis Oliphant wrote: > Travis Oliphant wrote: > >> >> - Where there is more than one index array, what should replace the >> single-axis subspaces that the indexes are referencing? Remember, >> all of the single-axis subspaces are being replaced with one "global" >> subspace. The current proposal states that this indexing subspace >> should be placed first and the "remaining subspaces" pasted in at the >> end. >> >> Is this acceptable, or can someone see a problem?? > > > Answering my own question... > > I think that it makes sense to do a direct subspace replacement > whenever the indexing arrays are right next to each other. In other > words, I would just extend the "one-index array" rule to > "all-consecutive-index-arrays" where of course one index array > satisfies the all-consecutive requirement. > > Hence in the previous example: > > X[:,ind1,ind2,:,:] would result in a (10,2,3,4,40,50) with the > (20,30)-subspace being replaced by the (2,3,4) indexing subspace. > > result[:,i,j,k,:,:] = X[:,ind1[i,j,k],ind2[i,j,k],:,:] > > > Any other thoughts. (I think I will implement this initially by just > using swapaxes on the current implementation...) > > -Travis From perry at stsci.edu Thu Mar 17 09:54:33 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Mar 17 09:54:33 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: References: <423906B5.9080501@ee.byu.edu> <423910D8.8050109@ucsd.edu> Message-ID: <351c9d129c25693a82abff95b11e9ed6@stsci.edu> On Mar 17, 2005, at 12:05 PM, konrad.hinsen at laposte.net wrote: > On 17.03.2005, at 06:08, Robert Kern wrote: > >> I'm too lazy to search right now, but I'm pretty sure that Konrad >> suggested the opposite: that x.sin(), while possibly "cleaner" in an >> OO-fetishistic sense, jars too much against the expectation of sin(x) >> that all of us got accustomed to in math class. Maybe I should let >> him speak for himself, though. :-) > > I agree. What I suggested is that there should be methods as well as > functions, and that the ufuncs should call the methods, such that > > Numeric.sin(x) > > would simply become syntactic sugar for > > x.sin() > > whatever the type of x. But I don't expect to see x.sin() in > application code, it's just a convenient way of implementing sin() in > new classes and subclasses. Actually, x.__sin__() would be a more > pythonic choice of method name. > > Konrad. > It would be hard to imagine not allowing the functional form. Users would think we were crazy. (And they'd be right ;-) Perry From rlw at stsci.edu Thu Mar 17 10:31:23 2005 From: rlw at stsci.edu (Rick White) Date: Thu Mar 17 10:31:23 2005 Subject: [Numpy-discussion] Please chime in on proposed methods for arrays In-Reply-To: <351c9d129c25693a82abff95b11e9ed6@stsci.edu> Message-ID: On Thu, 17 Mar 2005, Perry Greenfield wrote: > On Mar 17, 2005, at 12:05 PM, konrad.hinsen at laposte.net wrote: > > > I agree. What I suggested is that there should be methods as well as > > functions, and that the ufuncs should call the methods, such that > > > > Numeric.sin(x) > > > > would simply become syntactic sugar for > > > > x.sin() > > > > whatever the type of x. But I don't expect to see x.sin() in > > application code, it's just a convenient way of implementing sin() in > > new classes and subclasses. Actually, x.__sin__() would be a more > > pythonic choice of method name. > > > > Konrad. > > It would be hard to imagine not allowing the functional form. Users > would think we were crazy. (And they'd be right ;-) I think the suggestion that ufuncs should call methods behind the scenes is a bad idea. It just doesn't makes much sense to me. Doesn't this imply that you have to decorate array objects with another method every time someone adds another 1-argument ufunc? Even if you argue that you only want the methods for some standard set of ufuncs, it seems like a lot of baggage to pile into the array objects. I like the arc hyperbolic sine function, but I can't see why I would expect an array to have either a method x.asinh() or, worse, x.__asinh__()! Maybe I'm misunderstanding something here, but this just sounds like a way to bloat the interface to arrays. Rick From jh at oobleck.astro.cornell.edu Thu Mar 17 11:00:12 2005 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Thu Mar 17 11:00:12 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays Message-ID: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> I'll start by saying something positive: I am very encouraged by all the work that's going into resolving the small/big array split! That said, I view a.Sin as a potentially devastating change, if traditional functional notation is not guarranteed to be preserved forever. >From a learning user's perspective, having to say a.sin() or a.Sin rather than sin(a) will be enough to make most people stay away from numerical python. I say this from experience: most of what Python does well, guile also did well. After the IDAE BoF at the 1996 ADASS meeting, we considered whether guile would be a better platform than Python. It had a lot of development force behind it, had all the needed features, etc. We asked our audience whether they would use a lisp-based language. There was laughter. The CS people here know that lisp is a "beautiful" language. But nobody uses it, because the syntax is so different from the normal flow of human thought. It is written to make writing an interpreter for it easy, not to be easy to learn and use. I've tried it about 8 times and have given up. Apparently others agree, as guile has been ripped out of many applications, such as Gimp, or at least augmented as an extension language by perl or python, which now get most of the new code. Normal people don't want to warp their brains in order to code. Consider the following: math: 2 1/2 -b +- (b - 4ac) x = ------------------ 2a IDL: x = (-b + [1.,-1] * sqrt(b*b-4*a*c)) / (2*a) lisp: (let x (/ (+ (- b) (sqrt (- (* b b) (* 4 a c)))) (* 2 a))) You can verify that you have coded the IDL correctly at a glance. The lisp takes longer, even if you're a good lisp programmer. Now consider the following common astronomical equation: sin(dec) - sin(alt) sin(lat) cos(a) = ---------------------------- cos(alt) cos(lat) IDL: a = acos((sin(dec) - sin(alt) * sin(lat)) / (cos(alt)*?cos(lat))) proposal: a = ((dec.Sin - alt.Sin * lat.Sin) / (alt.Cos * lat.Cos)).Acos readable, but we start to see the problem with the moved .Acos. Now try this: 2 sin x + cos(tan(x + sin(x))) a = e a = exp((sin(x))^2 + cos(tan(x + sin(x)))) a = (x.Sin**2 + (x + x.Sin).tan.cos).Exp Half of it you read from left to right. The other half from right to left. Again, the IDL is much easier to write and to read, given that we started from traditional math notation. In the proposal version, it's easy to overlook that this is an exponential. So, I don't object to making functions into methods, but if there's even a hint of deprecating the traditional functional notation, that will relegate us to oblivion. If you don't believe it still, take the last equation to a few non-CS types and ask them whether they would consider using a language that required coding math in the proposed manner versus in the standard manner. Then consider how much time it would take to port your existing code to this new syntax, and verify that you didn't misplace a paren or sign along the way. A statement that traditional functional notation is guarranteed always to be part of Numeric should be in the PEP. Even calling it syntactic sugar is dangerous. It is the fundamental thing, and the methods are sugar for the CS types out there. --jh-- From stephen.walton at csun.edu Thu Mar 17 11:50:00 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Mar 17 11:50:00 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays In-Reply-To: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> References: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> Message-ID: <4239DEE9.1050802@csun.edu> Joe Harrington wrote: >I'll start by saying something positive: I am very encouraged by all >the work that's going into resolving the small/big array split! > > +1 from me. >That said, I view a.Sin as a potentially devastating change, if >traditional functional notation is not guarranteed to be preserved >forever. > > +2 or more on Joe's cogent comments. Four centuries of traditional mathematics notation should not be overthrown for the sake of OO nirvana. Believe me that astronomers know what it's like to be prisoners of history; ask one to explain the stellar magnitude scale to you sometime! Incidentally, one can read here why the Chandra X-Ray Observatory chose S-Lang instead of Python for its data analysis software: http://cxc.harvard.edu/ciao/why/slang.html From perry at stsci.edu Thu Mar 17 12:01:31 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Mar 17 12:01:31 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays In-Reply-To: <4239DEE9.1050802@csun.edu> References: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> <4239DEE9.1050802@csun.edu> Message-ID: <5a583fe32cc42cc47b659f2af44c5113@stsci.edu> On Mar 17, 2005, at 2:47 PM, Stephen Walton wrote: > Incidentally, one can read here why the Chandra X-Ray Observatory > chose S-Lang instead of Python for its data analysis software: > > http://cxc.harvard.edu/ciao/why/slang.html > Though I'll note that I think their conclusion wasn't really correct then. It does illustrate the aversion to OO though. From boomberschloss at yahoo.com Thu Mar 17 12:09:18 2005 From: boomberschloss at yahoo.com (Joachim Boomberschloss) Date: Thu Mar 17 12:09:18 2005 Subject: [Numpy-discussion] casting in numarray Message-ID: <20050317200747.73860.qmail@web53108.mail.yahoo.com> Hi, I'm using numarray for an audio-related application as a buffer in an audio-processing pipeline. I would like to be able to allocate the buffer in advance and later regard it as a buffer of 8bit or 16bit samples as appropriate, but in numarray, casting always produces a new array, which I don't want. How difficult should it be to make it possible to create an array using an exsisting pre-allocated buffer to act as an interface to that buffer? Also, if others consider it useful, is there anyone willing to guide me through the code in doing so? Thanks, Joe __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ From tim.hochberg at cox.net Thu Mar 17 12:20:22 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Mar 17 12:20:22 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> Message-ID: <4239E5AC.4040901@cox.net> Perry Greenfield wrote: > Before I delve too deeply into what you are suggesting (or asking), > has the idea to have a slice be equivalent to an index array been > changed. For example, I recall seeing (I forget where), the suggestion > that > > X[:,ind] is the same as X[arange(X.shape[0]), ind] > > The following seems to be at odds with this. The confusion of mixing > slices with index arrays led me to just not deal with them in > numarray. I thought index arrays were getting complicated enough. Yes! Not index arrays by themselves, but the indexing system as a whole is already on the verge of being overly complex in numarray. Adding anything more to it is foolish. > I suppose it may be useful, but I would be good to give some > motivating, realistic examples of why they are useful. For example, I > can think of lots of motivating examples for: > > using more than one index array (e.g., X[ind1, ind2]) > allowing index arrays to have arbitrary shape > allowing partial indexing with index arrays My take is that having even one type of index array overloaded onto the current indexing scheme is questionable. In fact, even numarray's current scheme is too complicated for my taste. I particularly don't like the distinction that has to be made between lists and arrays on one side and tuples on the other. I understand why it's there, but I don't like it. Is it really necessary to pile these indexing schemes directly onto the main array object. It seems that it would be clearer, and more flexible, to use a separate, attached adapter object. For instance (please excuse the names as I don't have good ideas for those): X.rows[ind0, ind1, ..., ind2, :] would act like take(take(take(X, ind0, 0), ind1, 1), ind2, -1)). That is it would select the rows given by ind0 along the 0th axis, the rows given by ind1 along the 1st axis (aka the columns) and the rows given by ind2 along the -2nd axis. X.atindex[indices] would give numarray's current indexarray behaviour. Etc, etc for any other indexing scheme that's deemed useful. As I think about it more I'm more convinced that basic indexing should not support index arrays at all. Any indexarray behaviour should be impleented using helper/adapter objects. Keep basic indexing simple. This also gives an opportunity to have multiple different types of index arrays behaviour. -tim > > Though I'm not sure I can think of good examples of arbitrary > combinations of these capabilities (though the machinery allows it). > So one question is there a good motivating example for > X[:, ind]? By the interpretation I remember (maybe wrongly), I'm not > sure I know where that would be commonly used (it would suggest that > all the sizes of the sliced dimensions must have consistent lengths > which doesn't seem typical. Any one have good examples? > > Perry > > On Mar 17, 2005, at 1:32 AM, Travis Oliphant wrote: > >> Travis Oliphant wrote: >> >>> >>> - Where there is more than one index array, what should replace the >>> single-axis subspaces that the indexes are referencing? Remember, >>> all of the single-axis subspaces are being replaced with one >>> "global" subspace. The current proposal states that this indexing >>> subspace should be placed first and the "remaining subspaces" pasted >>> in at the end. >>> >>> Is this acceptable, or can someone see a problem?? >> >> >> >> Answering my own question... >> >> I think that it makes sense to do a direct subspace replacement >> whenever the indexing arrays are right next to each other. In other >> words, I would just extend the "one-index array" rule to >> "all-consecutive-index-arrays" where of course one index array >> satisfies the all-consecutive requirement. >> >> Hence in the previous example: >> >> X[:,ind1,ind2,:,:] would result in a (10,2,3,4,40,50) with the >> (20,30)-subspace being replaced by the (2,3,4) indexing subspace. >> >> result[:,i,j,k,:,:] = X[:,ind1[i,j,k],ind2[i,j,k],:,:] >> >> >> Any other thoughts. (I think I will implement this initially by just >> using swapaxes on the current implementation...) >> >> -Travis > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From gr at grrrr.org Thu Mar 17 12:46:14 2005 From: gr at grrrr.org (Thomas Grill) Date: Thu Mar 17 12:46:14 2005 Subject: [Numpy-discussion] casting in numarray In-Reply-To: <20050317200747.73860.qmail@web53108.mail.yahoo.com> References: <20050317200747.73860.qmail@web53108.mail.yahoo.com> Message-ID: <4239EBFD.1000404@grrrr.org> Hi Joachim, this is what i do in my Python extension of the Pure Data realtime modular system. You have to create a Python buffer object pointing to your memory location and then create a numarray from that. It's quite easy. See the code in http://cvs.sourceforge.net/viewcvs.py/pure-data/externals/grill/py/source/ files pybuffer.h and pybuffer.cpp best greetings, Thomas Joachim Boomberschloss schrieb: >Hi, > >I'm using numarray for an audio-related application as >a buffer in an audio-processing pipeline. I would like >to be able to allocate the buffer in advance and later >regard it as a buffer of 8bit or 16bit samples as >appropriate, but in numarray, casting always produces >a new array, which I don't want. How difficult should >it be to make it possible to create an array using an >exsisting pre-allocated buffer to act as an interface >to that buffer? Also, if others consider it useful, is >there anyone willing to guide me through the code in >doing so? > >Thanks, > >Joe > > > >__________________________________ >Do you Yahoo!? >Yahoo! Small Business - Try our new resources site! >http://smallbusiness.yahoo.com/resources/ > > >------------------------------------------------------- >SF email is sponsored by - The IT Product Guide >Read honest & candid reviews on hundreds of IT Products from real users. >Discover which products truly live up to the hype. Start reading now. >http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > -- --->----->->----->-- Thomas Grill gr at grrrr.org +43 699 19715543 From sdhyok at gmail.com Thu Mar 17 12:46:19 2005 From: sdhyok at gmail.com (Daehyok Shin) Date: Thu Mar 17 12:46:19 2005 Subject: [Numpy-discussion] A pray from an end user of numeric python. Message-ID: <371840ef0503171244573f487e@mail.gmail.com> As an end user for Numeric and then numarray, recently I was quite frustrated by the move for Numeric3. Shortly after I became familiar with Numeric several years ago, I jumped to numarray mainly because of its more flexible indexing scheme. And after quite investment to learn the new package, recently I heard a distant echo that a new library called Numeric3 will replace Numeric sooner or later, and then I see a lot of discussion to discuss about better design of the new package in this mailing list. Because I think some feedbacks from end users are needed in the discussion, I dare to send this email, in spite of my meager knowledge about programming. To gods in numeric programming of Python, I want to make clear one basic fact. Like me, numerous scientific and engineering end users using Python needs a SINGLE standard data model for numeric operations. Even in the case that the data model has some defects in its design, many end users may not care about them, if many numeric packages using the data model are available. In this aspect, in my opinion, the replacement of current standard array types in Python must gain the top priority. How about building a very small array package with only MINIMUM set of functions first? We can think the functions of current array type in the standard Python library may define what is the minimal functions. All advanced mathematical or other functions may be added later as separate packages. So, I pray to gods in numeric programming of Python. Please give us a single numeric array model. Please save a flock of sheep like me from wondering around Numeric, numarray, Numeric3, or maybe in future Numeric4, 5, 6 ?. Hopefully, this pray will get some attention. Thanks for reading this humble email. -- Daehyok Shin (Peter) Geography Department University of North Carolina-Chapel Hill USA From oliphant at ee.byu.edu Thu Mar 17 13:39:57 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 17 13:39:57 2005 Subject: [Numpy-discussion] A pray from an end user of numeric python. In-Reply-To: <371840ef0503171244573f487e@mail.gmail.com> References: <371840ef0503171244573f487e@mail.gmail.com> Message-ID: <4239F8B3.7080105@ee.byu.edu> Daehyok Shin wrote: >As an end user for Numeric and then numarray, recently I was quite >frustrated by the move for Numeric3. Shortly after I became familiar >with Numeric several years ago, I jumped to numarray mainly because of >its more flexible indexing scheme. And after quite investment to learn >the new package, recently I heard a distant echo that a new library >called Numeric3 will replace Numeric sooner or later, and then I see a >lot of discussion to discuss about better design of the new package in >this mailing list. Because I think some feedbacks from end users are >needed in the discussion, I dare to send this email, in spite of my >meager knowledge about programming. > > Thank you, thank you for speaking up. I am very interested in hearing from end users. In fact, I'm an "end-user" myself. My real purpose in life is not to endlessly write array packages. I want to get back to the many problems I'm working on that require real usage. All of us had meager-knowledge at one point or another. Besides that, our supposed knowledge today may turn out to be useless tomorrow, so feel free to chime in any time with your opinions. Lack of knowledge doesn't seem to stop me from voicing an opinion :-) In my opinion, the more use-cases of arrays we see, the better design desicions can be made. Ultimately, the fact the numarray split off from Numeric was that some people wanted some new features to Numeric and wanted to try some new design ideas. Their efforts have led to a better understanding of what a good array object should be. > >To gods in numeric programming of Python, I want to make clear one >basic fact. Like me, numerous scientific and engineering end users >using Python needs a SINGLE standard data model for numeric >operations. Even in the case that the data model has some defects in >its design, many end users may not care about them, if many numeric >packages using the data model are available. In this aspect, in my >opinion, the replacement of current standard array types in Python >must gain the top priority. > > I think we are all on the same page here. The ONLY reason I'm spending any time on "Numeric3" at all is because currently we have two development directions. One group of people is building on top of Numeric (scipy, for example), while another group of people is building on top of Numarray (nd_image for example). We are doing a lot of the same work and I hate to see the few resources we have wasted on split efforts. Replacing the standard array type in Python is a longer-term problem. We need to put our own house in order in order to make that happen. Many of us want to see a single array type be standard in Python as long as we are satisfied with it. But, that is the problem currently. The people that wrote numarray were not satisfied with Numeric. Unfortunately, some of us that are long-time users of Numeric have never been satisfied with numarray either (it has not been even close to a "drop-in" replacement for Numeric). I think that most people who use arrays would be quite satisfied with Numeric today (with a few warts removed). But, I do think that the numarray folks have identified some scalability issues that if we address them will mean the arraytype we come up with can have a longer future. >So, I pray to gods in numeric programming of Python. Please give us a >single numeric array model. Please save a flock of sheep like me from >wondering around Numeric, numarray, Numeric3, or maybe in future >Numeric4, 5, 6 ?. Hopefully, this pray will get some attention. > > I think everybody involved wants this too. I'm giving up a great deal of my time to make it happen, largely because I see a great need and a way for me to contribute to help it. I am very interested in recruiting others to assist me. So far, I've received a lot of supportive comments, but not much supporting code. We have the momentum. I think we can get this done, so that come June, there is no "split" aside from backward compatibility layers.... In my estimation the fastest way to bring the two development directions together is to merge the numarray features back into Numeric. As I'm doing that, I want to make sure that the new design extensions are done correctly and not just a bad idea that nobody ever complained about. This has led to quite a few "side-trips" into dusty closets where we kept the wart-shavings of current usage. Those "side-trips" have extended the effort some, but I have not lost sight of the goal. I don't want "yet-another-implementation". I want everybody involved to agree that a single package provides the basis for what people need. Thanks again for your comments. If you can help in any way (e.g. writing test scripts) then please chip in. -Travis From oliphant at ee.byu.edu Thu Mar 17 13:54:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 17 13:54:07 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays In-Reply-To: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> References: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> Message-ID: <4239FBC6.3010808@ee.byu.edu> Joe Harrington wrote: >I'll start by saying something positive: I am very encouraged by all >the work that's going into resolving the small/big array split! > > Thanks, more-hands makes less work... >That said, I view a.Sin as a potentially devastating change, if >traditional functional notation is not guarranteed to be preserved >forever. > > Hold on, everybody. I'm the last person that would move from sin(x) to x.Sin as a "requirement". I don't believe this was ever suggested. I was just remembering that someone thought it would be useful if x.sin() were allowed, and noticed that the PEP had not mentioned that as a possibility. I'm inclined now to NOT add such computational methods and *require* ufuncs to be called as is currently done. It's interesting to see so many responses to something that in my mind was not the big issue, and to hear very little about the multidimensional indexing proposal. -Travis From oliphant at ee.byu.edu Thu Mar 17 14:03:32 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 17 14:03:32 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> Message-ID: <4239FDD6.6020409@ee.byu.edu> Perry Greenfield wrote: > Before I delve too deeply into what you are suggesting (or asking), > has the idea to have a slice be equivalent to an index array been > changed. For example, I recall seeing (I forget where), the suggestion > that > > X[:,ind] is the same as X[arange(X.shape[0]), ind] > This was in the PEP originally. But, after talking with you and better understanding the "broadcasting" issues of the numarray indexing scheme, it seemed less like a good idea. Then, during implementation it was easier to interpret slices differently. A very natural usage fell out as I thought more about partial indexing in Numeric: X[ind] where X has more than 1 dimension returns in numarray something like result[i,j,k,...] = X[ind[i,j,k],...] It seems rather singular to have this Ellipsis-like character only useful for the ending dimensions of X. Thus, I decided that X[...,ind] ought to be valid as well and return something like result[...,i,j,k] = X[...,ind[i,j,k]] So, yes, I've changed my mind (I sent an email about this when I woke up and realized a better solution). > The following seems to be at odds with this. The confusion of mixing > slices with index arrays led me to just not deal with them in > numarray. I thought index arrays were getting complicated enough. I > suppose it may be useful, but I would be good to give some motivating, > realistic examples of why they are useful. For example, I can think of > lots of motivating examples for: > > using more than one index array (e.g., X[ind1, ind2]) > allowing index arrays to have arbitrary shape > allowing partial indexing with index arrays Give me the reason for allowing partial indexing with index arrays, and I bet I can come up with a reason why you should allow X[...,ind] as well (because there is an implied ... at the end when you are using partial indexing anyway). > > Though I'm not sure I can think of good examples of arbitrary > combinations of these capabilities (though the machinery allows it). > So one question is there a good motivating example for > X[:, ind]? By the interpretation I remember (maybe wrongly), I'm not > sure I know where that would be commonly used (it would suggest that > all the sizes of the sliced dimensions must have consistent lengths > which doesn't seem typical. Any one have good examples? So, I've scaled back my "intermingling" of index arrays with other types of arrays (you'll also notice in the current PEP that I've gotten rid of mixing boolean and index arrays). I think the usage I define in the PEP for mixing slices, Ellipses, and index arrays is reasonable (and not difficult to implement) (it's bascially done --- minus bug fixes). -Travis From oliphant at ee.byu.edu Thu Mar 17 14:47:55 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 17 14:47:55 2005 Subject: [Fwd: Re: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core] Message-ID: <423A08A0.8080006@ee.byu.edu> I originally sent the attached just to Tim. It was meant for the entire list. -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 38 URL: From oliphant at ee.byu.edu Thu Mar 17 17:17:33 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu, 17 Mar 2005 15:17:33 -0700 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <4239E5AC.4040901@cox.net> References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> <4239E5AC.4040901@cox.net> Message-ID: <423A01FD.7070900@ee.byu.edu> Tim Hochberg wrote: > Perry Greenfield wrote: > > > Yes! Not index arrays by themselves, but the indexing system as a > whole is already on the verge of being overly complex in numarray. > Adding anything more to it is foolish. I think the solution given in the PEP is not especially complex. In fact, I think it clarifies what numarray does so that it does not appear "mind-blowing" and can actually be implemented in a reasonable way. > > My take is that having even one type of index array overloaded onto > the current indexing scheme is questionable. In fact, even numarray's > current scheme is too complicated for my taste. I particularly don't > like the distinction that has to be made between lists and arrays on > one side and tuples on the other. I understand why it's there, but I > don't like it. > > Is it really necessary to pile these indexing schemes directly onto > the main array object. It seems that it would be clearer, and more > flexible, to use a separate, attached adapter object. For instance > (please excuse the names as I don't have good ideas for those): This is an interesting idea!!! I'm not one to criticize naming (I can't seem come up with good names myself...) > > X.rows[ind0, ind1, ..., ind2, :] > > would act like take(take(take(X, ind0, 0), ind1, 1), ind2, -1)). That > is it would select the rows given by ind0 along the 0th axis, the rows > given by ind1 along the 1st axis (aka the columns) and the rows given > by ind2 along the -2nd axis. > > X.atindex[indices] would give numarray's current indexarray behaviour. > > Etc, etc for any other indexing scheme that's deemed useful. > > As I think about it more I'm more convinced that basic indexing should > not support index arrays at all. Any indexarray behaviour should be > impleented using helper/adapter objects. Keep basic indexing simple. > This also gives an opportunity to have multiple different types of > index arrays behaviour. I think adapter objects will be useful in the long run. We've already got X.flat right? It's too bad you couldn't have thought of this earlier (we could have added this to Numeric years ago and alleviated one of the most disparaged things about Numeric). But, I'm afraid that some form of index arrays are already with us, so we really won't be able to get rid of them entirely. I'm just trying to make them reasonable. The addition to the Numarray behavior I've added is not difficult (in fact I think it clarifies the numarray behavior --- at least for me). -Travis --------------050504070302070207060709-- From perry at stsci.edu Thu Mar 17 14:57:11 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Mar 17 14:57:11 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <4239FDD6.6020409@ee.byu.edu> References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> <4239FDD6.6020409@ee.byu.edu> Message-ID: <3c1aabf51ac58b4ad8512f2150067ecc@stsci.edu> On Mar 17, 2005, at 4:59 PM, Travis Oliphant wrote: > Perry Greenfield wrote: > >> Before I delve too deeply into what you are suggesting (or asking), >> has the idea to have a slice be equivalent to an index array been >> changed. For example, I recall seeing (I forget where), the >> suggestion that >> >> X[:,ind] is the same as X[arange(X.shape[0]), ind] >> > This was in the PEP originally. But, after talking with you and > better understanding the "broadcasting" issues of the numarray > indexing scheme, it seemed less like a good idea. Then, during > implementation it was easier to interpret slices differently. A very > natural usage fell out as I thought more about partial indexing in > Numeric: X[ind] where X has more than 1 dimension returns in numarray > something like > > result[i,j,k,...] = X[ind[i,j,k],...] > > It seems rather singular to have this Ellipsis-like character only > useful for the ending dimensions of X. Thus, I decided that > X[...,ind] ought to be valid as well and return something like > > result[...,i,j,k] = X[...,ind[i,j,k]] > > So, yes, I've changed my mind (I sent an email about this when I woke > up and realized a better solution). > Sorry if I missed that. Now that is cleared up, the use of slices you propose is essentially as a index placeholder for an index not to be indexed by index arrays (akin to what is implied by partial indexing). In that vein it makes sense. Identical functionality could be had by reordering the indices, doing partial indexing and then reordering to the original order. That's clumsy for sure, but it probably isn't going to be done that often. If you've already done it, great. Let me look it over in a bit more detail tonight. Perry From oliphant at ee.byu.edu Thu Mar 17 15:02:24 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 17 15:02:24 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <3c1aabf51ac58b4ad8512f2150067ecc@stsci.edu> References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> <4239FDD6.6020409@ee.byu.edu> <3c1aabf51ac58b4ad8512f2150067ecc@stsci.edu> Message-ID: <423A0C24.5050605@ee.byu.edu> Perry Greenfield wrote: > > On Mar 17, 2005, at 4:59 PM, Travis Oliphant wrote: > >> Perry Greenfield wrote: >> >>> Before I delve too deeply into what you are suggesting (or asking), >>> has the idea to have a slice be equivalent to an index array been >>> changed. For example, I recall seeing (I forget where), the >>> suggestion that >>> >>> X[:,ind] is the same as X[arange(X.shape[0]), ind] >>> >> This was in the PEP originally. But, after talking with you and >> better understanding the "broadcasting" issues of the numarray >> indexing scheme, it seemed less like a good idea. Then, during >> implementation it was easier to interpret slices differently. A very >> natural usage fell out as I thought more about partial indexing in >> Numeric: X[ind] where X has more than 1 dimension returns in >> numarray something like >> >> result[i,j,k,...] = X[ind[i,j,k],...] >> >> It seems rather singular to have this Ellipsis-like character only >> useful for the ending dimensions of X. Thus, I decided that >> X[...,ind] ought to be valid as well and return something like >> >> result[...,i,j,k] = X[...,ind[i,j,k]] >> >> So, yes, I've changed my mind (I sent an email about this when I woke >> up and realized a better solution). >> > Sorry if I missed that. > > Now that is cleared up, the use of slices you propose is essentially > as a index placeholder for an index not to be indexed by index arrays > (akin to what is implied by partial indexing). In that vein it makes > sense. Identical functionality could be had by reordering the indices, > doing partial indexing and then reordering to the original order. > That's clumsy for sure, but it probably isn't going to be done that > often. If you've already done it, great. Let me look it over in a bit > more detail tonight. Yes, re-ordering could accomplish the same thing. I should warn you. When I say "done" -- I mean I'm in the bug-fixing phase. So, expect segfaults.. I'm cleaning up as we speak. I may not finish. So, don't look at it unless you are interested in the implementation... because you may not get it to actually work for a day or two. -Travis From mdehoon at ims.u-tokyo.ac.jp Thu Mar 17 22:02:32 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 17 22:02:32 2005 Subject: [Numpy-discussion] Trying out Numeric3 Message-ID: <423A6F69.8020803@ims.u-tokyo.ac.jp> First of all, thanks to the Numerical Python developers for releasing version 23.8 of Numerical Python. It compiles out of the box and avoids the blas/lapack compilation problems in earlier versions, which makes my life as a developer a lot easier. Thanks! Travis Oliphant wrote: > I wanted to let people who may be waiting, that now is a good time to > help with numeric3. The CVS version builds (although I"m sure there are > still bugs), but more eyes could help me track them down. > > Currently, all that remains for the arrayobject is to implement the > newly defined methods (really it"s just a re-organization and > re-inspection of the code in multiarraymodule.c to call it using methods). > I downloaded Numeric3 today and installed it. The compilation and installation run fine. There are still some warnings from the compiler here and there, but I guess they will be fixed some other time. During compilation, I noticed that some test program is run, presumably for configuration. The test program is compiled by a different compiler as the one used in the build process. Note that "python setup.py config" is available in the standard distutils, so it may be better to use that instead of a self-defined configuration tool. For one thing, it'll make sure that the compiler used for configuration is the same as the one used for compilation. To use Numeric3, I did "from ndarray import *". I guess for the final version, this will be "from Numeric import *"? When using ndarray, I got a core dump using "zeros": $ python Python 2.5a0 (#1, Mar 2 2005, 12:15:06) [GCC 3.3.3] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from ndarray import * >>> zeros(5) creating data 0xa0c03d0 associated with 0xa0d52c0 array([0.0, 0.0, 0.0, 0.0, 0.0], 'd') Segmentation fault (core dumped) With Python 2.4, the segmentation fault occurs slightly later: $ python2.4 Python 2.4 (#1, Dec 5 2004, 20:47:03) [GCC 3.3.3] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from ndarray import * >>> zeros(5) creating data 0xa0a07f8 associated with 0xa0d6230 array([0.0, 0.0, 0.0, 0.0, 0.0], 'd') >>> >>> ^D freeing 0xa0a07f8 associated with array 0xa0d6230 freeing 0xa123b88 associated with array 0xa0d6230 Segmentation fault (core dumped) Finally, I tried to compile a C extension module that uses Numerical Python (by replacing #include by #include ): $ python setup.py build running build running build_py running build_ext building 'Pycluster.cluster' extension gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -Isrc -Iranli b/src -I/usr/local/include/python2.5 -c python/clustermodule.c -o build/temp.cyg win-1.5.12-i686-2.5/python/clustermodule.o In file included from python/clustermodule.c:2: /usr/local/include/python2.5/ndarray/arrayobject.h:76: warning: redefinition of `ushort' /usr/include/sys/types.h:85: warning: `ushort' previously declared here /usr/local/include/python2.5/ndarray/arrayobject.h:77: warning: redefinition of `uint' /usr/include/sys/types.h:86: warning: `uint' previously declared here These two warnings are probably not so serious, but it would be better to get rid of them anyway. python/clustermodule.c: In function `parse_data': python/clustermodule.c:38: warning: passing arg 1 of pointer to function from in compatible pointer type The offending line 38 is: { PyArrayObject* av = (PyArrayObject*) PyArray_Cast(*array, PyArray_DOUBLE); where array is a PyArrayObject**. Another warning was that PyArrayObject's "dimensions" doesn't seem to be an int array any more. Finally, when linking I get an undefined reference to _PyArray_API. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From konrad.hinsen at laposte.net Thu Mar 17 23:53:15 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 17 23:53:15 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays In-Reply-To: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> References: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> Message-ID: On 17.03.2005, at 19:58, Joe Harrington wrote: > That said, I view a.Sin as a potentially devastating change, if > traditional functional notation is not guarranteed to be preserved > forever. No one made that proposition, so there is no need to worry. The recent discussion was about 1) a misunderstanding. 2) internal implementation details. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From mdehoon at ims.u-tokyo.ac.jp Fri Mar 18 02:00:52 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Fri Mar 18 02:00:52 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <423A744F.8070007@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> Message-ID: <423AA7AB.8030409@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: >> During compilation, I noticed that some test program is run, >> presumably for configuration. The test program is compiled by a >> different compiler as the one used in the build process. Note that >> "python setup.py config" is available in the standard distutils, so it >> may be better to use that instead of a self-defined configuration >> tool. For one thing, it'll make sure that the compiler used for >> configuration is the same as the one used for compilation. > > What does python setup.py config do? I have been unable to figure out > how to get the configuration I need. If you have suggestions, that > would be great... > I submitted a patch to sourceforge that modifies setup.py such that it uses distutils' stuff for the configuration. See patch #1165840. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From barrett at stsci.edu Fri Mar 18 06:19:55 2005 From: barrett at stsci.edu (Paul Barrett) Date: Fri Mar 18 06:19:55 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays In-Reply-To: <5a583fe32cc42cc47b659f2af44c5113@stsci.edu> References: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> <4239DEE9.1050802@csun.edu> <5a583fe32cc42cc47b659f2af44c5113@stsci.edu> Message-ID: <423AE30F.3070604@stsci.edu> Perry Greenfield wrote: > > On Mar 17, 2005, at 2:47 PM, Stephen Walton wrote: > >> Incidentally, one can read here why the Chandra X-Ray Observatory >> chose S-Lang instead of Python for its data analysis software: >> >> http://cxc.harvard.edu/ciao/why/slang.html >> > Though I'll note that I think their conclusion wasn't really correct > then. It does illustrate the aversion to OO though. And look how far this back-assward approach has gotten them over the past 10 years. No very far in my opinion. It has also resulted in several user interface changes during this time. This decision to write most code in a compile language and then embed a scripting language in the code is counter to the way Python development is done. However, lately they seem to see the error of their ways by doing more development in S-lang. I don't think the CIAO data analysis environment is a good example of software design and development. --- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218 From perry at stsci.edu Fri Mar 18 06:43:00 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Mar 18 06:43:00 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays In-Reply-To: References: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> Message-ID: <3380248ec14bd94bddc082525a316552@stsci.edu> On Mar 18, 2005, at 2:51 AM, konrad.hinsen at laposte.net wrote: > On 17.03.2005, at 19:58, Joe Harrington wrote: > >> That said, I view a.Sin as a potentially devastating change, if >> traditional functional notation is not guarranteed to be preserved >> forever. > > No one made that proposition, so there is no need to worry. The recent > discussion was about > > 1) a misunderstanding. > 2) internal implementation details. That it was a misunderstanding is apparently the case. But if you look at the original text, it is easy to see how people could draw that conclusion. So the responses that drew that conclusion had the desirable effect in making that point clear. Specifically, what was said was: > Should all the ufuncs be methods as well? I think Konrad suggested > this. What is the opinion of others? > > > > The move from functions to methods will mean that some of the function > calls currently in Numeric.py will be redundant, but I think they > should stay there for backwards compatibility, (perhaps with a > deprecation warning...) > So the mention of a deprecation warning juxtaposed with the suggestion that ufuncs be methods suggested that it was possible that ufuncs would eventually be only methods. It is good to clear up that isn't the case. Perry From barrett at stsci.edu Fri Mar 18 07:10:29 2005 From: barrett at stsci.edu (Paul Barrett) Date: Fri Mar 18 07:10:29 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> Message-ID: <423AEEFE.1050107@stsci.edu> Perry Greenfield wrote: > Before I delve too deeply into what you are suggesting (or asking), > has the idea to have a slice be equivalent to an index array been > changed. For example, I recall seeing (I forget where), the suggestion > that > > X[:,ind] is the same as X[arange(X.shape[0]), ind] > > The following seems to be at odds with this. The confusion of mixing > slices with index arrays led me to just not deal with them in > numarray. I thought index arrays were getting complicated enough. I > suppose it may be useful, but I would be good to give some motivating, > realistic examples of why they are useful. For example, I can think of > lots of motivating examples for: > > using more than one index array (e.g., X[ind1, ind2]) > allowing index arrays to have arbitrary shape > allowing partial indexing with index arrays Can you give a few then? Say one or two for each of the three scenarios. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218 From barrett at stsci.edu Fri Mar 18 07:23:08 2005 From: barrett at stsci.edu (Paul Barrett) Date: Fri Mar 18 07:23:08 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <4239E5AC.4040901@cox.net> References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> <4239E5AC.4040901@cox.net> Message-ID: <423AF1E2.1060200@stsci.edu> Tim Hochberg wrote: > > My take is that having even one type of index array overloaded onto > the current indexing scheme is questionable. In fact, even numarray's > current scheme is too complicated for my taste. I particularly don't > like the distinction that has to be made between lists and arrays on > one side and tuples on the other. I understand why it's there, but I > don't like it. > > Is it really necessary to pile these indexing schemes directly onto > the main array object. It seems that it would be clearer, and more > flexible, to use a separate, attached adapter object. For instance > (please excuse the names as I don't have good ideas for those): > > X.rows[ind0, ind1, ..., ind2, :] > > would act like take(take(take(X, ind0, 0), ind1, 1), ind2, -1)). That > is it would select the rows given by ind0 along the 0th axis, the rows > given by ind1 along the 1st axis (aka the columns) and the rows given > by ind2 along the -2nd axis. > > X.atindex[indices] would give numarray's current indexarray behaviour. > > Etc, etc for any other indexing scheme that's deemed useful. > > As I think about it more I'm more convinced that basic indexing should > not support index arrays at all. Any indexarray behaviour should be > impleented using helper/adapter objects. Keep basic indexing simple. > This also gives an opportunity to have multiple different types of > index arrays behaviour. So you're saying that 1-D indexing arrays (or vectors) should not be allowed? As Perry said earlier, 'slice(1,9,2)' is equivalent to 'range(1, 9, 2)'. I just consider slices to be a shorthand for _regular_ indexing, whereas indexed arrays also allow for _irregular_ indexing. Or am I missing something? -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218 From cjw at sympatico.ca Fri Mar 18 07:29:21 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Mar 18 07:29:21 2005 Subject: [Numpy-discussion] Re: Please chime in on proposed methods for arrays In-Reply-To: <4239FBC6.3010808@ee.byu.edu> References: <200503171858.j2HIwHDA028013@oobleck.astro.cornell.edu> <4239FBC6.3010808@ee.byu.edu> Message-ID: <423AF35A.7060304@sympatico.ca> Travis Oliphant wrote: > Joe Harrington wrote: > >> I'll start by saying something positive: I am very encouraged by all >> the work that's going into resolving the small/big array split! >> >> > Thanks, more-hands makes less work... > > >> That said, I view a.Sin as a potentially devastating change, if >> traditional functional notation is not guarranteed to be preserved >> forever. >> >> > > Hold on, everybody. I'm the last person that would move from sin(x) > to x.Sin as a "requirement". I don't believe this was ever > suggested. I was just remembering that someone thought it would be > useful if x.sin() were allowed, and noticed that the PEP had not > mentioned that as a possibility. My suggestion was that x.Sin be available as a method. It was challenged as all the maths books use sin(x). True, since the books in the main are dealing with scalar x. Two things were suggested (1) with a method, one can drop the redundant parentheses, and (2) capitalize the first letter of the method to make it clear that the operation applies to the whole of an array structure and not just to a single value. It was also pointed out that the Sin style focuses on order of evaluation and so the expression looks different than a nested expression. Some, who prefer nesting can use the function. As Konrad Hinsen has pointed out, this is implementation detail stuff but, I suggest, of some importance as it gives the face which is presented to the world. > > I'm inclined now to NOT add such computational methods and *require* > ufuncs to be called as is currently done. Presumably, nothing would be done to inhibit the use of properties in sub-classes. > > It's interesting to see so many responses to something that in my mind > was not the big issue, and to hear very little about the > multidimensional indexing proposal. > My problem here is that I don't really understand just what the current proposal envisages. Some example would help. (1) How does a[..., 3] differ from a[:,3]? (2) How does this differ from numaray's take/put? Setting array elements using advanced indexing will be similar to getting. The object used for setting will be force-cast to the array's type if needed. This type must be "broadcastable" to the required shape specified by the indexing, where "broadcastable" is more fully explained below. Alternatively, the object can be an array iterator. This will repeatedly iterate over the object until the desired elements are set. The shape of X is never changed. (3) is this a typo? selects a 1-d array filled with the elements of A?? corresponding to the non-zero values of B. The search order will be C-style (last-index varies the fastest). (4) I can see value in nonZero(X) or even nonZero(X, tolerance), which presumably delivers an Array with a Boolean element type, but I wonder about the need for nonZero with a Boolean array as an argument. Shouldn't X[B] do the job? (5) Mention is made of indexing objects. These are Arrays of some sort? Colin W. From tim.hochberg at cox.net Fri Mar 18 09:19:21 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Mar 18 09:19:21 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <423AF1E2.1060200@stsci.edu> References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> <4239E5AC.4040901@cox.net> <423AF1E2.1060200@stsci.edu> Message-ID: <423B0D07.3090609@cox.net> Paul Barrett wrote: > Tim Hochberg wrote: > >> >> My take is that having even one type of index array overloaded onto >> the current indexing scheme is questionable. In fact, even numarray's >> current scheme is too complicated for my taste. I particularly don't >> like the distinction that has to be made between lists and arrays on >> one side and tuples on the other. I understand why it's there, but I >> don't like it. >> >> Is it really necessary to pile these indexing schemes directly onto >> the main array object. It seems that it would be clearer, and more >> flexible, to use a separate, attached adapter object. For instance >> (please excuse the names as I don't have good ideas for those): >> >> X.rows[ind0, ind1, ..., ind2, :] >> >> would act like take(take(take(X, ind0, 0), ind1, 1), ind2, -1)). That >> is it would select the rows given by ind0 along the 0th axis, the >> rows given by ind1 along the 1st axis (aka the columns) and the rows >> given by ind2 along the -2nd axis. >> >> X.atindex[indices] would give numarray's current indexarray behaviour. >> >> Etc, etc for any other indexing scheme that's deemed useful. >> >> As I think about it more I'm more convinced that basic indexing >> should not support index arrays at all. Any indexarray behaviour >> should be impleented using helper/adapter objects. Keep basic >> indexing simple. This also gives an opportunity to have multiple >> different types of index arrays behaviour. > > > So you're saying that 1-D indexing arrays (or vectors) should not be > allowed? As Perry said earlier, 'slice(1,9,2)' is equivalent to > 'range(1, 9, 2)'. I just consider slices to be a shorthand for > _regular_ indexing, whereas indexed arrays also allow for _irregular_ > indexing. Or am I missing something? I'm saying that irregular indexing should be spelled differently that regular indexing. Consider this little (contrived) example: X[(2,3,5,7,11)] = Y[[2,4,8,16,32]] Quick! What's that mean using numarray's indexing rules? (which I believe are close enough to the proposed rules to not make a difference for this case.) Oddly, it means: X[2,3,5,7,11] = take(Y, [2,4,8,16,32]) That's not entirely numarray's fault. For historical reasons X[a,b,c,d] is treated by Python exactly the same X[(a,b,c,d)]. And the above case is not going to get programmed on purpose, except by the pathological, but it could crop up as a bug fairly easily since in most other circumstances tuples and lists are equivalent. Even the more standard: X[2,3,5,7,11] = Y[[2,4,8,16,32]] Is not exactly easy to decipher. Contrast this to the proposed: X[2,3,5,7,11] = Y.atindex[2,4,8,16,32] Where it's immediately apparent that one indexing operation is irregular and one is regular. Note that you still need to use indexing notation on atindex, and thus it needs to be some sort of helper object vaguely similar to the new flat. This is because you also want: Y.atindex[2,4,8,16,32] = X[2,3,5,7,11] to work and it wouldn't work with function call syntax. This doesn't entirely insulate one from the weirdness of using tuples as indexes described above, but it should be a big improvement in this regard. Irrespective of that, it's much clearer what's going on with the spelling of the two types of indexing differentiated. It also opens the door for other types of irregular indexing, since it may turn out that there is more than one type of irregular indexing that may be useful as mentioned by Perry (?) earlier. -tim From perry at stsci.edu Fri Mar 18 10:58:14 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Mar 18 10:58:14 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <423AEEFE.1050107@stsci.edu> References: <42391B6E.8060709@ee.byu.edu> <42392481.1010701@ee.byu.edu> <423AEEFE.1050107@stsci.edu> Message-ID: On Mar 18, 2005, at 10:08 AM, Paul Barrett wrote: > Perry Greenfield wrote: > >> Before I delve too deeply into what you are suggesting (or asking), >> has the idea to have a slice be equivalent to an index array been >> changed. For example, I recall seeing (I forget where), the >> suggestion that >> >> X[:,ind] is the same as X[arange(X.shape[0]), ind] >> >> The following seems to be at odds with this. The confusion of mixing >> slices with index arrays led me to just not deal with them in >> numarray. I thought index arrays were getting complicated enough. I >> suppose it may be useful, but I would be good to give some >> motivating, realistic examples of why they are useful. For example, I >> can think of lots of motivating examples for: >> >> using more than one index array (e.g., X[ind1, ind2]) A common task is to obtain a list of values from an image based on a list (array) of i,j locations in the image. These index arrays may have come from some other source (say a catalog of known star positions) or from a function that obtained the positions of local maxima found in a corresponding (but different image) for the purposes of comparing the image objects with another image's objects. >> allowing index arrays to have arbitrary shape A classic example is using the array to be indexed as a lookup table. If I have byte image and wish to transform it to a different greyscale using a lookup table, I can use the byte image as a index array for the lookup table array. transformedimage = lookuptable[image] >> allowing partial indexing with index arrays > Here I'll go one better, a combination of the previous and this one using a similar mechanism, except to generate rgb values. The lookup table now is a 256x3 array representing how each of the 256 possible byte values are to be mapped to an rgb value rgbimage = lookuptable[image] Here the rgbimage has shape (image.shape[0],image.shape[1],3) But partial indexing can be used for other things such as selecting from a set of weighting functions or images to be used against a stack of 1-d arrays or images respectively for subsequent processing (e.g., reduction) > Can you give a few then? Say one or two for each of the three > scenarios. Others may be able to come with better or alternative examples. From oliphant at ee.byu.edu Fri Mar 18 11:35:15 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 18 11:35:15 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <423AA7AB.8030409@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <423AA7AB.8030409@ims.u-tokyo.ac.jp> Message-ID: <423B2CFA.2010603@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > I submitted a patch to sourceforge that modifies setup.py such that it > uses distutils' stuff for the configuration. See patch #1165840. > > --Michiel. Thank you so much. I just had to modify it so that "." was added to the path prior to trying config so it would work on my Linux box. I'm not an expert with distutils and so I appreciate this help greatly. Eventually, we should probably put the other defines described in the setup.py file in the config.h file as well. -Travis From stephen.walton at csun.edu Fri Mar 18 14:32:16 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Fri Mar 18 14:32:16 2005 Subject: [Numpy-discussion] Status of numeric3 / scipylite / scipy_core In-Reply-To: <878y4mxuf7.fsf@welho.com> References: <42391B6E.8060709@ee.byu.edu> <878y4mxuf7.fsf@welho.com> Message-ID: <423B5667.9010004@csun.edu> Timo Korvola wrote: >Travis Oliphant writes: > > >>- indexing with multidimensional index arrays under the >>numarray-introduced scheme (which seems reasonable to me) >> >> > >It is powerful but likely to confuse Matlab and Fortran users because >a[[0,1], [1,2]] is different from a[0:2, 1:3]. > > Ack. And let's not even talk about what take(a,((0,1),(1,2))) returns when shape(a)==(3,3). As I noted in my lengthy comments, index arrays and takes were far and away the most confusing part of the current numarray docs to me. From sdhyok at gmail.com Sat Mar 19 18:21:15 2005 From: sdhyok at gmail.com (Daehyok Shin) Date: Sat Mar 19 18:21:15 2005 Subject: [Numpy-discussion] A pray from an end user of numeric python. In-Reply-To: <4239F8B3.7080105@ee.byu.edu> References: <371840ef0503171244573f487e@mail.gmail.com> <4239F8B3.7080105@ee.byu.edu> Message-ID: <371840ef050319182071952653@mail.gmail.com> > Thank you, thank you for speaking up. I am very interested in hearing > from end users. In fact, I'm an "end-user" myself. My real purpose in > life is not to endlessly write array packages. I want to get back to > the many problems I'm working on that require real usage. Travis. I am really happy to hear your encouragement. And relieved to see you are not going to create another lib to define numeric arrays. > In my opinion, the more use-cases of arrays we see, the better design > desicions can be made. Ultimately, the fact the numarray split off from > Numeric was that some people wanted some new features to Numeric and > wanted to try some new design ideas. Their efforts have led to a > better understanding of what a good array object should be. It is true that open source community should always be open to new ideas or designs. However, considering the situation that there is no solid standard numeric library for Python, I don't think it is time for renovation. MATLAB gives us a good example. Even though it has terrible data structures for matrices, particularly sparse matrices, its plentiful libraries around the data structures made possible it becomes the most popular software for numerical programming. Who wants to build his/her house on the continuously-shaking ground? To gain a wide support from users, a program may need some balance between renovation and stabilization. My concern came from the feeling that our community is losing the balance. > Replacing the standard array type in Python is a longer-term problem. > We need to put our own house in order in order to make that happen. > Many of us want to see a single array type be standard in Python as long > as we are satisfied with it. We may agree that if a package succeeds in gaining the support from Guido, it will be the standard for numeric array in Python, no matter what limitations the package has. And, I can bet Guido will like a simple and small package ? like new package for sets. In this context, I think we have to shift our focus from "What new fancy functions are needed?" to "Is this function really necessary in the standard array package of Python?" > I think everybody involved wants this too. I'm giving up a great deal > of my time to make it happen, largely because I see a great need and a > way for me to contribute to help it. I am very interested in recruiting > others to assist me. So far, I've received a lot of supportive > comments, but not much supporting code. We have the momentum. I think > we can get this done, so that come June, there is no "split" aside from > backward compatibility layers.... Sorry. I was among the users who always complains but not contributes anything. To see what I can do, I am checking out your repository. > > In my estimation the fastest way to bring the two development directions > together is to merge the numarray features back into Numeric. I agree. For me, if I can write x[x>0] and create new classes easily by inheriting existing arrays with Numeric, I will come back to Numeric. Will Numeric3 solve the limitations? > extended the effort some, but I have not lost sight of the goal. I > don't want "yet-another-implementation". I want everybody involved to > agree that a single package provides the basis for what people need. Yes. No more "yet-another-implementation". So, will we use the same command to import Numeric3 as Numeric, right? from Numeric import * If true, I am wondering why a new name, rather than just Numeric, is used for the package. > Thanks again for your comments. If you can help in any way (e.g. > writing test scripts) then please chip in. I appreciate your kind reply to my humble mail. And I like to remind you that so many Python users are praying that you succeed the project of Numeric3 with the earnest hope to have a standard numeric array type in Python. -- Daehyok Shin (Peter) Geography Department University of North Carolina-Chapel Hill USA From sdhyok at gmail.com Sat Mar 19 18:57:06 2005 From: sdhyok at gmail.com (Daehyok Shin) Date: Sat Mar 19 18:57:06 2005 Subject: [Numpy-discussion] The first try on Numeric3. Message-ID: <371840ef05031918566113287b@mail.gmail.com> Dear Travis. I found no problem in installing Numeric3 and running tests in Mandrake 10.1. Good job. One question I have. I found you does not use UnitTest for test files. Will you change all tests eventually using UnitTest? If so, I think I can contribute something because I got some experience for it. For test files, please let me know what kind of supports you need. Thanks for your effort. -- Daehyok Shin Geography Department University of North Carolina-Chapel Hill USA From juenglin at cs.pdx.edu Sat Mar 19 21:09:14 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Sat Mar 19 21:09:14 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars Message-ID: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> Travis, Discussing zero dimensional arrays, the PEP says at one point: ... When ndarray is imported, it will alter the numeric table for python int, float, and complex to behave the same as array objects. Thus, in the proposed solution, 0-dim arrays would never be returned from calculation, but instead, the equivalent Python Array Scalar Type. Internally, these ArrayScalars can be quickly converted to 0-dim arrays when needed. Each scalar would also have a method to convert to a "standard" Python Type upon request (though this shouldn't be needed often). I'm not sure I understand this. Does it mean that, after having imported ndarray, "type(1)" to "ndarray.IntArrType" rather than "int"? If so, I think this is a dangerous idea. There is one important difference between zero dimensional arrays and Python scalar types, which is not discussed in the PEP: arrays are mutable, Python scalars are immutable. When Guido introduced in-place operators in Python, (+=, *=, etc.) he decided that "i += 1" should be allowed for Python scalars and should mean "i = i + 1". Here you have it, it means something different when i is a mutable zero dimensional array. So, I suspect a tacit re-definition of Python scalars on ndarray import will break some code out there (code, that does not deal with arrays at all). Facing this important difference between arrays and Python scalars, I'm also not sure anymore that advertising zero dimensional arrays as essentially the same as Python scalars is such a good idea. Perhaps it would be better not to try to inherit from Python's number types and all that. Perhaps it would be easier to just say that indexing an array always results in an array and that zero dimensional arrays can be converted into Python scalars. Period. Ralf PS: You wrote two questions about zero dimensional arrays vs Python scalars into the PEP. What are your plans for deciding these? From juenglin at cs.pdx.edu Sat Mar 19 21:49:05 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Sat Mar 19 21:49:05 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> Message-ID: <1111297725.21849.69.camel@alpspitze.cs.pdx.edu> I just read the section about "Array Scalars" again and am not sure anymore that I understood the whole idea. When you say "Array Scalar", do you mean a zero dimensional array or is an "Array Scalar" yet another animal? Ralf On Sat, 2005-03-19 at 21:06, Ralf Juengling wrote: > Travis, > > Discussing zero dimensional arrays, the PEP says at one point: > > ... When ndarray is imported, it will alter the numeric table > for python int, float, and complex to behave the same as > array objects. > > Thus, in the proposed solution, 0-dim arrays would never be > returned from calculation, but instead, the equivalent Python > Array Scalar Type. Internally, these ArrayScalars can > be quickly converted to 0-dim arrays when needed. Each scalar > would also have a method to convert to a "standard" Python Type > upon request (though this shouldn't be needed often). > > > I'm not sure I understand this. Does it mean that, after having > imported ndarray, "type(1)" to "ndarray.IntArrType" rather than > "int"? > > If so, I think this is a dangerous idea. There is one important > difference between zero dimensional arrays and Python scalar > types, which is not discussed in the PEP: arrays are mutable, > Python scalars are immutable. > > When Guido introduced in-place operators in Python, (+=, *=, > etc.) he decided that "i += 1" should be allowed for Python > scalars and should mean "i = i + 1". Here you have it, it > means something different when i is a mutable zero dimensional > array. So, I suspect a tacit re-definition of Python scalars > on ndarray import will break some code out there (code, that > does not deal with arrays at all). > > Facing this important difference between arrays and Python > scalars, I'm also not sure anymore that advertising zero > dimensional arrays as essentially the same as Python scalars > is such a good idea. Perhaps it would be better not to try to > inherit from Python's number types and all that. Perhaps it > would be easier to just say that indexing an array always > results in an array and that zero dimensional arrays can be > converted into Python scalars. Period. > > Ralf > > > PS: You wrote two questions about zero dimensional arrays > vs Python scalars into the PEP. What are your plans for > deciding these? > > > > > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From cjw at sympatico.ca Sun Mar 20 08:42:16 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Mar 20 08:42:16 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> Message-ID: <423DA7B3.8090906@sympatico.ca> Ralf Juengling wrote: >Travis, > >Discussing zero dimensional arrays, the PEP says at one point: > > ... When ndarray is imported, it will alter the numeric table > for python int, float, and complex to behave the same as > array objects. > > Thus, in the proposed solution, 0-dim arrays would never be > returned from calculation, but instead, the equivalent Python > Array Scalar Type. Internally, these ArrayScalars can > be quickly converted to 0-dim arrays when needed. Each scalar > would also have a method to convert to a "standard" Python Type > upon request (though this shouldn't be needed often). > > >I'm not sure I understand this. Does it mean that, after having >imported ndarray, "type(1)" to "ndarray.IntArrType" rather than >"int"? > >If so, I think this is a dangerous idea. There is one important >difference between zero dimensional arrays and Python scalar >types, which is not discussed in the PEP: arrays are mutable, >Python scalars are immutable. > >When Guido introduced in-place operators in Python, (+=, *=, >etc.) he decided that "i += 1" should be allowed for Python >scalars and should mean "i = i + 1". Here you have it, it >means something different when i is a mutable zero dimensional >array. So, I suspect a tacit re-definition of Python scalars >on ndarray import will break some code out there (code, that >does not deal with arrays at all). > >Facing this important difference between arrays and Python >scalars, I'm also not sure anymore that advertising zero >dimensional arrays as essentially the same as Python scalars >is such a good idea. Perhaps it would be better not to try to >inherit from Python's number types and all that. Perhaps it >would be easier to just say that indexing an array always >results in an array and that zero dimensional arrays can be >converted into Python scalars. Period. > >Ralf > > >PS: You wrote two questions about zero dimensional arrays >vs Python scalars into the PEP. What are your plans for >deciding these? > > > > It looks as though a decision has been made. I was among those who favoured abandoning rank-0 arrays, we lost. To my mind rank-0 arrays add complexity for little benefit and make explanation more difficult. I don't spot any discussion in the PEP of the pros and cons of the nd == 0 case. Colin W. From cjw at sympatico.ca Sun Mar 20 08:52:36 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Mar 20 08:52:36 2005 Subject: [Numpy-discussion] Additions to stdlib Message-ID: <423DAA27.60402@sympatico.ca> Here are some thoughts from Martin L?wis on the requirements and the process to enhance the standard Python library. Colin W. -------- Original Message -------- Subject: Re: survey of modules to be added to stdlib Date: Fri, 18 Mar 2005 23:26:31 +0100 From: "Martin v. L?wis" To: Alia Khouri Newsgroups: comp.lang.python References: <1111184161.122375.227250 at l41g2000cwc.googlegroups.com> Alia Khouri wrote: > BTW is there an official set of conditions that have to be met before a > module can be accepted into the stdlib? Yes - although this has never been followed to date: In PEP 2, http://www.python.org/peps/pep-0002.html a procedure is defined how new modules can be added. Essentially, we need a document stating its intended purpose, and a commitment by the authors to maintain the code. This may rule out inclusion of some modules in your list, e.g. if nobody steps forward to offer ongoing maintenance. Just that users want to see the code in the library is not sufficient, we also need somebody to do the actual work. If none of the core developers respond favourably to requests for inclusion, a library PEP can be seen as a last resort to trigger a BDFL pronouncement. Depending on the module, I personally would actively object inclusion if I have doubts whether the module is going to be properly maintained; I will, of course, obey to any BDFL pronouncement. Furthermore, and more recently, we also started requiring that code is *formally* contributed to the PSF, through the contrib forms, http://www.python.org/psf/contrib.html This may rule out further modules: the authors of the code have to agree to its inclusion in the library; somebody else contributing the modules for the authors will not be acceptable. However, the authors don't have to offer ongoing support for the copy in Python - any other volunteer could step in instead. Regards, Martin From cjw at sympatico.ca Sun Mar 20 10:36:20 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Mar 20 10:36:20 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <423DA7B3.8090906@sympatico.ca> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> <423DA7B3.8090906@sympatico.ca> Message-ID: <423DC23C.80907@sympatico.ca> Colin J. Williams wrote: > Ralf Juengling wrote: > >> Travis, >> >> Discussing zero dimensional arrays, the PEP says at one point: >> >> ... When ndarray is imported, it will alter the numeric table >> for python int, float, and complex to behave the same as array >> objects. >> Thus, in the proposed solution, 0-dim arrays would never be >> returned from calculation, but instead, the equivalent Python >> Array Scalar Type. Internally, these ArrayScalars can >> be quickly converted to 0-dim arrays when needed. Each scalar >> would also have a method to convert to a "standard" Python Type >> upon request (though this shouldn't be needed often). >> >> >> I'm not sure I understand this. Does it mean that, after having >> imported ndarray, "type(1)" to "ndarray.IntArrType" rather than "int"? >> >> If so, I think this is a dangerous idea. There is one important >> difference between zero dimensional arrays and Python scalar types, >> which is not discussed in the PEP: arrays are mutable, Python scalars >> are immutable. >> >> When Guido introduced in-place operators in Python, (+=, *=, etc.) he >> decided that "i += 1" should be allowed for Python >> scalars and should mean "i = i + 1". Here you have it, it means >> something different when i is a mutable zero dimensional >> array. So, I suspect a tacit re-definition of Python scalars >> on ndarray import will break some code out there (code, that >> does not deal with arrays at all). >> Facing this important difference between arrays and Python >> scalars, I'm also not sure anymore that advertising zero >> dimensional arrays as essentially the same as Python scalars >> is such a good idea. Perhaps it would be better not to try to >> inherit from Python's number types and all that. Perhaps it >> would be easier to just say that indexing an array always results in >> an array and that zero dimensional arrays can be converted into >> Python scalars. Period. >> >> Ralf >> >> >> PS: You wrote two questions about zero dimensional arrays vs Python >> scalars into the PEP. What are your plans for deciding these? >> >> >> >> > It looks as though a decision has been made. I was among those who > favoured abandoning rank-0 arrays, we lost. > > To my mind rank-0 arrays add complexity for little benefit and make > explanation more difficult. > > I don't spot any discussion in the PEP of the pros and cons of the nd > == 0 case. A correction! There is, in the PEP:: Questions 1) should sequence behavior (i.e. some combination of slicing, indexing, and len) be supported for 0-dim arrays? Pros: It means that len(a) always works and returns the size of the array. Slicing code and indexing code will work for any dimension (the 0-dim array is an identity element for the operation of slicing) Cons: 0-dim arrays are really scalars. They should behave like Python scalars which do not allow sequence behavior 2) should array operations that result in a 0-dim array that is the same basic type as one of the Python scalars, return the Python scalar instead? Pros: 1) Some cases when Python expects an integer (the most dramatic is when slicing and indexing a sequence: _PyEval_SliceIndex in ceval.c) it will not try to convert it to an integer first before raising an error. Therefore it is convenient to have 0-dim arrays that are integers converted for you by the array object. 2) No risk of user confusion by having two types that are nearly but not exactly the same and whose separate existence can only be explained by the history of Python and NumPy development. 3) No problems with code that does explicit typechecks (isinstance(x, float) or type(x) == types.FloatType). Although explicit typechecks are considered bad practice in general, there are a couple of valid reasons to use them. 4) No creation of a dependency on Numeric in pickle files (though this could also be done by a special case in the pickling code for arrays) Cons: It is difficult to write generic code because scalars do not have the same methods and attributes as arrays. (such as .type or .shape). Also Python scalars have different numeric behavior as well. This results in a special-case checking that is not pleasant. Fundamentally it lets the user believe that somehow multidimensional homoegeneous arrays are something like Python lists (which except for Object arrays they are not). For me and for the end user, the (2) Pros win. Colin W. From cjw at sympatico.ca Sun Mar 20 11:05:25 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Mar 20 11:05:25 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <423DB7B8.5050004@cs.pdx.edu> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> <423DA7B3.8090906@sympatico.ca> <423DB7B8.5050004@cs.pdx.edu> Message-ID: <423DC958.3010608@sympatico.ca> Ralf Juengling wrote: >>> >>> >> It looks as though a decision has been made. I was among those who >> favoured abandoning rank-0 arrays, we lost. >> >> To my mind rank-0 arrays add complexity for little benefit and make >> explanation more difficult. > > > What the current PEP describes is perhaps close to what you want, > though: It says that indexing an array never results in a zero > dimensional array but it results "Array Scalars", which are basically > Python scalars, but there are just more of them to support variety > the numeric types. > > You could still create zero dimensional arrays by reshaping single > element arrays though. > >> >> I don't spot any discussion in the PEP of the pros and cons of the nd >> == 0 case. > > > I don't remember your idea--getting rid of zero dimensional arrays > altogether--being voiced and discussed on this list. What would be > the bad consequences of getting rid of zero dimensional arrays? The argument made in the PEP against returning Python scalars is: Cons: It is difficult to write generic code because scalars do not have the same methods and attributes as arrays. (such as .type or .shape). Also Python scalars have different numeric behavior as well. This results in a special-case checking that is not pleasant. Fundamentally it lets the user believe that somehow multidimensional homoegeneous arrays are something like Python lists (which except for Object arrays they are not). I suggest that, in striking the balance between the developer or generic writer and the end user, the greater design consideration should go to the ease and convenience of the end user. Colin W. From rkern at ucsd.edu Sun Mar 20 15:29:20 2005 From: rkern at ucsd.edu (Robert Kern) Date: Sun Mar 20 15:29:20 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <423DC958.3010608@sympatico.ca> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> <423DA7B3.8090906@sympatico.ca> <423DB7B8.5050004@cs.pdx.edu> <423DC958.3010608@sympatico.ca> Message-ID: <423E0714.103@ucsd.edu> Colin J. Williams wrote: > The argument made in the PEP against returning Python scalars is: > > Cons: It is difficult to write generic code because scalars > do not have the same methods and attributes as arrays. > (such as .type or .shape). Also Python scalars have > different numeric behavior as well. > This results in a special-case checking that is not > pleasant. Fundamentally it lets the user believe that > somehow multidimensional homoegeneous arrays > are something like Python lists (which except for > Object arrays they are not). > > I suggest that, in striking the balance between the developer or generic > writer and the end user, > the greater design consideration should go to the ease and convenience > of the end user. How are you defining "end user"? To my definition, an end user will neither know nor care whether rank-0 arrays or Python ints, longs, floats, or complexes are returned. They will be at a GUI seeing graphs or reading output. They won't see a bit of code. The "generic code" being talked about in the PEP isn't code inside Numeric itself. It's all of the stuff written *using* Numeric. Now, if you are defining "end user" to be the people using Numeric to write code, then we can argue about which choice is simpler or more convenient. There are some situations in which the rank-0 approach is more convenient and some in which the Python scalar is preferred. I'm not sure that we can reliably enumerate them. I would suggest that Option 2, returning Python types when the typecode allows and rank-0 arrays otherwise, is an inconsistency that we could do without. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Sun Mar 20 22:07:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Mar 20 22:07:07 2005 Subject: [Numpy-discussion] The first try on Numeric3. In-Reply-To: <371840ef05031918566113287b@mail.gmail.com> References: <371840ef05031918566113287b@mail.gmail.com> Message-ID: <423E644E.8060902@ee.byu.edu> Daehyok Shin wrote: >Dear Travis. >I found no problem in installing Numeric3 and running tests in Mandrake 10.1. >Good job. >One question I have. >I found you does not use UnitTest for test files. >Will you change all tests eventually using UnitTest? >If so, I think I can contribute something >because I got some experience for it. > >For test files, please let me know what kind of supports you need. > > Absolutely, there will be UnitTests (scipy has them now) and it's what I'm used to, I just have not worried about them yet. Thank you for your comments. The only reason I have named it Numeric3 is so that I can continue using Numeric on my system until Numeric3 is ready to replace it and because I chose that as the name for the CVS project --- it really is a branch of Numeric, though. I just didn't want to learn how to use CVS branching... When it is nearing completion, it will go into scipy_core. I'm pretty sure, you will be able to say import Numeric when done. -Travis From oliphant at ee.byu.edu Sun Mar 20 22:18:05 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Mar 20 22:18:05 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <1111297725.21849.69.camel@alpspitze.cs.pdx.edu> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> <1111297725.21849.69.camel@alpspitze.cs.pdx.edu> Message-ID: <423E66D3.2030006@ee.byu.edu> Ralf Juengling wrote: >I just read the section about "Array Scalars" again and >am not sure anymore that I understood the whole idea. When >you say "Array Scalar", do you mean a zero dimensional >array or is an "Array Scalar" yet another animal? > >Ralf > > It is another type object that is a scalar but "quacks" like an array (has the same methods and attributes) -Travis From oliphant at ee.byu.edu Sun Mar 20 22:28:42 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Mar 20 22:28:42 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <423DA7B3.8090906@sympatico.ca> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> <423DA7B3.8090906@sympatico.ca> Message-ID: <423E6922.6050001@ee.byu.edu> Colin J. Williams wrote: > It looks as though a decision has been made. I was among those who > favoured abandoning rank-0 arrays, we lost. > I don't understand how you can say this. In what way have rank-0 arrays not been abandoned for the new Array Scalar objects? By the way, these array scalar objects can easily be explained as equivalent to the type hierarchy of current numarray (it is essentially identical --- it's just in C). > To my mind rank-0 arrays add complexity for little benefit and make > explanation more difficult. I don't know what you mean. rank-0 arrays are built into the arrayobject type. Removing them is actually difficult. The easiest thing to do is to return rank-0 arrays whenever the operation allows it. It is the confusion with desiring to use items in an array (which are logically rank-0 arrays) as equivalent to Python scalars that requires the Array Scalars that "bridge the gap" between rank-0 arrays and "regular" Python scalars. Perhaps you mean that "Array Scalars" add complexity for "little beneift" and not "rank-0 arrays". To address that question: It may add complexity, but it does add benefit (future optimization, array type hierarchy, and a better bridge between the problem of current Python scalars and array-conscious scalars). This rank-0 problem has been a wart with Numeric for a long time. Most of us long-time users work around it, but heavy users are definitely aware of the problem and a bit annoyed. I think we have finally found a reasonable "compromise" solution in the Array Scalars. Yes, it did take more work to implement (and will take a little more work to maintain --- you need to add methods to the GenericScalar class when you add them to the Array Class), but I can actually see it working. -Travis From juenglin at cs.pdx.edu Sun Mar 20 23:25:17 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Sun Mar 20 23:25:17 2005 Subject: [Numpy-discussion] Thoughts about zero dimensional arrays vs Python scalars In-Reply-To: <423E66D3.2030006@ee.byu.edu> References: <1111295212.21849.35.camel@alpspitze.cs.pdx.edu> <1111297725.21849.69.camel@alpspitze.cs.pdx.edu> <423E66D3.2030006@ee.byu.edu> Message-ID: <423E750A.3030403@cs.pdx.edu> Travis Oliphant wrote: > Ralf Juengling wrote: > >> I just read the section about "Array Scalars" again and >> am not sure anymore that I understood the whole idea. When >> you say "Array Scalar", do you mean a zero dimensional array or is an >> "Array Scalar" yet another animal? >> >> Ralf >> >> > It is another type object that is a scalar but "quacks" like an array > (has the same methods and attributes) ... but is, unlike arrays, an immutable type (just like the existing Python scalars). ralf From boomberschloss at yahoo.com Mon Mar 21 01:59:06 2005 From: boomberschloss at yahoo.com (Joachim Boomberschloss) Date: Mon Mar 21 01:59:06 2005 Subject: [Numpy-discussion] casting in numarray In-Reply-To: 6667 Message-ID: <20050321095815.83981.qmail@web53109.mail.yahoo.com> Thanks, that's exactly what I needed! --- Thomas Grill wrote: > Hi Joachim, > this is what i do in my Python extension of the Pure > Data realtime > modular system. You have to create a Python buffer > object pointing to > your memory location and then create a numarray from > that. It's quite easy. > See the code in > http://cvs.sourceforge.net/viewcvs.py/pure-data/externals/grill/py/source/ > files pybuffer.h and pybuffer.cpp > > best greetings, > Thomas > > Joachim Boomberschloss schrieb: > > >Hi, > > > >I'm using numarray for an audio-related application > as > >a buffer in an audio-processing pipeline. I would > like > >to be able to allocate the buffer in advance and > later > >regard it as a buffer of 8bit or 16bit samples as > >appropriate, but in numarray, casting always > produces > >a new array, which I don't want. How difficult > should > >it be to make it possible to create an array using > an > >exsisting pre-allocated buffer to act as an > interface > >to that buffer? Also, if others consider it useful, > is > >there anyone willing to guide me through the code > in > >doing so? > > > >Thanks, > > > >Joe > > > > > > > >__________________________________ > >Do you Yahoo!? > >Yahoo! Small Business - Try our new resources site! > >http://smallbusiness.yahoo.com/resources/ > > > > > >------------------------------------------------------- > >SF email is sponsored by - The IT Product Guide > >Read honest & candid reviews on hundreds of IT > Products from real users. > >Discover which products truly live up to the hype. > Start reading now. > >http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > >_______________________________________________ > >Numpy-discussion mailing list > >Numpy-discussion at lists.sourceforge.net > >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > > > -- > --->----->->----->-- > Thomas Grill > gr at grrrr.org > +43 699 19715543 > > __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ From jdgleeson at mac.com Mon Mar 21 07:40:29 2005 From: jdgleeson at mac.com (John Gleeson) Date: Mon Mar 21 07:40:29 2005 Subject: [Numpy-discussion] Numeric3 compilation errors on OS X Message-ID: <59990ca88950649336c742f006c8e725@mac.com> I get the following errors when building the extensions for Numeric3 on OS X 10.3.8 (with PatherPythonFix installed): Src/arrayobject.c:2538: error: conflicting types for `_swap_axes' Src/arrayobject.c:1170: error: previous declaration of `_swap_axes' This is for arrayobject.c v. 1.61. Any ideas? thanks, John From oliphant at ee.byu.edu Mon Mar 21 16:17:04 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 21 16:17:04 2005 Subject: [Numpy-discussion] Numeric3 compilation errors on OS X In-Reply-To: <59990ca88950649336c742f006c8e725@mac.com> References: <59990ca88950649336c742f006c8e725@mac.com> Message-ID: <423F63BF.4050501@ee.byu.edu> John Gleeson wrote: > I get the following errors when building the extensions for Numeric3 > on OS X 10.3.8 (with PatherPythonFix installed): > > Src/arrayobject.c:2538: error: conflicting types for `_swap_axes' > Src/arrayobject.c:1170: error: previous declaration of `_swap_axes' > > This is for arrayobject.c v. 1.61. > > Any ideas? The current code base (as of Saturday) is in flux as I add the new methods to the array type. If you want something more stable check out the version that was available Friday night. The CVS code is not guaranteed to compile all the time. This will be true for at least another week or so. I use CVS to store incremental changes during times like this so it can be in a state that can not be compiled for a few days. -Travis From mdehoon at ims.u-tokyo.ac.jp Tue Mar 22 05:47:19 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Mar 22 05:47:19 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <423A744F.8070007@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> Message-ID: <424022C1.1030204@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: >> Another warning was that PyArrayObject's "dimensions" doesn't seem to >> be an int array any more. > > Yes. To allow for dimensions that are bigger than 32-bits, dimensions > and strides are (intp *). intp is a signed integer with sizeof(intp) == > sizeof(void *). On 32-bit systems, the warning will not cause > problems. We could worry about fixing it by typedefing intp to int > (instead of the current long for 32-bit systems). > Do 4 gigabyte 1D numerical python arrays occur in practice? If I understand correctly, the current implementation gives dimensions a different pointer type on different platforms. This will break extension modules on platforms other than 32-bits, as the extension module expects dimensions to be a pointer to int. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Tue Mar 22 13:39:33 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 22 13:39:33 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <424022C1.1030204@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> Message-ID: <42409026.8080808@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >> Michiel Jan Laurens de Hoon wrote: >> >>> Another warning was that PyArrayObject's "dimensions" doesn't seem >>> to be an int array any more. >> >> >> Yes. To allow for dimensions that are bigger than 32-bits, >> dimensions and strides are (intp *). intp is a signed integer with >> sizeof(intp) == sizeof(void *). On 32-bit systems, the warning will >> not cause problems. We could worry about fixing it by typedefing >> intp to int (instead of the current long for 32-bit systems). >> > Do 4 gigabyte 1D numerical python arrays occur in practice? If I > understand correctly, the current implementation gives dimensions a > different pointer type on different platforms. This will break > extension modules on platforms other than 32-bits, as the extension > module expects dimensions to be a pointer to int. This is a must have. Yes, extension modules will have to be recompiled and pointers changed on 64-bit platforms, but this has to be done. If you see a better solution, I'd love to hear it. The earlier the better. -Travis From rowen at cesmail.net Tue Mar 22 13:39:38 2005 From: rowen at cesmail.net (Russell E. Owen) Date: Tue Mar 22 13:39:38 2005 Subject: [Numpy-discussion] Current state of performance? Message-ID: I'm curious as to the current state of numarray vs. Numeric performance. My code is a mix at the moment: - Numeric: coordinate conversion code that was written before numarray was very solid and makes heavy use of small matrices. - numarray: some image processing stuff that uses PyFits (which uses numarray). I'd like to settle on one package. At one time numarray was at a clear disadvantage for small arrays, but was wondering if that was still true. Any advice? -- Russell From cookedm at physics.mcmaster.ca Tue Mar 22 14:51:44 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Tue Mar 22 14:51:44 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <42409026.8080808@ee.byu.edu> (Travis Oliphant's message of "Tue, 22 Mar 2005 14:37:42 -0700") References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> Message-ID: Travis Oliphant writes: > Michiel Jan Laurens de Hoon wrote: > >> Travis Oliphant wrote: >> >>> Michiel Jan Laurens de Hoon wrote: >>> >>>> Another warning was that PyArrayObject's "dimensions" doesn't seem >>>> to be an int array any more. >>> >>> >>> Yes. To allow for dimensions that are bigger than 32-bits, >>> dimensions and strides are (intp *). intp is a signed integer with >>> sizeof(intp) == sizeof(void *). On 32-bit systems, the warning >>> will not cause problems. We could worry about fixing it by >>> typedefing intp to int (instead of the current long for 32-bit >>> systems). Why not use Py_intptr_t? It's defined by the Python C API already (in pyport.h). >> Do 4 gigabyte 1D numerical python arrays occur in practice? If I >> understand correctly, the current implementation gives dimensions a >> different pointer type on different platforms. This will break >> extension modules on platforms other than 32-bits, as the extension >> module expects dimensions to be a pointer to int. > > This is a must have. Yes, extension modules will have to be > recompiled and pointers changed on 64-bit platforms, but this has to > be done. If you see a better solution, I'd love to hear it. The > earlier the better. An array of longs would seem to be the best solution. On the two 64-bit platforms I have access to (an Athlon 64 and some Alphas), sizeof(long) == 8, while my two 32-bit platforms (Intel x86 and PowerPC) have sizeof(long) == 4. For comparison, here's a list of sizes for various platforms 32-bit 32-bit 64-bit 64-bit x86 PPC Athlon64 Alpha (Linux) (OS X) (Linux) (Tru64) char 1 1 1 1 short 2 2 2 2 int 4 4 4 4 long 4 4 8 8 long long 8 8 8 8 size_t 4 4 8 8 float 4 4 4 4 double 8 8 8 8 long double 12 8 16 16 void * 4 4 8 8 function pointer 4 4 8 8 Note the three different sizes of long double (oh, fun). Also note that size_t (which is the return type of sizeof()) is not int in general (although lots of programs treat it like that). Using long for the dimensions also means that converting to and from Python ints for indices is transparent, and won't fail, as Python ints are C longs. This is the cause of several of the 64-bit bugs I fixed in the latest Numeric release (23.8). [I'd help with Numeric3, but not until it compiles with fewer than several hundred warnings -- I *really* don't want to wade through all that.] I've attached the program I used to generate the above numbers, if someone wants to run it on other platforms, so we have a better idea of what's what. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca -------------- next part -------------- A non-text attachment was scrubbed... Name: csizes.c Type: text/x-csrc Size: 522 bytes Desc: print C type sizes URL: From mdehoon at ims.u-tokyo.ac.jp Tue Mar 22 17:08:01 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Mar 22 17:08:01 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <42409026.8080808@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> Message-ID: <4240C1DA.8090501@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: >> Travis Oliphant wrote: >>> Michiel Jan Laurens de Hoon wrote: >>>> Another warning was that PyArrayObject's "dimensions" doesn't seem >>>> to be an int array any more. >>> >>> Yes. To allow for dimensions that are bigger than 32-bits, >>> dimensions and strides are (intp *). intp is a signed integer with >>> sizeof(intp) == sizeof(void *). On 32-bit systems, the warning will >>> not cause problems. We could worry about fixing it by typedefing >>> intp to int (instead of the current long for 32-bit systems). >>> >> Do 4 gigabyte 1D numerical python arrays occur in practice? If I >> understand correctly, the current implementation gives dimensions a >> different pointer type on different platforms. This will break >> extension modules on platforms other than 32-bits, as the extension >> module expects dimensions to be a pointer to int. > > This is a must have. Yes, extension modules will have to be recompiled > and pointers changed on 64-bit platforms, but this has to be done. Why? There needs to be a good reason to break compatibility. Who needs this? --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Tue Mar 22 23:14:44 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 22 23:14:44 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> Message-ID: <424116B0.2040106@ee.byu.edu> David M. Cooke wrote: >Travis Oliphant writes: > > > >>Michiel Jan Laurens de Hoon wrote: >> >> >> >>>Travis Oliphant wrote: >>> >>> >>> >>>>Michiel Jan Laurens de Hoon wrote: >>>> >>>> >>>> >>>>>Another warning was that PyArrayObject's "dimensions" doesn't seem >>>>>to be an int array any more. >>>>> >>>>> >>>>Yes. To allow for dimensions that are bigger than 32-bits, >>>>dimensions and strides are (intp *). intp is a signed integer with >>>>sizeof(intp) == sizeof(void *). On 32-bit systems, the warning >>>>will not cause problems. We could worry about fixing it by >>>>typedefing intp to int (instead of the current long for 32-bit >>>>systems). >>>> >>>> > >Why not use Py_intptr_t? It's defined by the Python C API already (in >pyport.h). > > Sounds good to me. I wasn't aware of it (intp or intptr is shorter though). >An array of longs would seem to be the best solution. On the two >64-bit platforms I have access to (an Athlon 64 and some Alphas), >sizeof(long) == 8, while my two 32-bit platforms (Intel x86 and >PowerPC) have sizeof(long) == 4. > > I thought about this, but what about the MS Window compilers where long is still 4 byte (even on a 64-bit system), so that long long is the size of a pointer on that system. I just think we should just create an integer that will be big enough and start using it. >For comparison, here's a list of sizes for various platforms > > 32-bit 32-bit 64-bit 64-bit > x86 PPC Athlon64 Alpha > (Linux) (OS X) (Linux) (Tru64) >char 1 1 1 1 >short 2 2 2 2 >int 4 4 4 4 >long 4 4 8 8 >long long 8 8 8 8 >size_t 4 4 8 8 > >float 4 4 4 4 >double 8 8 8 8 >long double 12 8 16 16 > >void * 4 4 8 8 >function pointer 4 4 8 8 > > Nice table, thanks... >Note the three different sizes of long double (oh, fun). > Yeah, I know, I figure people who use long doubles will >Also note >that size_t (which is the return type of sizeof()) is not int in >general (although lots of programs treat it like that). > >Using long for the dimensions also means that converting to and from >Python ints for indices is transparent, and won't fail, as Python ints >are C longs. This is the cause of several of the 64-bit bugs I fixed >in the latest Numeric release (23.8). > > The conversion code has been updated so that it won't fail if the sizes are actually the same for your platform. >[I'd help with Numeric3, but not until it compiles with fewer than >several hundred warnings -- I *really* don't want to wade through all >that.] > > Do the warnings really worry you that much? Most are insignificant. You could help implement a method or two pretty easily. Or help with the ufunc module. -Travis From oliphant at ee.byu.edu Tue Mar 22 23:18:58 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 22 23:18:58 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <4240C1DA.8090501@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> <4240C1DA.8090501@ims.u-tokyo.ac.jp> Message-ID: <424117BD.9020509@ee.byu.edu> >>> Do 4 gigabyte 1D numerical python arrays occur in practice? If I >>> understand correctly, the current implementation gives dimensions a >>> different pointer type on different platforms. This will break >>> extension modules on platforms other than 32-bits, as the extension >>> module expects dimensions to be a pointer to int. >> >> >> This is a must have. Yes, extension modules will have to be >> recompiled and pointers changed on 64-bit platforms, but this has to >> be done. > > > Why? There needs to be a good reason to break compatibility. Who needs > this? > > --Michiel. > The "break compatibility argument" is not strong for me here. We are going to break compatibility in a few places. I'm trying to minimize them, but I don't want to chain ourselves to bad designs forever just for the sake of compatibility. For 32-bit systems there will be no problem, unchanged extension code will work fine. Unchanged extension code will not work on 64-bit systems. The change is not difficult (search and replace). I submit that there are fewer 64-bit users out there currently, but they are going to grow, and will eventually find Numeric a toy if the dimensions are limited to 32-bits even on 64-bit systems. The biggest problem is the 1 dimensional array. Here the 32-bit limit will byte you quickly. -Travis From oliphant at ee.byu.edu Tue Mar 22 23:35:44 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 22 23:35:44 2005 Subject: [Numpy-discussion] Specific plea for help with pickling In-Reply-To: <424117BD.9020509@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> <4240C1DA.8090501@ims.u-tokyo.ac.jp> <424117BD.9020509@ee.byu.edu> Message-ID: <42411BE4.10203@ee.byu.edu> If there is anyone out there with pickling experience who would like to help bring the new Numeric up to date with protocol 2 of the pickling protocol that would help me immensely. Even a document that describes what should be done would save me time, and right now time is very important, as I don't want to delay the new Numeric and scipy_core any more than June. Thanks, -Travis From jmiller at stsci.edu Wed Mar 23 02:55:55 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Mar 23 02:55:55 2005 Subject: [Numpy-discussion] Current state of performance? In-Reply-To: References: Message-ID: <1111575188.5028.40.camel@jaytmiller.comcast.net> On Tue, 2005-03-22 at 13:27 -0800, Russell E. Owen wrote: > I'm curious as to the current state of numarray vs. Numeric performance. > My code is a mix at the moment: > - Numeric: coordinate conversion code that was written before numarray > was very solid and makes heavy use of small matrices. > - numarray: some image processing stuff that uses PyFits (which uses > numarray). > > I'd like to settle on one package. At one time numarray was at a clear > disadvantage for small arrays, but was wondering if that was still true. It is still true that numarray is at a disadvantage for small arrays. > Any advice? I don't think there is a single array package that provides both PyFITS and good small array performance. Consider porting your conversion code to numarray and then profiling to get a better idea of the overall performance costs of your application. If you find a specific hot spot we can try to address it. Regards, Todd From xscottg at yahoo.com Wed Mar 23 03:00:04 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Wed Mar 23 03:00:04 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: 6667 Message-ID: <20050323105807.59603.qmail@web50208.mail.yahoo.com> --- Michiel Jan Laurens de Hoon wrote: > >>> > >> Do 4 gigabyte 1D numerical python arrays occur in practice? > > Why? There needs to be a good reason to break compatibility. Who needs > this? > I (and others I work with) routinely deal with 1D datasets that are multiple gigabytes in length. Working with terabyte datasets is on my near horizon. For lots of reasons, I don't/can't use Python/Numeric for very much of this sort of thing, but it would be nice it I could. The "32 bits is enough for anyone" design has bitten me with lots of tools (not just Python). The Python core will fix it's int/intp problem eventually, I can't see why Numeric3 wouldn't avoid the problem now. As a concrete case that I'm sure has been done, consider memory mapped file arrays. 64 bit platforms can mmap huge files without using huge amounts of of real memory. I try to remain a lurker on this list, but since I've already broken my silence, let me add a few other notes and then I'll go back to being silent... I'll try to sort them by priority. Pickling performance is important to us at my work. We use pickling to pass data across Unix pipes, through shared memory, across sockets on Gig-E, etc... Typically we'll have a dictionary containing some metadata, and a few large chunks (1-100 MBytes would be common) of Numeric array data. We'd like to transfer 100s of these per second. Currently, we pickle the array into a string in memory, then pickle the string across the conduit (pipe or socket or shared memory). For some reason, pickling a Numeric array directly to the file object is slower than the two stage process... If the new Numeric3 didn't break too much compatibility with the original Numeric but pickled much faster, we'd probably be in a hurry to upgrade based on this feature alone. The new pickling protocol that allows a generator to be used to copy small chunks at a time instead of an entire binary string copy could potentially save the cost of duplicating a 100 MByte array into a 100 MByte string. The reason we use pickling like we do is to pass data between processes. Almost all of our work machines have multiple processors (typically 4). A lot of times, the multi-process design is cleaner and less buggy, but there are also times when we'd prefer to use multiple threads in a single process. It's unfortunate that the GIL prohibits too much real concurrency with multiple threads. It would be nice if the ufuncs and other numerical algorithms released the GIL when possible. I know the limitations of the Python buffer protocol add significant headache in this area, but it's something to think about. We have a wide group of smart engineering folks using Python/Numeric, but most of them are not computer scientists or software engineers. Meaning they spend all day writing software, but know just enough about programming to solve their problems, and almost none of them have any knowledge about the internals of Python or Numeric. Complicated rules about whether something returns a scalar-versus-array, or a copy-versus-view add frustration and hard to find bugs. This has been beaten up on this list quite a bit, and there is probably too much momentum behind the case by case strategy that is now in place, but please count my vote for always getting an array copy (copy on write) from subscripting unless you explicitly ask for a view, and always returning a rank-0 array instead of a scalar. I agree with the other guy who pointed out that arrays are mutable and that likewise, rank-0 arrays should be mutable. I know it's unlikely to happen, but it would also be nice to see the Python parser change slightly to treat a[] as a[()]. Then the mutability of rank-0 could fit elegantly with the rank-(n > 1) arrays. It's a syntax error now, so there wouldn't be a backwards compatibility issue. We commonly use data types that aren't in Numeric. The most prevalent example at my work is complex-short. It looks like I can wrap the new "Void" type to handle this to some extent. Will indexing (subscripting) a class derived from a Numeric3 array return the derived class? class Derived(Numeric3.ArrayType): pass d = Derived(shape=(200, 200, 2), typecode='s') if isinstance(d[0], Derived): print "This is what I mean" I don't really expect Numeric3 to add all of the possible oddball types, but I think it's important to remember that other types are out there (fixed point for DSP, mu-law for audio, 16 bit floats for graphics, IBMs decimal64 decimal128 types, double-double and quad-double for increased precision, quaternions of standard types, ....). It's one thing to treat these like "record arrays", it's another thing for them to have overloaded arithmetic operators. Since Numeric3 can't support every type under the sun, it would be nice if when the final version goes into the Python core that the C-API and Python library functions used "duck typing" so that other array implementations could work to whatever extent possible. In other words, it would be better if users were not required to derive from the Numeric3 type in order to create new kinds of arrays that can be used with sufficiently generic Numeric3 routines. Simply having the required attributes (shape, strides, itemsize, ...) of a Numeric3 array should be enough to be treated like a Numeric3 array. This last one is definitely pie-in-the-sky, but I thought I'd mention it. Since the 64 bit Alphas are expensive and pretty much on the way out of production, we've stepped back to 32 bit versions of x86/Linux. The Linux boxes are cheaper, faster, and smaller, but not 64 bit. It would be really great using Numeric to directly manipulate huge (greater than 2**32 byte length) files on a 32 bit platform. This would require a smarter paging scheme than simply mmapping the whole thing, and I don't think any of the Python Array packages has proposed a good solution for this... I realize it adds considerable complexity to switch from a single buffer object pointing to the entire block of data to having multiple buffers pinning down pieces of the data at a time, but the result would be pretty useful. I realize this is a lot of commentary from someone who doesn't contribute much of anything back to the Numeric/Numarray/SciPy community. If you got this far, thanks for your time reading it. I appreciate the work you're doing. Cheers, -Scott From cookedm at physics.mcmaster.ca Wed Mar 23 03:24:54 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Mar 23 03:24:54 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <424116B0.2040106@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> <424116B0.2040106@ee.byu.edu> Message-ID: <20050323112205.GA1350@arbutus.physics.mcmaster.ca> On Wed, Mar 23, 2005 at 12:11:44AM -0700, Travis Oliphant wrote: > David M. Cooke wrote: > >Travis Oliphant writes: > >>Michiel Jan Laurens de Hoon wrote: > >>>Travis Oliphant wrote: > >>>>Michiel Jan Laurens de Hoon wrote: > >>>>>Another warning was that PyArrayObject's "dimensions" doesn't seem > >>>>>to be an int array any more. > >>>>Yes. To allow for dimensions that are bigger than 32-bits, > >>>>dimensions and strides are (intp *). intp is a signed integer with > >>>>sizeof(intp) == sizeof(void *). On 32-bit systems, the warning > >>>>will not cause problems. We could worry about fixing it by > >>>>typedefing intp to int (instead of the current long for 32-bit > >>>>systems). > >Why not use Py_intptr_t? It's defined by the Python C API already (in > >pyport.h). > Sounds good to me. I wasn't aware of it (intp or intptr is shorter > though). Some reasons not to use those two: 1) intp is too short for an API. The user might be using it already. 2) the C99 type for this is intptr_t. Py_intptr_t is defined to be the same thing. But let's step back a moment: PyArrayObject is defined like this: typedef struct PyArrayObject { PyObject_HEAD char *data; int nd; intp *dimensions; intp *strides; ... Thinking about it, I would say that dimensions should have the type of size_t *. size_t is the unsigned integer type used to represent the sizes of objects (it's the type of the result of sizeof()). Thus, it's guaranteed that an element of size_t should be large enough to contain any number that we could use as an array dimension. size_t is also unsigned. Also, since the elements of strides are byte offsets into the array, strides should be of type ptrdiff_t *. The elements are used by adding them to a pointer. Is there a good reason why data is not of type void *? If it's char *, it's quite easy to make the mistake of using data[0], which is probably *not* what you want. With void *, you would have to cast it, as you should be doing anyways, or else the compiler complains. Also, assigning to the right pointer, like double *A = array->data, doesn't need casts like it does with data being a char *. In Numeric, char * is probably a holdover when Numeric had to compile with K&R-style C. But, we know we have ANSI C89 ('cause that's what Python requires). So I figure it should look like this: typedef struct PyArrayObject { PyObject_HEAD void *data; int nd; size_t *dimensions; ptrdiff_t *strides; ... I've really started to appreciate size_t when trying to make programs work correctly on my 64-bit machine :-) It's not just another pretty face. > >An array of longs would seem to be the best solution. On the two > >64-bit platforms I have access to (an Athlon 64 and some Alphas), > >sizeof(long) == 8, while my two 32-bit platforms (Intel x86 and > >PowerPC) have sizeof(long) == 4. > > > I thought about this, but what about the MS Window compilers where long > is still 4 byte (even on a 64-bit system), so that long long is the > size of a pointer on that system. I just think we should just create > an integer that will be big enough and start using it. I don't know about ptrdiff_t, but sizeof(size_t) *should* be 8 on 64-bit Windows. > >For comparison, here's a list of sizes for various platforms >... > Nice table, thanks... There's a another one (for all sorts of Linux systems) at http://www.xml.com/ldd/chapter/book/ch10.html#t1 > >Also note > >that size_t (which is the return type of sizeof()) is not int in > >general (although lots of programs treat it like that). > > > >Using long for the dimensions also means that converting to and from > >Python ints for indices is transparent, and won't fail, as Python ints > >are C longs. This is the cause of several of the 64-bit bugs I fixed > >in the latest Numeric release (23.8). > > > > > The conversion code has been updated so that it won't fail if the sizes > are actually the same for your platform. > > >[I'd help with Numeric3, but not until it compiles with fewer than > >several hundred warnings -- I *really* don't want to wade through all > >that.] > Do the warnings really worry you that much? Most are insignificant. > You could help implement a method or two pretty easily. Or help with > the ufunc module. They really obscure significant warnings, though. And most look like they can be dealt with. Right now, it doesn't compile for me. I'll just list a few general cases: - arrayobject.h redefines ushort, uint, ulong (they're defined in sys/types.h already for legacy reasons) - functions taking no arguments should be defined like void function(void) not void function() (which is an old style that actually means the argument list isn't specified, not that it takes no arguments) - then a bunch of errors with typos, and things not defined. I might get some time to track some down, but it's limited also :-) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From Fernando.Perez at colorado.edu Wed Mar 23 04:13:06 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Wed Mar 23 04:13:06 2005 Subject: [Numpy-discussion] Specific plea for help with pickling In-Reply-To: <42411BE4.10203@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> <4240C1DA.8090501@ims.u-tokyo.ac.jp> <424117BD.9020509@ee.byu.edu> <42411BE4.10203@ee.byu.edu> Message-ID: <42415CD7.2020508@colorado.edu> Travis Oliphant wrote: > If there is anyone out there with pickling experience who would like to > help bring the new Numeric up to date with protocol 2 of the pickling > protocol that would help me immensely. I don't have much to offer, since I don't have much pickle experience myself. But keep this note in mind, which flew by in the enthought-dev list yesterday. It might be a good idea to at least keep this in the back of your mind. best, f. ################ Protocol 2 caused some issues with Traits classes in the past, so we decided to go with 1. Robert Kern wrote: >> Lowell Vaughn wrote: >> > >>>> So, I'm checking in a change to naming that may hose current >>>> projects. Specifically, we're now using binary pickling instead of >>>> ascii picking (should make everything smaller and faster, which is a >>>> win). In theory we should be fine, but since I had to change the >>>> file(path, 'r') to an file(path, 'rb'), there may be some issues with >>>> the the old pickle files. > >> >> >> I notice that you are using protocol 1. There's a protocol 2 that's >> even faster and smaller, especially with new-style classes. >> >> http://www.python.org/doc/2.3.5/lib/node63.html From rkern at ucsd.edu Wed Mar 23 08:36:35 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Mar 23 08:36:35 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> Message-ID: <424198CF.4090004@ucsd.edu> David M. Cooke wrote: > An array of longs would seem to be the best solution. On the two > 64-bit platforms I have access to (an Athlon 64 and some Alphas), > sizeof(long) == 8, while my two 32-bit platforms (Intel x86 and > PowerPC) have sizeof(long) == 4. I'm not terribly caught up on 64-bit computing, but I believe that 64-bit Windows doesn't (won't? I haven't paid attention) make longs 64-bit. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/win64/win64/abstract_data_models.asp -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From Chris.Barker at noaa.gov Wed Mar 23 10:18:04 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Mar 23 10:18:04 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <20050323105807.59603.qmail@web50208.mail.yahoo.com> References: <20050323105807.59603.qmail@web50208.mail.yahoo.com> Message-ID: <4241B0AB.2020303@noaa.gov> Scott Gilbert wrote: > Since the 64 bit Alphas are expensive and pretty much on the way out of > production, we've stepped back to 32 bit versions of x86/Linux. The Linux > boxes are cheaper, faster, and smaller, but not 64 bit. Kind of OT, but why not use AMD64 or PPC64? Both give you very good price/performance. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Wed Mar 23 11:41:28 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 23 11:41:28 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <20050323112205.GA1350@arbutus.physics.mcmaster.ca> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <423A744F.8070007@ee.byu.edu> <424022C1.1030204@ims.u-tokyo.ac.jp> <42409026.8080808@ee.byu.edu> <424116B0.2040106@ee.byu.edu> <20050323112205.GA1350@arbutus.physics.mcmaster.ca> Message-ID: <4241C525.507@ee.byu.edu> David M. Cooke wrote: >On Wed, Mar 23, 2005 at 12:11:44AM -0700, Travis Oliphant wrote: > > >>David M. Cooke wrote: >> >> >>>Travis Oliphant writes: >>> >>> >>>>Michiel Jan Laurens de Hoon wrote: >>>> >>>> >>>>>Travis Oliphant wrote: >>>>> >>>>> >>>>>>Michiel Jan Laurens de Hoon wrote: >>>>>> >>>>>> >>>>>>>Another warning was that PyArrayObject's "dimensions" doesn't seem >>>>>>>to be an int array any more. >>>>>>> >>>>>>> >>>>>>Yes. To allow for dimensions that are bigger than 32-bits, >>>>>>dimensions and strides are (intp *). intp is a signed integer with >>>>>>sizeof(intp) == sizeof(void *). On 32-bit systems, the warning >>>>>>will not cause problems. We could worry about fixing it by >>>>>>typedefing intp to int (instead of the current long for 32-bit >>>>>>systems). >>>>>> >>>>>> >>>Why not use Py_intptr_t? It's defined by the Python C API already (in >>>pyport.h). >>> >>> >>Sounds good to me. I wasn't aware of it (intp or intptr is shorter >>though). >> >> > >Some reasons not to use those two: >1) intp is too short for an API. The user might be using it already. >2) the C99 type for this is intptr_t. Py_intptr_t is defined to be > the same thing. > >But let's step back a moment: PyArrayObject is defined like this: > >typedef struct PyArrayObject { > PyObject_HEAD > char *data; > int nd; > intp *dimensions; > intp *strides; > ... > >Thinking about it, I would say that dimensions should have the type of >size_t *. size_t is the unsigned integer type used to represent the >sizes of objects (it's the type of the result of sizeof()). Thus, it's >guaranteed that an element of size_t should be large enough to contain >any number that we could use as an array dimension. size_t is also >unsigned > > Because axis arguments can be negative it would require a lot of changes to check for typing to make dimensions unsigned. It's just easier to make them signed. So, what is the signed equivalent? Is ssize_t available everywhere? >Also, since the elements of strides are byte offsets into the array, >strides should be of type ptrdiff_t *. The elements are used by adding >them to a pointer. > > Is this an available type on all systems? What does it mean? >Is there a good reason why data is not of type void *? If it's char *, >it's quite easy to make the mistake of using data[0], which is probably >*not* what you want. With void *, you would have to cast it, as you >should be doing anyways, or else the compiler complains. Also, assigning >to the right pointer, like double *A = array->data, doesn't need >casts like it does with data being a char *. In Numeric, char * is >probably a holdover when Numeric had to compile with K&R-style C. But, >we know we have ANSI C89 ('cause that's what Python requires). > > Only real reason is backward compatibility. I have no problem with making it void *. >So I figure it should look like this: > >typedef struct PyArrayObject { > PyObject_HEAD > void *data; > int nd; > size_t *dimensions; > ptrdiff_t *strides; > ... > >I've really started to appreciate size_t when trying to make programs >work correctly on my 64-bit machine :-) It's not just another pretty >face. > > > Good suggestions? Any other comments. >They really obscure significant warnings, though. And most look like >they can be dealt with. Right now, it doesn't compile for me. > >I'll just list a few general cases: >- arrayobject.h redefines ushort, uint, ulong (they're defined in > sys/types.h already for legacy reasons) > > I don't think they are defined on all systems (I don't get a warning on mysystem). This is another thing configure needs to check if we are really concerned about the warnings. >- functions taking no arguments should be defined like >void function(void) >not >void function() > > Ah, thanks for that! >I might get some time to track some down, but it's limited also :-) > > > The errors right now are mainly due to the fact that I'm adding the new methods (and some functions things are left undefined). I have not compiled the code for several days. Adding methods is an easy thing that most anyone could help with. The help would be appreciated. -Travis From oliphant at ee.byu.edu Wed Mar 23 11:48:12 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 23 11:48:12 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <20050323105807.59603.qmail@web50208.mail.yahoo.com> References: <20050323105807.59603.qmail@web50208.mail.yahoo.com> Message-ID: <4241C781.8080001@ee.byu.edu> Scott Gilbert wrote: >--- Michiel Jan Laurens de Hoon wrote: > > >>>>Do 4 gigabyte 1D numerical python arrays occur in practice? >>>> >>>> >>Why? There needs to be a good reason to break compatibility. Who needs >>this? >> >> >> > >I (and others I work with) routinely deal with 1D datasets that are >multiple gigabytes in length. Working with terabyte datasets is on my near >horizon. For lots of reasons, I don't/can't use Python/Numeric for very >much of this sort of thing, but it would be nice it I could. The "32 bits >is enough for anyone" design has bitten me with lots of tools (not just >Python). The Python core will fix it's int/intp problem eventually, I >can't see why Numeric3 wouldn't avoid the problem now. > > Thanks for your comments Scott. This is exactly the kind of comments I'm looking for. I want to hear the experiences of real users (I know there are a lot of silent-busy types out there). It really helps in figuring out what are the most important issues. >If the new Numeric3 didn't break too much compatibility with the original >Numeric but pickled much faster, we'd probably be in a hurry to upgrade >based on this feature alone. > > I'm hoping we can do this, so stay tuned. >I agree with the other guy who pointed out that arrays are mutable and that >likewise, rank-0 arrays should be mutable. I know it's unlikely to happen, >but it would also be nice to see the Python parser change slightly to treat >a[] as a[()]. Then the mutability of rank-0 could fit elegantly with the >rank-(n > 1) arrays. It's a syntax error now, so there wouldn't be a >backwards compatibility issue. > > Well, rank-0 arrays are and forever will be mutable. But, Python scalars (and the new Array-like Scalars) are not mutable. I know this is not ideal. But making it ideal means fundamental changes to Python scalars. So far the current scheme is the best idea I've heard. I'm always open to better ones. >We commonly use data types that aren't in Numeric. The most prevalent >example at my work is complex-short. It looks like I can wrap the new >"Void" type to handle this to some extent. Will indexing (subscripting) a >class derived from a Numeric3 array return the derived class? > > class Derived(Numeric3.ArrayType): > pass > > d = Derived(shape=(200, 200, 2), typecode='s') > if isinstance(d[0], Derived): > print "This is what I mean" > > Yes, indexing will return a derived type currently. There are probably going to be some issues here, but it can be made to work. I'm glad you are noticing that the VOID * type is for more than just record arrays. I've got ideas for hooks that allow new types to be defined, but I could definitely use examples. >I don't really expect Numeric3 to add all of the possible oddball types, >but I think it's important to remember that other types are out there >(fixed point for DSP, mu-law for audio, 16 bit floats for graphics, IBMs >decimal64 decimal128 types, double-double and quad-double for increased >precision, quaternions of standard types, ....). It's one thing to treat >these like "record arrays", it's another thing for them to have overloaded >arithmetic operators. > > I think using standard Python overloading of arithmetic operators (i.e. define their own) may be the way to go. >Since Numeric3 can't support every type under the sun, it would be nice if >when the final version goes into the Python core that the C-API and Python >library functions used "duck typing" so that other array implementations >could work to whatever extent possible. In other words, it would be better >if users were not required to derive from the Numeric3 type in order to >create new kinds of arrays that can be used with sufficiently generic >Numeric3 routines. Simply having the required attributes (shape, strides, >itemsize, ...) of a Numeric3 array should be enough to be treated like a > > >Numeric3 array. > > I would really like to see this eventually too. We need examples, though, to eventually make it work right. One idea is to have classes define "coercion" routines that the ufunc machinery uses, and create an API wherein the ufunc can be made to call the right function. -Travis From mdehoon at ims.u-tokyo.ac.jp Wed Mar 23 18:20:42 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 23 18:20:42 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <20050323105807.59603.qmail@web50208.mail.yahoo.com> References: <20050323105807.59603.qmail@web50208.mail.yahoo.com> Message-ID: <424224D0.9030006@ims.u-tokyo.ac.jp> Scott Gilbert wrote: > --- Michiel Jan Laurens de Hoon wrote: > >>>>Do 4 gigabyte 1D numerical python arrays occur in practice? >> >>Why? There needs to be a good reason to break compatibility. Who needs >>this? >> > > I (and others I work with) routinely deal with 1D datasets that are > multiple gigabytes in length. Working with terabyte datasets is on my near > horizon. I see. Then I agree, we need to fix the dimensions and strides in PyArrayObject. Thanks, Scott. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From perry at stsci.edu Wed Mar 23 19:28:49 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Mar 23 19:28:49 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <424224D0.9030006@ims.u-tokyo.ac.jp> Message-ID: > Scott Gilbert wrote: > > > --- Michiel Jan Laurens de Hoon wrote: > > > >>>>Do 4 gigabyte 1D numerical python arrays occur in practice? > >> > >>Why? There needs to be a good reason to break compatibility. Who needs > >>this? > >> > > > > I (and others I work with) routinely deal with 1D datasets that are > > multiple gigabytes in length. Working with terabyte datasets > is on my near > > horizon. > > I see. Then I agree, we need to fix the dimensions and strides in > PyArrayObject. > Thanks, Scott. > I'll also add that we've already had internal requests to deal with files that large as well as external queries about support large files. Believe me, files of this size are becoming much more common than you realize. Perry From arnd.baecker at web.de Thu Mar 24 01:12:52 2005 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Mar 24 01:12:52 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <423A6F69.8020803@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> Message-ID: Hi Travis, I just had a quick look at Numeric3, checked out with cvs -z3 -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy co -D 2005-03-18 -P Numeric3 (as you already warned, the current CVS does not compile for me). After that I saw Michiels mail, so my results below just adds another "data-point"... On Fri, 18 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > > I wanted to let people who may be waiting, that now is a good time to > > help with numeric3. The CVS version builds (although I"m sure there are > > still bugs), but more eyes could help me track them down. > > > > Currently, all that remains for the arrayobject is to implement the > > newly defined methods (really it"s just a re-organization and > > re-inspection of the code in multiarraymodule.c to call it using methods). > > [...] > When using ndarray, I got a core dump using "zeros": > > $ python > Python 2.5a0 (#1, Mar 2 2005, 12:15:06) > [GCC 3.3.3] on cygwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from ndarray import * > >>> zeros(5) > creating data 0xa0c03d0 associated with 0xa0d52c0 > array([0.0, 0.0, 0.0, 0.0, 0.0], 'd') > Segmentation fault (core dumped) > > With Python 2.4, the segmentation fault occurs slightly later: > $ python2.4 > Python 2.4 (#1, Dec 5 2004, 20:47:03) > [GCC 3.3.3] on cygwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from ndarray import * > >>> zeros(5) > creating data 0xa0a07f8 associated with 0xa0d6230 > array([0.0, 0.0, 0.0, 0.0, 0.0], 'd') > >>> > >>> ^D > freeing 0xa0a07f8 associated with array 0xa0d6230 > freeing 0xa123b88 associated with array 0xa0d6230 > Segmentation fault (core dumped) Python 2.3.5 (#1, Mar 22 2005, 11:11:34) Type "copyright", "credits" or "license" for more information. IPython 0.6.13_cvs -- An enhanced Interactive Python. ? -> Introduction to IPython's features. %magic -> Information about IPython's 'magic' % functions. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]:from ndarray import * In [2]:arange(10) Out[2]:array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'l') In [3]:arange(10.0) Out[3]:array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 'd') In [4]: In [4]:arange(10.0) zsh: 7191 segmentation fault ipython Without ipython the segfault is even earlier: Python 2.3.5 (#1, Mar 22 2005, 11:11:34) [GCC 3.3.5 (Debian 1:3.3.5-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from ndarray import * >>> arange(10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'l') >>> arange(10.0) zsh: 7192 segmentation fault python Have you already found the origin of this? If so, which version should I download for further testing? If not, if you need help in debugging this one, just let me know (+some hints how to tackle this). Best, Arnd From mdehoon at ims.u-tokyo.ac.jp Thu Mar 24 04:58:15 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 24 04:58:15 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> Message-ID: <4242BA03.5050204@ims.u-tokyo.ac.jp> Arnd's comment raises the question of how to try out or contribute to Numeric3 if the code base is changing from day to day. It may be a good idea to set up some division of labor, so we can contribute to Numeric3 without getting in each other's way. For example, I'd be interested in working on setup.py and putting different parts of Numeric3/scipy_base together. --Michiel. Arnd Baecker wrote: > Hi Travis, > > I just had a quick look at Numeric3, checked > out with > cvs -z3 -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/numpy co -D > 2005-03-18 -P Numeric3 > (as you already warned, the current CVS does not compile for me). > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Thu Mar 24 16:09:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 24 16:09:08 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4242BA03.5050204@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> Message-ID: <42435632.5080304@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Arnd's comment raises the question of how to try out or contribute to > Numeric3 if the code base is changing from day to day. It may be a > good idea to set up some division of labor, so we can contribute to > Numeric3 without getting in each other's way. For example, I'd be > interested in working on setup.py and putting different parts of > Numeric3/scipy_base together. > Well, CVS, makes that somewhat easy if we just commit changes regularly, and update regularly. But, I understand that people may want to know what kinds of things they could work on right now. I'm working on finishing adding methods. I'd like to create the new core distribution on an SVN server. Enthought is willing to host the SVN server as far as I know. SVN is easy to use and is supposed to be easier to manage than CVS. Current needs: - the PEP for the __index__ method added to Python needs to be written and the code implemented --- this is not that hard for the budding Python contributor - the PEP for a good "buffer" object (this has been called by others a "byte" array which might be a good name. Essentially, it needs to be a light-weight object around a chunk of memory -- i.e. a way to allocate memory through Python. We would like to standardize on a set of meta information that could be used to "understand" this memory as a numeric array. Then, other objects which used this buffer as a memory block would just have to expose the meta information in order to make seamless the transfer of data from one application to another. We need to be vocal about the value of the buffer object. This PEP is one way to do that. There are some people who think buffer objects were a "bad idea." This is primarily because of a fatal flaw in some objects that both expose a memory pointer through the buffer protocol AND allow the object's memory to be reallocated (using realloc) --- Numeric does not do this. This problem could actually be easily fixed by a good Python memory allocator that returns a simple memory object. If people who wanted memory went through it's C-API (instead of using malloc and realloc), much of the problems would be alleviated. This is what the new "byte" object should be. I think it also wise to expect the "byte" object to have an attribute called "meta" that would just be a dictionary of "other information" you might want to pass to something using the buffer protocol. - a record array class. This should be adapted from the numarray record array class and probably inherit from the ndarray type. - ufunc modifications. This is where I'm headed after the array methods task is done. If people have ideas about how ufuncs should be handled, now is the time to voice them. If somebody could help me here, it would be great. But, in a couple of days, I will be spending the next chunck of my (spare) time on ufunc modifications. -Travis From oliphant at ee.byu.edu Thu Mar 24 16:38:40 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 24 16:38:40 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4242BA03.5050204@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> Message-ID: <42435D18.809@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Arnd's comment raises the question of how to try out or contribute to > Numeric3 if the code base is changing from day to day. It may be a > good idea to set up some division of labor, so we can contribute to > Numeric3 without getting in each other's way. For example, I'd be > interested in working on setup.py and putting different parts of > Numeric3/scipy_base together. > Michiel, you are free to work on setup.py all you want :-) Putting the parts of scipy_base together is a good idea. Exactly how to structure this is going to require some thought and need to be coordinated with current scipy. I want a package that is as easy to install as current Numeric (so the default will have something like lapack_lite). But, this should not handicap nor ignore a speed-conscious user who wants to install ATLAS or take advantage of vendor-supplied libraries. There should be a way to replace functionality that is clean and does not require editing setup.py files. Anybody with good ideas about how to do this well is welcome to speak up. Perhaps, the easiest thing to do is to keep the basic Numeric structure (with C-based easy-to-install additions) and call it scipylite (with backwards compatibility provided for Numeric, LinearAlgebra, RandomArray, and MLab names). This also installs the namespace scipy which has a little intelligence in it to determine if you have altas and fortran capabilities installed or not. Then, provide a scipyatlas package that can be installed to take advantage of atlas and vendor-supplied lapack/blas. Then, a scipyfortran package that can be installed if you have a fortran compiler which provides the functionality provided by fortran libraries. So, there are three divisions here. Feedback and criticisms encouraged and welcomed..... -Travis From mdehoon at ims.u-tokyo.ac.jp Thu Mar 24 18:51:10 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 24 18:51:10 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42435D18.809@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> Message-ID: <42437D45.5090608@ims.u-tokyo.ac.jp> While I basically agree with your setup, I think that there is no need to call it scipylite. Sticking to the Numeric structure and names is to the advantage of both current SciPy and current Numerical Python users. The advantage to current Numerical Python users is obvious -- and there are many more of them than of SciPy users. For SciPy users, it is in their best interest that as many people as possible go over to Numeric3, in order to avoid another split in the Numerics community. Now, if I talk with the other pygist or biopython developers and tell them there is a new Numerical Python package which solves some of the issues with the older versions, I have a good chance to convince them to update pygist/biopython to the Numeric3 API. If I tell them that there is a scipylite package that intends to replace Numerical Python: Forget it. It will be ignored. You may not care about pygist or biopython in particular, but developers of other packages will make the same consideration, so you may end up with some numerical / graphics packages working with scipylite and others with Numerical Python 23.8. It's better to get everybody on board. Secondly, we have confused users more than enough with the Numerical Python / numarray / Numeric3 split. We should not add one more new name to the equation. Third, there is lots of code out there that imports LinearAlgebra or RandomArray etcetera. Why force our users to go through the trouble of changing those imports? I don't see the benefit to the users. Finally, the word scipylite has no meaning. As SciPy evolves into a website where scientific software for Python can be downloaded, there will not be a scipy-full nor a scipy-lite. --Michiel. Travis Oliphant wrote: > Putting the parts of scipy_base together is a good idea. Exactly how > to structure this is going to require some thought and need to be > coordinated with current scipy. > > I want a package that is as easy to install as current Numeric (so the > default will have something like lapack_lite). > But, this should not handicap nor ignore a speed-conscious user who > wants to install ATLAS or take advantage of vendor-supplied libraries. > > There should be a way to replace functionality that is clean and does > not require editing setup.py files. > > Anybody with good ideas about how to do this well is welcome to speak up. > Perhaps, the easiest thing to do is to keep the basic Numeric structure > (with C-based easy-to-install additions) and call it scipylite (with > backwards compatibility provided for Numeric, LinearAlgebra, > RandomArray, and MLab names). This also installs the namespace scipy > which has a little intelligence in it to determine if you have altas and > fortran capabilities installed or not. > > Then, provide a scipyatlas package that can be installed to take > advantage of atlas and vendor-supplied lapack/blas. > > Then, a scipyfortran package that can be installed if you have a fortran > compiler which provides the functionality provided by fortran libraries. > So, there are three divisions here. > Feedback and criticisms encouraged and welcomed..... -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From cjw at sympatico.ca Thu Mar 24 19:20:46 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 24 19:20:46 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42437D45.5090608@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <42437D45.5090608@ims.u-tokyo.ac.jp> Message-ID: <42438340.50507@sympatico.ca> Michiel Jan Laurens de Hoon wrote: > While I basically agree with your setup, I think that there is no need > to call it scipylite. Sticking to the Numeric structure and names is > to the advantage of both current SciPy and current Numerical Python > users. The advantage to current Numerical Python users is obvious -- > and there are many more of them than of SciPy users. For SciPy users, > it is in their best interest that as many people as possible go over > to Numeric3, in order to avoid another split in the Numerics > community. Now, if I talk with the other pygist or biopython > developers and tell them there is a new Numerical Python package which > solves some of the issues with the older versions, I have a good > chance to convince them to update pygist/biopython to the Numeric3 > API. If I tell them that there is a scipylite package that intends to > replace Numerical Python: Forget it. It will be ignored. You may not > care about pygist or biopython in particular, but developers of other > packages will make the same consideration, so you may end up with some > numerical / graphics packages working with scipylite and others with > Numerical Python 23.8. It's better to get everybody on board. > > Secondly, we have confused users more than enough with the Numerical > Python / numarray / Numeric3 split. We should not add one more new > name to the equation. > > Third, there is lots of code out there that imports LinearAlgebra or > RandomArray etcetera. Why force our users to go through the trouble > of changing those imports? I don't see the benefit to the users. > > Finally, the word scipylite has no meaning. As SciPy evolves into a > website where scientific software for Python can be downloaded, there > will not be a scipy-full nor a scipy-lite. > > --Michiel. > It looks to me as though getting numarray/Numeric sorted out, and getting it right, will be sufficient work for now. It's far better to concentrate the limited resources on that and to leave the complexities of SciPy for another day. I wonder about introducing another distribution system (SVN?) when some of us have barely mastered CVS. Colin W. > Travis Oliphant wrote: > >> Putting the parts of scipy_base together is a good idea. Exactly >> how to structure this is going to require some thought and need to be >> coordinated with current scipy. >> >> I want a package that is as easy to install as current Numeric (so >> the default will have something like lapack_lite). >> But, this should not handicap nor ignore a speed-conscious user who >> wants to install ATLAS or take advantage of vendor-supplied libraries. >> >> There should be a way to replace functionality that is clean and does >> not require editing setup.py files. >> >> Anybody with good ideas about how to do this well is welcome to speak >> up. Perhaps, the easiest thing to do is to keep the basic Numeric >> structure (with C-based easy-to-install additions) and call it >> scipylite (with backwards compatibility provided for Numeric, >> LinearAlgebra, RandomArray, and MLab names). This also installs the >> namespace scipy which has a little intelligence in it to determine if >> you have altas and fortran capabilities installed or not. >> >> Then, provide a scipyatlas package that can be installed to take >> advantage of atlas and vendor-supplied lapack/blas. >> >> Then, a scipyfortran package that can be installed if you have a >> fortran compiler which provides the functionality provided by fortran >> libraries. >> So, there are three divisions here. >> Feedback and criticisms encouraged and welcomed..... > > > From mdehoon at ims.u-tokyo.ac.jp Thu Mar 24 19:43:53 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 24 19:43:53 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42435D18.809@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> Message-ID: <424389C8.2010000@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > I want a package that is as easy to install as current Numeric (so the > default will have something like lapack_lite). > But, this should not handicap nor ignore a speed-conscious user who > wants to install ATLAS or take advantage of vendor-supplied libraries. > > There should be a way to replace functionality that is clean and does > not require editing setup.py files. > > Anybody with good ideas about how to do this well is welcome to speak up. Doing this automatically without editing setup.py may be too complicated. Quoting from the Numerical Python manual: 'A frequent request is that somehow the maintainers of Numerical Python invent a procedure which will automatically find and use the "best" available versions of these libraries. This is not going to happen.' "these libraries" being BLAS and LAPACK. However, what we can do is to put some frequently encountered options in setup.py commented out, and say "uncomment this line if you have BLAS and LAPACK preinstalled on your Mac" etcetera. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From pearu at scipy.org Thu Mar 24 23:58:43 2005 From: pearu at scipy.org (Pearu Peterson) Date: Thu Mar 24 23:58:43 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42435D18.809@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> Message-ID: On Thu, 24 Mar 2005, Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: > >> Arnd's comment raises the question of how to try out or contribute to >> Numeric3 if the code base is changing from day to day. It may be a good >> idea to set up some division of labor, so we can contribute to Numeric3 >> without getting in each other's way. For example, I'd be interested in >> working on setup.py and putting different parts of Numeric3/scipy_base >> together. >> > > Michiel, you are free to work on setup.py all you want :-) > > Putting the parts of scipy_base together is a good idea. Exactly how to > structure this is going to require some thought and need to be coordinated > with current scipy. > > I want a package that is as easy to install as current Numeric (so the > default will have something like lapack_lite). > But, this should not handicap nor ignore a speed-conscious user who wants to > install ATLAS or take advantage of vendor-supplied libraries. > > There should be a way to replace functionality that is clean and does not > require editing setup.py files. > > Anybody with good ideas about how to do this well is welcome to speak up. > Perhaps, the easiest thing to do is to keep the basic Numeric structure (with > C-based easy-to-install additions) and call it scipylite (with backwards > compatibility provided for Numeric, LinearAlgebra, RandomArray, and MLab > names). This also installs the namespace scipy which has a little > intelligence in it to determine if you have altas and fortran capabilities > installed or not. > > Then, provide a scipyatlas package that can be installed to take advantage of > atlas and vendor-supplied lapack/blas. > > Then, a scipyfortran package that can be installed if you have a fortran > compiler which provides the functionality provided by fortran libraries. > So, there are three divisions here. Hmm, the idea of introducing scipylite, scipyatlas, scipyfortran packages does not sound like a good idea. The usage of atlas or fortran blas/lapack or vendor based blas/lapack libraries is an implementation detail and should not be reflected in scipy_base package structure. This is because such an approach is not suitable for writing portable Numeric3 based applications or packages. For example, if a developer uses scipyfortran package in a package, it immidiately reduces the number of potential users for this package. I got an impression from earlier threads that scipy_distutils will be included to scipy_base. So, I am proposing to use scipy_distutils tools and our scipy experience for dealing with this issue, scipy.lib.lapack would be a good working prototype here. Ideally, scipy_base should provide a complete interface to LAPACK routines, but not immidiately, of course. Now, depending on the availability of compilers and resources in a particular computer, the following would happen: 1) No Fortran compiler, no lapack libraries in the system, only C compiler is available --- f2c generated lite-lapack C sources are used to build lapack extension module; wrappers to lapack routines, for which there are no f2c generated sources, are disabled by f2py `only:` feature. lite-lapack C sources come with scipy_base sources. 2) No Fortran compiler, system has lapack libraries (atlas or Accelerate or vecLib), C compiler is available --- system lapack library will be used and a complete lapack extension module can be built. 3) Fortran and C compiler are available, no lapack libraries in the system --- Fortran lite-lapack sources are used to build lapack extension module; lite-lapack Fortran sources come with scipy_base sources. Similar to the case (1), some wrappers are disabled. 4-..) other combinations are possible and users can choose their favorite approach. The availability of system resources can be checked using scipy_distutils.system_info.get_info. Checking the availability of Fortran compiler should be done in a configuration step and only when an user specifically asks for it, by default we should assume that Fortran compiler is not available. The same should apply also to atlas/lapack/blas libraries, by default f2c generated lite-lapack C sources will be used. In this way users that only need Numeric3 array capabilities will avoid all possible troubles that may show up when using all possible resources for speed on an arbitrary computer. Btw, I would suggest using `scipy ` instead of `scipy ` or `scipy ` for naming packages. Pearu From xscottg at yahoo.com Fri Mar 25 00:15:03 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Mar 25 00:15:03 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: 6667 Message-ID: <20050325081346.24717.qmail@web50210.mail.yahoo.com> --- Travis Oliphant wrote: > > - the PEP for a good "buffer" object (this has been called by others > a "byte" array which might be a good name. Essentially, it needs to be > a light-weight object around a chunk of memory -- i.e. a way to allocate > memory through Python. We would like to standardize on a set of meta > information that could be used to "understand" this memory as a numeric > array. Then, other objects which used this buffer as a memory block > would just have to expose the meta information in order to make seamless > the transfer of data from one application to another. We need to be > vocal about the value of the buffer object. This PEP is one way to do > that. There are some people who think buffer objects were a "bad > idea." This is primarily because of a fatal flaw in some objects that > both expose a memory pointer through the buffer protocol AND allow the > object's memory to be reallocated (using realloc) --- Numeric does not > do this. This problem could actually be easily fixed by a good > Python memory allocator that returns a simple memory object. If people > who wanted memory went through it's C-API (instead of using malloc and > realloc), much of the problems would be alleviated. This is what the > new "byte" object should be. I think it also wise to expect the "byte" > object to have an attribute called "meta" that would just be a > dictionary of "other information" you might want to pass to something > using the buffer protocol. > Hi Travis. I'm curious if you find PEP-296 sufficient: http://www.python.org/peps/pep-0296.html It is marked as "withdrawn by the author", but that is not really true. A more accurate statement would be "the author spent his allotted time defending and revising the PEP on the Python mailing list and was not left with sufficient time to finish the implementation on his corporate dollar". :-) It's a good PEP, and while my company uses Python quite extensively, after two weeks I had to get back to more direct goals... Regardless, I think PEP-296 meets your needs (and several other groups in the Python community), and it might save someone the time recreating a new PEP from scratch. More importantly, it might save someone some of the time required to defend and argue the PEP on the Python mailing list. When the discussion cleared, Guido was very positive toward the PEP - I just never got it implemented... The "meta" attribute would be a small change. It's possible to do that with composition or inheritance instead, but that's really a just matter of taste. When I wrote the PEP, I had high hopes of creating a Python only "ndarray" class out of bytes and the struct module, so it was definitely targeted at needs similar to what I believe yours to be. Obviously you should do what is best for you, but I would be pleased if my wasted effort was revived and completed to actually be useful. Cheers, -Scott From oliphant at ee.byu.edu Fri Mar 25 00:24:50 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 25 00:24:50 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <20050325081346.24717.qmail@web50210.mail.yahoo.com> References: <20050325081346.24717.qmail@web50210.mail.yahoo.com> Message-ID: <4243CA86.50301@ee.byu.edu> >Hi Travis. > >I'm curious if you find PEP-296 sufficient: > > http://www.python.org/peps/pep-0296.html > >It is marked as "withdrawn by the author", but that is not really true. A >more accurate statement would be "the author spent his allotted time >defending and revising the PEP on the Python mailing list and was not left >with sufficient time to finish the implementation on his corporate dollar". > :-) It's a good PEP, and while my company uses Python quite extensively, >after two weeks I had to get back to more direct goals... > > Great to hear from you Scott. Yes, I looked at this PEP (though I haven't studied it sufficiently to say if it's perfect for our needs or not), but it is very close. I did not know what "withdrawn by author" meant, thanks for clarifying. How would somebody change the status of that and re-open the PEP? I think it is a great place to start. Also, numarray has a memory object implemented that is a good start on the implementation. So, this wouldn't be a huge job at this point. >Regardless, I think PEP-296 meets your needs (and several other groups in >the Python community), and it might save someone the time recreating a new >PEP from scratch. More importantly, it might save someone some of the time >required to defend and argue the PEP on the Python mailing list. When the >discussion cleared, Guido was very positive toward the PEP - I just never >got it implemented... > > > Good to hear. >The "meta" attribute would be a small change. It's possible to do that >with composition or inheritance instead, but that's really a just matter of >taste. > > I don't think I fully understand what you mean by "composition" --- like a mixin class? or how inheritance solves the problem on a C-API level? I'm mainly thinking of Extension modules that want to use each others' memory on a C-level. That would be the main use of the meta information. >When I wrote the PEP, I had high hopes of creating a Python only "ndarray" >class out of bytes and the struct module, so it was definitely targeted at >needs similar to what I believe yours to be. Obviously you should do what >is best for you, but I would be pleased if my wasted effort was revived and > completed to actually be useful. > > Numarray essentially did this. I think we still need a C-type object for arrays. But, it's great to hear you still believe in the byte object. I wasn't sure. -Travis From oliphant at ee.byu.edu Fri Mar 25 00:40:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 25 00:40:08 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <20050325081346.24717.qmail@web50210.mail.yahoo.com> References: <20050325081346.24717.qmail@web50210.mail.yahoo.com> Message-ID: <4243CE20.1070509@ee.byu.edu> Scott Gilbert wrote: >Hi Travis. > >I'm curious if you find PEP-296 sufficient: > > http://www.python.org/peps/pep-0296.html > >It is marked as "withdrawn by the author", but that is not really true. A >more accurate statement would be "the author spent his allotted time >defending and revising the PEP on the Python mailing list and was not left >with sufficient time to finish the implementation on his corporate dollar". > :-) It's a good PEP, and while my company uses Python quite extensively, >after two weeks I had to get back to more direct goals... > > I read the PEP again, and agree with Scott that it is quite good and would fit what we need quite well. I say let's resurrect it and push it forward. Scott, do you have any left-over code you could contribute? -Travis From oliphant at ee.byu.edu Fri Mar 25 00:40:35 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 25 00:40:35 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> Message-ID: <4243CE29.80304@ee.byu.edu> >For example, if a developer uses scipyfortran package in a package, it immidiately reduces the number of >potential users for this package. While I'm not in love with my suggestion and would prefer to see better ones put forward, wouldn't any system that uses routines not available unless you have a fortran-compiled package installed be a problem? I was just proposing not "hiding" this from the developer but making it explicit. What do you propose to do for those situations? I was just proposing putting them in a separate hierarchy so the developer is aware he is using something that requires fortran. I actually think that it's somewhat of a non-issue myself, and feel that people who don't have fortran compilers will look for binaries anyway. > > I got an impression from earlier threads that scipy_distutils will be > included to scipy_base. So, I am proposing to use scipy_distutils > tools and our scipy experience for dealing with this issue, > scipy.lib.lapack > would be a good working prototype here. > > Ideally, scipy_base should provide a complete interface to LAPACK > routines, but not immidiately, of course. Now, depending on the > availability of compilers and resources in a particular computer, the > following would happen: > 1) No Fortran compiler, no lapack libraries in the system, only C > compiler is available --- f2c generated lite-lapack C sources are used > to build lapack extension module; wrappers to lapack routines, for > which there are no f2c generated sources, are disabled by f2py `only:` > feature. > lite-lapack C sources come with scipy_base sources. > 2) No Fortran compiler, system has lapack libraries (atlas or > Accelerate or vecLib), C compiler is available --- system lapack > library will be used and a complete lapack extension module can be built. > 3) Fortran and C compiler are available, no lapack libraries in the > system --- Fortran lite-lapack sources are used to build lapack > extension module; > lite-lapack Fortran sources come with scipy_base sources. Similar to > the case (1), some wrappers are disabled. > 4-..) other combinations are possible and users can choose their > favorite approach. Great, Sounds like Pearu has some good ideas here. I nominate Pearu to take the lead here. Michiel sounds like he? wants to keep the Numeric, RandomArray, LinearAlgebra naming conventions forever. I want them to be more coordinated like scipy is doing with scipy.linalg scipy.stats and scipy_base ( I agree scipy.base is better). What are the opinions of others on this point. Of course the names Numeric, RandomArray, and LinearAlgebra will still work, but I think they should be deprecated in favor of a better overall design for numerical packages. What do others think? From mdehoon at ims.u-tokyo.ac.jp Fri Mar 25 01:03:43 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Fri Mar 25 01:03:43 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> Message-ID: <4243D4A5.9050004@ims.u-tokyo.ac.jp> Pearu Peterson wrote: > I got an impression from earlier threads that scipy_distutils will be > included to scipy_base. So, I am proposing to use scipy_distutils tools > and our scipy experience for dealing with this issue, scipy.lib.lapack > would be a good working prototype here. Have you tried integrating scipy_distutils with Python's distutils? My guess is that Python's distutils can benefit from what is in scipy_distutils, particularly the parts dealing with C compilers. A clean integration will also prevent duplicated code, avoids Pearu having to keep scipy_distutils up to date with Python's distutils, and will enlarge the number of potential users. Having two distutils packages seems to be too much of a good thing. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From pearu at scipy.org Fri Mar 25 01:22:33 2005 From: pearu at scipy.org (Pearu Peterson) Date: Fri Mar 25 01:22:33 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4243CE29.80304@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243CE29.80304@ee.byu.edu> Message-ID: On Fri, 25 Mar 2005, Travis Oliphant wrote: >> For example, if a developer uses scipyfortran package in a package, it > immidiately reduces the number of >potential users for this package. > > While I'm not in love with my suggestion and would prefer to see better ones > put forward, wouldn't any system that uses routines not available unless you > have a fortran-compiled package installed be a problem? I was just proposing > not "hiding" this from the developer but making it explicit. > > What do you propose to do for those situations? I was just proposing putting > them in a separate hierarchy so the developer is aware he is using something > that requires fortran. I actually think that it's somewhat of a non-issue > myself, and feel that people who don't have fortran compilers will look for > binaries anyway. Such an situation can be avoided if a package is extended with new wrappers parallel for all backend cases. For example, when adding a new interface to a lapack routine then to the scipy_base sources must be added both Fortran and f2c versions of the corresponding routine. Pearu From pearu at scipy.org Fri Mar 25 01:59:47 2005 From: pearu at scipy.org (Pearu Peterson) Date: Fri Mar 25 01:59:47 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4243D4A5.9050004@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> Message-ID: On Fri, 25 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Pearu Peterson wrote: >> I got an impression from earlier threads that scipy_distutils will be >> included to scipy_base. So, I am proposing to use scipy_distutils tools and >> our scipy experience for dealing with this issue, scipy.lib.lapack >> would be a good working prototype here. > > Have you tried integrating scipy_distutils with Python's distutils? My guess > is that Python's distutils can benefit from what is in scipy_distutils, > particularly the parts dealing with C compilers. A clean integration will > also prevent duplicated code, avoids Pearu having to keep scipy_distutils up > to date with Python's distutils, and will enlarge the number of potential > users. Having two distutils packages seems to be too much of a good thing. No, I have not. Though a year or so ago there was a discussion about this in distutils list, mainly for adding Fortran compiler support to distutils. At the time I didn't have resources to push scipy_distutils features to distutils and even less so for now. So, one can think that scipy_distutils is an extension to distutils, though it also includes few bug fixes for older distutils. On the other hand, since Scipy supports Python starting at 2.2 then it cannot relay much on new features added to distutils of later Python versions. Instead, if these features happen to be useful for Scipy then they are backported for Python 2.2 through implementing them in scipy_distutils. "Luckily", there are not much such features as scipy_distutils has evolved with new very useful features much quicker than distutils. But, for Numeric3, scipy.distutils would be a perfect place to clean up scipy_distutils a bit, e.g. removing some obsolete features and assuming that Numeric3 will support Python 2.3 and up. Based on that, integrating scipy_distutils features to standard distutils can be made less pain if someone decides to do that. Pearu From xscottg at yahoo.com Fri Mar 25 03:35:35 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Mar 25 03:35:35 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: 6667 Message-ID: <20050325113426.58485.qmail@web50203.mail.yahoo.com> --- Travis Oliphant wrote: > How would somebody change the status of that > and re-open the PEP? I believe all it would take is a note to the python-dev mailing list by the new champion who was willing to implement and defend it. The text is public domain, so there's no copyright silliness if you need to make changes. I'm curious to see how this flies as this has always been one of their pet peave topics. Talking about buffer objects/protocols in general draws ire from some and dead silence from the rest. :-) > > Also, numarray has a memory object implemented that is a good start on > the implementation. So, this wouldn't be a huge job at this point. > The memory object is a very good start. I don't know if it tries to be usable when the GIL is released, or if it handles the slice semantics the same way. I think doing pickling really well is a non-trivial issue - at least if this object is going into the core of Python. Implementing the new pickling protocol is not terribly difficult, and any object can do it, but that only solves the space half of the problem. The new pickling protocol allows one to serialize large data without making one large copy of the binary data as a string, but one still has to make a lot of little copies of the data a piece at time. The multitude of little parts cost time allocating and memcpy-ing just to be written to a file and discarded. It would be great if the Python core libraries (cPickle) could be "taught" about the new type and serialize directly from the memory that is already there without creating any new string copies. > > > The "meta" attribute would be a small change. It's possible > > to do that with composition or inheritance instead, but that's > > really a just matter of taste. > > > I don't think I fully understand what you mean by "composition" > --- like a mixin class? or how inheritance solves the problem > on a C-API level? > > I'm mainly thinking of Extension modules that want to use each others' > memory on a C-level. That would be the main use of the meta information. > It would be a lot like putting a similar meta dictionary on the builtin "list" object. Many people wouldn't use it and would consider it a tiny wart just taking up space, while others would use it pretty differently from the way Numeric3 did and store completely different keys. The result would be that Numeric3 would have to check for the keys that it wanted in the meta dictionary. Since I think you're going to allow folks to pass in their own buffer objects to some of the array constructors (mmap for instance), the underlying Numeric3 code can't really assume that the "meta" attribute is there on all buffer objects. If you wanted to annotate all buffers that were passed inside of Numeric, something like the following would work with "memory", and "mmap" alike: # Composition of a memory buffer and meta data class NumericStorage(object): def __init__(self, buf, **meta): self.buf = buf self.meta = meta.copy() Of course at the C-Level it could just be a lightweight struct with two PyObject pointers. If you really wanted to add a meta attribute to the new generic memory object, you could do: # Inheritance to add metadata to a memory buffer class NumericBytes(memory): def __init__(self, *args, **kwds): memory.__init__(self, *args, **kwds) self.meta = {} It's a minor pain, but obviously inheritance like this can be done at the C level too... I don't know what particular meta data you plan to store with the buffer itself, and I'm going to resist the urge to guess. You probably have some very good use cases. What are you planning? If you have a list of meta keys that many if not all users would agree on, then it would be worth considering just building them efficiently into the proposed type and not wasting the overhead of a dictionary. That would also standardize their usage to some extent. As I said before, this is all just a matter of taste. I appologize for using so much text to try and explain what I meant. When all is said and done, I think whether the C API code is required to check for keys in the meta dictionary or attributes of the object itself, it's probably a pretty similar task. It would be PyDict_GetItem(...) versus PyObject_GetAttr(...). > > > When I wrote the PEP, I had high hopes of creating a > > Python only "ndarray" class out of bytes and the struct > > module > > Numarray essentially did this. I think we still need a C-type object > for arrays. > Yup. I understand and appreciate your attention to performance. For small arrays, it's tough to argue that a C implementation won't win. At the time, all I really needed was something to store and casually inspect/manipulate my odd ball data (large arrays of complex short) without converting to a larger representation. We have something very similar to weave.inline that I used when it came time to go fast. > > I read the PEP again, and agree with Scott that it > is quite good and would fit what we need quite well. > > I say let's resurrect it and push it forward. > Very cool. I hope it does what you need and makes it into the core. With your enthusiasm, I wish I had time to finish or at least help with the implementation. Unfortunately, I'm more swamped at work now than I was when I dropped the ball on this the first time. > > Scott, do you have any left-over code you could contribute? > I'll try and find what I had, but I probably don't have too much that you'll find much more valuable than the memory object from Numarray. I remember I went through a bit of pain to implement the "new style classes" correctly, but the pickling stuff in the core of the Python library is where the real challenge is, and I never got going on the TeX docs or unit tests that would be necessary for acceptance. Cheers, -Scott From Chris.Barker at noaa.gov Fri Mar 25 09:53:57 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Mar 25 09:53:57 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <20050325113426.58485.qmail@web50203.mail.yahoo.com> References: <20050325113426.58485.qmail@web50203.mail.yahoo.com> Message-ID: <42445030.5090503@noaa.gov> Scott Gilbert wrote: > I don't know what particular meta data you plan to store with the buffer > itself, and I'm going to resist the urge to guess. You probably have some > very good use cases. What are you planning? I don't know what Travis has in mind, but I thought I'd bring up a use case that I think provides some of the motivation for this. There are any number of Third arty extensions that could benefit from being able to directly read the data in Numeric* arrays: PIL, wxPython, etc. Etc. My personal example is wxPython: At the moment, you can pass a Numeric or numarray array into wxPython, and it will be converted to a wxList of wxPoints (for instance), but that is done by using the generic sequence protocol, and a lot of type checking. As you can imagine, that is pretty darn slow, compared to just typecasting the data pointer and looping through it. Robin Dunn, quite reasonably, doesn't want wxPython to depend on Numeric, so that's what we've got. My understanding of this memory object is that an extension like wxPython wouldn't not need to know about Numeric, but could simply get the memory Object, and there would be enough meta-data with it to typecast and loop through the data. I'm a bit skeptical about how this would work. It seems that the metadata required would be the full set of stuff in an array Object already: type dimensions strides This could be made a bit simpler by allowing only contiguous arrays, but then there would need to be a contiguous flag. To make use of this, wxPython would have to know a fair bit about Numeric Arrays anyway, so that it can check to see if the data is appropriate. I guess the advantage is that while the wxPython code would have to know about Numeric arrays, it wouldn't have to include Numeric headers or code. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Fri Mar 25 11:33:13 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 25 11:33:13 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42445030.5090503@noaa.gov> References: <20050325113426.58485.qmail@web50203.mail.yahoo.com> <42445030.5090503@noaa.gov> Message-ID: <4244676A.1050006@ee.byu.edu> Chris Barker wrote: > Scott Gilbert wrote: > >> I don't know what particular meta data you plan to store with the buffer >> itself, and I'm going to resist the urge to guess. You probably have >> some >> very good use cases. What are you planning? > > > I don't know what Travis has in mind, but I thought I'd bring up a use > case that I think provides some of the motivation for this. > This is exactly the kind of thing I mean. One of the reasons for putting Numeric in the core is so other extension writers can use it reliably. But, really, a better solution is to create ways to deal with each others memory reliably. I think the bytes object Scott proposed is really very close to what is needed. Extension writers will likely need some additional information in order to use somebody else's byte object well. What information is needed can vary (but it should be standard for different types of objects). You can already get at the memory directly through the buffer protocol. But, I think pickling could be handled efficiently with a single new opcode very similar to a string but instead creating a bytes object on unpickling. Then, an array could use the memory of that bytes object (instead of creating it's own and copying). This would be very easy to handle. I really believe we need to push forward the bytes object. This attitude towards the buffer interface (due to an easily fixed problem) is really disquieting. I could spend time pushing the bytes object, but it would take away from the work I'm currently doing. This is something that we really need some help with. Scott's PEP is quite good and gives an outline for how to do this, so except for pickling you don't even need to be an expert to get something done. You could start with the outline of numarray's memory object (getting rid of the Int64 stuff in it), and proceeding from there. It would probably take a week to get it done. > > My understanding of this memory object is that an extension like > wxPython wouldn't not need to know about Numeric, but could simply get > the memory Object, and there would be enough meta-data with it to > typecast and loop through the data. I'm a bit skeptical about how this > would work. It seems that the metadata required would be the full set > of stuff in an array Object already: > > type > dimensions > strides > > This could be made a bit simpler by allowing only contiguous arrays, > but then there would need to be a contiguous flag. I'm thinking just contiguous arrays would be passed. While Numeric does support the multi-segment buffer interface. I doubt extension writers want to try and understand how to deal with it. I think it would be too much of a burden to other extensions if the array they saw was not contiguous. Even internal to Numeric, discontiguous arrays are made contiguous all the time (although the new iterator in Numeric3 makes it much easier for a programmer to deal with discontiguous arrays). > > To make use of this, wxPython would have to know a fair bit about > Numeric Arrays anyway, so that it can check to see if the data is > appropriate. I guess the advantage is that while the wxPython code > would have to know about Numeric arrays, it wouldn't have to include > Numeric headers or code. It only has to know the shape and type (typechar and itemsize) of the array if we stick to contiguous arrays (this is where the typechar becomes very valuable). I still think the bytes object is a really, really good idea. Python needs it very badly. If every extension module that allocated memory went through a bytes object instead, then a lot of unnecessary copying could be minimized. -Travis From verveer at embl.de Fri Mar 25 11:54:14 2005 From: verveer at embl.de (Peter Verveer) Date: Fri Mar 25 11:54:14 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4244676A.1050006@ee.byu.edu> References: <20050325113426.58485.qmail@web50203.mail.yahoo.com> <42445030.5090503@noaa.gov> <4244676A.1050006@ee.byu.edu> Message-ID: >> This could be made a bit simpler by allowing only contiguous arrays, >> but then there would need to be a contiguous flag. > > I'm thinking just contiguous arrays would be passed. While Numeric > does support the multi-segment buffer interface. I doubt extension > writers want to try and understand how to deal with it. I think it > would be too much of a burden to other extensions if the array they > saw was not contiguous. Even internal to Numeric, discontiguous > arrays are made contiguous all the time (although the new iterator in > Numeric3 makes it much easier for a programmer to deal with > discontiguous arrays). It think it would be a real shame not to support non-contiguous data. It would be great if such a byte object could be used instead of Numeric/numarray arrays when writing extensions. Then I could write C extensions that could be made available very easily/efficiently to any package supporting it without having to worry about the specific C api of those packages. If only contiguous byte objects are supported that byte object is not a good option anymore for implementing extensions for Numeric unless I am prepared to live with a lot of copying of non-contiguous arrays. Peter From oliphant at ee.byu.edu Fri Mar 25 12:42:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 25 12:42:08 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <20050325113426.58485.qmail@web50203.mail.yahoo.com> <42445030.5090503@noaa.gov> <4244676A.1050006@ee.byu.edu> Message-ID: <4244770C.4050007@ee.byu.edu> Peter Verveer wrote: >>> This could be made a bit simpler by allowing only contiguous arrays, >>> but then there would need to be a contiguous flag. >> >> >> I'm thinking just contiguous arrays would be passed. While Numeric >> does support the multi-segment buffer interface. I doubt extension >> writers want to try and understand how to deal with it. I think it >> would be too much of a burden to other extensions if the array they >> saw was not contiguous. Even internal to Numeric, discontiguous >> arrays are made contiguous all the time (although the new iterator in >> Numeric3 makes it much easier for a programmer to deal with >> discontiguous arrays). > > > It think it would be a real shame not to support non-contiguous data. > It would be great if such a byte object could be used instead of > Numeric/numarray arrays when writing extensions. Then I could write C > extensions that could be made available very easily/efficiently to any > package supporting it without having to worry about the specific C api > of those packages. If only contiguous byte objects are supported that > byte object is not a good option anymore for implementing extensions > for Numeric unless I am prepared to live with a lot of copying of > non-contiguous arrays. How would you support "non-contiguous" data with the bytes object? Or do you mean just passing the strides information around as meta data? With the bytes object pointing to the start? The latter would not be hard to support (it's just a matter of defining an additional piece of meta information and making people aware of it) but not every extension writer would try and deal with that, I'm sure. But, that would be o.k. -Travis From stephen.walton at csun.edu Fri Mar 25 14:54:04 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Fri Mar 25 14:54:04 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <4241C781.8080001@ee.byu.edu> References: <20050323105807.59603.qmail@web50208.mail.yahoo.com> <4241C781.8080001@ee.byu.edu> Message-ID: <4244963B.5010103@csun.edu> Travis Oliphant wrote: > Well, rank-0 arrays are and forever will be mutable. But, Python > scalars (and the new Array-like Scalars) are not mutable. This is a really minor point, and only slightly relevant to the discussion, and perhaps I'm just revealing my Python ignorance again, but: what does it mean for a scalar to be mutable? I can understand that one wants a[0]=7 to be allowed when a is a rank-0 array, and I also understand that str[k]='b' where str is a string is not allowed because strings are immutable. But if I type "b=7" followed by "b=3", do I really care whether the 3 gets stuck in the same memory location previously occupied by the 7 (mutable) or the symbol b points to a new location containing a 3 (immutable)? What are some circumstances where this might matter? From verveer at embl.de Fri Mar 25 15:16:06 2005 From: verveer at embl.de (Peter Verveer) Date: Fri Mar 25 15:16:06 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4244770C.4050007@ee.byu.edu> References: <20050325113426.58485.qmail@web50203.mail.yahoo.com> <42445030.5090503@noaa.gov> <4244676A.1050006@ee.byu.edu> <4244770C.4050007@ee.byu.edu> Message-ID: <1ecedca417f6fb76ab3c611785b77f53@embl.de> On Mar 25, 2005, at 9:39 PM, Travis Oliphant wrote: > Peter Verveer wrote: > >>>> This could be made a bit simpler by allowing only contiguous >>>> arrays, but then there would need to be a contiguous flag. >>> >>> >>> I'm thinking just contiguous arrays would be passed. While Numeric >>> does support the multi-segment buffer interface. I doubt extension >>> writers want to try and understand how to deal with it. I think >>> it would be too much of a burden to other extensions if the array >>> they saw was not contiguous. Even internal to Numeric, >>> discontiguous arrays are made contiguous all the time (although the >>> new iterator in Numeric3 makes it much easier for a programmer to >>> deal with discontiguous arrays). >> >> >> It think it would be a real shame not to support non-contiguous data. >> It would be great if such a byte object could be used instead of >> Numeric/numarray arrays when writing extensions. Then I could write C >> extensions that could be made available very easily/efficiently to >> any package supporting it without having to worry about the specific >> C api of those packages. If only contiguous byte objects are >> supported that byte object is not a good option anymore for >> implementing extensions for Numeric unless I am prepared to live with >> a lot of copying of non-contiguous arrays. > > > How would you support "non-contiguous" data with the bytes object? > Or do you mean just passing the strides information around as meta > data? With the bytes object pointing to the start? Exactly. > The latter would not be hard to support (it's just a matter of > defining an additional piece of meta information and making people > aware of it) but not every extension writer would try and deal with > that, I'm sure. But, that would be o.k. There needs to be a way to treat such objects as contiguous for people who do not want to deal with strides, which means copying data if needed. It would need some thought to make that transparent, and the question is if it is worth the trouble. I have not really followed the discussion about the byte object, and maybe I have got the wrong idea about its function. But if you see it as a generic data model for homogeneous array data, then it would provide a basis for writing C extensions that could work with different packages. For example to write a C extension of image processing routines that would work with both Numeric arrays and PIL images. Peter From Chris.Barker at noaa.gov Fri Mar 25 15:44:45 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Mar 25 15:44:45 2005 Subject: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4244770C.4050007@ee.byu.edu> References: <20050325113426.58485.qmail@web50203.mail.yahoo.com> <42445030.5090503@noaa.gov> <4244676A.1050006@ee.byu.edu> <4244770C.4050007@ee.byu.edu> Message-ID: <4244A208.3090108@noaa.gov> Travis Oliphant wrote: > do you mean just passing the strides information around as meta data? > With the bytes object pointing to the start? That works for me. It would require extension writers to know to at least check for the contiguous flag, but that's not too heavy a burden. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Fri Mar 25 16:44:01 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Mar 25 16:44:01 2005 Subject: [Numpy-discussion] Numeric3 CVS compiles now Message-ID: <4244AFB6.40601@ee.byu.edu> To all who were waiting: I've finished adding the methods to the array object so that Numeric3 in CVS now compiles (at least for me on Linux). I will be away for at least a day so it is a good time to play... -Travis From xscottg at yahoo.com Fri Mar 25 22:59:01 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Mar 25 22:59:01 2005 Subject: [Numpy-discussion] Bytes Object and Pickling In-Reply-To: <4244676A.1050006@ee.byu.edu> Message-ID: <20050326065756.19141.qmail@web50208.mail.yahoo.com> --- Travis Oliphant wrote: > > I think pickling could be handled efficiently with a single new opcode > very similar to a string but instead creating a bytes object on > unpickling. Then, an array could use the memory of that bytes object > (instead of creating it's own and copying). This would be very easy to > handle. > I agree that this is the easy and *right* way to add pickling of the "bytes object". However, several years ago, Guido imposed an additional constraint. He wanted the "bytes objects" to unpickle as "string objects" on older versions of Python that didn't know about the new "bytes type". His reasoning was that pickling is used to communicate between different versions of Python, and the older ones would die ungracefully when exposed to the new type. Perhaps his position would be different now. He has made added a new Pickling protocol since then, and that had to break backwards compatibility somewhere. Cheers, -Scott From xscottg at yahoo.com Fri Mar 25 22:59:04 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Mar 25 22:59:04 2005 Subject: [Numpy-discussion] Bytes Object and Metadata In-Reply-To: 6667 Message-ID: <20050326065814.27019.qmail@web50204.mail.yahoo.com> Adding metadata at the buffer object level causes problems for "view" semantics. Let's say that everyone agreed what "itemsize" and "itemtype" meant: real_view = complex_array.real The real_view will have to use a new buffer since they can't share the old one. The buffer used in complex_array would have a typecode like ComplexDouble and an itemsize of 16. The buffer in real_view would need a typecode of Double and an itemsize of 8. If metadata is stored with the buffer object, it can't be the same buffer object in both places. Another case would be treating a 512x512 image of 4 byte pixels as a 512x512x4 image of 1 byte RGBA elements. Or even coercing from Signed to Unsigned. The bytes object as proposed does allow new views to be created from other bytes objects (sharing the same memory underneath), and these views could each have separate metadata, but then you wouldn't be able to have arrays that used other types of buffers. Having arrays use mmap buffers is very useful. The bytes object shouldn't create views from arbitrary other buffer objects because it can't rely on the general semantics of the PyBufferProcs interface. The foreign buffer object might realloc and invalidate the pointer for instance... The current Python "buffer" builtin does this, and the results are bad. So creating a bytes object as a view on the mmap object doesn't work in the general case. Actually, now that I think about it, the mmap object might be safe. I don't believe the current implementation of mmap does any reallocing under the scenes and I think the pointer stays valid for the lifetime of the object. If we verified that mmap is safe enough, bytes could make a special case out of it, but then you would be locked into bytes and mmap only. Maybe that's acceptable... Still, I think keeping the metadata at a different level, and having the bytes object just be the Python way to spell a call to C's malloc will avoid a lot of problems. Read below for how I think the metadata stuff could be handled. --- Chris Barker wrote: > > There are any number of Third party extensions that could benefit from > being able to directly read the data in Numeric* arrays: PIL, wxPython, > etc. Etc. My personal example is wxPython: > > At the moment, you can pass a Numeric or numarray array into wxPython, > and it will be converted to a wxList of wxPoints (for instance), but > that is done by using the generic sequence protocol, and a lot of type > checking. As you can imagine, that is pretty darn slow, compared to just > typecasting the data pointer and looping through it. Robin Dunn, quite > reasonably, doesn't want wxPython to depend on Numeric, so that's what > we've got. > > My understanding of this memory object is that an extension like > wxPython wouldn't not need to know about Numeric, but could simply get > the memory Object, and there would be enough meta-data with it to > typecast and loop through the data. I'm a bit skeptical about how this > would work. It seems that the metadata required would be the full set of > stuff in an array Object already: > > type > dimensions > strides > > This could be made a bit simpler by allowing only contiguous arrays, but > then there would need to be a contiguous flag. > > To make use of this, wxPython would have to know a fair bit about > Numeric Arrays anyway, so that it can check to see if the data is > appropriate. I guess the advantage is that while the wxPython code would > have to know about Numeric arrays, it wouldn't have to include Numeric > headers or code. > I think being able to traffic in N-Dimensional arrays without requiring linking against the libraries is a good thing. Several years ago, I proposed a solution to this problem. Actually I did a really poor job of proposing it and irritated a lot of people in the process. I'm embarrassed to post a link to the following thread, but here it is anyway: http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/1166013 Accept my appologies if you read the whole thing just now. :-) Accept my sincere appologies if you read it at the time. I think the proposal is still relevant today, but I might revise it a bit as follows. A bear minimum N-Dimensional array for interchanging data across libraries could get by with following attributes: # Create a simple record type for storing attributes class BearMin: pass bm = BearMin() # Set the attributes sufficient to describe a simple ndarray bm.buffer = bm.shape = bm.itemtype = The bm.buffer and bm.shape attributes are pretty obvious. I would suggest that the bm.itemtype borrow it's typecodes from the Python struct module, but anything that everyone agreed on would work. (The struct module is nice because it is already documented and supports native and portable types of many sizes in both endians. It also supports composite struct types.) Those attributes are sufficient for someone to *produce* an N-Dimensional array that could be understood by many libraries. Someone who *consumes* the data would need to know a few more: bm.offset = bm.strides = The value of bm.offset would default to zero if it wasn't present, and the the tuple bm.strides could be generated from the shape assuming it was a C style array. Subscripting operations that returned non-contiguous views of shared data could change bm.offset to non-zero. Subscripting would also affect the bm.strides, and creating a Fortran style array would require bm.strides to be present. You might also choose to add bm.itemsize in addition to bm.itemtype when you can describe how big elements are, but you can't sufficiently describe what the data is using the agreed upon typecodes. This would be uncommon. The default for bm.itemsize would come from struct.calcsize(bm.itemtype). You might also choose to add bm.complicated for when the array layout can't be described by the shape/offset/stride combination. For instance bm.complicated might get used when creating views from more sophisticated subscripting operations like index arrays or mask arrays. Although it looks like Numeric3 plans on making new contiguous copies in those cases. The C implementations of arrays would only have to add getattr like methods, and the data could be stored very compactly. >From those minimum 5-7 attributes (metadata), an N-Dimensional array consumer could determine most everything it needed to know about the data. Simple routines could determine things like iscontiguous(bm), iscarray(bm) or isfortran(bm). I expect libraries like wxPython or PIL could punt (raise an exception) when the water gets too deep. It also doesn't prohibit other attributes from being added. Just because an N-Dimensional array described it's itemtype using the struct module typecodes doesn't mean that it couldn't implement more sophisticated typing hierarchies with a different attribute. There are a few commonly used types like "long double" which are not supported by the struct module, but this could be addressed with a little discussion. Also you might want a "bit" or "Object" typecode for tightly packed mask arrays and Object arrays. The names could be argued about, and something like: bm.__array_buffer__ bm.__array_shape__ bm.__array_itemtype__ bm.__array_offset__ bm.__array_strides__ bm.__array_itemsize__ bm.__array_complicated__ would really bring home the notion that the attributes are a description of what it means to participate in an N-Dimensional array protocol. Plus names this long and ugly are unlikely to step on the existing attributes already in use by Numeric3 and Numarray. :-) Anyway, I proposed this a long time ago, but the belief was that one of the standard array packages would make it into the core very soon. With a standard array library in the core, there wouldn't be as much need for general interoperability like this. Everyone could just use the standard. Maybe that position would change now that Numeric3 and Numarray both look to have long futures. Even if one package made it in, the other is likely to live on. I personally think the competition is a good thing. We don't need to have only one array package to get interoperability. I would definitely like to see the Python core acquire a full fledged array package like Numeric3 or Numarray. When I log onto a new Linux or MacOS machine, the array package would just be there. No installs, no hassle. But I still think a simple community agreed upon set of attributes like this would be a good idea. --- Peter Verveer wrote: > > It think it would be a real shame not to support non-contiguous data. > It would be great if such a byte object could be used instead of > Numeric/numarray arrays when writing extensions. Then I could write C > extensions that could be made available very easily/efficiently to any > package supporting it without having to worry about the specific C api > of those packages. If only contiguous byte objects are supported that > byte object is not a good option anymore for implementing extensions > for Numeric unless I am prepared to live with a lot of copying of > non-contiguous arrays. > I'm hoping I made a good case for a slightly different strategy above. But even if the metadata did go into the bytes object itself, the metadata could describe a non-contiguous layout on top of the contiguous chunk of memory. There is another really valid argument for using the strategy above to describe metadata instead of wedging it into the bytes object: The Numeric community could agree on the metadata attributes and start using it *today*. If you wait until someone commits the bytes object into the core, it won't be generally available until Python version 2.5 at the earliest, and any libraries that depended on using bytes stored metadata would not work with older versions of Python. Cheers, -Scott From xscottg at yahoo.com Fri Mar 25 23:16:08 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Mar 25 23:16:08 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: 6667 Message-ID: <20050326071530.56285.qmail@web50202.mail.yahoo.com> --- Stephen Walton wrote: > Travis Oliphant wrote: > > > Well, rank-0 arrays are and forever will be mutable. But, Python > > scalars (and the new Array-like Scalars) are not mutable. > > This is a really minor point, and only slightly relevant to the > discussion, and perhaps I'm just revealing my Python ignorance again, > but: what does it mean for a scalar to be mutable? I can understand > that one wants a[0]=7 to be allowed when a is a rank-0 array, and I also > understand that str[k]='b' where str is a string is not allowed because > strings are immutable. But if I type "b=7" followed by "b=3", do I > really care whether the 3 gets stuck in the same memory location > previously occupied by the 7 (mutable) or the symbol b points to a new > location containing a 3 (immutable)? What are some circumstances where > this might matter? > It's nice because it fits with the rest of the array semantics and creates a consistant system: Array3D = zeros((1, 1, 1)) Array2D = Array3D[0] Array1D = Array2D[0] Array0D = Array1D[0] That each is mutable is shown by: Array3D[0, 0, 0] = 1 Array2D[0, 0] = 1 Array1D[0] = 1 Array0D[] = 1 # whoops! Unfortunately that last one, while it follows the pattern, doesn't work for Python's parser so you're stuck with: Array0D[()] = 1 This becomes useful when you start writing generic routines that want to work with *any* dimensional arrays: zero_all_elements(ArrayND) Python's immutable scalar types could not change in this case. More complicated examples are more interesting, but a simple implementation of the above would be: def zero_all_elements(any): any[...] = 0 Cheers, -Scott From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 02:49:23 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 02:49:23 2005 Subject: [Numpy-discussion] Numeric3 CVS compiles now In-Reply-To: <4244AFB6.40601@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> Message-ID: <42453F1D.6030305@ims.u-tokyo.ac.jp> I've downloaded the latest Numeric3 and tried to compile it. I found one error in setup.py that I put in myself a couple of days ago; I fixed this in CVS. On Cygwin, the compilation and installation runs without problems, other than the warning messages. However, when I try to compile Numeric3 for Windows, an error windows pops up with the following message: 16 bit MS-DOS Subsystem ~/Numeric3 The NTVDM CPU has encountered an illegal instruction. CS:071d IP:210f OP:63 69 66 69 65 Choose 'Close' to terminate the application It seems that this message is due to the section labeled "#Generate code" in setup.py, where python is being run with a call to os.system. What does this do? Is there a need to generate code automatically in setup.py, rather than include the generated code with the Numeric3 source code? When using Numeric3, I found that the zeros function now returns a float array by default, whereas Numeric returns an integer array: $ python2.4 Python 2.4 (#1, Dec 5 2004, 20:47:03) [GCC 3.3.3] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from ndarray import * >>> zeros(5) array([0.0, 0.0, 0.0, 0.0, 0.0], 'd') >>> array([1,2,3]) array([1, 2, 3], 'l') >>> mdehoon at ginseng ~ $ python2.4 Python 2.4 (#1, Dec 5 2004, 20:47:03) [GCC 3.3.3] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> zeros(5) array([0, 0, 0, 0, 0]) >>> array([1,2,3]) array([1, 2, 3]) >>> array([1,2,3]).typecode() 'l' >>> Is there a reason to change the default behavior of zeros? Existing code may assume that zeros returns an integer array, and may behave incorrectly with Numeric3. Such bugs would be very hard to find. Finally, I tried to compile an extension module with Numeric3. The warning messages concerning the type of PyArrayObject->dimensions no longer occur, now that intp is typedef'd as an int instead of a long. But I agree with David Cooke that using Py_intptr_t in pyport.h would be better: > Why not use Py_intptr_t? It's defined by the Python C API already (in > pyport.h). When compiling the extension module, I get warning messages about PyArray_Cast, which now takes a PyObject* instead of a PyArrayObject* as in Numerical Python. Is this a typo in Src/arrayobject.c? Also, the PyArray_Cast function has the comment /* For backward compatibility */ in Src/arrayobject.c. Why is that? Linking the extension module fails due to the undefined reference to _PyArray_API. --Michiel. Travis Oliphant wrote: > > To all who were waiting: > > I've finished adding the methods to the array object so that Numeric3 in > CVS now compiles (at least for me on Linux). > > I will be away for at least a day so it is a good time to play... > -Travis > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From stephen.walton at csun.edu Sat Mar 26 12:19:21 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Sat Mar 26 12:19:21 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <20050326071530.56285.qmail@web50202.mail.yahoo.com> References: <20050326071530.56285.qmail@web50202.mail.yahoo.com> Message-ID: <4245C390.1090907@csun.edu> Scott Gilbert wrote: >It's nice because it fits with the rest of the array semantics and creates >a consistant system: > > Array3D = zeros((1, 1, 1)) > > Array2D = Array3D[0] > Array1D = Array2D[0] > Array0D = Array1D[0] > > Hmm...in both Numeric3 and numarray, the last line creates a Python scalar. Array2D and Array1D by contrast are not only arrays, but they are views of Array3D. Is what you're saying is that you want Array0D to be a rank-0 array after the above? > Array0D[()] = 1 > > Of course, this generates an error at present: "TypeError: object does not support item assignment" since it is a Python int. Moreover, it isn't a view, so that Array0D doesn't change after the assignment to Array3D. Is this also slated to be changed/fixed using rank 0 arrays? Would Array0D.shape be () in that case? From stephen.walton at csun.edu Sat Mar 26 12:25:53 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Sat Mar 26 12:25:53 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 Message-ID: <4245C512.1030701@csun.edu> zeros() in Numeric3 defaults to typecode='d' while in numarray it defaults to typecode=None, which in practice means 'i' by default. Is this deliberate? Is this desirable? I'd vote for zeros(), ones() and the like to default to 'i' or 'f' rather than 'd' in the interest of space and speed. From arnd.baecker at web.de Sat Mar 26 13:31:14 2005 From: arnd.baecker at web.de (Arnd Baecker) Date: Sat Mar 26 13:31:14 2005 Subject: [Numpy-discussion] Numeric3 CVS compiles now In-Reply-To: <4244AFB6.40601@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> Message-ID: Hi Travis, On Fri, 25 Mar 2005, Travis Oliphant wrote: > To all who were waiting: > > I've finished adding the methods to the array object so that Numeric3 in > CVS now compiles (at least for me on Linux). Compilation is fine for me as well (linux). I played around a bit - Obviously, addition of arrays, multiplication etc. don't work yet (as expected, if I remember your mails correctly). One thing which confused me is the following In [1]:from ndarray import * In [2]:x=arange(10.0) In [3]:scalar=x[3] In [4]:print scalar+1 --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) TypeError: unsupported operand type(s) for +: 'double_arrtype' and 'int' In [5]:print type(scalar) OK, this has been discussed up-and-down on this mailing list. At the moment I don't know how this will affect my Numeric(3) life, so I will wait until the other operations are implemented and see if there are any consequences for my programs at all ... ;-) A couple of things seem to be a bit unusual, e.g.: In [9]:x=arange(10.0) In [10]:x Out[10]:array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 'd') In [11]:x.argmax() --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) /home/scratch/abaecker/INSTALL_SOFT/TEST_NUMERIC3/ TypeError: function takes exactly 1 argument (0 given) In [12]:x.argmax(None) Out[12]:9 In [13]:t=x.argmax(None) In [14]:type(t) Out[14]: So argmax also returns an array type, but I would have really thought that this is an integer index?! Also a couple of attributes (E.g. x.sum() are not yet implemented) or lack documention (I know this comes last ;-). Best, Arnd From xscottg at yahoo.com Sat Mar 26 13:37:13 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Sat Mar 26 13:37:13 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: 6667 Message-ID: <20050326213554.93877.qmail@web50201.mail.yahoo.com> --- Stephen Walton wrote: > Scott Gilbert wrote: > > >It's nice because it fits with the rest of the array semantics and > creates > >a consistant system: > > > > Array3D = zeros((1, 1, 1)) > > > > Array2D = Array3D[0] > > Array1D = Array2D[0] > > Array0D = Array1D[0] > > > > > Hmm...in both Numeric3 and numarray, the last line creates a Python > scalar. Array2D and Array1D by contrast are not only arrays, but they > are views of Array3D. I should have been clear that I wasn't describing what any of the array packages do today. The views thing is a different can of worms (I think they should be copy-on-write copies by default and views only when explicitly asked for). > Is what you're saying is that you want Array0D to > be a rank-0 array after the above? > Yes. I think it fits the pattern and is consistant. There are also cases where it is useful. Array operations form a nice little calculus. Returning non-mutable scalars in place of rank-0 arrays is like having (2 - 1 == 1) while (2 - 1 - 1 == a donut). 0 looks like a donut, but it's a different food group. :-) > > > Array0D[()] = 1 > > > Of course, this generates an error at present: "TypeError: object does > not support item assignment" since it is a Python int. Moreover, it > isn't a view, so that Array0D doesn't change after the assignment to > Array3D. Is this also slated to be changed/fixed using rank 0 arrays? > Would Array0D.shape be () in that case? > Array0D.shape would be an empty tuple () in that case. I can't say what either Numeric3 or Numarray will do. The last time I read the Numeric3 PEP, it looked like there were going to be several special types of scalars. From cjw at sympatico.ca Sat Mar 26 17:57:59 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Mar 26 17:57:59 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4245C512.1030701@csun.edu> References: <4245C512.1030701@csun.edu> Message-ID: <424612E1.2020908@sympatico.ca> Stephen Walton wrote: > zeros() in Numeric3 defaults to typecode='d' while in numarray it > defaults to typecode=None, which in practice means 'i' by default. Is > this deliberate? Is this desirable? I'd vote for zeros(), ones() and > the like to default to 'i' or 'f' rather than 'd' in the interest of > space and speed. > The following seems to show that the default data type for the numarray elements is Int32: Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numarray as _n >>> a= _n.zeros(shape=(3, 3)) >>> a._type Int32 >>> I don't use the typecodes as the numerictypes are much more explicit. Colin W. From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 21:23:29 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 21:23:29 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> Message-ID: <42464443.8050402@ims.u-tokyo.ac.jp> Pearu Peterson wrote: > Michiel Jan Laurens de Hoon wrote: >> Have you tried integrating scipy_distutils with Python's distutils? My >> guess is that Python's distutils can benefit from what is in >> scipy_distutils, particularly the parts dealing with C compilers. A >> clean integration will also prevent duplicated code, avoids Pearu >> having to keep scipy_distutils up to date with Python's distutils, and >> will enlarge the number of potential users. Having two distutils >> packages seems to be too much of a good thing. > > > No, I have not. Though a year or so ago there was a discussion about > this in distutils list, mainly for adding Fortran compiler support to > distutils. At the time I didn't have resources to push scipy_distutils > features to distutils and even less so for now. So, one can think that > scipy_distutils is an extension to distutils, though it also includes > few bug fixes for older distutils. Having a separate scipy_distutils that fixes some bugs in Python's distutils is a design mistake in SciPy that we should not repeat in Numeric3. Not that I don't think the code in scipy_distutils is not useful -- I think it would be very useful. But the fact that it is not integrated with the existing Python distutils makes me wonder if this package really has been thought out that well. As far as I can tell, scipy_distutils now fulfills four functions: 1) Bug fixes for Python's distutils for older Python versions. As Numeric3 will require Python 2.3 or up, these are no longer relevant. 2) Bug fixes for current Python's distutils. These should be integrated with Python's distutils. Writing your own package instead of contributing to Python gives you bad karma. 3) Fortran support. Very useful, and I'd like to see them in Python's distutils. Another option would be to put this in SciPy.fortran or something similar. But since Python's distutils already has a language= option for C++ and Objective-C, the cleanest way would be to add this to Python's distutils and enable language="fortran". 4) Stuff particular to SciPy, for example finding Atlas/Lapack/Blas libraries. These we can decide on a case-by-case basis if it's useful for Numeric3. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 21:27:24 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 21:27:24 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42464443.8050402@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <424644FC.6070003@ims.u-tokyo.ac.jp> Michiel Jan Laurens de Hoon wrote: > Pearu Peterson wrote: > > Michiel Jan Laurens de Hoon wrote: > ..... Not that I don't think the code in scipy_distutils is not > useful -- I think it would be very useful. One negation too many in this sentence -- sorry. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 21:33:04 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 21:33:04 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: References: <4244AFB6.40601@ee.byu.edu> Message-ID: <42464639.6050207@ims.u-tokyo.ac.jp> I'm a bit confused about where Numeric3 is heading. Originally, the idea was that Numeric3 should go in Python core. Are we still aiming for that? More recently, another goal was to integrate Numeric and numarray, which I fully support. However, from looking at the Numeric3 source code, and from looking at the errors found by Arnd, it looks like that Numeric3 is a complete rewrite of Numeric. This goes well beyond integrating Numeric and numarray. Now I realize that sometimes it is necessary to rethink and rewrite some code. However, Numerical Python has served us very well over the years, and I'm worried that rewriting the whole thing will break more than it fixes. So where is Numeric3 going? --Michiel. Arnd Baecker wrote: > Hi Travis, > > On Fri, 25 Mar 2005, Travis Oliphant wrote: > > >>To all who were waiting: >> >>I've finished adding the methods to the array object so that Numeric3 in >>CVS now compiles (at least for me on Linux). > > > Compilation is fine for me as well (linux). > I played around a bit - > Obviously, addition of arrays, multiplication etc. don't work > yet (as expected, if I remember your mails correctly). > One thing which confused me is the following > > In [1]:from ndarray import * > In [2]:x=arange(10.0) > In [3]:scalar=x[3] > In [4]:print scalar+1 > --------------------------------------------------------------------------- > exceptions.TypeError Traceback (most recent call last) > TypeError: unsupported operand type(s) for +: 'double_arrtype' and 'int' > In [5]:print type(scalar) > > > OK, this has been discussed up-and-down on this mailing > list. At the moment I don't know how this > will affect my Numeric(3) life, so I will wait > until the other operations are implemented > and see if there are any consequences > for my programs at all ... ;-) > > A couple of things seem to be a bit unusual, e.g.: > > In [9]:x=arange(10.0) > In [10]:x > Out[10]:array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 'd') > In [11]:x.argmax() > --------------------------------------------------------------------------- > exceptions.TypeError Traceback (most > recent call last) > > /home/scratch/abaecker/INSTALL_SOFT/TEST_NUMERIC3/ > > TypeError: function takes exactly 1 argument (0 given) > In [12]:x.argmax(None) > Out[12]:9 > In [13]:t=x.argmax(None) > In [14]:type(t) > Out[14]: > > So argmax also returns an array type, but I would > have really thought that this is an integer index?! > > Also a couple of attributes (E.g. x.sum() are not yet > implemented) or lack documention (I know this comes last ;-). > > Best, > > Arnd > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Sat Mar 26 22:43:34 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Mar 26 22:43:34 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <42464639.6050207@ims.u-tokyo.ac.jp> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> Message-ID: <424655B1.4000503@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > I'm a bit confused about where Numeric3 is heading. Originally, the > idea was that Numeric3 should go in Python core. Are we still aiming > for that? More recently, another goal was to integrate Numeric and > numarray, which I fully support. I would prefer to re-integrate the numarray people "back" into the Numeric community, by adding the features to Numeric that they need. > However, from looking at the Numeric3 source code, and from looking at > the errors found by Arnd These errors are all due to the fact that umath functions are not available yet. Please don't judge too hastily. At this point, I'm really just looking for people who are going to pitch in and help with coding or to offer suggestions about where the code base should go. I have done this completely in the open as best I can. I have no other "hidden" plans. All of the scalar types will get their math from the umath functions too. So, of course they don't work either yet (except for those few that inherit from the basic Python types). > it looks like that Numeric3 is a complete rewrite of Numeric. I really don't understand how you can make this claim. All I've done is add significantly to Numeric's code base and re-inserted some code-generators (code-generators are critical for maintainability --- in answer to your previous question). > This goes well beyond integrating Numeric and numarray. Now I realize > that sometimes it is necessary to rethink and rewrite some code. > However, Numerical Python has served us very well over the years, and > I'm worried that rewriting the whole thing will break more than it > fixes. So where is Numeric3 going? You really need to justify this idea you are creating that I am "re-writing the whole thing" I disagree wholeheartedly with your assessment. Certain changes were made to accomodate numarray features, new types were added, indexing was enhanced. When possible, I've deferred to Numeric's behavior. But, you can't bring two groups back to a common array type by not changing the array type at all. Numeric3 is going wherever the community takes it. It is completely open. Going into the Python core is going to have to wait until things settle down in our own community. It's not totally abandoned just put on hold (again). That was the decision thought best by Guido, Perry, myself, and Paul at our lunch together. We thought it best to instead suggest interoperability strategies. Numeric3 is NOT a re-write of Numeric. I've re-used a great deal of the Numeric code. The infrastructure is exactly the same. I started the project from the Numeric code base. I've just added a great deal more following the discussions on this list. At worst, I've just expanded quite a few things and moved a lot into C. Little incompatibilties are just a sign of alpha code not a "direction." Some have thought that zeros returning ints was a bug in the past which is why the change occured. But, it is an easy change, and I tend to agree that it should stay returning the Numeric default of Intp. >> >> In [1]:from ndarray import * >> In [2]:x=arange(10.0) >> In [3]:scalar=x[3] >> In [4]:print scalar+1 >> --------------------------------------------------------------------------- >> >> exceptions.TypeError Traceback (most recent call last) >> TypeError: unsupported operand type(s) for +: 'double_arrtype' and 'int' >> In [5]:print type(scalar) >> > All scalar types by default will do math operations as rank-0 arrays (which haven't been brought back in yet). That is why the error. I believe that right now I am inheriting first from the Generic Scalar Type instead of the double type (so it will use Scalar Arithmetic first). Later, special methods for each scalar type could be used (for optimization). >> A couple of things seem to be a bit unusual, e.g.: >> >> In [9]:x=arange(10.0) >> In [10]:x >> Out[10]:array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 'd') >> In [11]:x.argmax() >> --------------------------------------------------------------------------- >> >> exceptions.TypeError Traceback (most >> recent call last) >> This is an error. I'll look into it. Notice all method calls that take an axis argument are defaulting to None (which means ravel the whole array). The function calls won't change for backward compatibility. >> /home/scratch/abaecker/INSTALL_SOFT/TEST_NUMERIC3/ >> >> TypeError: function takes exactly 1 argument (0 given) >> In [12]:x.argmax(None) >> Out[12]:9 >> In [13]:t=x.argmax(None) >> In [14]:type(t) >> Out[14]: >> >> So argmax also returns an array type, but I would >> have really thought that this is an integer index?! > Remember, array operations always return array scalars! >> >> Also a couple of attributes (E.g. x.sum() are not yet >> implemented) or lack documention (I know this comes last ;-). > Hmm. x.sum should be there. Believe me, it was a bit of a pain to convert all the Python code to C. Have to check this. -Travis From oliphant at ee.byu.edu Sat Mar 26 22:51:25 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Mar 26 22:51:25 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <424655B1.4000503@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> <424655B1.4000503@ee.byu.edu> Message-ID: <424657A4.2050203@ee.byu.edu> >>> >>> In [1]:from ndarray import * >>> In [2]:x=arange(10.0) >>> In [3]:scalar=x[3] >>> In [4]:print scalar+1 >>> --------------------------------------------------------------------------- >>> >>> exceptions.TypeError Traceback (most recent call last) >>> TypeError: unsupported operand type(s) for +: 'double_arrtype' and >>> 'int' >>> In [5]:print type(scalar) >>> >> >> > > All scalar types by default will do math operations as rank-0 arrays > (which haven't been brought back in yet). That is why the error. I > believe that right now I am inheriting first from the Generic Scalar > Type instead of the double type (so it will use Scalar Arithmetic > first). Later, > special methods for each scalar type could be used (for optimization). To clarify, it is the umath operations that have not been 'brought back in', or activated, yet. The rank-0 arrays are of course there as they have always been. -Travis From oliphant at ee.byu.edu Sat Mar 26 23:23:30 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Mar 26 23:23:30 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050326065814.27019.qmail@web50204.mail.yahoo.com> References: <20050326065814.27019.qmail@web50204.mail.yahoo.com> Message-ID: <42465F29.208@ee.byu.edu> Scott Gilbert wrote: >Adding metadata at the buffer object level causes problems for "view" >semantics. Let's say that everyone agreed what "itemsize" and "itemtype" >meant: > > real_view = complex_array.real > >The real_view will have to use a new buffer since they can't share the old >one. The buffer used in complex_array would have a typecode like >ComplexDouble and an itemsize of 16. The buffer in real_view would need a >typecode of Double and an itemsize of 8. If metadata is stored with the >buffer object, it can't be the same buffer object in both places. > > This is where having "strides" metadata becomes very useful. Then, real_view would not have to be a copy at all, unless the coder didn't want to deal with it. >Another case would be treating a 512x512 image of 4 byte pixels as a >512x512x4 image of 1 byte RGBA elements. Or even coercing from Signed to >Unsigned. > > Why not? A different bytes object could point to the same memory but the different metadata would say "treat this data differently" > >The bytes object as proposed does allow new views to be created from other >bytes objects (sharing the same memory underneath), and these views could >each have separate metadata, but then you wouldn't be able to have arrays >that used other types of buffers. > > I don't see why not. Your argument is not clear to me. >The bytes object shouldn't create views from arbitrary other buffer objects >because it can't rely on the general semantics of the PyBufferProcs >interface. The foreign buffer object might realloc and invalidate the >pointer for instance... The current Python "buffer" builtin does this, and >the results are bad. So creating a bytes object as a view on the mmap >object doesn't work in the general case. > > This is a problem with the objects that expose the buffer interface. The C-API could be more clear that you should not "reallocate" memory if another array is referencing you. See the arrayobject's resize method for an example of how Numeric does not allow reallocation of the memory space if another object is referencing it. I suppose you could keep track separately in the object of when another object is using your memory, but the REFCOUNT works for this also (though it is not so specific, and so you would miss cases where you "could" reallocate but this is rarely used in arrayobject's anyway). Another idea is to fix the bytes object so it always regrabs the pointer to memory from the object instead of relying on the held pointer in view situations. >Still, I think keeping the metadata at a different level, and having the >bytes object just be the Python way to spell a call to C's malloc will >avoid a lot of problems. Read below for how I think the metadata stuff >could be handled. > > Metadata is such a light-weight "interface-based" solution. It could be as simple as attributes on the bytes object. I don't see why you resist it so much. Imaging defining a jpeg file by a single bytes object with a simple EXIF header metadata string. If the bytes object allowed the "bearmin" attributes you are describing then that would be one way to describe an array that any third-party application could support as much as they wanted. In short, I think we are thinking along similar lines. It really comes down to being accepted by everybody as a standard. One of the things, I want for Numeric3 is to be able to create an array from anything that exports the buffer interface. The problem, of course is with badly-written exentsion modules that rudely reallocate their memory even after they've shared it with someone else. Yes, Python could be improved so that this were handled better, but it does work right now, as long as buffer interface exporters play nice. This is the way to advertise the buffer interface (and buffer object). Rather than vague references to buffer objects being a "bad-design" and a blight we should say: objects wanting to export the buffer interface currently have restrictions on their ability to reallocate their buffers. >> >> > >I think being able to traffic in N-Dimensional arrays without requiring >linking against the libraries is a good thing. > > Several of us are just catching on to the idea. Thanks for your patience. >I think the proposal is still relevant today, but I might revise it a bit >as follows. A bear minimum N-Dimensional array for interchanging data >across libraries could get by with following attributes: > > # Create a simple record type for storing attributes > class BearMin: pass > bm = BearMin() > > # Set the attributes sufficient to describe a simple ndarray > bm.buffer = > bm.shape = > bm.itemtype = > >The bm.buffer and bm.shape attributes are pretty obvious. I would suggest >that the bm.itemtype borrow it's typecodes from the Python struct module, >but anything that everyone agreed on would work. > > I've actually tried to do this if you'll notice, and I'm sure I'll take some heat for that decision at some point too. The only difference currently I think are long types (q and Q), I could be easily persuaded to change thes typecodes too. I agree that the typecode characters are very simple and useful for interchanging information about type. That is a big reason why I am not "abandoning them" >Those attributes are sufficient for someone to *produce* an N-Dimensional >array that could be understood by many libraries. Someone who *consumes* >the data would need to know a few more: > > bm.offset = > > I don't like this offset parameter. Why doesn't the buffer just start where it needs too? > bm.strides = > > Things are moving this direction (notice that Numeric3 has attributes much like you describe), except we use the word .data (instead of .buffer) It would be an easy thing to return an ArrayObject from an object that exposes those attributes (and a good idea). So, I pretty much agree with what you are saying. I just don't see how this is at odds with attaching metadata to a bytes object. We could start supporting this convention today, and also handle bytes objects with metadata in the future. >There is another really valid argument for using the strategy above to >describe metadata instead of wedging it into the bytes object: The Numeric >community could agree on the metadata attributes and start using it >*today*. > > Yes, but this does not mean we should not encourage the addition of metadata to bytes objects (as this has larger uses than just Numeric arrays). It is not a difficult thing to support both concepts. >If you wait until someone commits the bytes object into the core, it won't >be generally available until Python version 2.5 at the earliest, and any >libraries that depended on using bytes stored metadata would not work with >older versions of Python. > > > I think we should just start advertising now, that with the new methods of numarray and Numeric3, extension writers can right now deal with Numeric arrays (and anything else that exposes the same interface) very easily by using attribute access (or the buffer protocol together with attribute access). They can do this because Numeric arrays (and I suspect numarrays as well) use the buffer interface responsibly (we could start a political campaign encouraging responsible buffer usage everywhere :-) ). -Travis From mdehoon at ims.u-tokyo.ac.jp Sun Mar 27 00:45:33 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 27 00:45:33 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <424655B1.4000503@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> <424655B1.4000503@ee.byu.edu> Message-ID: <4246732D.2080908@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: >> I'm a bit confused about where Numeric3 is heading. Originally, the idea >> was that Numeric3 should go in Python core. Are we still aiming for that? >> More recently, another goal was to integrate Numeric and numarray, which I >> fully support. > > I would prefer to re-integrate the numarray people "back" into the Numeric > community, by adding the features to Numeric that they need. > >> However, from looking at the Numeric3 source code, and from looking at the >> errors found by Arnd > > These errors are all due to the fact that umath functions are not available > yet. Please don't judge too hastily. At this point, I'm OK. Fair enough. Maybe I did judge too hastily. From your comments, it looks like we agree on where Numeric3 is going. >> It seems that this message is due to the section labeled "#Generate code" >> in setup.py, where python is being run with a call to os.system. What does >> this do? Is there a need to generate code automatically in setup.py, rather >> than include the generated code with the Numeric3 source code? > > (code-generators are critical for maintainability --- in answer to your > previous question). As far as I can tell from their setup.py, neither Numerical Python nor numarray currently does code generation on the fly from setup.py. (This was one of the reasons that I started to worry if Numeric3 is more than Numerical Python + numarray). --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sun Mar 27 02:29:11 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 27 02:29:11 2005 Subject: [Numpy-discussion] Numeric3 CVS compiles now In-Reply-To: <4244AFB6.40601@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> Message-ID: <42468BB3.4060503@ims.u-tokyo.ac.jp> I have made one change to setup.py and Include/ndarray/arrayobject.h to make the definition of intp, uintp consistent with what is in pyport.h (which gets #included via Python.h). This shouldn't change anything, but since this affects the compilation almost everywhere, I thought I should let everybody know. If it causes any problems, feel free to change it back. --Michiel. Travis Oliphant wrote: > > To all who were waiting: > > I've finished adding the methods to the array object so that Numeric3 in > CVS now compiles (at least for me on Linux). > > I will be away for at least a day so it is a good time to play... > -Travis > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From dd55 at cornell.edu Sun Mar 27 15:22:08 2005 From: dd55 at cornell.edu (Darren Dale) Date: Sun Mar 27 15:22:08 2005 Subject: [Numpy-discussion] searching a list of arrays Message-ID: <200503271820.51307.dd55@cornell.edu> Hi, I have a list of numeric-23.8 arrays: a = [array([0,1]), array([0,1]), array([1,0]), array([1,0])] b = [array([0,1,0]), array([0,1,0]), array([1,0,0]), array([1,0,0])] and I want to make a new list out of b: c = [array([0,1,2]), array([1,0,2])] where the last index in each array is the result of b.count([0,1,0]) # or [1,0,0] The problem is that the result of b.count(array([1,0,0])) is 4, not 2, and b.remove(array([1,0,0])) indescriminantly removes arrays from the list. a.count and a.remove work the way I expected. Does anyone know why 1x2 arrays work, but 1x3 or larger arrays do not? Thanks, Darren From xscottg at yahoo.com Sun Mar 27 18:08:13 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Sun Mar 27 18:08:13 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42465F29.208@ee.byu.edu> Message-ID: <20050328020731.85506.qmail@web50202.mail.yahoo.com> Hi Travis. I'm quite possibly misunderstanding how you want to incorporate the metadata into the bytes object, so I'm going to try and restate both of our positions from the point of view of a third party who will be using ndarrays. Let's take Chris Barker's point of view with regards to wxPython... We all roughly agree which pieces of metadata are needed for arrays. There are a few persnicketies, and the names could vary. I'll use your given names: .data (could be .buffer or .__array_buffer__) .shape (could be .dimensions or .__array_shape__) .strides (maybe .__array_strides__) .itemtype (coulb be .typecode or .__array_itemtype__) Several other attributes can be derived (calculated) from those (isfortran, iscontiguous, etc...), and we might need a few more, but we'll ignore those for now. In my proposal, Chris would write a routine like such: def version_one(a): data = a.data shape = a.shape strides = a.strides itemtype = a.itemtype # Cool code goes here I believe you are suggesting Chris would write: def version_two(a): data = a shape = a.shape strides = a.strides itemtype = a.itemtype # Cool code goes here Of if you have the .meta dictionary, Chris would write: def version_three(a): data = a shape = a.meta["shape"] strides = a.meta["strides"] itemtype = a.meta["itemtype"] # Cool code goes here Of course Chris could save one line of code with: def version_two_point_one(data): shape = a.shape strides = a.strides itemtype = a.itemtype # Cool code goes here If I'm mistaken about your proposal, please let me know. However if I'm not mistaken, I think there are limitations with version_two and version_three. First, most of the existing buffer objects do not allow attributes to be added to them. With version_one, Chris could have data of type array.array, Numarray.memory, mmap.mmap, __builtins__.str, the new __builtins__.bytes type as well as any other PyBufferProcs supporting object (and possibly sequence objects like __builtins__.list). With version_two and version_three, something more is required. In a few cases like the __builtins__.str type you could add the necessary attributes by inheritance. In other cases like the mmap.mmap, you could wrap it with a __builtins__.bytes object. (That's assuming that __builtins__.bytes knows how to wrap mmap.mmap objects...) However, other PyBufferProcs objects like array.array will never allow themselves to be wrapped by a __builtins__.bytes since they realloc their memory and violate the promises that the __builtins__.bytes object makes. I think you disagree with me on this part, so more on that later in this message. For now I'll take your side, let's pretend that all PyBufferProcs supporting objects could be made well enough behaved to wrap up in a __builtins__.bytes object. Do you really want to require that only __builtins__.bytes objects are suitable for data interchange across libraries? This isn't explicitly stated by you, but since the __builtins__.bytes object is the only common PyBufferProcs supporting object that could define the metadata attributes, it would be the rule in practice. I think you're losing flexibility if you do it this way. From Chris's point of view it's basically the same amount of code for all three versions above. Another consideration that might sway you is that the existing N-Dimensional array packages could easily add attribute methods to implement the interface, and they could do this without changing any part of their implementation. The .data attribute when requested would call a "get method" that returns a buffer. This allows user defined objects which do not implement the PyBufferProcs protocol themselves, but which contain a buffer inside of them to participate in the "ndarray protocol". Both version_two and version_three do not allow this - the object being passed must *be* a buffer. > > > The bytes object shouldn't create views from arbitrary other buffer > > objects because it can't rely on the general semantics of the > > PyBufferProcs interface. The foreign buffer object might realloc > > and invalidate the pointer for instance... The current Python > > "buffer" builtin does this, and the results are bad. So creating > > a bytes object as a view on the mmap object doesn't work in the > > general case. > > > > > This is a problem with the objects that expose the buffer interface. > The C-API could be more clear that you should not "reallocate" memory if > another array is referencing you. See the arrayobject's resize method > for an example of how Numeric does not allow reallocation of the memory > space if another object is referencing it. I suppose you could keep > track separately in the object of when another object is using your > memory, but the REFCOUNT works for this also (though it is not so > specific, and so you would miss cases where you "could" reallocate but > this is rarely used in arrayobject's anyway). > The reference count on the PyObject pointer is different than the number of users using the memory. In Python you could have: import array a = array.array('d', [1]) b = a The reference count on the array.array object is 2, but there are 0 users working with the memory. Given the existing semantics of the array.array object, it really should be allowed to resize in this case. Storing the object in a dictionary would be another common situation that would increase it's refcount but shouldn't lock down the memory. A good solution to this problem was presented with PEP-298, but no progress seems to have been made on it. http://www.python.org/peps/pep-0298.html To my memory, PEP-298 was in response to PEP-296. I proposed PEP-296 to create a good working buffer (bytes) object that avoided the problems of the other buffer objects. Several folks wanted to fix the other (non bytes) objects where possible, and PEP-298 was the result. A strategy like this could be used to make array.array safe outside of the GIL. Bummer that it didn't get implemented. > > Another idea is to fix the bytes object so it always regrabs the pointer > to memory from the object instead of relying on the held pointer in view > situations. > A while back, I submitted a patch [552438] like this to fix the __builtins__.buffer object: http://sourceforge.net/tracker/index.php?func=detail&aid=552438&group_id=5470&atid=305470 It was ignored for a bit, and during the quiet time I came to realize that even if the __builtins__.buffer object was fixed, it still wouldn't meet my needs. So I proposed the bytes object, and this patch fell on the floor (the __builtins__.buffer object is still broken). The downside to this approach is that it only solves the problem for code running with posession of the GIL. It does solve the stale pointer problem that is exposed by the __builtins__.buffer object, but if you release the GIL in C code, all bets are off - the pointer can become stale again. The promises that bytes tries to make about the lifetime of the pointer can only be guaranteed by the object itself. Just because bytes could wrap the other object and grab the latest pointer when you need it doesn't mean that the other object won't invalidate the pointer a split second later when the GIL is released. It is mere chance that the mmap object is well behaved enough. And even the mmap object can release it's memory if someone closes the object - again leading to a stale pointer. > > Metadata is such a light-weight "interface-based" solution. It could be > as simple as attributes on the bytes object. I don't see why you resist > it so much. Imaging defining a jpeg file by a single bytes object > with a simple EXIF header metadata string. If the bytes object > allowed the "bearmin" attributes you are describing then that would be > one way to describe an array that any third-party application could > support as much as they wanted. > Please don't think I'm offering you resistance. I'm only trying to point out some things that I think you might have overlooked. Lots of people ignore my suggestions all the time. You'd be in good company if you did too, and I wouldn't even hold a grudge against you. Now let me be argumentative. :-) I've listed what I consider the disadvantages above, but I guess I don't see any advantages of putting the metadata on the bytes object. In what way is: jpeg = bytes() jpeg.exif = better than: class record: pass jpeg = record() jpeg.data = jpeg.exif = The only advantage I see if that yours is a little shorter, but in any real application, you were probably going to define an object of some sort to add all the methods needed. And as I showed up in version_one, version_two, and version_three above, it's basically the same number of lines for the consumer of the data. There is nothing stopping a PyBufferProcs object like bytes from supporting version_one above: jpeg = bytes() jpeg.data = jpeg jpeg.exif = But non PyBufferProcs objects can't play with version_two or version_three. Incidently, being able to add attributes to bytes means that it needs to play nicely with the garbage collection system. At that point, bytes is basically a container for arbitrary Python objects. That's additional implementation headache. > > It really comes down to being accepted by everybody as a standard. > This I completely agree with. I think the community will roll with whatever you and Perry come to agree on. Even the array.array object in the core could be made to work either way. If the decision you come up with makes it easy to add the interface to existing array objects then everyone would probably adopt it and it would become a standard. This is the main reason I like the double underscore __*meta*__ names. It matches the similar pattern all over Python, and existing array packages could add those without interfering with their existing implementation: class Numarray: # # lots of array implementing code # # Down here at the end, add the "well-known" interface # (I haven't embraced the @property decorator syntax yet.) def __get_shape(self): return self._shape __array_shape__ = property(__get_shape) def __get_data(self): # Note that they use a different name internally return self._buffer __array_data__ = property(__get_data) def __get_itemtype(self): # Perform an on the fly conversion from the class # hierarchy type to the struct module typecode that # closest matches return self._type._to_typecode() __array_itemtype__ = property(__get_itemtype) Changing class Numarray to a PyBufferProcs supporting object would be harder. The C version for Numeric3 arrays would be similar, and there is no wasted space on a per instance basis in either case. > > One of the things, I want for Numeric3 is to be able to create an array > from anything that exports the buffer interface. The problem, of course > is with badly-written exentsion modules that rudely reallocate their > memory even after they've shared it with someone else. Yes, Python > could be improved so that this were handled better, but it does work > right now, as long as buffer interface exporters play nice. > I think the behavior of the array.array objects are pretty defensible. It is useful that you can extend those arrays to new sizes. For all I know, it was written that way before there was a GIL. I think PEP-298 is a good way to make the dynamic buffers more GIL friendly. > > This is the way to advertise the buffer interface (and buffer > object). Rather than vague references to buffer objects being a > "bad-design" and a blight we should say: objects wanting to export the > buffer interface currently have restrictions on their ability to > reallocate their buffers. > I agree. The "bad-design" type of comments about "the buffer problem" on python-dev have always annoyed me. It's not that hard of a problem to solve technically. > > > I would suggest that the bm.itemtype borrow it's typecodes from > > the Python struct module, but anything that everyone agreed on > > would work. > > I've actually tried to do this if you'll notice, and I'm sure I'll take > some heat for that decision at some point too. The only difference > currently I think are long types (q and Q), I could be easily persuaded > to change thes typecodes too. I agree that the typecode characters are > very simple and useful for interchanging information about type. That > is a big reason why I am not "abandoning them" > The real advantage to the struct module typecodes comes in two forms. First and most important is that it's already documented and in place - a defacto standard. Second is that Python script code could use those typecodes directly with the struct module to pull apart pieces of data. The disadvantage is that a few new typecodes would be needed... I would even go as far as to recommend their '>' '<' prefix codes for big-endian and little-endian for just this reason... > > I don't like this offset parameter. Why doesn't the buffer just start > where it needs too? > Well if you stick with using the bytes object, you could probably get away with this. Effectively, the offset is encoded in the bytes object. At this point, I don't know if anything I said above was pursuasive, but I think there are other cases where you would really want this. Does anyone plan to support tightly packed (8 bits to a byte) bitmask arrays? Object arrays could be implemented on top of shared __builtins__.list objects, and there is no easy way to create offset views into lists. > > It would be an easy thing to return an ArrayObject from an object that > exposes those attributes (and a good idea). > This would be wonderful. Third party libraries could produce data that is sufficiently ndarray like without hassle, and users of that library could promote it to a Numeric3 array with no headaches. > > So, I pretty much agree with what you are saying. I just don't see how > this is at odds with attaching metadata to a bytes object. > > We could start supporting this convention today, and also handle bytes > objects with metadata in the future. > Unfortunately, I don't think any buffer objects exist today which have the ability to dynamically add attributes. If my arguments above are unpursuasive, I believe bytes (once it is written) will be the only buffer object with this support. By the way, it looks like the "bytes" concept has been revisited recently. there is a new PEP dated Aug 11, 2004: http://www.python.org/peps/pep-0332.html > > > > There is another really valid argument for using the strategy above to > > describe metadata instead of wedging it into the bytes object: The > > Numeric community could agree on the metadata attributes and start > > using it *today*. > > I think we should just start advertising now, that with the new methods > of numarray and Numeric3, extension writers can right now deal with > Numeric arrays (and anything else that exposes the same interface) very > easily by using attribute access (or the buffer protocol together with > attribute access). They can do this because Numeric arrays (and I > suspect numarrays as well) use the buffer interface responsibly (we > could start a political campaign encouraging responsible buffer usage > everywhere :-) ). > I can just imagine the horrible mascot that would be involved in the PR campaign. Thanks for your attention and patience with me on this. I really appreciate the work you are doing. I wish I could explain my understanding of things more clearly. Cheers, -Scott From mdehoon at ims.u-tokyo.ac.jp Sun Mar 27 19:01:13 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 27 19:01:13 2005 Subject: [Numpy-discussion] searching a list of arrays In-Reply-To: <200503271820.51307.dd55@cornell.edu> References: <200503271820.51307.dd55@cornell.edu> Message-ID: <42477475.4040406@ims.u-tokyo.ac.jp> This is because of how "==" is defined for arrays. For lists, list1==list2 if all elements are the same; a boolean value is returned: >>> x = [0,1,0] >>> x==[0,1,0] True >>> x==[1,0,0] False For arrays, "==" does a element-wise comparison: >>> from Numeric import * >>> x = array([0,1,0]) >>> x==array([0,1,0]) array([1, 1, 1]) >>> x==array([1,0,0]) array([0, 0, 1]) >>> Now, when you count how often array([0,1,0]) appears in b, actually you evaluate element==array([0,1,0]) for each element in b, and count how often you get a True, with every array other than array([0,0,0]) regarded as True. For list a, this happens to work because array([0,1]) and array([1,0]) have no elements in common. But in this case: >>> a = [array([0,0]),array([0,0]),array([0,1]),array([0,1])] >>> a [array([0, 0]), array([0, 0]), array([0, 1]), array([0, 1])] >>> a.count(array([0,0])) 4 you also get the non-intuitive answer 4. An easy way to get this to work is to use lists instead of arrays: >>> b = [[0,1,0], [0,1,0], [1,0,0], [1,0,0]] >>> b.count([0,1,0]) 2 But I don't know if this solution is suitable for your application. --Michiel. Darren Dale wrote: > Hi, > > I have a list of numeric-23.8 arrays: > > a = [array([0,1]), > array([0,1]), > array([1,0]), > array([1,0])] > > b = [array([0,1,0]), > array([0,1,0]), > array([1,0,0]), > array([1,0,0])] > > and I want to make a new list out of b: > > c = [array([0,1,2]), > array([1,0,2])] > > where the last index in each array is the result of > > b.count([0,1,0]) # or [1,0,0] > > > The problem is that the result of b.count(array([1,0,0])) is 4, not 2, and > b.remove(array([1,0,0])) indescriminantly removes arrays from the list. > a.count and a.remove work the way I expected. > > Does anyone know why 1x2 arrays work, but 1x3 or larger arrays do not? > > Thanks, > Darren > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From stephen.walton at csun.edu Sun Mar 27 21:00:15 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Sun Mar 27 21:00:15 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <424612E1.2020908@sympatico.ca> References: <4245C512.1030701@csun.edu> <424612E1.2020908@sympatico.ca> Message-ID: <42478F28.7020501@csun.edu> Colin J. Williams wrote: > The following seems to show that the default data type for the > numarray elements is Int32: It is, and I thought my original message said that. I was talking about Numeric3, where the default type for zeros() is 'd' (Float64 in numarray parlance). From pearu at scipy.org Mon Mar 28 01:24:09 2005 From: pearu at scipy.org (Pearu Peterson) Date: Mon Mar 28 01:24:09 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42464443.8050402@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: On Sun, 27 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Having a separate scipy_distutils that fixes some bugs in Python's distutils > is a design mistake in SciPy that we should not repeat in Numeric3. Not that > I don't think the code in scipy_distutils is not useful -- I think it would > be very useful. But the fact that it is not integrated with the existing > Python distutils makes me wonder if this package really has been thought out > that well. I don't think that part of scipy_distutils design was to fix Python's distutils bugs. As we found a bug, its fix was added to scipy_distutils as well as reported to distutils bug tracker. The main reason for adding bug fixes to scipy_distutils was to continue the work with scipy instead of waiting for the next distutils release (i.e. Python release), nor we could expect that SciPy users would use CVS version of Python's distutils. Also, SciPy was meant to support Python 2.1 and up, so the bug fixes remained relevant even when the bugs were fixed in Python 2.2 or 2.3 distutils. So much of history.. > As far as I can tell, scipy_distutils now fulfills four functions: > 1) Bug fixes for Python's distutils for older Python versions. As Numeric3 > will require Python 2.3 or up, these are no longer relevant. > 2) Bug fixes for current Python's distutils. These should be integrated with > Python's distutils. Writing your own package instead of contributing to > Python gives you bad karma. > 3) Fortran support. Very useful, and I'd like to see them in Python's > distutils. Another option would be to put this in SciPy.fortran or something > similar. But since Python's distutils already has a language= option for C++ > and Objective-C, the cleanest way would be to add this to Python's distutils > and enable language="fortran". > 4) Stuff particular to SciPy, for example finding Atlas/Lapack/Blas > libraries. These we can decide on a case-by-case basis if it's useful for > Numeric3. Plus I would add the scipy_distutils ability to build sources on-fly feature (build_src command). That's a very fundamental feature useful whenever swig or f2py is used, or when building sources from templates or dynamically during a build process. Btw, I have started scipy_core clean up. The plan is to create the following package tree under Numeric3 source tree: scipy.distutils - contains cpuinfo, exec_command, system_info, etc scipy.distutils.fcompiler - contains Fortran compiler support scipy.distutils.command - contains build_src and config_compiler commands plus few enhancements to build_ext, build_clib, etc commands scipy.base - useful modules from scipy_base scipy.testing - enhancements to unittest module, actually current scipy_test contains one useful module (testing.py) that could also go under scipy.base and so getting rid of scipy.testing scipy.weave - scipy.f2py - not sure yet how to incorporate f2py2e or weave sources here. As a first instance people are assumed to download them to Numeric3/scipy/ directory but in future their sources could be added to Numeric3 repository. For Numeric3 f2py and weave are optional. scipy.lib.lapack - wrappers to Atlas/Lapack libraries, by default f2c generated wrappers are used as in current Numeric. For backwards compatibility, there will be Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ and Lib/{LinearAlgebra,..}.py under Numeric3 that will use modules from scipy. Pearu From oliphant at ee.byu.edu Mon Mar 28 01:32:12 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 01:32:12 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328020731.85506.qmail@web50202.mail.yahoo.com> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> Message-ID: <4247CEC9.1030903@ee.byu.edu> Scott, Thank you for your detailed explanations. This is starting to make more sense to me. It is obvious that you understand what we are trying to do, and I pretty much agree with you in how you think it should be done. I think you do a great job of explaining things. I agree we should come up with a set of names for the interface to arrayobjects. I'm even convinced that offset should be an optional part of the interface (implied 0 if it's not there). >However, other PyBufferProcs objects like array.array will never allow >themselves to be wrapped by a __builtins__.bytes since they realloc their >memory and violate the promises that the __builtins__.bytes object makes. >I think you disagree with me on this part, so more on that later in this >message. > > I think I agree with you: array.array shouldn't allow itself to by wrapped by a bytes object because it reallocates without tracking what it's shared. >Another consideration that might sway you is that the existing >N-Dimensional array packages could easily add attribute methods to >implement the interface, and they could do this without changing any part >of their implementation. The .data attribute when requested would call a >"get method" that returns a buffer. This allows user defined objects which >do not implement the PyBufferProcs protocol themselves, but which contain a >buffer inside of them to participate in the "ndarray protocol". Both >version_two and version_three do not allow this - the object being passed >must *be* a buffer. > > I am not at all against the ndarray protocol you describe. In fact, I'm quite a fan. I think we should start doing it, now. I was just wondering if adding attributes to the bytes object was useful in any case. Your arguments have persuaded me that it is not worth the trouble. Underscore names are a good idea. We already have __array__ which is a protocol for returning an array object: Currently Numeric3 already implements this protocol minus name differences. So, let's come up with names. I'm happy with __array__XXXXX type names as it does dovetail nicely with the already established __array__ name which Numeric3 expects will return an actual array object. As I've already said, it would be easy to check for the more specialized attributes at object creation time to boot-strap an array from an arbitrary object. In addition, to what you state. Why not also have the protocol look at the object itself to expose the PyBufferProcs protocol if it doesn't expose a .__array__data method? >The reference count on the PyObject pointer is different than the number of >users using the memory. In Python you could have: > > Your examples explaining this are good, but I did realize this, that's why I stated that the check in arr.resize is overkill and will disallow situations that could actually work. Do you think the Numeric3 arrayobject should have a "memory pointer count" added to the PyArrayObject structure? >Please don't think I'm offering you resistance. I'm only trying to point >out some things that I think you might have overlooked. Lots of people >ignore my suggestions all the time. You'd be in good company if you did >too, and I wouldn't even hold a grudge against you. > > I very much appreciate the pointers. I had overlooked some things and I believe your suggestions are better. > class Numarray: > # > # lots of array implementing code > # > > # Down here at the end, add the "well-known" interface > # (I haven't embraced the @property decorator syntax yet.) > > def __get_shape(self): > return self._shape > __array_shape__ = property(__get_shape) > > def __get_data(self): > # Note that they use a different name internally > return self._buffer > __array_data__ = property(__get_data) > > def __get_itemtype(self): > # Perform an on the fly conversion from the class > # hierarchy type to the struct module typecode that > # closest matches > return self._type._to_typecode() > __array_itemtype__ = property(__get_itemtype) > > >Changing class Numarray to a PyBufferProcs supporting object would be >harder. > > I think they just did this, though... >The C version for Numeric3 arrays would be similar, and there is no wasted >space on a per instance basis in either case. > > Doing this in C would be extremely easy a simple binding of a name to an already available function (and disallowing any set attribute). >The real advantage to the struct module typecodes comes in two forms. >First and most important is that it's already documented and in place - a >defacto standard. Second is that Python script code could use those >typecodes directly with the struct module to pull apart pieces of data. >The disadvantage is that a few new typecodes would be needed... > > >I would even go as far as to recommend their '>' '<' prefix codes for >big-endian and little-endian for just this reason... > > Hmm.. an interesting idea. I don't know if I agree or not. >This would be wonderful. Third party libraries could produce data that is >sufficiently ndarray like without hassle, and users of that library could >promote it to a Numeric3 array with no headaches. > > >By the way, it looks like the "bytes" concept has been revisited recently. >there is a new PEP dated Aug 11, 2004: > > http://www.python.org/peps/pep-0332.html > > Thanks for the pointer. >Thanks for your attention and patience with me on this. I really >appreciate the work you are doing. I wish I could explain my understanding >of things more clearly. > > As I said before, you do a really good job of explaining. I'm pretty much on your side now :-) Let's go ahead and get some __array__XXXXX attribute names decided on. I'll put them in the Numeric3 code base (I could also put them in old Numeric and make a 24.0 release as well --- I need to do that because of a horrible bug in the new empty method: Numeric.empty(, 'O'). -Travis From oliphant at ee.byu.edu Mon Mar 28 01:39:02 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 01:39:02 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <4247D072.3090406@ee.byu.edu> > Plus I would add the scipy_distutils ability to build sources on-fly > feature (build_src command). That's a very fundamental feature useful > whenever swig or f2py is used, or when building sources from templates > or dynamically during a build process. I'd like to use this feature in Numeric3 (which has code-generation). > > Btw, I have started scipy_core clean up. The plan is to create the > following package tree under Numeric3 source tree: This is great news. I'm thrilled to have Pearu's help in doing this. He understands a lot of these issues very well. I'm sure he will be open to suggestions. > > scipy.distutils - contains cpuinfo, exec_command, system_info, etc > scipy.distutils.fcompiler - contains Fortran compiler support > scipy.distutils.command - contains build_src and config_compiler > commands plus few enhancements to build_ext, build_clib, etc > commands > scipy.base - useful modules from scipy_base > scipy.testing - enhancements to unittest module, actually > current scipy_test contains one useful module (testing.py) that > could also go under scipy.base and so getting rid of scipy.testing > scipy.weave - > scipy.f2py - not sure yet how to incorporate f2py2e or weave sources > here. As a first instance people are assumed to download them to > Numeric3/scipy/ directory but in future their sources could be added > to Numeric3 repository. For Numeric3 f2py and weave are optional. > scipy.lib.lapack - wrappers to Atlas/Lapack libraries, by default > f2c generated wrappers are used as in current Numeric. > > For backwards compatibility, there will be > Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ > and > Lib/{LinearAlgebra,..}.py > under Numeric3 that will use modules from scipy. This looks like a good break down. Where will the ndarray object and the ufunc code go in this breakdown? In scipy.base? -Travis From konrad.hinsen at laposte.net Mon Mar 28 02:25:04 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Mar 28 02:25:04 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <4244963B.5010103@csun.edu> References: <20050323105807.59603.qmail@web50208.mail.yahoo.com> <4241C781.8080001@ee.byu.edu> <4244963B.5010103@csun.edu> Message-ID: <79bf1f715a3fa3de61ca8ebf45cd6c0f@laposte.net> On 25.03.2005, at 23:52, Stephen Walton wrote: > where str is a string is not allowed because strings are immutable. > But if I type "b=7" followed by "b=3", do I really care whether the 3 > gets stuck in the same memory location previously occupied by the 7 > (mutable) or the symbol b points to a new location containing a 3 > (immutable)? What are some circumstances where this might matter? > The most important one in practice is a = some_array[0] b = a a += 3 If the indexing operation returns a scalar (immutable), then "a" and "b" will have different values. If it returns a rank-0 (mutable), then "a" and "b" will be the same. This matters for code that is written with scalars in mind and which then gets fed rank-0 arrays as arguments. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Mon Mar 28 02:30:07 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Mar 28 02:30:07 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4245C512.1030701@csun.edu> References: <4245C512.1030701@csun.edu> Message-ID: <57e36543b3cad1b8680667aa61f5166c@laposte.net> On 26.03.2005, at 21:24, Stephen Walton wrote: > zeros() in Numeric3 defaults to typecode='d' while in numarray it > defaults to typecode=None, which in practice means 'i' by default. Is > this deliberate? Is this desirable? I'd vote for zeros(), ones() and > the like to default to 'i' or 'f' rather than 'd' in the interest of > space and speed. My main argument is a different one: consistency. I see zeros() as an array constructor, a shorthand for calling array() with an explicit list argument. From that point of view, zeros((n,)) should return the same value as array(n*[0]) i.e. an integer array. If people feel a need for a compact float-array generator, I'd rather have an additional function "fzeros()" than a modification of zeros(), whose behaviour in current Numeric and numarray is both consistent and well established. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Mon Mar 28 02:32:13 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Mar 28 02:32:13 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42464443.8050402@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: On 27.03.2005, at 07:27, Michiel Jan Laurens de Hoon wrote: > 3) Fortran support. Very useful, and I'd like to see them in Python's > distutils. Another option would be to put this in SciPy.fortran or > something similar. But since Python's distutils already has a > language= option for C++ and Objective-C, the cleanest way would be to > add this to Python's distutils and enable language="fortran". I agree in principle, but I wonder how stable the Fortran support in SciPy distutils is. If it contains compiler-specific data, then it might not be a good idea to restrict modifications and additions to new Python releases. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From mdehoon at ims.u-tokyo.ac.jp Mon Mar 28 02:58:17 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 28 02:58:17 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <4247E431.5050006@ims.u-tokyo.ac.jp> Pearu Peterson wrote: > Btw, I have started scipy_core clean up. The plan is to create the > following package tree under Numeric3 source tree: > ... > > For backwards compatibility, there will be > Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ > and > Lib/{LinearAlgebra,..}.py > under Numeric3 that will use modules from scipy. Just for clarification: Is this scipy_core or Numeric3 that you're working on? Or are they the same? --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From pearu at scipy.org Mon Mar 28 05:21:18 2005 From: pearu at scipy.org (Pearu Peterson) Date: Mon Mar 28 05:21:18 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4247E431.5050006@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> Message-ID: On Mon, 28 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Pearu Peterson wrote: >> Btw, I have started scipy_core clean up. The plan is to create the >> following package tree under Numeric3 source tree: >> ... >> >> For backwards compatibility, there will be >> Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ >> and >> Lib/{LinearAlgebra,..}.py >> under Numeric3 that will use modules from scipy. > > Just for clarification: Is this scipy_core or Numeric3 that you're working > on? Or are they the same? The idea was merge tools from scipy_core (that basically contains scipy_distutils and scipy_base) to Numeric3. The features of scipy_distutils have been stated in previous messages, some of these features will be used to build Numeric3. scipy_base contains enhancements to Numeric (now to be natural part of Numeric3) plus few useful python modules. Which scipy_core modules exactly should be included to Numeric3 or left out of it, depends on how crusial are they for building/maintaining Numeric3 and whether they are useful in general for Numeric3 users. This is completely open for discussion. No part of scipy_core should be blindly copied to Numeric3 project. Pearu From mdehoon at ims.u-tokyo.ac.jp Mon Mar 28 05:31:25 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 28 05:31:25 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> Message-ID: <424807D9.7090800@ims.u-tokyo.ac.jp> Pearu Peterson wrote: > > The idea was merge tools from scipy_core (that basically contains > scipy_distutils and scipy_base) to Numeric3. The features of > scipy_distutils have been stated in previous messages, some of these > features will be used to build Numeric3. scipy_base contains > enhancements to Numeric (now to be natural part of Numeric3) plus few > useful python modules. Which scipy_core modules exactly should be > included to Numeric3 or left out of it, depends on how crusial are they > for building/maintaining Numeric3 and whether they are useful in general > for Numeric3 users. This is completely open for discussion. No part of > scipy_core should be blindly copied to Numeric3 project. > Sounds good to me. Thanks, Pearu. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From faltet at carabos.com Mon Mar 28 07:17:10 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Mar 28 07:17:10 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <4247CEC9.1030903@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: <200503281713.33850.faltet@carabos.com> Hi Travis, Scott, I've been following your discussions and I'm very happy that Travis has finally decided to go with adopting the bytes object in Numeric3. It's also very important that from the discussions, you finally reached an almost complete agreement on how to support the __array__ protocol. I do think that this idea is both very simple and powerful. I do hope this would be a *major* step towards interchanging data between differents applications and packages and, perhaps, this would render almost a non-sense the final goal of including a specific ndarray object in the Python standard library: this simply should be not necessary at all! A Dilluns 28 Mar? 2005 11:30, Travis Oliphant va escriure: [snip] > As I've already said, it would be easy to check for the more specialized > attributes at object creation time to boot-strap an array from an > arbitrary object. [snip] > Let's go ahead and get some __array__XXXXX attribute names decided on. > I'll put them in the Numeric3 code base (I could also put them in old > Numeric and make a 24.0 release as well --- I need to do that because of > a horrible bug in the new empty method: Numeric.empty(, 'O'). Very nice! From what you stated above I deduce that you will be including a case in the Numeric.array constructor so that it can create a properly defined array if the sequence that is passed to it fulfils the __array__ protocol. In addition, if the numarray people would be willing to do the same thing, I envision a very easy (and very efficient) way to convert from/to Numeric to/from numarray (until Numeric3 would be ready for production), something like: NumericArray = Numeric.array(numarrayArray) numarrayArray = numarray.array(NumericArray) Internally, one should decide which is the optimum way to convert from one object to the other. Based on suggestions from Todd Miller on how to do this as efficiently as possible, I have arrived to the conclusions that the next conversions are the most efficient ones: In [69]:na = numarray.arange(100*1000,shape=(100,1000)) In [70]:num = Numeric.arange(100*1000);num=num.resize((100,1000)) In [72]:t1=time();num2=Numeric.fromstring(na._data, typecode=na.typecode());num2=num2.resize(na.shape);time()-t1 Out[72]:0.0017759799957275391 In [73]:t1=time();na2=numarray.fromstring(num.tostring(),type=num.typecode(),shape=num.shape);time()-t1 Out[73]:0.0039050579071044922 Both ways, although very efficient, still copy the data area in the conversion process. In the future, when Numeric3 will support the bytes object, there will be no copy of memory at all for interchanging data with another package (i.e. numarray). Until then, the __array__ protocol may contribute to share data (well, at least contiguous data) efficiently between applications right now. A big thanks to Scott for suggesting and heartedly defending the bytes object and to Travis for unrecklessly becoming a convert. We, the developers of extensions, will be grateful forever :-) Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From rkern at ucsd.edu Mon Mar 28 07:44:18 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 07:44:18 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4245C512.1030701@csun.edu> References: <4245C512.1030701@csun.edu> Message-ID: <4248262D.5060407@ucsd.edu> Stephen Walton wrote: > zeros() in Numeric3 defaults to typecode='d' while in numarray it > defaults to typecode=None, which in practice means 'i' by default. Is > this deliberate? Is this desirable? I'd vote for zeros(), ones() and > the like to default to 'i' or 'f' rather than 'd' in the interest of > space and speed. For zeros() and ones(), I don't think space and speed are going to be affected by the default typecode. In my use of these functions, I almost always need a specific typecode. If I use the default, it's because I actually need the default typecode. Unfortunately, I almost always want Float and not Int, so all of my code is littered with zeros(shape, Float) I'll bet Travis's code looks the same. I would *love* to be able to spell these things like Float.zeros(shape) UInt8.ones(shape) Complex32.array(other) ... Then we could leave zeros() and ones() defaults as they are for backwards compatibility, and deprecate the functions. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From perry at stsci.edu Mon Mar 28 08:18:16 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Mar 28 08:18:16 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4248262D.5060407@ucsd.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> Message-ID: <09347872476f2c45aaa5d80d2c856088@stsci.edu> On Mar 28, 2005, at 10:43 AM, Robert Kern wrote: > Stephen Walton wrote: >> zeros() in Numeric3 defaults to typecode='d' while in numarray it >> defaults to typecode=None, which in practice means 'i' by default. >> Is this deliberate? Is this desirable? I'd vote for zeros(), ones() >> and the like to default to 'i' or 'f' rather than 'd' in the interest >> of space and speed. > > For zeros() and ones(), I don't think space and speed are going to be > affected by the default typecode. In my use of these functions, I > almost always need a specific typecode. If I use the default, it's > because I actually need the default typecode. Unfortunately, I almost > always want Float and not Int, so all of my code is littered with > > zeros(shape, Float) > > I'll bet Travis's code looks the same. > > I would *love* to be able to spell these things like > > Float.zeros(shape) > UInt8.ones(shape) > Complex32.array(other) > ... This is an odd thought but why not: Float(shape) # defaults to 0 UInt(shape, value=1) I forget if it was proposed to make the type object a constructor for arrays in which case this may conflict with the usage of converting the argument of the callable form to an array, i.e., Float((2,3)) --> array([2.,3.], typecode=Float) # or whatever the name of the type parameter becomes From faltet at carabos.com Mon Mar 28 08:52:18 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Mar 28 08:52:18 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503281713.33850.faltet@carabos.com> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <200503281713.33850.faltet@carabos.com> Message-ID: <200503281847.15157.faltet@carabos.com> A Dilluns 28 Mar? 2005 17:13, Francesc Altet va escriure: [snip] > Based on suggestions from Todd Miller on how > to do this as efficiently as possible, I have arrived to the > conclusions that the next conversions are the most efficient ones: > > In [69]:na = numarray.arange(100*1000,shape=(100,1000)) > In [70]:num = Numeric.arange(100*1000);num=num.resize((100,1000)) > > In [72]:t1=time();num2=Numeric.fromstring(na._data, > typecode=na.typecode());num2=num2.resize(na.shape);time()-t1 > Out[72]:0.0017759799957275391 > In > [73]:t1=time();na2=numarray.fromstring(num.tostring(),type=num.typecode(),s >hape=num.shape);time()-t1 Out[73]:0.0039050579071044922 > Er, sorry, there is in fact a more efficient way to convert from a Numeric object to a numarray object that doesn't require any data copy at all. This is: In [212]:num=Numeric.arange(100*1000, typecode="i");num=num.resize((100,1000)) In [213]:num[0,:5] Out[213]:array([0, 1, 2, 3, 4],'i') In [214]:t1=time();na2=numarray.array(numarray.memory.writeable_buffer(num),type=num.typecode(),shape=num.shape);time()-t1 Out[214]:0.0001010894775390625 # takes just 100 us! In [215]:na2[0,4] = 1 # modify a cell In [216]:num[0,:5] Out[216]:array([0, 1, 2, 3, 1],'i') In [217]:na2[0,:5] Out[217]:array([0, 1, 2, 3, 1]) # na2 has been modified as well, so the # data area is shared between num and na2 in fact, its speed is independent of the array size (as it should be for a non-data-copying procedure): # Create a Numeric object 10x larger In [218]:num=Numeric.arange(1000*1000, typecode="i");num=num.resize((1000,1000)) In [219]:t1=time();na2=numarray.array(numarray.memory.writeable_buffer(num),type=num.typecode(),shape=num.shape);time()-t1 Out[219]:0.00010204315185546875 # 100 us again! This is because numarray has chosen to use a buffer object internally, and that the Numeric object can be wrapped by a buffer object without any actual data copy. That drives me to think that, if the bytes object (that seems to be implemented by Numeric3) could wrap the buffer object where numarray objects hold its data, the conversion between Numeric3 <--> numarray (or, in general, between those packages that deal with bytes objects and other packages that deal with buffer objects) can be done with a cost of 1 (that is, independent of the data size). If this cannot be done (I mean, to get a safe bytes object from a buffer object and vice-versa), well, it should be a pity. Do you think that would be possible at all? Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From faltet at carabos.com Mon Mar 28 09:03:47 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Mar 28 09:03:47 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <09347872476f2c45aaa5d80d2c856088@stsci.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> Message-ID: <200503281902.42770.faltet@carabos.com> A Dilluns 28 Mar? 2005 18:18, Perry Greenfield va escriure: > This is an odd thought but why not: > > Float(shape) # defaults to 0 > UInt(shape, value=1) > > I forget if it was proposed to make the type object a constructor for > arrays in which case this may conflict with the usage of converting the > argument of the callable form to an array, i.e., > > Float((2,3)) --> array([2.,3.], typecode=Float) # or whatever the name > of the type parameter becomes Well, why not: Array(shape, type=Float, defvalue=None) In the end, all three paramters are used to univoquely determine the Array object. Moreover, "defvalue = None" would be a synonymous of the recently introduced "empty" factory. However, this looks suspiciously similar to the "array" factory. Perhaps it would be nice to add this "defvalue" or "value" parameter to the "array" factory and that's all. -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From rkern at ucsd.edu Mon Mar 28 09:38:22 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 09:38:22 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <200503281902.42770.faltet@carabos.com> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> <200503281902.42770.faltet@carabos.com> Message-ID: <42483F5C.7050002@ucsd.edu> Francesc Altet wrote: > A Dilluns 28 Mar? 2005 18:18, Perry Greenfield va escriure: > >>This is an odd thought but why not: >> >>Float(shape) # defaults to 0 >>UInt(shape, value=1) >> >>I forget if it was proposed to make the type object a constructor for >>arrays in which case this may conflict with the usage of converting the >>argument of the callable form to an array, i.e., >> >>Float((2,3)) --> array([2.,3.], typecode=Float) # or whatever the name >>of the type parameter becomes > > > Well, why not: > > Array(shape, type=Float, defvalue=None) > > In the end, all three paramters are used to univoquely determine the > Array object. Moreover, "defvalue = None" would be a synonymous of the > recently introduced "empty" factory. My thought was to not deal with typecode keywords at all and put more responsibility on the typecode objects for general constructor-type operations. In this vein, though, I suggest the spelling Float.new(shape, value=None) # empty Float.new(shape, value=0) # zeros Float.new(shape, value=1) # ones value defaults to None. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From stephen.walton at csun.edu Mon Mar 28 09:39:29 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Mar 28 09:39:29 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <57e36543b3cad1b8680667aa61f5166c@laposte.net> References: <4245C512.1030701@csun.edu> <57e36543b3cad1b8680667aa61f5166c@laposte.net> Message-ID: <424840F1.4000500@csun.edu> konrad.hinsen at laposte.net wrote: > My main argument is a different one: consistency. > > I see zeros() as an array constructor, a shorthand for calling > array() with an explicit list argument. Ah, but array(10*[0]) returns an integer array, and array(10*[0.]) returns a double. Which should zeros() be equivalent to? From xscottg at yahoo.com Mon Mar 28 10:30:22 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Mar 28 10:30:22 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <4247CEC9.1030903@ee.byu.edu> Message-ID: <20050328182929.50411.qmail@web50205.mail.yahoo.com> --- Travis Oliphant wrote: > > Thank you for your detailed explanations. This is starting to make more > sense to me. It is obvious that you understand what we are trying to > do, and I pretty much agree with you in how you think it should be > done. I think you do a great job of explaining things. > > I agree we should come up with a set of names for the interface to > arrayobjects. I'm even convinced that offset should be an optional part > of the interface (implied 0 if it's not there). > Very cool! You just made my day. I wish I had time to do a good writeup, but I need to catch a flight in a couple hours, and I won't be back behind my computer until Wednesday night. Here is an initial stab: __array_shape__ Required, a sequence (typically tuple) of non-negative int/longs __array_storage__ Required, a buffer or possibly sequence object (list) (Required unless the object support PyBufferProcs directly? I don't have a strong opinion on that one...) A slightly different name to indicate it could be a buffer or sequence object (like a list). Typically buffer. __array_itemtype__ Suggested, but Optional if __array_itemsize__ is present. This attribute probably warrants some discussion... A struct module format string or one of the additional ones that needs to be added. Need to discuss "long double" and "Object". (Capital 'O' for Object, Captial 'D' for long double, Capital 'X' for bit?) If not present or the empty string '', indicates that the array elements can only be treated as blobs and the real data representation must be gotten from some other means. I think doubling the typecode as a convention to denote complex numbers makes some sense (for instance 'ff' is complex float). The struct module convention for denoting native, portable big endian, and portable little endian is concise and documented. __array_itemsize__ Optional if __array_itemtype is present and the value can calculated from struct.calcsize(__array_itemtype__) __array_strides__ Optional if the array data is in a contiguous C layout. Required otherwise. Same length as __array_shape__. Indicates how much to multiply subscripts by to get to the desired position in the storage. A sequence (typically tuple) of ints/longs. These are in byte offsets (not element_size offsets) for most arrays. Special exceptions made for: Tightly packed (8 bits to a byte) bitmask arrays, where they offsets are bit indexes PyObject arrays (lists) where the offsets are indexes They should be byte offsets to handle non-aligned data or data with odd packing. Fortran arrays might be common enough to warrant special casing. We could discuss whether a __array_fortran__ attribute indicates that the array is in contiguous Fortran layout __array_offset__ Optional and defaults to zero. An int/long indicating the offset to treat as the zeroth element __array_complicated__ Optional and defaults to zero/false. This is a kluge to indicate that while yes the data is an array, the storage layout can not be easily described by the shape/strides/offset combination alone. This could warrant some discussion. __array_fortran__ Optional and defaults to zero/false. If you want to represent Fortran arrays without creating a strides for them, this would be necessary. I'd vote to leave it out and stick with strides... These are all just suggestions. Is something important missing? Predicates like iscontiguous(a) and isfortran(a) can all be easily determined from the above. The ndims or rank is simply len(a.__array_shape__). I wish I had more time to respond to some of the other things in your message, but I'm gone until Wednesday night... Cheers, -Scott From oliphant at ee.byu.edu Mon Mar 28 12:05:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 12:05:08 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> Message-ID: <42486323.9080801@ee.byu.edu> >> Just for clarification: Is this scipy_core or Numeric3 that you're >> working on? Or are they the same? > > > The idea was merge tools from scipy_core (that basically contains > scipy_distutils and scipy_base) to Numeric3. The features of > scipy_distutils have been stated in previous messages, some of these > features will be used to build Numeric3. scipy_base contains > enhancements to Numeric (now to be natural part of Numeric3) plus few > useful python modules. Which scipy_core modules exactly should be > included to Numeric3 or left out of it, depends on how crusial are > they for building/maintaining Numeric3 and whether they are useful in > general for Numeric3 users. This is completely open for discussion. No > part of scipy_core should be blindly copied to Numeric3 project. My understanding is that scipy_core and Numeric3 are the same thing. I'm using the terminology Numeric3 in emails to avoid confusion, but I would rather see one package emerge from this like scipy_core. I would prefer not to have a "Numeric3" package and a separate "scipy_core" package, unless there is a good reason to have two packages. -Travis From perry at stsci.edu Mon Mar 28 13:54:10 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Mar 28 13:54:10 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <4247CEC9.1030903@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: On Mar 28, 2005, at 4:30 AM, Travis Oliphant wrote: > Scott, > > Thank you for your detailed explanations. This is starting to make > more sense to me. It is obvious that you understand what we are > trying to do, and I pretty much agree with you in how you think it > should be done. I think you do a great job of explaining things. > I agree we should come up with a set of names for the interface to > arrayobjects. I'm even convinced that offset should be an optional > part of the interface (implied 0 if it's not there). > Just to add my two cents, I don't think I ever thought it was necessary to bundle the metadata with the memory object for the reasons Scott outlined. It isn't needed functionally, and there are cases where the same memory may be used in different contexts (as is done with our record arrays). Numarray, when it uses the buffer object, always gets a fresh pointer for the buffer object for every data access. But Scott is right that that pointer is good so long as there isn't a chance for something else to change it. In practice, I don't think that ever happens with the buffers that numarray happens to use, but it's still a flaw of the current buffer object that there is no way to ensure it won't change. I'm not sure how the support for large data sets should be handled. I generally think that it will be very awkward to handle these until Python does as well. Speaking of which... I had been in occasional contact with Martin von Loewis about his work to update Python to handle 64-bit addressing. We weren't planning to handle this in nummarray (nor Numeric3, right Travis or do I have that wrong?) until Python did. A few months ago Martin said he was mostly done. I had a chance to talk to him at Pycon about where that work stood. Unfortunately, it is not turning out to be as easy as he hoped. This is too bad. I have a feeling that this work is going to stall without help on our (numpy community) part to help make the changes or drum beating to make it a higher priority. At the moment the Numeric3 effort should be the most important focus, but I think that after that, this should become a high priority. Perry From perry at stsci.edu Mon Mar 28 14:04:24 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Mar 28 14:04:24 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <4246732D.2080908@ims.u-tokyo.ac.jp> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> <424655B1.4000503@ee.byu.edu> <4246732D.2080908@ims.u-tokyo.ac.jp> Message-ID: <5d690db672d4a406a82e3b8b6c0da541@stsci.edu> On Mar 27, 2005, at 3:47 AM, Michiel Jan Laurens de Hoon wrote: > As far as I can tell from their setup.py, neither Numerical Python nor > numarray currently does code generation on the fly from setup.py. > (This was one of the reasons that I started to worry if Numeric3 is > more than Numerical Python + numarray). Numarray definitely does code generation (and so did Numeric originally, eventually the generated code was hand-edited). Code generation is the way to go (with C anyway). Perry From rkern at ucsd.edu Mon Mar 28 14:18:09 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 14:18:09 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <42488268.3030505@ucsd.edu> konrad.hinsen at laposte.net wrote: > On 27.03.2005, at 07:27, Michiel Jan Laurens de Hoon wrote: > >> 3) Fortran support. Very useful, and I'd like to see them in Python's >> distutils. Another option would be to put this in SciPy.fortran or >> something similar. But since Python's distutils already has a >> language= option for C++ and Objective-C, the cleanest way would be >> to add this to Python's distutils and enable language="fortran". > > > I agree in principle, but I wonder how stable the Fortran support in > SciPy distutils is. If it contains compiler-specific data, then it > might not be a good idea to restrict modifications and additions to new > Python releases. Case in point: Pearu just added g95 support last week. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From pearu at scipy.org Mon Mar 28 14:47:13 2005 From: pearu at scipy.org (Pearu Peterson) Date: Mon Mar 28 14:47:13 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42486323.9080801@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> <42486323.9080801@ee.byu.edu> Message-ID: On Mon, 28 Mar 2005, Travis Oliphant wrote: > >>> Just for clarification: Is this scipy_core or Numeric3 that you're working >>> on? Or are they the same? >> >> >> The idea was merge tools from scipy_core (that basically contains >> scipy_distutils and scipy_base) to Numeric3. The features of >> scipy_distutils have been stated in previous messages, some of these >> features will be used to build Numeric3. scipy_base contains enhancements >> to Numeric (now to be natural part of Numeric3) plus few useful python >> modules. Which scipy_core modules exactly should be included to Numeric3 or >> left out of it, depends on how crusial are they for building/maintaining >> Numeric3 and whether they are useful in general for Numeric3 users. This is >> completely open for discussion. No part of scipy_core should be blindly >> copied to Numeric3 project. > > > My understanding is that scipy_core and Numeric3 are the same thing. I'm > using the terminology Numeric3 in emails to avoid confusion, but I would > rather see one package emerge from this like scipy_core. I would prefer not > to have a "Numeric3" package and a separate "scipy_core" package, unless > there is a good reason to have two packages. In that case ndarray object and ufunc codes should go under scipy.base. We can postpone this move until scipy.distutils is ready. And if I understand you correctly then from scipy.base import * will replace from Numeric import * or from numarray import * roughly speaking. Pearu From oliphant at ee.byu.edu Mon Mar 28 15:16:20 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 15:16:20 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> <42486323.9080801@ee.byu.edu> Message-ID: <42489027.8030903@ee.byu.edu> Pearu Peterson wrote: > > from scipy.base import * > > will replace > > from Numeric import * > > or > > from numarray import * > > roughly speaking. > > Pearu This is exactly what I would like to see. We will need, however, to provide that import Numeric and friends still works for backward compatibility, but it should be deprecated. Best, -Travis From oliphant at ee.byu.edu Mon Mar 28 15:26:32 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 15:26:32 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: <42489275.7060600@ee.byu.edu> > > Just to add my two cents, I don't think I ever thought it was > necessary to bundle the metadata with the memory object for the > reasons Scott outlined. It isn't needed functionally, and there are > cases where the same memory may be used in different contexts (as is > done with our record arrays). I'm glad we've worked that one out. > > Numarray, when it uses the buffer object, always gets a fresh pointer > for the buffer object for every data access. But Scott is right that > that pointer is good so long as there isn't a chance for something > else to change it. In practice, I don't think that ever happens with > the buffers that numarray happens to use, but it's still a flaw of the > current buffer object that there is no way to ensure it won't change. One could see it as a "flaw" in the buffer object, but I prefer to see it as problesm with objects that use the PyBufferProcs protocol. It is at worst, a "limitation" of the buffer interface that should be advertised (in my mind the problem lies with the objects that make use of the buffer protocol and also reallocate memory willy-nilly since Python does not allow for this). To me, an analagous situation occurs when an extension module writes into memory it does not own and causes a seg-fault. I suppose a casual observer could say this is a Python flaw but clearly the problem is with the extension object. It certinaly does not mean at all that something like a buffer object should never exist or that the buffer protocol should not be used. I get the feeling sometimes, that some naive (to Numeric and numarray) people on python-dev feel that way. > > I'm not sure how the support for large data sets should be handled. I > generally think that it will be very awkward to handle these until > Python does as well. Speaking of which... > > I had been in occasional contact with Martin von Loewis about his work > to update Python to handle 64-bit addressing. We weren't planning to > handle this in nummarray (nor Numeric3, right Travis or do I have that > wrong?) until Python did. A few months ago Martin said he was mostly > done. I had a chance to talk to him at Pycon about where that work > stood. Unfortunately, it is not turning out to be as easy as he hoped. > This is too bad. I have a feeling that this work is going to stall > without help on our (numpy community) part to help make the changes or > drum beating to make it a higher priority. At the moment the Numeric3 > effort should be the most important focus, but I think that after > that, this should become a high priority. > I would be interested to hear what the problems are. Why can't you just change the protocol replacing all int's with Py_intptr_t? Is backward compatibilty the problem? This seems like it's on the extension code level (and then only on 64-bit systesm), and so would be easier to force through the change in Python 2.5. Numeric3 will suffer limitations whenever the sequence protocol is used. We can work around it as much as possible (by not using the sequence protocol whenever possible), but the limitation lies firmly in the Python sequence protocol. -Travis From stephen.walton at csun.edu Mon Mar 28 15:40:17 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Mar 28 15:40:17 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <42483F5C.7050002@ucsd.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> <200503281902.42770.faltet@carabos.com> <42483F5C.7050002@ucsd.edu> Message-ID: <424895BC.7030504@csun.edu> Robert Kern wrote: > Float.new(shape, value=None) # empty > Float.new(shape, value=0) # zeros > Float.new(shape, value=1) # ones Uhm, my first reaction to this kind of thing is "ugh," but maybe I'm just not thinking in the correct OO mode. Is this any better than zeros() and ones()? For that matter, is it any better than x=zeros(shape) x=any_old_scalar Having said that, the main reason I use zeros() in MATLAB is to preallocate space. MATLAB can dynamically grow arrays, so the following is legal: x=[]; do k=1:100 x[:,k]=a_vector_of_100_values; and produces a 100 by 100 array. While legal, it is much faster to preallocate x by changing the first line to "x=zeros(100,100);". Since NumPy arrays can't grow dynamically, perhaps this is a small issue. From oliphant at ee.byu.edu Mon Mar 28 16:00:14 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 16:00:14 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328182929.50411.qmail@web50205.mail.yahoo.com> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> Message-ID: <42489A65.2030201@ee.byu.edu> >I wish I had time to do a good writeup, but I need to catch a flight in a >couple hours, and I won't be back behind my computer until Wednesday night. > Here is an initial stab: > > __array_shape__ > Required, a sequence (typically tuple) of non-negative int/longs > > great. I agree. > __array_storage__ > Required, a buffer or possibly sequence object (list) > > (Required unless the object support PyBufferProcs directly? > I don't have a strong opinion on that one...) > > A slightly different name to indicate it could be a buffer or > sequence object (like a list). Typically buffer. > > I prefer __array_data__ (it's a common name for Numeric and numarray, It can be interpreted as a sequence object if desired). > __array_itemtype__ > Suggested, but Optional if __array_itemsize__ is present. > > I say this one defaults to "V" for void * if not present. And _array_itemsize__ is necessary if it is "S" (string), "U" unicode, or "V". I also like __array_typestr__ or __array_typechar__ better as a name. > A struct module format string or one of the additional ones > that needs to be added. Need to discuss "long double" and > "Object". (Capital 'O' for Object, Captial 'D' for long double, > Capital 'X' for bit?) > > Don't like 'D' for long double. Complex floats is already using it. I'm not sure I like the idea of moving to two character typecodes at this point because it indicates more internal changes to Numeric3 (otherwise we have two typecharacter standards which is not a good thing). What is wrong with 'g' and 'G' for long double and complex long double respectively. > If not present or the empty string '', indicates that the > array elements can only be treated as blobs and the real > data representation must be gotten from some other means. > > Again, a void * type handles this well. > The struct module convention for denoting native, portable > big endian, and portable little endian is concise and documented. > > So, you think we should put the byte-order in the typecharacter interface. Don't know.... could be persuaded. > __array_itemsize__ > Optional if __array_itemtype is present and the value can > calculated from struct.calcsize(__array_itemtype__) > > I think it is only optional if typechar is not 'S', 'U', or 'V'. > __array_strides__ > Optional if the array data is in a contiguous C layout. > Required otherwise. Same length as __array_shape__. > Indicates how much to multiply subscripts by to get to > the desired position in the storage. > > A sequence (typically tuple) of ints/longs. These are in > byte offsets (not element_size offsets) for most arrays. > Special exceptions made for: > Tightly packed (8 bits to a byte) bitmask arrays, where > they offsets are bit indexes > > PyObject arrays (lists) where the offsets are indexes > > They should be byte offsets to handle non-aligned data or data > with odd packing. > > Fortran arrays might be common enough to warrant special casing. > We could discuss whether a __array_fortran__ attribute indicates > that the array is in contiguous Fortran layout > > I don't think it is necessary in the interface. > __array_offset__ > Optional and defaults to zero. An int/long indicating the offset > to treat as the zeroth element > > __array_complicated__ > Optional and defaults to zero/false. This is a kluge to indicate > that while yes the data is an array, the storage layout can not > be easily described by the shape/strides/offset combination alone. > > This could warrant some discussion. > > I don't see the utility here I guess, If it can't be described by a shape/strides combination then how can it participate in the protocol? > __array_fortran__ > Optional and defaults to zero/false. If you want to represent > Fortran arrays without creating a strides for them, this would > be necessary. I'd vote to leave it out and stick with strides... > > > Me too. We should make the interface as minimal as possible, intially. My proposal: __array_data__ (optional object that exposes the PyBuffer protocol or a sequence object, if not present, the object itself is used). __array_shape__ (required tuple of int/longs that gives the shape of the array) __array_strides__ (optional provides how to step through the memory in bytes (or bits if a bit-array), default is C-contiguous) __array_typestr__ (optional struct-like string showing the type --- optional endianness indicater + Numeric3 typechars, default is 'V') __array_itemsize__ (required if above is 'S', 'U', or 'V') __array_offset__ (optional offset to start of buffer, defaults to 0) So, you could define an array interface with only two additional attributes if your object exposed the buffer or sequence protocol. We should figure out a way to work around the 32-bit limitations of the sequence and buffer protocols as well. -Travis From oliphant at ee.byu.edu Mon Mar 28 16:07:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 16:07:08 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328182929.50411.qmail@web50205.mail.yahoo.com> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> Message-ID: <42489BF7.4040401@ee.byu.edu> Scott Gilbert wrote: > __array_itemtype__ > Suggested, but Optional if __array_itemsize__ is present. > > This attribute probably warrants some discussion... > > A struct module format string or one of the additional ones > that needs to be added. Need to discuss "long double" and > "Object". (Capital 'O' for Object, Captial 'D' for long double, > Capital 'X' for bit?) > > If not present or the empty string '', indicates that the > array elements can only be treated as blobs and the real > data representation must be gotten from some other means. > > I think doubling the typecode as a convention to denote complex > numbers makes some sense (for instance 'ff' is complex float). > > The struct module convention for denoting native, portable > big endian, and portable little endian is concise and documented. > > After more thought, I think here we need to also allow the "c-type" independent way of describing an array (i.e. numarray-introduced 'c4' for a complex-valued 4 byte itemsize array). So, pehaps __array_ctypestr_ and __array_typestr__ should be two ways to get the information (or overload the __array_typestr__ interface and reequire consumers to accept either style). -Travis From mdehoon at ims.u-tokyo.ac.jp Mon Mar 28 18:18:44 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 28 18:18:44 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42486323.9080801@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> <42486323.9080801@ee.byu.edu> Message-ID: <4248BBC3.9080102@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > My understanding is that scipy_core and Numeric3 are the same thing. > I'm using the terminology Numeric3 in emails to avoid confusion, but I > would rather see one package emerge from this like scipy_core. I would > prefer not to have a "Numeric3" package and a separate "scipy_core" > package, unless there is a good reason to have two packages. > Right now, I think it's probably better to call it it scipy_core instead of Numeric3, since we'll be doing >>> from scipy.base import * instead of >>> from Numeric import * --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From rkern at ucsd.edu Mon Mar 28 23:37:16 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 23:37:16 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <424895BC.7030504@csun.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> <200503281902.42770.faltet@carabos.com> <42483F5C.7050002@ucsd.edu> <424895BC.7030504@csun.edu> Message-ID: <42490437.6050002@ucsd.edu> Stephen Walton wrote: > Robert Kern wrote: > >> Float.new(shape, value=None) # empty >> Float.new(shape, value=0) # zeros >> Float.new(shape, value=1) # ones > > > Uhm, my first reaction to this kind of thing is "ugh," but maybe I'm > just not thinking in the correct OO mode. Is this any better than > zeros() and ones()? For that matter, is it any better than > > x=zeros(shape) > x=any_old_scalar x[:] = any_old_scalar you mean? Perhaps not, *if* I need the default type, which I rarely do. And when I do need the default type, and I'm coding carefully, I will add the type anyways to be explicit. I *do* think that x = CFloat.new(shape, 2j*pi) is better than x = empty(shape, type=CFloat) x[:] = 2j*pi I don't think there's much OO in it. The implementations won't change, really. It's more a matter of aesthetics of the API. I like it for much the same reasons that transpose(array) et al. were folded into methods of arrays. Also, with Perry's and Francesc's suggestions, it collapses three very similar functions into one. > Having said that, the main reason I use zeros() in MATLAB is to > preallocate space. I use it the same in Python. Sometimes, I'm going to be replacing all of the values (in which case I would use empty()), but often I only need to sparsely replace values. Usually, the "background" value ought to be 0, but occasionally, things get weirder. But, this isn't a particularly important issue. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From rkern at ucsd.edu Tue Mar 29 03:46:19 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Mar 29 03:46:19 2005 Subject: [Numpy-discussion] searching a list of arrays In-Reply-To: <200503271820.51307.dd55@cornell.edu> References: <200503271820.51307.dd55@cornell.edu> Message-ID: <42493FBD.1060002@ucsd.edu> Darren Dale wrote: > Hi, > > I have a list of numeric-23.8 arrays: > > a = [array([0,1]), > array([0,1]), > array([1,0]), > array([1,0])] > > b = [array([0,1,0]), > array([0,1,0]), > array([1,0,0]), > array([1,0,0])] > > and I want to make a new list out of b: > > c = [array([0,1,2]), > array([1,0,2])] > > where the last index in each array is the result of > > b.count([0,1,0]) # or [1,0,0] > > > The problem is that the result of b.count(array([1,0,0])) is 4, not 2, and > b.remove(array([1,0,0])) indescriminantly removes arrays from the list. > a.count and a.remove work the way I expected. This is a result of rich comparisons. (array1 == array2) yields an array, not a boolean. In [1]:a = [array([0,1]), ...: array([0,1]), ...: array([1,0]), ...: array([1,0])] In [2]:b = [array([0,1,0]), ...: array([0,1,0]), ...: array([1,0,0]), ...: array([1,0,0])] In [3]: In [3]:b.count(array([0,1,0])) Out[3]:4 In [4]:[x == array([0,1,0]) for x in b] Out[4]: [array([1, 1, 1],'b'), array([1, 1, 1],'b'), array([0, 0, 1],'b'), array([0, 0, 1],'b')] To replace b.count(), you can do In [12]:sum(alltrue(equal(b, array([0,1,0])), axis=-1)) Out[12]:2 To replace b.remove(), you can do In [14]:[x for x in b if not alltrue(x == array([0,1,0]))] Out[14]:[array([1, 0, 0]), array([1, 0, 0])] -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From faltet at carabos.com Tue Mar 29 05:24:24 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Mar 29 05:24:24 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: <200503291523.18309.faltet@carabos.com> A Dilluns 28 Mar? 2005 23:54, Perry Greenfield va escriure: > Numarray, when it uses the buffer object, always gets a fresh pointer > for the buffer object for every data access. But Scott is right that > that pointer is good so long as there isn't a chance for something else > to change it. In practice, I don't think that ever happens with the > buffers that numarray happens to use, but it's still a flaw of the > current buffer object that there is no way to ensure it won't change. However, having to update the pointer for the buffer object for every data access does impact performance quite a lot. This issue has been brought up to this list some months ago (see [1]). I, as for one, have renounced to call NA_updateDataPtr() during table reads in PyTables and this speeded up the reading process by 70%, which is not a joke. And this speed-up could be theoretically achieved in every piece of code that reads like: for i range(n): a = numarrayobject[i] that is, whenever a single element in array is accessed. If the bytes object suggested by Scott makes the call to NA_updateDataPtr() unnecessary then this is an added advantage of bytes over buffer. [1] http://sourceforge.net/mailarchive/message.php?msg_id=8848962 Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From magnus at hetland.org Tue Mar 29 07:21:39 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 07:21:39 2005 Subject: [Numpy-discussion] Linear programming Message-ID: <20050329151958.GA28688@idi.ntnu.no> Is there some standard Python (i.e., numarray/Numeric) mapping for some linear programming package out there? Might be rather useful... -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From perry at stsci.edu Tue Mar 29 07:46:47 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Mar 29 07:46:47 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489275.7060600@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> Message-ID: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> On Mar 28, 2005, at 6:25 PM, Travis Oliphant wrote: > > One could see it as a "flaw" in the buffer object, but I prefer to see > it as problesm with objects that use the PyBufferProcs protocol. It > is at worst, a "limitation" of the buffer interface that should be > advertised (in my mind the problem lies with the objects that make use > of the buffer protocol and also reallocate memory willy-nilly since > Python does not allow for this). To me, an analagous situation > occurs when an extension module writes into memory it does not own and > causes a seg-fault. I suppose a casual observer could say this is a > Python flaw but clearly the problem is with the extension object. > > It certinaly does not mean at all that something like a buffer object > should never exist or that the buffer protocol should not be used. I > get the feeling sometimes, that some naive (to Numeric and numarray) > people on python-dev feel that way. > Certainly there needs to be something like this (that's why we used it for numarray after all). >> >> I'm not sure how the support for large data sets should be handled. I >> generally think that it will be very awkward to handle these until >> Python does as well. Speaking of which... >> >> I had been in occasional contact with Martin von Loewis about his >> work to update Python to handle 64-bit addressing. We weren't >> planning to handle this in nummarray (nor Numeric3, right Travis or >> do I have that wrong?) until Python did. A few months ago Martin said >> he was mostly done. I had a chance to talk to him at Pycon about >> where that work stood. Unfortunately, it is not turning out to be as >> easy as he hoped. This is too bad. I have a feeling that this work is >> going to stall without help on our (numpy community) part to help >> make the changes or drum beating to make it a higher priority. At the >> moment the Numeric3 effort should be the most important focus, but I >> think that after that, this should become a high priority. >> > > I would be interested to hear what the problems are. Why can't you > just change the protocol replacing all int's with Py_intptr_t? Is > backward compatibilty the problem? This seems like it's on the > extension code level (and then only on 64-bit systesm), and so would > be easier to force through the change in Python 2.5. > As Martin explained it, he said there is a lot of code that uses int declarations. If you are saying that it would be easy just to replace all int declarations in Python, I doubt it is that simple since there are calls to many other libraries that must use ints. So it means that there are thousands (so Martin says) of declarations that one must change by hand. It has to be changed for strings, lists, tuples and everything that uses them (Guido was open to doing this but everything had to be updated at once, not just strings or certain objects, and he is certainly right about that). Martin also said that we would need a system with enough memory to test all of these. Lists in particular would need a system with 16GB of memory to test lists that use more than the current limit (because of the size of list objects). I'm not sure I agree with that. It would be nice to have that kind of test, but I think it would be reasonable to have tested on the largest memory systems available at the time for our testing. If there are latent list sequence bugs that surface when 16 GB systems become available, then the bugs can be dealt with at that time (IMHO). (Anybody out there have a system with that much memory available for test purposes :-). Of course, this change will change the C API for Python too as far as sequence use goes (or is there some way around that? A compatibility API and a new one that supports extended indices?) It would be nice if there were some way of handling that gracefully without requiring all extensions to have to change to match this. I imagine that this is going to be the biggest objection to making any changes unless the old API is supported for a while. Perhaps someone has thought this all out already. I haven't thought about it at all. Perry From perry at stsci.edu Tue Mar 29 07:53:23 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Mar 29 07:53:23 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489A65.2030201@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> Message-ID: <7805a039fbad32679dcc101cde7b9be8@stsci.edu> On Mar 28, 2005, at 6:59 PM, Travis Oliphant wrote: >> The struct module convention for denoting native, portable >> big endian, and portable little endian is concise and >> documented. >> > So, you think we should put the byte-order in the typecharacter > interface. Don't know.... could be persuaded. > I think we need to think about what the typecharacter is supposed to represent. Is it the value as the user will see it or to indicate what the internal representation is? These are two different things. Then again, I'm not sure how this info is exposed to the user; if it is appropriately handled by intermediate code it may not matter. For example, if this corresponds to what the user will see for the type, I think it is bad. Most of the time they don't care what the internal representation is, they just want to know if it is Int16 or whatever; with the two combined, they have to test for both variants. Perry From rkern at ucsd.edu Tue Mar 29 07:58:22 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Mar 29 07:58:22 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <20050329151958.GA28688@idi.ntnu.no> References: <20050329151958.GA28688@idi.ntnu.no> Message-ID: <42497AD6.2030700@ucsd.edu> Magnus Lie Hetland wrote: > Is there some standard Python (i.e., numarray/Numeric) mapping for > some linear programming package out there? Might be rather useful... My Google-fu does not reveal an obvious one. There does seem to be a recent one in which the authors wrote their own matrix object! http://www.ee.ucla.edu/~vandenbe/cvxopt/ -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From pearu at cens.ioc.ee Tue Mar 29 12:08:27 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Tue Mar 29 12:08:27 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <20050329151958.GA28688@idi.ntnu.no> Message-ID: On Tue, 29 Mar 2005, Magnus Lie Hetland wrote: > Is there some standard Python (i.e., numarray/Numeric) mapping for > some linear programming package out there? Might be rather useful... It is certainly not a standard one but some years ago I wrote a wrapper to cddlib: http://cens.ioc.ee/projects/polyhedron/ I haven't used it with recent versions of Numeric or Python though. Pearu From magnus at hetland.org Tue Mar 29 13:07:20 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 13:07:20 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328182929.50411.qmail@web50205.mail.yahoo.com> References: <4247CEC9.1030903@ee.byu.edu> <20050328182929.50411.qmail@web50205.mail.yahoo.com> Message-ID: <20050329210615.GA4743@idi.ntnu.no> > __array_storage__ How about __array_data__? -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Mar 29 13:11:20 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 13:11:20 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489A65.2030201@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> Message-ID: <20050329211013.GB4743@idi.ntnu.no> Travis Oliphant : [snip] > My proposal: > > __array_data__ (optional object that exposes the PyBuffer protocol or a > sequence object, if not present, the object itself is used). > __array_shape__ (required tuple of int/longs that gives the shape of the > array) > __array_strides__ (optional provides how to step through the memory in > bytes (or bits if a bit-array), default is C-contiguous) > __array_typestr__ (optional struct-like string showing the type --- > optional endianness indicater + Numeric3 typechars, default is 'V') > __array_itemsize__ (required if above is 'S', 'U', or 'V') > __array_offset__ (optional offset to start of buffer, defaults to 0) > > So, you could define an array interface with only two additional > attributes if your object exposed the buffer or sequence protocol. Wohoo! Niiice :) (Okay, a bit "me too"-ish, but I just wanted to contribute some enthusiasm ;) -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Mar 29 13:15:19 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 13:15:19 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <42497AD6.2030700@ucsd.edu> References: <20050329151958.GA28688@idi.ntnu.no> <42497AD6.2030700@ucsd.edu> Message-ID: <20050329211417.GC4743@idi.ntnu.no> Robert Kern : > > Magnus Lie Hetland wrote: > >Is there some standard Python (i.e., numarray/Numeric) mapping for > >some linear programming package out there? Might be rather useful... > > My Google-fu does not reveal an obvious one. Neither did mine ;) I did find pysimplex, though... But that's not really what I'm after, I guess. > There does seem to be a recent one in which the authors wrote their own > matrix object! Oh, no! 8-| Hm. Maybe this is a use-case for the new buffer stuff? Exposing the bytes of their arrays shouldn't be so hard... Easier than introducing numpy arrays of some sort, I should think ;) > http://www.ee.ucla.edu/~vandenbe/cvxopt/ Hm. They support sparse matrices too. Interesting. Thanks for the tip! -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From faltet at carabos.com Tue Mar 29 13:55:37 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Mar 29 13:55:37 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503291523.18309.faltet@carabos.com> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <200503291523.18309.faltet@carabos.com> Message-ID: <200503292354.20352.faltet@carabos.com> A Dimarts 29 Mar? 2005 15:23, Francesc Altet va escriure: > This issue has been brought up to this list some months ago (see [1]). > I, as for one, have renounced to call NA_updateDataPtr() during table > reads in PyTables and this speeded up the reading process by 70%, > which is not a joke. And this speed-up could be theoretically achieved > in every piece of code that reads like: > > for i range(n): > a = numarrayobject[i] > > that is, whenever a single element in array is accessed. Well, the statement above is not exactly true. The overhead introduced by NA_updateDataPtr (and other functions related with the buffer object) is mainly important when you call the __getitem__ method from *extensions* and less important (but yet significant!) when you are in pure Python. This evening I wanted to evaluate how much would be the acceleration if it would be not necessary to call NA_updateDataPtr and companions (i.e. getting rid of the buffer object), found some interesting results and ended doing a quite long report that took this sunny Spring evening away from me :( Despite its rather serious format, please, don't look at it as a serious demonstration of nothing. It was made basically because I need maximum performance on __getitem__ operations and was curious on what Numeric/numarray/Numeric3 can offer in that regard. If I'm publishing it here is because it could of help for somebody. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" A note on __getitem__ performance on Numeric/numarray on Python extensions (with an small follow-up on Numeric3) ========================================================================== Francesc Altet 2005-03-29 Abstract ======== Numeric [1] and numarray [2] are Python packages that provide very convenient containers to deal with large amounts of data in memory in an efficient way. The fact that they have quite different implementations lends naturally to areas where one package is better suited than the other, and vice-versa. In fact, it is a luck to have such a duality because competence is basic on every software (sane) ecosystem. The best way of determining which package is better adapted to do a certain task is benchmarking. In this report, I have made use of Pyrex [3] and oprofile [4] in order to decide which is the best candidate to be used for accessing the data in the containers from C extensions. In the appendix, some attention has been dedicated as well to Numeric3, a new-born contender for Numeric and numarray. Motivation ========== I need peak performance when accessing to data belonging to Numeric/numarray objects in my extensions, so I decided to do some profiling on the next code, which is representative of my own needs: niter = 5 N = 1000*1000 def matrix_loop(object): for j in xrange(niter): for i in xrange(N): p = object[i] This basically exercises the __getitem__ special method in Numeric/numarray objects. The benchmark ============= In order to get some comparisons done, I've made a small script (getitem-numarrayVSNumeric.py) that checks the speed for both kinds of objects: Numeric and numarray. Also, and in order to reduce the Python overhead, I've used psyco [3] so that the results may get as close as possible as if these tests were running inside a Python extension (made in C). Moreover, I've used the oprofile [4] so as to get an idea of where the CPU is wasted in this loop. First of all, I've made a calibration test to measure the time of the empty loop, that is: def null_loop(): for j in xrange(niter): for i in xrange(N): pass This time is almost negligible when running with Psyco (and the same happens inside a C extension), but it takes a *significant* time if psyco is not active. Once this time has been measured, it is substracted from the loops that actually exercise __getitem__. First (naive) timings ===================== Now, let's see some of the timings that I've done. My platform is a Pentium4 @ 2GHZ laptop, using Debian GNU/Linux and kernel 2.6.9 and with gcc 3.3.5. First of all, I'll list the results without psyco: $ python2.3 bench/getitem-numarrayVSNumeric.py Psyco not active Numeric version: 23.8 numarray version: 1.2.3 Calibration loop: 0.11173081398 Time for numarray(getitem)/iter: 3.82528972626e-07 Time for Numeric(getitem)/iter: 2.51150989532e-07 getitem in Numeric is 1.52310358537 times faster We can see how the time per iteration for numarray is 380 ns while for Numeric is 250 ns, which accounts for a 1.5x speed-up of Numeric vs numarray. Using psyco to reduce Python overhead ===================================== However, and even though we have substracted the time for the calibration loop, there may remain other places were time is wasted in Python space. Psyco is a good manner to optimize loops and make them go almost as fast as in C. Now, the figures using psyco: $ python2.3 bench/getitem-numarrayVSNumeric.py Psyco active Numeric version: 23.8 numarray version: 1.2.3 Calibration loop: 0.0015878200531 Time for numarray(getitem)/iter: 2.4246096611e-07 Time for Numeric(getitem)/iter: 1.19336557388e-07 getitem in Numeric is 2.0317409134 times faster We can see how the time for the calibration loop has been improved a factor 100x. Not too bad for a silly loop. Also, the time per iteration for numarray has dropped to 242 ns and to 119 ns for Numeric. This accounts for a 2x speedup. The first conclusion is that numarray is considerably slower than Numeric when accessing its data. Besides, when using psyco, part of the Python overhead evaporates, making the gap between Numeric and numarray loops to grow. Introducing oprofile: getting a broad view of what's going on ============================================================= In order to measure the exact difference of __getitem__ method without the Python overhead (in an extension, for example) I've used oprofile against the psyco version of the benchmark. Here is the result for the run with psyco and profiled with oprofile: # opreport /usr/bin/python2.3 samples| %| ------------------ 586 34.1293 libnumarray.so 454 26.4415 python2.3 331 19.2778 _numpy.so 206 11.9977 _ndarray.so 102 5.9406 memory.so 22 1.2813 libc-2.3.2.so 9 0.5242 ld-2.3.2.so 4 0.2330 multiarray.so 2 0.1165 _sort.so 1 0.0582 _psyco.so libnumarray.so, _ndarray.so, memory.so and _sort.so shared libraries all belongs to numarray package. The _numpy.so and multiarray.so fall into Numeric. The time spent in python space is very little (just a 26%, in a great deal thanks to psyco acceleration). The libc-2.3.2.so and ld-2.3.2.so belongs to the C runtime library, and it is not possible to decide whether this time has been used by numarray, Numeric or Python itself, but as the time consumed is very little, we can safely ignore it. So, if we sum the samples when the CPU was in the C space (the shared libs) in numarray, and compare against the time in C space in Numeric, we get that this is 894 against 331, which means that Numeric is 2.7x faster than numarray for __getitem__. Of course, this is more than 1.5x and 2x factor that we get earlier because of the time spent in python space. However, the 2.7x factor is probably more accurate when one wants to exercise __getitem__ in C extensions. Most CPU intensive functions using oprofile ========================================== If we want to look at the most consuming functions in numarray: # opstack -t 1 /usr/bin/python2.3 | sort -nr| head -10 454 26.6432 python2.3 (no symbols) 331 19.4249 _numpy.so (no symbols) 145 8.5094 libnumarray.so NA_getPythonScalar 115 6.7488 libnumarray.so NA_getByteOffset 101 5.9272 libnumarray.so isBufferWriteable 98 5.7512 _ndarray.so _ndarray_subscript 91 5.3404 _ndarray.so _simpleIndexingCore 73 4.2840 libnumarray.so NA_updateDataPtr 64 3.7559 memory.so memory_getbuf 60 3.5211 libnumarray.so getReadBufferDataPtr The _numpy.so was stripped out of debugging info, so we can't see where the time was spent in Numeric. However, we can estimate the cost for getting a fresh pointer for the data buffer for every data access in numarray: isBufferWriteable+NA_updateDataPtr+memory_getbuf+getReadBufferDataPtr gives a total of 298 samples, which is almost as much as all the time spent by the Numeric shared library (331). So we can conclude that having a buffer object in our array object can be a serious drawback if we want to get maximum performance for accessing the data. Another point that can be worth to look at is in NA_getByteOffset that takes 115 samples by itself. This is perhaps a little too much. Conclusions =========== To sum up, we can expect that the __getitem__ method in Numeric would be 1.5x times faster than numarray in pure python code, 2x when using Psyco, and 2.7x times faster when used in C extensions. One factor that (partially) explain that numarray is slower in this area is that it is based on the buffer interface to keep its data. This feature, while very convenient for certain tasks (like sharing data with other Python packages or extensions), has a limitation that make an extension to crash if the memory buffer is reallocated. Other solutions (like the "bytes" object [5]) has been proposed to overcome this limitation (and others) of the buffer interface. Numeric3 might choose this to avoid these kind of contention problems created by the buffer interface. Finally, we have seen how using oprofile could be of unvaluable help for determining where the hot spots are, not only in our extensions, but also in other shared libraries in our system. If the shared libraries also have debugging info on them, then it would be possible to track down even the most expensive routines in our application. Appendix ======== Even though it is in the very early stages of existence, I was curious about how Numeric3 [3] would perform in comparison with Numeric. By slightly changing getitem-numarrayVSNumeric.py, I've come up with getitem-NumericVSNumeric3.py, which do the comparison I wanted to. When running without psyco, I got: $ python2.3 bench/getitem-NumericVSNumeric3.py Psyco not active Numeric version: 23.8 Numeric3 version: Very early alpha release...! Calibration loop: 0.107951593399 Time for Numeric3(getitem)/iter: 1.18472018242e-06 Time for Numeric(getitem)/iter: 2.45458602905e-07 getitem in Numeric is 4.82655799551 times faster Ops, Numeric3 is almost 5 times slower than Numeric. So it really seems to be still in very alpha (you know, premature optimization is the root of all evils). Never mind, this is just an exercise. So, let's continue with the psyco version: $ python2.3 bench/getitem-NumericVSNumeric3.py Psyco active Numeric version: 23.8 Numeric3 version: Very early alpha release...! Calibration loop: 0.00171356201172 Time for Numeric3(getitem)/iter: 1.04013824463e-06 Time for Numeric(getitem)/iter: 1.19578647614e-07 getitem in Numeric is 8.69836099828 times faster The gap has increased to 8.6x as expected. Let's have a look at the most consuming shared libs by using oprofile: # opreport /usr/bin/python2.3 samples| %| ------------------ 1841 33.7365 multiarray.so 1701 31.1710 libc-2.3.2.so 1586 29.0636 python2.3 318 5.8274 _numpy.so 6 0.1100 ld-2.3.2.so 3 0.0550 multiarray.so 2 0.0367 _psyco.so God! two libraries alone are getting more than half of the CPU: multiarray.so and libc-2.3.2.so. As we already know that Numeric3 __getitem__ takes much more time than its counterpart in Numeric, we can conclude that Numeric3 comes with its own multiarray.so, and that it is responsible for taking one third (33.7%) of the time. Moreover, multiarray.so should be the responsible to be calling the libc routines so much, because in our previous benchmarks, the libc calls never took more than 5% of the time, and here is taking more than 30%. To conclude, let's see which are the most consuming routines in Numeric3 for this exercise: # opstack -t 1 /usr/bin/python2.3 | sort -nr| head -20 1586 30.1750 python2.3 (no symbols) 669 12.7283 libc-2.3.2.so __GI___strcasecmp 618 11.7580 multiarray.so PyArray_MapIterNew 374 7.1157 multiarray.so array_subscript 318 6.0502 _numpy.so (no symbols) 260 4.9467 libc-2.3.2.so __realloc 190 3.6149 libc-2.3.2.so _int_malloc 172 3.2725 multiarray.so PyArray_New 152 2.8919 libc-2.3.2.so __strncasecmp 123 2.3402 libc-2.3.2.so malloc_consolidate 121 2.3021 libc-2.3.2.so __memalign_internal 118 2.2451 multiarray.so array_dealloc 102 1.9406 libc-2.3.2.so _int_realloc 93 1.7694 multiarray.so fancy_indexing_check 86 1.6362 multiarray.so arraymapiter_dealloc 79 1.5030 multiarray.so PyArray_Scalar 76 1.4460 multiarray.so LONG_copyswapn 62 1.1796 multiarray.so PyArray_UpdateFlags 57 1.0845 multiarray.so PyArray_DescrFromType While we can see that a lot of time is spent inside the multiarray.so of Numeric3 it also catch our attention that a lot of time is spent doing the __GI___strcasecmp system call. This is very strange, because our arrays are made of integers and calling strcasecmp on each iteration seems like very unnecessary. In order to know who is calling strcasecmp (i.e. get the call tree), oprofile needs a special patched version of the linux kernel. But this is material for another story. References ========== [1] http://numpy.sourceforge.net/ [2] http://stsdas.stsci.edu/numarray/ [3] http://psyco.sourceforge.net/ [4] http://oprofile.sourceforge.net/ [5] http://www.python.org/peps/pep-0296.html -------------- next part -------------- A non-text attachment was scrubbed... Name: getitem-numarrayVSNumeric.py Type: application/x-python Size: 1280 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: getitem-NumericVSNumeric3.py Type: application/x-python Size: 1281 bytes Desc: not available URL: From oliphant at ee.byu.edu Tue Mar 29 18:13:35 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 29 18:13:35 2005 Subject: [Numpy-discussion] large file and array support In-Reply-To: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> Message-ID: <424A0AE6.9090209@ee.byu.edu> There are two distinct issues with regards to large arrays. 1) How do you support > 2Gb memory mapped arrays on 32 bit systems and other large-object arrays only a part of which are in memory at any given time (there is an equivalent problem for > 8 Eb (exabytes) on 64 bit systems, an Exabyte is 2^60 bytes or a giga-giga-byte). 2) Supporting the sequence protocol for in-memory objects on 64-bit systems. Part 2 can be fixed using the recommendations Martin is making and which will likely happen (though it could definitely be done faster). Handling part 1 is more difficult. One idea is to define some kind of "super object" that mediates between the large file and the in-memory portion. In other words, the ndarray is an in-memory object, while the super object handles interfacing it with a larger structure. Thoughts? -Travis From perry at stsci.edu Tue Mar 29 18:26:35 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Mar 29 18:26:35 2005 Subject: [Numpy-discussion] Re: large file and array support In-Reply-To: <424A0AE6.9090209@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <424A0AE6.9090209@ee.byu.edu> Message-ID: <776140fae84b09d015d2508955611c5b@stsci.edu> On Mar 29, 2005, at 9:11 PM, Travis Oliphant wrote: > There are two distinct issues with regards to large arrays. > > 1) How do you support > 2Gb memory mapped arrays on 32 bit systems and > other large-object arrays only a part of which are in memory at any > given time (there is an equivalent problem for > 8 Eb (exabytes) on 64 > bit systems, an Exabyte is 2^60 bytes or a giga-giga-byte). > > 2) Supporting the sequence protocol for in-memory objects on 64-bit > systems. > > Part 2 can be fixed using the recommendations Martin is making and > which will likely happen (though it could definitely be done faster). > Handling part 1 is more difficult. > > One idea is to define some kind of "super object" that mediates > between the large file and the in-memory portion. In other words, the > ndarray is an in-memory object, while the super object handles > interfacing it with a larger structure. > > Thoughts? Maybe I'm missing something but isn't it possible to mmap part of a large file? In that case one just limits the memory maps to what can be handled on a 32 bit system leaving it up to the user software to determine which part of the file to mmap. Did you have something more automatic in mind? As for other large-object arrays I'm not sure what other examples there are other than memory mapping. Do you have any? Perry From pjssilva at ime.usp.br Tue Mar 29 18:45:31 2005 From: pjssilva at ime.usp.br (Paulo J. S. Silva) Date: Tue Mar 29 18:45:31 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <20050329151958.GA28688@idi.ntnu.no> References: <20050329151958.GA28688@idi.ntnu.no> Message-ID: <1112150674.8038.10.camel@localhost.localdomain> Em Ter, 2005-03-29 ?s 17:19 +0200, Magnus Lie Hetland escreveu: > Is there some standard Python (i.e., numarray/Numeric) mapping for > some linear programming package out there? Might be rather useful... > Hello, I have written a very simple wrapper to COIN/CLP (www.coin-or.org) based on swig. I am using this code in my own research. It is simple, but it is good enough for me. I will clean it up a little and "release" it this week. Please enter in contact by Friday with me. Here is a sample code using the wrapper: --- Sample code --- from numarray import * import Coin s = Coin.OsiSolver() # Define objective and variables bounds ncols = 2 obj = array([-1.0, -1.0]) col_lb = array([0.0, 0.0]) col_ub = s.getInfinity()*array([1.0, 1.0]) # Define constraints nrows = 2 row_lb = -s.getInfinity()*array([1.0, 1.0]) row_ub = array([3.0, 3.0]) matrix = Coin.CoinPackedMatrix(0, 0, 0) matrix.setDimensions(0, ncols) row1 = Coin.CoinPackedVector() row1.insert(0, 1.0) row1.insert(1, 2.0) matrix.appendRow(row1) row2 = Coin.CoinPackedVector() row2.insert(0, 2.0) row2.insert(1, 1.0) matrix.appendRow(row2) # Load Problem s.loadProblem(matrix, col_lb, col_ub, obj, row_lb, row_ub) # Write mps model. s.writeMps('example') # Solve problem s.initialSolve() # Print optimal value. print 'Optimal value: ', s.getObjValue() print 'Solution: ', s.getColSolution() --- End sample --- Note that I am using the COIN's sparce matrix and vector so as to use sparcity in the CLP solver. Best, Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: pjssilva at ime.usp.br Web: http://www.ime.usp.br/~pjssilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica.(understand well enough to call practice) From faltet at carabos.com Wed Mar 30 02:45:00 2005 From: faltet at carabos.com (Francesc Altet) Date: Wed Mar 30 02:45:00 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489A65.2030201@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> Message-ID: <200503301240.55483.faltet@carabos.com> A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: > My proposal: > > __array_data__ (optional object that exposes the PyBuffer protocol or a > sequence object, if not present, the object itself is used). > __array_shape__ (required tuple of int/longs that gives the shape of the > array) > __array_strides__ (optional provides how to step through the memory in > bytes (or bits if a bit-array), default is C-contiguous) > __array_typestr__ (optional struct-like string showing the type --- > optional endianness indicater + Numeric3 typechars, default is 'V') > __array_itemsize__ (required if above is 'S', 'U', or 'V') > __array_offset__ (optional offset to start of buffer, defaults to 0) > Considering that heterogenous data is to be suported as well, and there is some tradition of assigning names to the different fields, I wonder if it would not be good to add something like: __array_names__ (optional comma-separated names for record fields) Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From oliphant at ee.byu.edu Wed Mar 30 11:39:02 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 30 11:39:02 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503301240.55483.faltet@carabos.com> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> Message-ID: <424AFFE9.40300@ee.byu.edu> >A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: > > >>My proposal: >> >>__array_data__ (optional object that exposes the PyBuffer protocol or a >>sequence object, if not present, the object itself is used). >>__array_shape__ (required tuple of int/longs that gives the shape of the >>array) >>__array_strides__ (optional provides how to step through the memory in >>bytes (or bits if a bit-array), default is C-contiguous) >>__array_typestr__ (optional struct-like string showing the type --- >>optional endianness indicater + Numeric3 typechars, default is 'V') >>__array_itemsize__ (required if above is 'S', 'U', or 'V') >>__array_offset__ (optional offset to start of buffer, defaults to 0) >> >> >> > >Considering that heterogenous data is to be suported as well, and >there is some tradition of assigning names to the different fields, I >wonder if it would not be good to add something like: > >__array_names__ (optional comma-separated names for record fields) > > > I'm O.K. with that. After more thought, I think using the struct-like typecharacters is not a good idea for the array protocol. I think that the character codes used by the numarray record array: kind_character + byte_width is better. Commas can separate heterogeneous data. The problem is that if the data buffer originally came from a different machine or saved with a different compiler (e.g. a mmap'ed file), then the struct-like typecodes only tell you the c-type that machine thought the data was. It does not tell you how to interpret the data on this machine. So, I think we should use the __array_typestr__ method to pass type information using the kind_character + byte_width method. I'm also going to use this type information for pickles, so that arrays pickled on one machine type will be able to be interpreted on another with ease. Bool -- "b%d" % sizeof(bool) Signed Integer -- "i%d" % sizeof() Unsigned Integer -- "u%d" % sizeof() Float -- "f%d" % sizeof() Complex -- "c%d" % sizeof() Object -- "O%d" % sizeof(PyObject *) --- this would only be useful on shared memory String -- "S%d" % itemsize Unicode -- "U%d" % itemsize Void -- "V%d" % itemsize I also think that rather than attach < or > to the start of the string it would be easier to have another protocol for endianness. Perhaps something like: __array_endian__ (optional Python integer with the value 1 in it). If it is not 1, then a byteswap must be necessary. -Travis From oliphant at ee.byu.edu Wed Mar 30 11:49:03 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 30 11:49:03 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <424AFFE9.40300@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> <424AFFE9.40300@ee.byu.edu> Message-ID: <424B022B.3040004@ee.byu.edu> > > After more thought, I think using the struct-like typecharacters is > not a good idea for the array protocol. I think that the character > codes used by the numarray record array: kind_character + byte_width > is better. Commas can separate heterogeneous data. The problem is > that if the data buffer originally came from a different machine or > saved with a different compiler (e.g. a mmap'ed file), then the > struct-like typecodes only tell you the c-type that machine thought > the data was. It does not tell you how to interpret the data on this > machine. > So, I think we should use the __array_typestr__ method to pass type > information using the kind_character + byte_width method. I'm also > going to use this type information for pickles, so that arrays pickled > on one machine type will be able to be interpreted on another with ease. > > Bool -- "b%d" % sizeof(bool) > Signed Integer -- "i%d" % sizeof() > Unsigned Integer -- "u%d" % sizeof() > Float -- "f%d" % sizeof() > Complex -- "c%d" % sizeof() > Object -- "O%d" % sizeof(PyObject *) --- this > would only be useful on shared memory > String -- "S%d" % itemsize > Unicode -- "U%d" % itemsize > Void -- "V%d" % itemsize Of course with this protocol for the typestr, the array_itemsize is redundant and can disappear. Another reason to like it. > I also think that rather than attach < or > to the start of the string > it would be easier to have another protocol for endianness. Perhaps > something like: > __array_endian__ (optional Python integer with the value 1 in it). > If it is not 1, then a byteswap must be necessary. I'm mixed on this, I could be persuaded either way. -Travis From cookedm at physics.mcmaster.ca Wed Mar 30 13:06:42 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Mar 30 13:06:42 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503301240.55483.faltet@carabos.com> (Francesc Altet's message of "Wed, 30 Mar 2005 12:40:55 +0200") References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> Message-ID: Francesc Altet writes: > A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: >> My proposal: >> >> __array_data__ (optional object that exposes the PyBuffer protocol or a >> sequence object, if not present, the object itself is used). >> __array_shape__ (required tuple of int/longs that gives the shape of the >> array) >> __array_strides__ (optional provides how to step through the memory in >> bytes (or bits if a bit-array), default is C-contiguous) >> __array_typestr__ (optional struct-like string showing the type --- >> optional endianness indicater + Numeric3 typechars, default is 'V') >> __array_itemsize__ (required if above is 'S', 'U', or 'V') >> __array_offset__ (optional offset to start of buffer, defaults to 0) > > Considering that heterogenous data is to be suported as well, and > there is some tradition of assigning names to the different fields, I > wonder if it would not be good to add something like: > > __array_names__ (optional comma-separated names for record fields) A sequence (list or tuple) of strings would be preferable. That removes all worrying about using commas in the names. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Wed Mar 30 15:34:45 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 30 15:34:45 2005 Subject: [Numpy-discussion] Pickle complete (new ideas for Python arrays) Message-ID: <424B3730.4040408@ee.byu.edu> Hi all, Pickling is now implemented for scipy.base (was calling it Numeric3) Anybody wanting to tackle a function to read old Numeric and/or numarray pickles is welcome. I think this could be all in Python. Ideally, we should be able to read these pickles without having those packages installed. I think the PEP for Python should be converted to a bare-bones protocol (e.g. the one that is emerging) Optionally we could create a very simple default arrayobject for Python that just has a default pickle implementation and knows how to get data through the buffer interface from other objects). That way any array implementation just has to talk the Python array protocol to be interoperable with any other array implementation. -Travis From oliphant at ee.byu.edu Thu Mar 31 15:53:01 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 31 15:53:01 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core Message-ID: <424C8D05.7030006@ee.byu.edu> To all interested in the future of arrays... I'm still very committed to Numeric3 as I want to bring the numarray and Numeric people together behind a single array object for scientific computing. But, I've been thinking about the array protocol and thinking that it would be a good thing if this became universal. One of the ways to make it universal is by having something that follows it in the Python core. So, what if we proposed for the Python core not something like Numeric3 (which would still exist in scipy.base and be everybody's favorite array :-) ), but a very minimal array object (scaled back even from Numeric) that followed the array protocol and had some C-API associated with it. This minimal array object would support 5 basic types ('bool', 'integer', 'float', 'complex', 'Object'). (Maybe a void type could be defined and a void "scalar" introduced (which would be the bytes object)). These types correspond to scalars already available in Python and so the whole 0-dim array Python scalar arguments could be ignored. Math could be done without ufuncs initially (people really needing speed would use scipy.base anyway). But, more people in the Python community would be able to use arrays and get used to them. And we would have a reference array_protocol object so that extension writers could write to it. I would not try a project like this until after scipy_core is out, but it's an interesting thing to think about. I mainly wanted feedback on the basic concept. An alternative would be to "add" multidimensionality to the array object already part of Python, fix it's reallocating with an exposed buffer problem, and add the array protocol. -Travis From xscottg at yahoo.com Thu Mar 31 20:14:15 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Mar 31 20:14:15 2005 Subject: [Numpy-discussion] Array Metadata Message-ID: <20050401041204.18335.qmail@web50208.mail.yahoo.com> I got back late last night, and there were lots of things I wanted to comment on. I've put parts of several threads into this one message since they're all dealing with the same general topic: Perry Greenfield wrote: > > I'm not sure how the support for large data sets should be handled. > I generally think that it will be very awkward to handle these > until Python does as well. Speaking of which... > I agree that it's going to be difficult to have general support for large PyBufferProcs objects until the Python core is made 64 bit clean. But specific support can be added for buffer types that are known in advance. For instance, the bytes object PEP proposes an alternate way to get a 64 bit length, and similar support could easily be added to Numarray.memory, mmap.mmap, and whatever else on a case by case basis. So you could get a 64 bit pointer from some types of buffers before the rest of Python becomes 64 bit clean. If the ndarray consumer (wxWindows for instance) doesn't recognize the particular implementation, it has to stick with the limitations of the standard PyBufferProcs and assume a 32 bit length would suffice. Travis Oliphant wrote: > > I prefer __array_data__ (it's a common name for Numeric and > numarray, It can be interpreted as a sequence object if desired). > So long as everyone agrees it doesn't matter what name it is. Sounds like __array_data__ works for everyone. > > I also like __array_typestr__ or __array_typechar__ better as a name. > A name is a name as far as I'm concerned. The name __array_typestr__ works for me. The name __array_typechar__ implies a single character, and that won't be true. > > Don't like 'D' for long double. Complex floats is already > using it. I'm not sure I like the idea of moving to two > character typecodes at this point because it indicates more > internal changes to Numeric3 (otherwise we have two typecharacter > standards which is not a good thing). What is wrong with 'g' > and 'G' for long double and complex long double respectively. > Nothing in this array protocol should *require* internal changes to either Numeric3 or Numarray. I suspect Numarray is going to keep it's type hierarchy, and Numeric3 can use single character codes for it's representation if it wants. However, both Numeric3 and Numarray might (probably would) have to translate their internal array type specifiers into the agreed upon "type code string" when reporting out this attribute. The important qualities __array_typestr__ should have are: 1) Everyone should agree on the interpretation. It needs to be documented somewhere. Third party libraries should get the same __array_typestr__ from Numarray as they do from Numeric3. 2) It should be sufficiently general in it's capabilities to describe a wide category of array types. Simple things should be simple, and harder things should be possible. An ndarray of double should have a simple common well recognized value for __array_typestr__. An ndarray of a multi-field structs should be representable too. > > > > > __array_complicated__ > > > > I don't see the utility here I guess, If it can't be described by a > shape/strides combination then how can it participate in the protocol? > I'm not married to this one. I don't know if Numarray or Numeric3 will ever do such a thing, but I can imagine more complicated schemes of arranging the data than offset/shape/strides are capable of representing. So this is forward compatibility with "Numarric4" :-). Pretty hypothetical, but imagine that typically Numarric4 can represent it's data with offset/shape/strides, but for more advanced operations that falls apart. I could bore you with a detailed example... The idea is that if array consumers like wxPython were aware that more complicated implementations can occur in the future, they can gracefully bow out and raise an exception instead of incorrectly interpreting the data. If you need it later, you can't easily add it after the fact. Take it or leave it I guess - it's possibly a YAGNI. > > After more thought, I think here we need to also allow the > "c-type" independent way of describing an array (i.e. numarray > introduced 'c4' for a complex-valued 4 byte itemsize array). > So, perhaps __array_ctypestr_ and __array_typestr__ should be > two ways to get the information (or overload the __array_typestr__ > interface and reequire consumers to accept either style). > I don't understand what you are proposing here. Why would you want to represent the same information two different ways? Perry Greenfield wrote: > > I think we need to think about what the typecharacter is supposed > to represent. Is it the value as the user will see it or to indicate > what the internal representation is? These are two different things. > I think __array_typestr__ should accurately represent the internal representation. It is not intended for typical end users. The whole of the __array_*metadata*__ stuff is intended for third party libraries like wxPython or PIL to be able to grab a pointer to the data, calculate offsets, and cast it to the appropriate type without writing lots of special case code to handle the differences between Numeric, Numarray, Numeric3, and whatever else. > > Then again, I'm not sure how this info is exposed to the user; if it > is appropriately handled by intermediate code it may not matter. For > example, if this corresponds to what the user will see for the type, > I think it is bad. Most of the time they don't care what the internal > representation is, they just want to know if it is Int16 or whatever; > with the two combined, they have to test for both variants. > Typical users would call whatever attribute or method you prefer (.type() or .typecode() for instance), and the type representation could be classes or typecodes or whatever you think is best. The __array_typestr__ attribute is not for typical users (unless they start to care about the details under the hood). It's for libraries that need to know what's going on in a generic fashion. You don't have to store this attribute as separate data, it can be a property style attribute that calculates it's value dynamically from your own internal representation. Francesc Altet wrote: > > Considering that heterogenous data is to be suported as well, and > there is some tradition of assigning names to the different fields, > I wonder if it would not be good to add something like: > > __array_names__ (optional comma-separated names for record fields) > I really like this idea. Although I agree with David M. Cooke that it should be a tuple of names. Unless there is a use case I'm not considering, it would be preferrable if the names were restricted to valid Python identifiers. Travis Oliphant wrote: > > After more thought, I think using the struct-like typecharacters > is not a good idea for the array protocol. I think that the > character codes used by the numarray record array: kind_character > + byte_width is better. Commas can separate heterogeneous data. > The problem is that if the data buffer originally came from a > different machine or saved with a different compiler (e.g. a mmap'ed > file), then the struct-like typecodes only tell you the c-type that > machine thought the data was. It does not tell you how to interpret > the data on this machine. > The struct module has a portable set of typecodes. They call it "standard", but it's the same thing. The struct module let's you specify either standard or native. For instance, the typecode for "standard long" ("=l") is always 4 bytes while a "native long" ("@l") is likely to be 4 or 8 bytes depending on the platform. The __array_typestr__ codes should require the "standard" sizes. There is a table at the bottom of the documentation that goes into detail: http://docs.python.org/lib/module-struct.html The only problem with the struct module is that it's missing a few types... (long double, PyObject, unicode, bit). > > I also think that rather than attach < or > to the start of the > string it would be easier to have another protocol for endianness. > Perhaps something like: > > __array_endian__ (optional Python integer with the value 1 in it). > If it is not 1, then a byteswap must be necessary. > This has the problem you were just describing. Specifying "byteswapped" like this only tells you if the data was reversed on the machine it came from. It doesn't tell you what is correct for the current machine. Assuming you represented little endian as 0 and big endian as 1, you could always figure out whether to byteswap like this: byteswap = data_endian ^ host_endian Do you want to have an __array_endian__ where 0 indicates "little endian", 1 indicates "big endian", and the default is whatever the current host machine uses? I think this would work for a lot of cases. A limitation of this approach is that it can't adequately represent struct/record arrays where some fields are big endian and others are little endian. > > Bool -- "b%d" % sizeof(bool) > Signed Integer -- "i%d" % sizeof() > Unsigned Integer -- "u%d" % sizeof() > Float -- "f%d" % sizeof() > Complex -- "c%d" % sizeof() > Object -- "O%d" % sizeof(PyObject *) > --- this would only be useful on shared memory > String -- "S%d" % itemsize > Unicode -- "U%d" % itemsize > Void -- "V%d" % itemsize > The above is a nice start at reinventing the struct module typecodes. If you and Perry agree to it, that would be great. A few additions though: I think you're proposing that "struct" or "record" arrays would be a concatenation of the above strings. If so, you'll need an indicator for padding bytes. (You probably know this, but structs in C frequently have wasted bytes inserted by the compiler to make sure data is aligned on the machine addressable boundaries.) I also assume that you intend the ("c%d" % itemsize) to always represent complex floating point numbers. That leaves my favorite example of complex short integer data with no way to be represented... I guess I could get by with "i2i2". How about not having a complex type explicitly, but representing complex data as something like: __array_typestr__ = "f4f4 __array_names__ = ("real", "imag") Just a thought... I do like it though. I think that both Numarray and Numeric3 are planning on storing booleans in a full byte. A typecode for tightly packed bits wouldn't go unused however... > > 1) How do you support > 2Gb memory mapped arrays on 32 bit systems > and other large-object arrays only a part of which are in memory at > any given time > Doing this well is a lot like implementing mmap in user space. I think this is a modification to the buffer protocol, not the array protocol. It would add a bit of complexity if you want to deal with it, but it is doable. Instead of just grabbing a pointer to the whole thing, you need to ask the object to "page in" ranges of the data and give you a pointer that is only valid in that range. Then when you're done with the pointer, you need to explicitly tell the object so that it can write back if necessary and release the memory for other requests. Do you think Numeric3 or Numarray would support this? I think it would be very cool functionality to have. > > (there is an equivalent problem for > 8 Eb (exabytes) on 64 bit > systems, an Exabyte is 2^60 bytes or a giga-giga-byte). > I think it will be at least 10-20 years before we could realisticly exceed a 64 bit address space. Probably a lot longer. That's a billion times more RAM than any machine I've ever worked on, and it's a million times more bytes than any RAID set I've worked with. Are there any super computers approaching this level? Even at Moore's law rates, I'm not worried about that one just yet. > > But, I've been thinking about the array protocol and thinking that > it would be a good thing if this became universal. One of the ways > to make it universal is by having something that follows it in the > Python core. > > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some > C-API associated with it. > > This minimal array object would support 5 basic types ('bool', > 'integer', 'float', 'complex', 'Object'). (Maybe a void type > could be defined and a void "scalar" introduced (which would be > the bytes object)). These types correspond to scalars already > available in Python and so the whole 0-dim array Python scalar > arguments could be ignored. > I really like this idea. It could easily be implemented in C or Python script. Since half it's purpose is for documentation, the Python script implementation might make more sense. Additionally, a module that understood the defaults and did the right thing with the metadata attributes would be useful: def get_ndims(a): return len(a.__array_shape__) def get_offset(a): if hasattr(a, "__array_offset__"): return a.__array_offset__ return 0 def get_strides(a): if hasattr(a, "__array_strides__"): return a.array_strides # build the default strides from the shape def is_c_contiguous(a): shape = a.__array_shape__ strides = get_strides(a) # determine if the strides indicate it is contiguous def is_fortran_contiguous(a): # similar to is_c_contiguous etc... Thes functions could be useful for third party libraries to work with *any* of the array packages. > > An alternative would be to "add" multidimensionality to the array > object already part of Python, fix it's reallocating with an exposed > buffer problem, and add the array protocol. > I'd recommend not breaking backward compatibility on the array.array object, but adding the __array_*metadata*__ attributes wouldn't hurt anything. (The __array_shape__ would always be a tuple of length one, but that's allowed...). Magnus Lie Hetland wrote: > > Wohoo! Niiice :) > > (Okay, a bit "me too"-ish, but I just wanted to contribute some > enthusiasm ;) > I completely agree! :-) Cheers, -Scott From konrad.hinsen at laposte.net Thu Mar 31 23:23:01 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 31 23:23:01 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424C8D05.7030006@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> Message-ID: <324ad11b79f2594d6589ce4dec7ee1e4@laposte.net> On 01.04.2005, at 01:51, Travis Oliphant wrote: > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some C-API > associated with it. What would that minimal array object have in common with the full-size one? A subset of both the Python API and the C API? The data layout? Would the full one be a subtype of the minimal one? I like the idea in principle but I would like to be sure that it doesn't create additional overhead in the full array or in extension modules that use arrays, in the form of additional typecheck and compatibility criteria. Once there is a minimal array type in the core, objects of that type will be circulating and must somehow be handled. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From haase at msg.ucsf.edu Tue Mar 1 09:43:31 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Tue Mar 1 09:43:31 2005 Subject: [Numpy-discussion] bug in pyfits w/ numarray 1.2 Message-ID: <200503010942.41026.haase@msg.ucsf.edu> Hi, After upgrading to the latest numarray we get this error from pyfits: >>> a = U.loadFits(fn) Traceback (most recent call last): File "", line 1, in ? File "/jws30/haase/PrLin/Priithon/useful.py", line 1069, in loadFits return ff[ slot ].data File "/jws30/haase/PrLin/pyfits.py", line 1874, in __getattr__ raw_data = num.fromfile(self._file, type=code, shape=dims) File "/jws30/haase/PrLin0/numarray/numarraycore.py", line 517, in fromfile bytesleft=type.bytes*_gen.product(shape) AttributeError: 'str' object has no attribute 'bytes' >>>pyfits.__version__ '0.9.3 (June 30, 2004)' Looks like pyfits uses a typecode-string 'code' in this line 1874: raw_data = num.fromfile(self._file, type=code, shape=dims) I this supposed to still work in numarray ? Or should pyfits be updated ? I tried num.fromfile(self._file, typecode=code, shape=dims) but 'typecode' doesn't seem an allowed keyword for fromfile() Thanks, Sebastian Haase From cjw at sympatico.ca Tue Mar 1 11:10:17 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Mar 1 11:10:17 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 Message-ID: <4224BDB2.5010203@sympatico.ca> An HTML attachment was scrubbed... URL: From rkern at ucsd.edu Tue Mar 1 12:09:17 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Mar 1 12:09:17 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <4224BDB2.5010203@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> Message-ID: <4224CB47.6030802@ucsd.edu> Colin J. Williams wrote: > I suggest that Numeric3 offers the opportunity to drop the word /rank/ > from its lexicon. "rank" has an established usage long before digital > computers. See: http://mathworld.wolfram.com/Rank.html It also has a well-established usage with multi-arrays. http://mathworld.wolfram.com/TensorRank.html > Perhaps some abbreviation for "Dimensions" would be acceptable. It is also reasonable to say that array([1., 2., 3.]) has 3 dimensions. > Matrix Class > > " A default Matrix class will either inherit from or contain the Python > class". Surely, almost all of the objects above are to be rooted in > "new" style classes. See PEP's 252 and 253 or > http://www.python.org/2.2.2/descrintro.html Sure, but just because inheritance is possible does not entail that it is a good idea. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From konrad.hinsen at laposte.net Wed Mar 2 00:03:16 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Mar 2 00:03:16 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <4224BDB2.5010203@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> Message-ID: On 01.03.2005, at 20:08, Colin J. Williams wrote: > Basic Types > These are, presumably,? intended as the types of the data elements > contained in an Array instance.? I would see then as sub-types of > Array. Element types as subtypes??? > I wonder why there is a need for 30 new types.? Python itself has > about 30 distinct types.? Wouldn't it be more saleable to think in > terms of an Array The Python standard library has hundreds of types, considering that the difference between C types and classes is an implementation detail. > Suppose one has: > import numarray.numerictypes as _nt > > Then, the editor (PythonWin for example) responds to the entry of > "_nt." with a drop down menu offering the available types from which > the user can select one. That sounds interesting, but it looks like this would require specific support from the editor. > I suggest that Numeric3 offers the opportunity to drop the word rank > from its lexicon.? "rank" has an established usage long before digital > computers.? See: http://mathworld.wolfram.com/Rank.html The meaning of "tensor rank" comes very close and was probably the inspiration for the use of this terminology in array system. > Perhaps some abbreviation for "Dimensions" would be acceptable. The equivalent of "rank" is "number of dimensions", which is a bit long for my taste. > len() seems to be treated as a synonym for the number of dimensions.? > Currently, in numarray, it follows the usual sequence of sequences > approach of Python and returns the number of rows in a two dimensional > array. As it should. The rank is given by len(array.shape), which is pretty much a standard idiom in Numeric code. But I don't see any place in the PEP that proposes something different! > Rank-0 arrays and Python Scalars > > Regarding Rank-0 Question 2.? I've already, in effect, answered > "yes".? I'm sure that a more compelling "Pro" could be written Three "pro" argument to be added are: - No risk of user confusion by having two types that are nearly but not exactly the same and whose separate existence can only be explained by the history of Python and NumPy development. - No problems with code that does explicit typechecks (isinstance(x, float) or type(x) == types.FloatType). Although explicit typechecks are considered bad practice in general, there are a couple of valid reasons to use them. - No creation of a dependency on Numeric in pickle files (though this could also be done by a special case in the pickling code for arrays) > The "Con" case is valid but, I suggest, of no great consequence.? In > my view, the important considerations are (a) the complexity of > training the newcomer and (b) whether the added work should be imposed > on the generic code writer or the end user.? I suggest that the aim > should be to make things as easy as possible for the end user. That is indeed a valid argument. > Mapping Iterator > An example could help here.? I am puzzled by "slicing syntax does not > work in constructors.". Python allows the colon syntax only inside square brackets. x[a:b] and x[a:b:c] are fine but it is not possible to write iterator(a:b). One could use iterator[a:b] instead, but this is a bit confusing, as it is not the iterator that is being sliced. Konrad. From cjw at sympatico.ca Wed Mar 2 09:22:16 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Mar 2 09:22:16 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: References: <4224BDB2.5010203@sympatico.ca> Message-ID: <4225F634.1040305@sympatico.ca> konrad.hinsen at laposte.net wrote: > On 01.03.2005, at 20:08, Colin J. Williams wrote: > >> Basic Types >> These are, presumably, intended as the types of the data elements >> contained in an Array instance. I would see then as sub-types of Array. > > > Element types as subtypes??? Sub-types in the sense that, given an instance a of Array, a.elementType gives us the type of the data elements contained in a. > >> I wonder why there is a need for 30 new types. Python itself has >> about 30 distinct types. Wouldn't it be more saleable to think in >> terms of an Array > > > The Python standard library has hundreds of types, considering that > the difference between C types and classes is an implementation detail. > I was thinking of the objects in the types module. >> Suppose one has: >> import numarray.numerictypes as _nt >> >> Then, the editor (PythonWin for example) responds to the entry of >> "_nt." with a drop down menu offering the available types from which >> the user can select one. > > > That sounds interesting, but it looks like this would require specific > support from the editor. > Yes, it is built into Mark Hammond's PythonWin and is a valuable tool. Unfortunately, it is not available for Linux. However, I believe that SciTE and boa-constructor are intended to have the "completion" facility. These open source projects are available both with Linux and Windows. >> I suggest that Numeric3 offers the opportunity to drop the word rank >> from its lexicon. "rank" has an established usage long before >> digital computers. See: http://mathworld.wolfram.com/Rank.html > > > The meaning of "tensor rank" comes very close and was probably the > inspiration for the use of this terminology in array system. Yes: The total number of contravariant and covariant indices of a tensor . The rank of a tensor is independent of the number of dimensions of the space . I was thinking in terms of linear independence, as with Matrix Rank: The rank of a matrix or a linear map is the dimension of the range of the matrix or the linear map , corresponding to the number of linearly independent rows or columns of the matrix, or to the number of nonzero singular values of the map. I guess there has been a tussle between the tensor users and the matrix users for some time. > >> Perhaps some abbreviation for "Dimensions" would be acceptable. > > > The equivalent of "rank" is "number of dimensions", which is a bit > long for my taste. Perhaps nDim, numDim or dim would be acceptable. > >> len() seems to be treated as a synonym for the number of >> dimensions. Currently, in numarray, it follows the usual sequence of >> sequences approach of Python and returns the number of rows in a two >> dimensional array. > > > As it should. The rank is given by len(array.shape), which is pretty > much a standard idiom in Numeric code. But I don't see any place in > the PEP that proposes something different! This was probably my misreading of len(T). > >> Rank-0 arrays and Python Scalars >> >> Regarding Rank-0 Question 2. I've already, in effect, answered >> "yes". I'm sure that a more compelling "Pro" could be written > > > Three "pro" argument to be added are: > > - No risk of user confusion by having two types that are nearly but not > exactly the same and whose separate existence can only be explained > by the history of Python and NumPy development. Thanks, history has a pull in favour of retaining the current approach. > > - No problems with code that does explicit typechecks (isinstance(x, > float) > or type(x) == types.FloatType). Although explicit typechecks are > considered > bad practice in general, there are a couple of valid reasons to use > them. > I would see this as supporting the conversion to a scalar. For example: >>> type(type(x)) >>> isinstance(x, float) True >>> isinstance(x, types.FloatType) True >>> > - No creation of a dependency on Numeric in pickle files (though this > could > also be done by a special case in the pickling code for arrays) > >> The "Con" case is valid but, I suggest, of no great consequence. In >> my view, the important considerations are (a) the complexity of >> training the newcomer and (b) whether the added work should be >> imposed on the generic code writer or the end user. I suggest that >> the aim should be to make things as easy as possible for the end user. > > > That is indeed a valid argument. > >> Mapping Iterator >> An example could help here. I am puzzled by "slicing syntax does >> not work in constructors.". > > > Python allows the colon syntax only inside square brackets. x[a:b] and > x[a:b:c] are fine but it is not possible to write iterator(a:b). One > could use iterator[a:b] instead, but this is a bit confusing, as it is > not the iterator that is being sliced. Thanks. It would be nice if a:b or a:b:c could return a slice object. > > Konrad. > Colin W. From stephen.walton at csun.edu Wed Mar 2 09:26:27 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Mar 2 09:26:27 2005 Subject: [Numpy-discussion] bug in pyfits w/ numarray 1.2 In-Reply-To: <200503010942.41026.haase@msg.ucsf.edu> References: <200503010942.41026.haase@msg.ucsf.edu> Message-ID: <4225F6A1.4020901@csun.edu> Sebastian Haase wrote: >Hi, >After upgrading to the latest numarray we get this error from pyfits: > > >>>>a = U.loadFits(fn) >>>> >>>> >Traceback (most recent call last): > File "", line 1, in ? > File "/jws30/haase/PrLin/Priithon/useful.py", line 1069, in loadFits > return ff[ slot ].data > Are you sure the value of 'slot' and 'ff' in your code are correct. pyfits 0.9.3 and numarray 1.2.2 seem to work fine for me: In [5]: f=pyfits.open(file) In [6]: v=f[0].data In [7]: v? Type: NumArray Base Class: String Form: [[ 221 171 67 ..., 112 -136 12] [ 125 78 159 ..., 249 -345 -260] [ 346 47 250 ..., <...> ..., 206 -106 -127] [ 187 16 218 ..., 342 -243 -59] [ 156 200 279 ..., 138 -209 -230]] Namespace: Interactive Length: 1024 Docstring: Fundamental Numeric Array type The type of each data element, e.g. Int32 byteorder The actual ordering of bytes in buffer: "big" or "little". In [8]: pyfits.__version__ Out[8]: '0.9.3 (June 30, 2004)' In [9]: numarray.__version__ Out[9]: '1.2.2' From southey at uiuc.edu Wed Mar 2 12:15:24 2005 From: southey at uiuc.edu (Bruce Southey) Date: Wed Mar 2 12:15:24 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 Message-ID: <245dddb2.d6e3d971.8a87b00@expms6.cites.uiuc.edu> Hi, >>> I suggest that Numeric3 offers the opportunity to drop the word rank >>> from its lexicon. "rank" has an established usage long before >>> digital computers. See: http://mathworld.wolfram.com/Rank.html >> >> >> The meaning of "tensor rank" comes very close and was probably the >> inspiration for the use of this terminology in array system. > >Yes: The total number of contravariant > and covariant > indices of a tensor >. The rank of a tensor > is independent of the number >of dimensions of the space >. > >I was thinking in terms of linear independence, as with Matrix Rank: The >rank of a matrix or a linear >map is the dimension > of the range > of the matrix > or the linear map >, corresponding to the >number of linearly independent > rows or columns >of the matrix, or to the number of nonzero singular values > of the map. > >I guess there has been a tussle between the tensor users and the matrix >users for some time. > If you come from the linear algebra, rank is the column or row space which is not the current usage in numarray but this is the Matlab usage. The matrix rank doesn't exist in numarray (as such, but can be computed) so the only problem for is remembering what rank provides and avoiding it in numarray. >> >>> Perhaps some abbreviation for "Dimensions" would be acceptable. >> >> >> The equivalent of "rank" is "number of dimensions", which is a bit >> long for my taste. > >Perhaps nDim, numDim or dim would be acceptable. > There needs to be a clarification that by dimensions, one does not mean the number of rows and columns etc. However, taking directly from the numarray manual: "The rank of an array A is always equal to len(A.getshape())." So I would guess the best solution is to find out how people actually use the term 'rank' in Numerical Python applications. Regards Bruce From gc238 at cornell.edu Wed Mar 2 13:11:18 2005 From: gc238 at cornell.edu (Garnet Chan) Date: Wed Mar 2 13:11:18 2005 Subject: [Numpy-discussion] PyObject arrays Message-ID: <33471.128.253.229.184.1109797814.squirrel@128.253.229.184> Hi All, Do PyObject arrays works, more specifically Numeric arrays of Numeric arrays? I've tried: from Numeric import * mat = zeros([2, 2], PyObject) mat[0, 0] = zeros([2, 2]) which gives ValueError: array too large for destination. It seems to be calling PyArray_CopyObject; I noticed that there was some special code to make arrays of strings work, but not for other objects. This is on Python 2.3.4 and Numeric 23.3 thanks, Garnet Chan From oliphant at ee.byu.edu Wed Mar 2 14:20:28 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 2 14:20:28 2005 Subject: [Numpy-discussion] PyObject arrays In-Reply-To: <33471.128.253.229.184.1109797814.squirrel@128.253.229.184> References: <33471.128.253.229.184.1109797814.squirrel@128.253.229.184> Message-ID: <42263BDD.3010503@ee.byu.edu> Garnet Chan wrote: >Hi All, >Do PyObject arrays works, more specifically Numeric arrays of Numeric arrays? > > They probably don't work when the objects are Numeric arrays. It would be nice if they did, but this could take some effort. -Travis From Sebastien.deMentendeHorne at electrabel.com Wed Mar 2 15:24:11 2005 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Wed Mar 2 15:24:11 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-M ar-05 Message-ID: <035965348644D511A38C00508BF7EAEB145CB35D@seacex03.eib.electrabel.be> > It might be useful to have a Table type where there is a header of some sort to keep track, > for each column of the column name name and the datatype in that column, so that the user > could, optionally specify validity checks. Another useful type for arrays representing physical values would be an array that keeps vectors for each dimension with index values. For instance, an object representing temperature at a given time in a given location would consist in data = N x M array of Float64 = [ [ 23, 34, 23], [ 31, 28,29] ] first_axis = N array of time = [ "01/01/2004", "02/01/2004" ] second_axis = M array of location = [ "Paris", "New York" ] All slicing operation would equivalently slice the corresponding axis. Assignment between arrays would be axis coherent (assigning "Paris" in one array to "Paris" in another while putting NaN or 0 if there is no correspondance). If indexing could also be done via component of *_axis, it would be also useful. Several field of applications could benefit of this (econometrics, monte carlo simulation, physical simulation, time series,...). In fact most of real data consist usually of values for tuples of general indices (e.g. temparature@("01/01/2004","Paris")) Hmmm, I think I was just thinking aloud :-) ======================================================= This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. http://www.electrabel.be/homepage/general/disclaimer_EN.asp ======================================================= From konrad.hinsen at laposte.net Thu Mar 3 00:27:19 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 3 00:27:19 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-M ar-05 In-Reply-To: <035965348644D511A38C00508BF7EAEB145CB35D@seacex03.eib.electrabel.be> References: <035965348644D511A38C00508BF7EAEB145CB35D@seacex03.eib.electrabel.be> Message-ID: <6c12559b2562b43c7df9ae564df5443e@laposte.net> On 03.03.2005, at 00:23, Sebastien.deMentendeHorne at electrabel.com wrote: > Another useful type for arrays representing physical values would be an > array that keeps vectors for each dimension with index values. For > instance, > an object representing temperature at a given time in a given location > would > consist in > data = N x M array of Float64 = [ [ 23, 34, 23], [ 31, 28,29] ] > first_axis = N array of time = [ "01/01/2004", "02/01/2004" ] > second_axis = M array of location = [ "Paris", "New York" ] > > All slicing operation would equivalently slice the corresponding axis. That is indeed useful, but rather a class written using arrays than a variety of the basic array type. It's actually pretty straightforward to implement, the most difficult choice being the form of the constructor that gives most flexibility in use. Konrad. From konrad.hinsen at laposte.net Thu Mar 3 00:34:18 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 3 00:34:18 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <4225F634.1040305@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> Message-ID: <72b45bee60a00e5e61b9538359b98e59@laposte.net> On 02.03.2005, at 18:21, Colin J. Williams wrote: > Sub-types in the sense that, given an instance a of Array, > a.elementType gives us the type of the data elements contained in a. Ah, I see, it's just about how to access the type object. That's not my first worry in design questions. Once you can get the object somehow, you can make it accessible in nearly any way you like. >> The Python standard library has hundreds of types, considering that >> the difference between C types and classes is an implementation >> detail. >> > I was thinking of the objects in the types module. Those are just the built-in types. There are no plans to increase their number. > Yes, it is built into Mark Hammond's PythonWin and is a valuable tool. > Unfortunately, it is not available for Linux. However, I believe > that SciTE and boa-constructor are intended to have the "completion" > facility. These open source projects are available both with Linux > and Windows. The number of Python IDEs seems to be growing all the time - I haven't even heard of those. And I am still using Emacs... >> The equivalent of "rank" is "number of dimensions", which is a bit >> long for my taste. > > Perhaps nDim, numDim or dim would be acceptable. As a variable name, fine. As a pseudo-word in normal language, no. Not for me at least. I like sentences to use real, pronouncable words. >> - No problems with code that does explicit typechecks (isinstance(x, >> float) >> or type(x) == types.FloatType). Although explicit typechecks are >> considered >> bad practice in general, there are a couple of valid reasons to use >> them. >> > I would see this as supporting the conversion to a scalar. For > example: But technically it isn't, so some code would cease to work. > Thanks. It would be nice if a:b or a:b:c could return a slice object. That would be difficult to reconcile with Python syntax because of the use of colons in the block structure of the code. The parser (and the programmers' brains) would have to handle stuff like if slice == 1:: pass correctly. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From cjw at sympatico.ca Thu Mar 3 08:47:34 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 3 08:47:34 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <72b45bee60a00e5e61b9538359b98e59@laposte.net> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> Message-ID: <42273F71.9060005@sympatico.ca> konrad.hinsen at laposte.net wrote: > On 02.03.2005, at 18:21, Colin J. Williams wrote: > [snip] > >>> The Python standard library has hundreds of types, considering that >>> the difference between C types and classes is an implementation >>> detail. >>> >> I was thinking of the objects in the types module. > > > Those are just the built-in types. There are no plans to increase > their number. My understanding was that there was to be a new builtin multiarray/Array class/type which eventually would replace the existing array.ArrayType. Thus, for a time at least, there would be at least one new class/type. In addition, it seemed to be proposed that the new class/type would not just be Array but Array_with_Int32, Array_with_Float64 etc.. I'm not too clear on this latter point but Konrad says that there would not be this multiplicity of basic class/type's. > >> Yes, it is built into Mark Hammond's PythonWin and is a valuable >> tool. Unfortunately, it is not available for Linux. However, I >> believe that SciTE and boa-constructor are intended to have the >> "completion" facility. These open source projects are available >> both with Linux and Windows. > > > The number of Python IDEs seems to be growing all the time - I > haven't even heard of those. And I am still using Emacs... Having spent little time with Unices, I'm not familiar with emacs. Another useful facility with PythonWin is that when one enters a class, function or method, followed by "(", the docstring is presented. This is often helpful. Finally, the PythonWin debug facility provides useful context information. Suppose that f1 calls f2 which calls ... fn and that we have a breakpoint in fn, then the current values in each of these contexts is available in a PythonWin panel. > [snip] > >> Thanks. It would be nice if a:b or a:b:c could return a slice object. > > > That would be difficult to reconcile with Python syntax because of > the use of colons in the block structure of the code. The parser (and > the programmers' brains) would have to handle stuff like > > if slice == 1:: > pass > > correctly. > > Konrad. Yes, that it a problem which is not well resolved by requiring that a slice be terminated with a ")", "]", "}" or a space. One of the difficulties is that the slice is not recognized in the current syntax. We have a "slicing" which ties a slice with a primary, but no "slice". Your earlier suggestion that a slice be [a:b:c] is probably better. Then a slicing would be: primary slice which no doubt creates parsing problems. Thomas Wouters proposed a similar structure for a range in PEP204 (http://python.fyxm.net/peps/pep-0204.html), which was rejected. Colin W. From jmiller at stsci.edu Thu Mar 3 09:08:19 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Mar 3 09:08:19 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 Message-ID: <1109869619.19608.16.camel@halloween.stsci.edu> numarray-1.2.3 is a bugfix release for numarray-1.2.2 which fixes a problem with universal function setup caching which noticeably impaired 1.2.2 small array performance. Get it if you are new to numarray, haven't upgraded to 1.2.2 yet, or use a lot of small arrays. numarray-1.2.3 is here: http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=32367 Thanks to Ralf Juengling for quietly reporting this and working with me to identify and fix the problem. From konrad.hinsen at laposte.net Thu Mar 3 09:33:17 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 3 09:33:17 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue Message-ID: Following a bug report concerning ScientificPython with numarray, I noticed an incompatibility between Numeric and numarray, and I am wondering if this is intentional. In Numeric, the result of a comparison operation is an integer array. In numarray, it is a Bool array. Bool arrays seem to behave like Int8 arrays when arithmetic operations are applied. The net result is that print n.add.reduce(n.greater(n.arange(128), -1)) yields -128, which is not what I would expect. I can see two logically coherent points of views: 1) The Numeric view: comparisons yield integer arrays, which may be used freely in arithmetic. 2) The "logician's" view: comparisons yield arrays of boolean values, on which no arithmetic is allowed at all, only logical operations. The first approach is a lot more pragmatic, because there are a lot of useful idioms that use the result of comparisons in arithmetic, whereas an array of boolean values cannot be used for much else than logical operations. And now for my pragmatic question: can anyone come up with a solution that will work under both Numeric an numarray, won't introduce a speed penalty under Numeric, and won't leave the impression that the programmer had had too many beers? There is the quick hack print n.add.reduce(1*n.greater(n.arange(128), -1)) but it doesn't satisfy the last two criteria. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From haase at msg.ucsf.edu Thu Mar 3 09:49:21 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Mar 3 09:49:21 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <1109869619.19608.16.camel@halloween.stsci.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> Message-ID: <200503030948.17122.haase@msg.ucsf.edu> Hi, what is the cvs command to update to the exact same 1.2.3 version using cvs? Also I'm wondering if numarray.__version__ could be more informative about e.g. "1.2" vs. "1.2.2" vs. "1.2.3" ? (What does the 'a' stand for in na.__version__ == '1.2a' ? Does that mean I got it from CVS ? ) Thanks, Sebastian Haase On Thursday 03 March 2005 09:07, Todd Miller wrote: > numarray-1.2.3 is a bugfix release for numarray-1.2.2 which fixes a > problem with universal function setup caching which noticeably impaired > 1.2.2 small array performance. Get it if you are new to numarray, > haven't upgraded to 1.2.2 yet, or use a lot of small arrays. > > numarray-1.2.3 is here: > > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=32367 > > Thanks to Ralf Juengling for quietly reporting this and working with me > to identify and fix the problem. > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jmiller at stsci.edu Thu Mar 3 10:34:26 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Mar 3 10:34:26 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <200503030948.17122.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <200503030948.17122.haase@msg.ucsf.edu> Message-ID: <1109874753.19608.24.camel@halloween.stsci.edu> On Thu, 2005-03-03 at 12:48, Sebastian Haase wrote: > Hi, > what is the cvs command to update to the exact same 1.2.3 version using cvs? % cvs update -r v1_2_3 > Also I'm wondering if numarray.__version__ could be more informative about > e.g. "1.2" vs. "1.2.2" vs. "1.2.3" ? They're already OK I think, just like you're showing above. Do you want something else? > (What does the 'a' stand for in na.__version__ == '1.2a' ? Does that mean I > got it from CVS ? ) The 'a' in 1.2a stands for "optimism". It actually took 1.2, 1.2.1, 1.2.2 to get to 1.2.3. My original plan was 1.2a, pass go, 1.2... it just didn't work out that way. Regards, Todd > Thanks, > Sebastian Haase > > > > On Thursday 03 March 2005 09:07, Todd Miller wrote: > > numarray-1.2.3 is a bugfix release for numarray-1.2.2 which fixes a > > problem with universal function setup caching which noticeably impaired > > 1.2.2 small array performance. Get it if you are new to numarray, > > haven't upgraded to 1.2.2 yet, or use a lot of small arrays. > > > > numarray-1.2.3 is here: > > > > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=32367 > > > > Thanks to Ralf Juengling for quietly reporting this and working with me > > to identify and fix the problem. > > > > > > > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real users. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From haase at msg.ucsf.edu Thu Mar 3 11:14:26 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Mar 3 11:14:26 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <1109874753.19608.24.camel@halloween.stsci.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <200503030948.17122.haase@msg.ucsf.edu> <1109874753.19608.24.camel@halloween.stsci.edu> Message-ID: <200503031113.15222.haase@msg.ucsf.edu> On Thursday 03 March 2005 10:32, Todd Miller wrote: > On Thu, 2005-03-03 at 12:48, Sebastian Haase wrote: > > Hi, > > what is the cvs command to update to the exact same 1.2.3 version using > > cvs? > > % cvs update -r v1_2_3 I just did this - but comparing with the 1.2.3 from sourceforge I have some files, e.g. Examples/ufunc/Src/airy.h only in the CVS version !? Thanks, Sebastian Haase From jmiller at stsci.edu Thu Mar 3 11:40:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Mar 3 11:40:22 2005 Subject: [Numpy-discussion] ANN: numarray-1.2.3 In-Reply-To: <200503031113.15222.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <200503030948.17122.haase@msg.ucsf.edu> <1109874753.19608.24.camel@halloween.stsci.edu> <200503031113.15222.haase@msg.ucsf.edu> Message-ID: <1109878692.19608.97.camel@halloween.stsci.edu> On Thu, 2005-03-03 at 14:13, Sebastian Haase wrote: > On Thursday 03 March 2005 10:32, Todd Miller wrote: > > On Thu, 2005-03-03 at 12:48, Sebastian Haase wrote: > > > Hi, > > > what is the cvs command to update to the exact same 1.2.3 version using > > > cvs? > > > > % cvs update -r v1_2_3 > > I just did this - but comparing with the 1.2.3 from sourceforge I have some > files, e.g. > Examples/ufunc/Src/airy.h > only in the CVS version !? airy.h exists now for me on both the CVS head and 1.2.3. airy.h did not always exist throughout the entire pre-release lifespan of version 1.2 so if you did a checkout (cvs checkout numarray or cvs update numarray) and saw 1.2, there's no guarantee what the state of airy.h would have been. CVS versions just tend to be stale. I tag CVS and change the numarray version only when I do a tarball or semi-formal tests involving other people. Also note that CVS can be used with dates rather than version numbers or tags, so there is some recourse even when numarray.__version__ isn't telling the whole story. Regards, Todd From oliphant at ee.byu.edu Thu Mar 3 11:41:24 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 3 11:41:24 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <42273F71.9060005@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> <42273F71.9060005@sympatico.ca> Message-ID: <42276807.8020007@ee.byu.edu> Colin J. Williams wrote: > > My understanding was that there was to be a new builtin > multiarray/Array class/type which eventually would replace the > existing array.ArrayType. Thus, for a time at least, there would be > at least one new class/type. The new type will actually be in the standard library. For backwards compatibility we will not be replacing the existing array.ArrayType but providing an additional ndarray.ndarray (or some such name -- the name hasn't been finalized yet). > > In addition, it seemed to be proposed that the new class/type would > not just be Array but Array_with_Int32, Array_with_Float64 etc.. I'm > not too clear on this latter point but Konrad says that there would > not be this multiplicity of basic class/type's. The arrays have always been homogeneous collections of "something". This 'something' has been indicated by typecodes characters (Numeric) or Python classes (numarray). The proposal is that the "something" that identifies what the homogeneous arrays are collections of will be actual type objects. Some of these type objects are just "organizational types" which help to classify the different kinds of homogeneous arrays. The "leaf-node" types are also the types of new Python scalars that act as a transition layer between ndarrays with their variety of objects and traditional Python bool, int, float, complex, string, and unicode objects which do not "understand" that they could be considered as 0-dimensional arrays. -Travis From haase at msg.ucsf.edu Thu Mar 3 11:41:32 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Mar 3 11:41:32 2005 Subject: [Numpy-discussion] Re: ANN: numarray-1.2.3 -- segfault in in my C program In-Reply-To: <200503031113.15222.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <1109874753.19608.24.camel@halloween.stsci.edu> <200503031113.15222.haase@msg.ucsf.edu> Message-ID: <200503031140.21522.haase@msg.ucsf.edu> Hi, After upgrading from numarray 1.1 (now 1.2.3) We get a Segmentation fault in our C++ program on Linux (python2.2,gcc2.95) , gdb says this: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1087498336 (LWP 8279)] 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 (gdb) where #0 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 #1 0x410f905e in deferred_libnumarray_init () at Src/libnumarraymodule.c:149 #2 0x410f98a8 in NA_NewAllFromBuffer (ndim=3, shape=0xbffff2e4, type=tFloat32, bufferObject=0x8a03988, byteoffset=0, bytestride=0, byteorder=0, aligned=1, writeable=1) at Src/ libnumarraymodule.c:636 #3 0x0805b159 in MyApp::OnInit (this=0x8108f50) at omx_app.cpp:519 #4 0x4026f616 in wxEntry () from /jws30/haase/PrLin0/wxGtkLibs/ libwx_gtk-2.4.so #5 0x0805a91a in main (argc=1, argv=0xbffff414) at omx_app.cpp:247 To initialize libnumarray I was using this: { // import_libnumarray(); { PyObject *module = PyImport_ImportModule("numarray.libnumarray"); if (!module) Py_FatalError("Can't import module 'numarray.libnumarray'"); if (module != NULL) { PyObject *module_dict = PyModule_GetDict(module); PyObject *c_api_object = PyDict_GetItemString(module_dict, "_C_API"); if (PyCObject_Check(c_api_object)) { libnumarray_API = (void **)PyCObject_AsVoidPtr(c_api_object); } else { Py_FatalError("Can't get API for module 'numarray.libnumarray'"); } } } } Any idea ? Thanks, Sebastian Haase From cjw at sympatico.ca Thu Mar 3 12:14:19 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Mar 3 12:14:19 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <42276807.8020007@ee.byu.edu> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> <42273F71.9060005@sympatico.ca> <42276807.8020007@ee.byu.edu> Message-ID: <42276FE5.5000303@sympatico.ca> Travis Oliphant wrote: > Colin J. Williams wrote: > >> >> My understanding was that there was to be a new builtin >> multiarray/Array class/type which eventually would replace the >> existing array.ArrayType. Thus, for a time at least, there would be >> at least one new class/type. > > > The new type will actually be in the standard library. For backwards > compatibility we will not be replacing the existing array.ArrayType > but providing an additional ndarray.ndarray (or some such name -- the > name hasn't been finalized yet). > >> >> In addition, it seemed to be proposed that the new class/type would >> not just be Array but Array_with_Int32, Array_with_Float64 etc.. I'm >> not too clear on this latter point but Konrad says that there would >> not be this multiplicity of basic class/type's. > > > The arrays have always been homogeneous collections of "something". > This 'something' has been indicated by typecodes characters (Numeric) > or Python classes (numarray). The proposal is that the "something" > that identifies what the homogeneous arrays are collections of will be > actual type objects. Some of these type objects are just > "organizational types" which help to classify the different kinds of > homogeneous arrays. The "leaf-node" types are also the types of new > Python scalars that act as a transition layer between ndarrays with > their variety of objects and traditional Python bool, int, float, > complex, string, and unicode objects which do not "understand" that > they could be considered as 0-dimensional arrays. > Thanks. This clarifies things. These 'somethingTypes' would presumably not be in the standard library but in some module like Numeric3.numerictypes. Colin W. From stephen.walton at csun.edu Thu Mar 3 17:00:05 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Mar 3 17:00:05 2005 Subject: [Numpy-discussion] bdist-rpm problem Message-ID: <4227B2AC.1080200@csun.edu> Hi, All, A week or so ago, I posted to matplotlib-users about a problem with bdist_rpm. I'd asked about python 2.3 on Fedora Core 1. It turns out there are two problems. One is that even if one has python2.3 and python2.2 installed, bdist_rpm always calls the interpreter named 'python', which is 2.2 on FC1. The other problem is that in bdist_rpm.py there is a set of lines near line 307 which tests if the number of generated RPM files is 1. This fails because all of matplotlib, numeric, numarray and scipy generate a debuginfo RPM when one does 'python setup.py bdist_rpm'. (Why the RPM count doesn't fail with Python 2.3 on FC3 is beyond me, but nevermind.) The patch is at http://opensvn.csie.org/pyvault/rpms/trunk/python23/python-2.3.4-distutils-bdist-rpm.patch and I have verified that after applying this patch to /usr/lib/python2.2/distutils/command/bdist_rpm.py on FC1 that 'python setup.py bdist_rpm' works for numarray 1.2.2, scipy current CVS, and matplotlib 0.72 (after changing setup.py for python2.2 as documented in the latter). It still fails with Numeric 23.6 however for reasons I'm still checking into; the failed "setup.py bdist_rpm" claims that arraytypes.c doesn't exist. Steve Walton From stephen.walton at csun.edu Thu Mar 3 17:02:42 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Mar 3 17:02:42 2005 Subject: [Numpy-discussion] Re: [SciPy-user] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: <4227B375.8000200@csun.edu> Stephen Walton wrote: > [bdist_rpm] still fails with Numeric 23.6 however for reasons I'm > still checking into; Posted too soon; this problem is fixed at Numeric 23.7. From pearu at scipy.org Thu Mar 3 17:05:26 2005 From: pearu at scipy.org (Pearu Peterson) Date: Thu Mar 3 17:05:26 2005 Subject: [Numpy-discussion] Re: [SciPy-user] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: On Thu, 3 Mar 2005, Stephen Walton wrote: > Hi, All, > > A week or so ago, I posted to matplotlib-users about a problem with > bdist_rpm. I'd asked about python 2.3 on Fedora Core 1. > > It turns out there are two problems. One is that even if one has python2.3 > and python2.2 installed, bdist_rpm always calls the interpreter named > 'python', which is 2.2 on FC1. Using `bdist_rpm --fix-python` should take care of this issue. Pearu From oliphant at ee.byu.edu Thu Mar 3 17:24:41 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 3 17:24:41 2005 Subject: [Numpy-discussion] CVS version of Numeric3 compiles again Message-ID: <4227B88C.1060400@ee.byu.edu> For any that tried to check out the CVS version of numeric3 while it was in a transition state of adding the new Python Scalar Objects, you can now try again and help me test it. The current CVS version of numeric3 builds on linux (there is some magic in the setup.py file to do some autoconfiguration stuff that I would like to see if it works on other platforms). The arrayobject is nearing completion. There are only a couple of things left to do before I can start tackling the ufuncobject (which is part-way transitioned but needs more numarray-inspired fixes). If anyone would like to help, now is a good time, since at least the codebase should compile and you can play with it. Best regards, -Travis From Fernando.Perez at colorado.edu Thu Mar 3 17:25:01 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Mar 3 17:25:01 2005 Subject: [Numpy-discussion] Re: [SciPy-user] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: <4227B86E.6070108@colorado.edu> Stephen Walton wrote: > Hi, All, > > A week or so ago, I posted to matplotlib-users about a problem with > bdist_rpm. I'd asked about python 2.3 on Fedora Core 1. > > It turns out there are two problems. One is that even if one has > python2.3 and python2.2 installed, bdist_rpm always calls the > interpreter named 'python', which is 2.2 on FC1. The other problem is You need to 'fix' the python version to be called inside the actual rpm build. From the ipython release script: # A 2.4-specific RPM, where we must use the --fix-python option to ensure that # the resulting RPM is really built with 2.4 (so things go to # lib/python2.4/...) python2.4 ./setup.py bdist_rpm --release=py24 --fix-python > that in bdist_rpm.py there is a set of lines near line 307 which tests > if the number of generated RPM files is 1. This fails because all of > matplotlib, numeric, numarray and scipy generate a debuginfo RPM when > one does 'python setup.py bdist_rpm'. (Why the RPM count doesn't fail > with Python 2.3 on FC3 is beyond me, but nevermind.) The patch is at This problem has been fixed in recent 2.3 and 2.4. 2.2 still has it. Best, f From nwagner at mecha.uni-stuttgart.de Fri Mar 4 00:40:27 2005 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Fri Mar 4 00:40:27 2005 Subject: [Numpy-discussion] PyTrilinos - Python interface to Trilinos libraries Message-ID: <42281E8C.60800@mecha.uni-stuttgart.de> Hi all, A new release of Trilinos is available. It includes a Python interface to Trilinos libraries. http://software.sandia.gov/trilinos/release_5.0_notes.html Regards, Nils From konrad.hinsen at laposte.net Fri Mar 4 02:10:21 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 4 02:10:21 2005 Subject: [Numpy-discussion] bdist-rpm problem In-Reply-To: <4227B2AC.1080200@csun.edu> References: <4227B2AC.1080200@csun.edu> Message-ID: <034e251d7331d04a59b2c6785a094eb5@laposte.net> On Mar 4, 2005, at 1:58, Stephen Walton wrote: > It turns out there are two problems. One is that even if one has > python2.3 and python2.2 installed, bdist_rpm always calls the > interpreter named 'python', which is That can be changed with the option "--python". > 2.2 on FC1. The other problem is that in bdist_rpm.py there is a set > of lines near line 307 which tests if the number of generated RPM > files is 1. This fails because all of matplotlib, numeric, numarray > and scipy generate a debuginfo RPM when one does 'python setup.py > bdist_rpm'. (Why the RPM count doesn't fail with This is a common problem, but it is safe to ignore the error message, the RPMs are fine. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From konrad.hinsen at laposte.net Fri Mar 4 05:39:15 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 4 05:39:15 2005 Subject: [Numpy-discussion] Some comments on the Numeric3 Draft of 1-Mar-05 In-Reply-To: <42273F71.9060005@sympatico.ca> References: <4224BDB2.5010203@sympatico.ca> <4225F634.1040305@sympatico.ca> <72b45bee60a00e5e61b9538359b98e59@laposte.net> <42273F71.9060005@sympatico.ca> Message-ID: <51fd185dec4aa52157a8d2257c895e7d@laposte.net> On Mar 3, 2005, at 17:46, Colin J. Williams wrote: >> Those are just the built-in types. There are no plans to increase >> their number. > > My understanding was that there was to be a new builtin > multiarray/Array class/type which eventually would replace the > existing array.ArrayType. Thus, for a time Neither the current array type nor the proposed multiarray type are builtin types. They are types defined in modules belonging to the standard library. > Yes, that it a problem which is not well resolved by requiring that a > slice be terminated with a ")", "]", "}" or a space. One of the > difficulties is that the slice is not recognized in the current > syntax. We have a "slicing" which ties a slice with a It is, but in the form of a standard constructor: slice(a, b, c). > Thomas Wouters proposed a similar structure for a range in PEP204 > (http://python.fyxm.net/peps/pep-0204.html), which was rejected. > We would probably face the same problem: a syntax change must matter to many people to have a chance of being accepted. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From cosbys at yahoo.com Fri Mar 4 06:53:49 2005 From: cosbys at yahoo.com (kristen kaasbjerg) Date: Fri Mar 4 06:53:49 2005 Subject: [Numpy-discussion] Problem with dashes and savefig in 0.72.1 In-Reply-To: 6667 Message-ID: <20050304144913.46367.qmail@web52903.mail.yahoo.com> 1) Running the dash_control.py example I get the following error message (the problem is present on both linux and windows installations): Traceback (most recent call last): File "/usr/lib/python2.3/lib-tk/Tkinter.py", line 1345, in __call__ return self.func(*args) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_tkagg.py", line 140, in resize self.show() File "/home/camp/s991416/lib/python/matplotlib/backends/backend_tkagg.py", line 143, in draw FigureCanvasAgg.draw(self) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_agg.py", line319, in draw self.figure.draw(self.renderer) File "/home/camp/s991416/lib/python/matplotlib/figure.py", line 338, in draw for a in self.axes: a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/axes.py", line 1296, in draw a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 283, in draw lineFunc(renderer, gc, xt, yt) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 543, in _draw_dashed renderer.draw_lines(gc, xt, yt, self._transform) TypeError: CXX: type error. 2) And when using savefig I get : Traceback (most recent call last): File "dash_control.py", line 13, in ? savefig('dash_control') File "/home/camp/s991416/lib/python/matplotlib/pylab.py", line 763, in savefig try: ret = fig.savefig(*args, **kwargs) File "/home/camp/s991416/lib/python/matplotlib/figure.py", line 455, in savefig self.canvas.print_figure(*args, **kwargs) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_tkagg.py", line 161, in print_figure agg.print_figure(filename, dpi, facecolor, edgecolor, orientation) File "/home/camp/s991416/lib/python/matplotlib/backends/backend_agg.py", line370, in print_figure self.draw() File "/home/camp/s991416/lib/python/matplotlib/backends/backend_agg.py", line319, in draw self.figure.draw(self.renderer) File "/home/camp/s991416/lib/python/matplotlib/figure.py", line 338, in draw for a in self.axes: a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/axes.py", line 1296, in draw a.draw(renderer) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 283, in draw lineFunc(renderer, gc, xt, yt) File "/home/camp/s991416/lib/python/matplotlib/lines.py", line 543, in _draw_dashed renderer.draw_lines(gc, xt, yt, self._transform) __________________________________ Celebrate Yahoo!'s 10th Birthday! Yahoo! Netrospective: 100 Moments of the Web http://birthday.yahoo.com/netrospective/ From jmiller at stsci.edu Fri Mar 4 07:04:31 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Mar 4 07:04:31 2005 Subject: [Numpy-discussion] Re: ANN: numarray-1.2.3 -- segfault in in my C program In-Reply-To: <200503031140.21522.haase@msg.ucsf.edu> References: <1109869619.19608.16.camel@halloween.stsci.edu> <1109874753.19608.24.camel@halloween.stsci.edu> <200503031113.15222.haase@msg.ucsf.edu> <200503031140.21522.haase@msg.ucsf.edu> Message-ID: <1109948612.19608.210.camel@halloween.stsci.edu> >From what you're showing me, it looks like libnumarray initialization is failing which makes me suspect a corrupted numarray installation. Here are some things to try: 1. Completely delete your existing site-packages/numarray. Also delete numarray/build then re-install numarray. 2. Delete and re-install your extensions. In principle, numarray-1.2.3 is supposed to be binary compatible with numarray-1.1.1 but maybe I'm mistaken. 3. Hopefully you won't get this far but... a python which works well with gdb can be built from source using ./configure --with-pydebug. So a debug scenario is something like: % tar zxf Python-2.2.3.tar.gz % cd Python-2.2.3 % ./configure --with-pydebug --prefix=$HOME % make % make install % cd .. % tar zxf numarray-1.2.3.tar.gz % cd numarray-1.2.3 % python setup.py install % cd .. % tar zxf your_stuff.tar.gz % cd your_stuff % python setup.py install This makes a debug Python installed in $HOME/bin, $HOME/lib, and $HOME/include. This process is useful for compiling Python itself and extensions with "-g -O0" and hence gdb works better. Besides appropriate compiler switches, debug Python also has more robust object memory management and better tracked reference counting. Debug like this: % setenv PATH $HOME/bin:$PATH # export if you use bash % rehash % gdb python (gdb) run >>> (gdb) l , # to see some code (gdb) p (gdb) up # Move up the stack frame to see where the bogus value came from Regards, Todd On Thu, 2005-03-03 at 14:40, Sebastian Haase wrote: > Hi, > After upgrading from numarray 1.1 (now 1.2.3) > We get a Segmentation fault in our C++ program on Linux (python2.2,gcc2.95) , > gdb says this: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 1087498336 (LWP 8279)] > 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 > (gdb) where > #0 0x406d68d5 in PyObject_GetAttrString () from /usr/lib/libpython2.2.so.0.0 > #1 0x410f905e in deferred_libnumarray_init () at Src/libnumarraymodule.c:149 > #2 0x410f98a8 in NA_NewAllFromBuffer (ndim=3, shape=0xbffff2e4, > type=tFloat32, bufferObject=0x8a03988, byteoffset=0, > bytestride=0, byteorder=0, aligned=1, writeable=1) at Src/ > libnumarraymodule.c:636 > #3 0x0805b159 in MyApp::OnInit (this=0x8108f50) at omx_app.cpp:519 > #4 0x4026f616 in wxEntry () from /jws30/haase/PrLin0/wxGtkLibs/ > libwx_gtk-2.4.so > #5 0x0805a91a in main (argc=1, argv=0xbffff414) at omx_app.cpp:247 > > > To initialize libnumarray I was using this: > { > // import_libnumarray(); > { > PyObject *module = PyImport_ImportModule("numarray.libnumarray"); > if (!module) > Py_FatalError("Can't import module 'numarray.libnumarray'"); > if (module != NULL) { > PyObject *module_dict = PyModule_GetDict(module); > PyObject *c_api_object = > PyDict_GetItemString(module_dict, "_C_API"); > if (PyCObject_Check(c_api_object)) { > libnumarray_API = (void **)PyCObject_AsVoidPtr(c_api_object); > } else { > Py_FatalError("Can't get API for module 'numarray.libnumarray'"); > } > } > } > } > > Any idea ? > > Thanks, > Sebastian Haase > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From perry at stsci.edu Fri Mar 4 07:49:14 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Mar 4 07:49:14 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: On Mar 3, 2005, at 12:31 PM, konrad.hinsen at laposte.net wrote: > Following a bug report concerning ScientificPython with numarray, I > noticed an incompatibility between Numeric and numarray, and I am > wondering if this is intentional. > > In Numeric, the result of a comparison operation is an integer array. > In numarray, it is a Bool array. Bool arrays seem to behave like Int8 > arrays when arithmetic operations are applied. The net result is that > > print n.add.reduce(n.greater(n.arange(128), -1)) > > yields -128, which is not what I would expect. > > I can see two logically coherent points of views: > > 1) The Numeric view: comparisons yield integer arrays, which may be > used freely in arithmetic. > > 2) The "logician's" view: comparisons yield arrays of boolean values, > on which no arithmetic is allowed at all, only logical operations. > > The first approach is a lot more pragmatic, because there are a lot of > useful idioms that use the result of comparisons in arithmetic, > whereas an array of boolean values cannot be used for much else than > logical operations. > > And now for my pragmatic question: can anyone come up with a solution > that will work under both Numeric an numarray, won't introduce a speed > penalty under Numeric, and won't leave the impression that the > programmer had had too many beers? There is the quick hack > > print n.add.reduce(1*n.greater(n.arange(128), -1)) > > but it doesn't satisfy the last two criteria. First of all, isn't the current behavior a little similar to Python in that Python Booleans aren't pure either (for backward compatibility purposes)? I think this has come up in the past, and I thought that one possible solution was to automatically coerce all integer reductions and accumulations to Int32 to avoid overflow issues. That had been discussed before and apparently many preferred avoiding automatic promotion (the reductions allow specifying a new type for the reduction, but I don't believe that helps your specific example for code that works for both). Using .astype(Int32) should work for both, right? (or is that too much of a speed hit?) But it is a fair question to ask if arithmetic operations should be allowed on booleans without explicit casts. Perry From stephen.walton at csun.edu Fri Mar 4 09:29:18 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Fri Mar 4 09:29:18 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: Message-ID: <42289A75.8090907@csun.edu> Perry Greenfield wrote: > On Mar 3, 2005, at 12:31 PM, konrad.hinsen at laposte.net wrote: > >> print n.add.reduce(n.greater(n.arange(128), -1)) >> >> yields -128, which is not what I would expect. >> >> > > I think this has come up in the past, It has. I think I commented on it some time back, and the consensus was that, as Perry suggested, using .astype(Int32) is the best fix. I think the fact that arithmetic is allowed on booleans without casts is an oversight; standard Python 2.3 allows you to do True+False. Fortran would never let you do .TRUE.+.FALSE. :-) . From konrad.hinsen at laposte.net Fri Mar 4 10:25:43 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Mar 4 10:25:43 2005 Subject: [Numpy-discussion] Numeric/numarray compatibility issue In-Reply-To: References: bm.shape = bm.itemtype = The bm.buffer and bm.shape attributes are pretty obvious. I would suggest that the bm.itemtype borrow it's typecodes from the Python struct module, but anything that everyone agreed on would work. (The struct module is nice because it is already documented and supports native and portable types of many sizes in both endians. It also supports composite struct types.) Those attributes are sufficient for someone to *produce* an N-Dimensional array that could be understood by many libraries. Someone who *consumes* the data would need to know a few more: bm.offset = bm.strides = The value of bm.offset would default to zero if it wasn't present, and the the tuple bm.strides could be generated from the shape assuming it was a C style array. Subscripting operations that returned non-contiguous views of shared data could change bm.offset to non-zero. Subscripting would also affect the bm.strides, and creating a Fortran style array would require bm.strides to be present. You might also choose to add bm.itemsize in addition to bm.itemtype when you can describe how big elements are, but you can't sufficiently describe what the data is using the agreed upon typecodes. This would be uncommon. The default for bm.itemsize would come from struct.calcsize(bm.itemtype). You might also choose to add bm.complicated for when the array layout can't be described by the shape/offset/stride combination. For instance bm.complicated might get used when creating views from more sophisticated subscripting operations like index arrays or mask arrays. Although it looks like Numeric3 plans on making new contiguous copies in those cases. The C implementations of arrays would only have to add getattr like methods, and the data could be stored very compactly. >From those minimum 5-7 attributes (metadata), an N-Dimensional array consumer could determine most everything it needed to know about the data. Simple routines could determine things like iscontiguous(bm), iscarray(bm) or isfortran(bm). I expect libraries like wxPython or PIL could punt (raise an exception) when the water gets too deep. It also doesn't prohibit other attributes from being added. Just because an N-Dimensional array described it's itemtype using the struct module typecodes doesn't mean that it couldn't implement more sophisticated typing hierarchies with a different attribute. There are a few commonly used types like "long double" which are not supported by the struct module, but this could be addressed with a little discussion. Also you might want a "bit" or "Object" typecode for tightly packed mask arrays and Object arrays. The names could be argued about, and something like: bm.__array_buffer__ bm.__array_shape__ bm.__array_itemtype__ bm.__array_offset__ bm.__array_strides__ bm.__array_itemsize__ bm.__array_complicated__ would really bring home the notion that the attributes are a description of what it means to participate in an N-Dimensional array protocol. Plus names this long and ugly are unlikely to step on the existing attributes already in use by Numeric3 and Numarray. :-) Anyway, I proposed this a long time ago, but the belief was that one of the standard array packages would make it into the core very soon. With a standard array library in the core, there wouldn't be as much need for general interoperability like this. Everyone could just use the standard. Maybe that position would change now that Numeric3 and Numarray both look to have long futures. Even if one package made it in, the other is likely to live on. I personally think the competition is a good thing. We don't need to have only one array package to get interoperability. I would definitely like to see the Python core acquire a full fledged array package like Numeric3 or Numarray. When I log onto a new Linux or MacOS machine, the array package would just be there. No installs, no hassle. But I still think a simple community agreed upon set of attributes like this would be a good idea. --- Peter Verveer wrote: > > It think it would be a real shame not to support non-contiguous data. > It would be great if such a byte object could be used instead of > Numeric/numarray arrays when writing extensions. Then I could write C > extensions that could be made available very easily/efficiently to any > package supporting it without having to worry about the specific C api > of those packages. If only contiguous byte objects are supported that > byte object is not a good option anymore for implementing extensions > for Numeric unless I am prepared to live with a lot of copying of > non-contiguous arrays. > I'm hoping I made a good case for a slightly different strategy above. But even if the metadata did go into the bytes object itself, the metadata could describe a non-contiguous layout on top of the contiguous chunk of memory. There is another really valid argument for using the strategy above to describe metadata instead of wedging it into the bytes object: The Numeric community could agree on the metadata attributes and start using it *today*. If you wait until someone commits the bytes object into the core, it won't be generally available until Python version 2.5 at the earliest, and any libraries that depended on using bytes stored metadata would not work with older versions of Python. Cheers, -Scott From xscottg at yahoo.com Fri Mar 25 23:16:08 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Mar 25 23:16:08 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: 6667 Message-ID: <20050326071530.56285.qmail@web50202.mail.yahoo.com> --- Stephen Walton wrote: > Travis Oliphant wrote: > > > Well, rank-0 arrays are and forever will be mutable. But, Python > > scalars (and the new Array-like Scalars) are not mutable. > > This is a really minor point, and only slightly relevant to the > discussion, and perhaps I'm just revealing my Python ignorance again, > but: what does it mean for a scalar to be mutable? I can understand > that one wants a[0]=7 to be allowed when a is a rank-0 array, and I also > understand that str[k]='b' where str is a string is not allowed because > strings are immutable. But if I type "b=7" followed by "b=3", do I > really care whether the 3 gets stuck in the same memory location > previously occupied by the 7 (mutable) or the symbol b points to a new > location containing a 3 (immutable)? What are some circumstances where > this might matter? > It's nice because it fits with the rest of the array semantics and creates a consistant system: Array3D = zeros((1, 1, 1)) Array2D = Array3D[0] Array1D = Array2D[0] Array0D = Array1D[0] That each is mutable is shown by: Array3D[0, 0, 0] = 1 Array2D[0, 0] = 1 Array1D[0] = 1 Array0D[] = 1 # whoops! Unfortunately that last one, while it follows the pattern, doesn't work for Python's parser so you're stuck with: Array0D[()] = 1 This becomes useful when you start writing generic routines that want to work with *any* dimensional arrays: zero_all_elements(ArrayND) Python's immutable scalar types could not change in this case. More complicated examples are more interesting, but a simple implementation of the above would be: def zero_all_elements(any): any[...] = 0 Cheers, -Scott From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 02:49:23 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 02:49:23 2005 Subject: [Numpy-discussion] Numeric3 CVS compiles now In-Reply-To: <4244AFB6.40601@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> Message-ID: <42453F1D.6030305@ims.u-tokyo.ac.jp> I've downloaded the latest Numeric3 and tried to compile it. I found one error in setup.py that I put in myself a couple of days ago; I fixed this in CVS. On Cygwin, the compilation and installation runs without problems, other than the warning messages. However, when I try to compile Numeric3 for Windows, an error windows pops up with the following message: 16 bit MS-DOS Subsystem ~/Numeric3 The NTVDM CPU has encountered an illegal instruction. CS:071d IP:210f OP:63 69 66 69 65 Choose 'Close' to terminate the application It seems that this message is due to the section labeled "#Generate code" in setup.py, where python is being run with a call to os.system. What does this do? Is there a need to generate code automatically in setup.py, rather than include the generated code with the Numeric3 source code? When using Numeric3, I found that the zeros function now returns a float array by default, whereas Numeric returns an integer array: $ python2.4 Python 2.4 (#1, Dec 5 2004, 20:47:03) [GCC 3.3.3] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from ndarray import * >>> zeros(5) array([0.0, 0.0, 0.0, 0.0, 0.0], 'd') >>> array([1,2,3]) array([1, 2, 3], 'l') >>> mdehoon at ginseng ~ $ python2.4 Python 2.4 (#1, Dec 5 2004, 20:47:03) [GCC 3.3.3] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> zeros(5) array([0, 0, 0, 0, 0]) >>> array([1,2,3]) array([1, 2, 3]) >>> array([1,2,3]).typecode() 'l' >>> Is there a reason to change the default behavior of zeros? Existing code may assume that zeros returns an integer array, and may behave incorrectly with Numeric3. Such bugs would be very hard to find. Finally, I tried to compile an extension module with Numeric3. The warning messages concerning the type of PyArrayObject->dimensions no longer occur, now that intp is typedef'd as an int instead of a long. But I agree with David Cooke that using Py_intptr_t in pyport.h would be better: > Why not use Py_intptr_t? It's defined by the Python C API already (in > pyport.h). When compiling the extension module, I get warning messages about PyArray_Cast, which now takes a PyObject* instead of a PyArrayObject* as in Numerical Python. Is this a typo in Src/arrayobject.c? Also, the PyArray_Cast function has the comment /* For backward compatibility */ in Src/arrayobject.c. Why is that? Linking the extension module fails due to the undefined reference to _PyArray_API. --Michiel. Travis Oliphant wrote: > > To all who were waiting: > > I've finished adding the methods to the array object so that Numeric3 in > CVS now compiles (at least for me on Linux). > > I will be away for at least a day so it is a good time to play... > -Travis > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From stephen.walton at csun.edu Sat Mar 26 12:19:21 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Sat Mar 26 12:19:21 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <20050326071530.56285.qmail@web50202.mail.yahoo.com> References: <20050326071530.56285.qmail@web50202.mail.yahoo.com> Message-ID: <4245C390.1090907@csun.edu> Scott Gilbert wrote: >It's nice because it fits with the rest of the array semantics and creates >a consistant system: > > Array3D = zeros((1, 1, 1)) > > Array2D = Array3D[0] > Array1D = Array2D[0] > Array0D = Array1D[0] > > Hmm...in both Numeric3 and numarray, the last line creates a Python scalar. Array2D and Array1D by contrast are not only arrays, but they are views of Array3D. Is what you're saying is that you want Array0D to be a rank-0 array after the above? > Array0D[()] = 1 > > Of course, this generates an error at present: "TypeError: object does not support item assignment" since it is a Python int. Moreover, it isn't a view, so that Array0D doesn't change after the assignment to Array3D. Is this also slated to be changed/fixed using rank 0 arrays? Would Array0D.shape be () in that case? From stephen.walton at csun.edu Sat Mar 26 12:25:53 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Sat Mar 26 12:25:53 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 Message-ID: <4245C512.1030701@csun.edu> zeros() in Numeric3 defaults to typecode='d' while in numarray it defaults to typecode=None, which in practice means 'i' by default. Is this deliberate? Is this desirable? I'd vote for zeros(), ones() and the like to default to 'i' or 'f' rather than 'd' in the interest of space and speed. From arnd.baecker at web.de Sat Mar 26 13:31:14 2005 From: arnd.baecker at web.de (Arnd Baecker) Date: Sat Mar 26 13:31:14 2005 Subject: [Numpy-discussion] Numeric3 CVS compiles now In-Reply-To: <4244AFB6.40601@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> Message-ID: Hi Travis, On Fri, 25 Mar 2005, Travis Oliphant wrote: > To all who were waiting: > > I've finished adding the methods to the array object so that Numeric3 in > CVS now compiles (at least for me on Linux). Compilation is fine for me as well (linux). I played around a bit - Obviously, addition of arrays, multiplication etc. don't work yet (as expected, if I remember your mails correctly). One thing which confused me is the following In [1]:from ndarray import * In [2]:x=arange(10.0) In [3]:scalar=x[3] In [4]:print scalar+1 --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) TypeError: unsupported operand type(s) for +: 'double_arrtype' and 'int' In [5]:print type(scalar) OK, this has been discussed up-and-down on this mailing list. At the moment I don't know how this will affect my Numeric(3) life, so I will wait until the other operations are implemented and see if there are any consequences for my programs at all ... ;-) A couple of things seem to be a bit unusual, e.g.: In [9]:x=arange(10.0) In [10]:x Out[10]:array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 'd') In [11]:x.argmax() --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) /home/scratch/abaecker/INSTALL_SOFT/TEST_NUMERIC3/ TypeError: function takes exactly 1 argument (0 given) In [12]:x.argmax(None) Out[12]:9 In [13]:t=x.argmax(None) In [14]:type(t) Out[14]: So argmax also returns an array type, but I would have really thought that this is an integer index?! Also a couple of attributes (E.g. x.sum() are not yet implemented) or lack documention (I know this comes last ;-). Best, Arnd From xscottg at yahoo.com Sat Mar 26 13:37:13 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Sat Mar 26 13:37:13 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: 6667 Message-ID: <20050326213554.93877.qmail@web50201.mail.yahoo.com> --- Stephen Walton wrote: > Scott Gilbert wrote: > > >It's nice because it fits with the rest of the array semantics and > creates > >a consistant system: > > > > Array3D = zeros((1, 1, 1)) > > > > Array2D = Array3D[0] > > Array1D = Array2D[0] > > Array0D = Array1D[0] > > > > > Hmm...in both Numeric3 and numarray, the last line creates a Python > scalar. Array2D and Array1D by contrast are not only arrays, but they > are views of Array3D. I should have been clear that I wasn't describing what any of the array packages do today. The views thing is a different can of worms (I think they should be copy-on-write copies by default and views only when explicitly asked for). > Is what you're saying is that you want Array0D to > be a rank-0 array after the above? > Yes. I think it fits the pattern and is consistant. There are also cases where it is useful. Array operations form a nice little calculus. Returning non-mutable scalars in place of rank-0 arrays is like having (2 - 1 == 1) while (2 - 1 - 1 == a donut). 0 looks like a donut, but it's a different food group. :-) > > > Array0D[()] = 1 > > > Of course, this generates an error at present: "TypeError: object does > not support item assignment" since it is a Python int. Moreover, it > isn't a view, so that Array0D doesn't change after the assignment to > Array3D. Is this also slated to be changed/fixed using rank 0 arrays? > Would Array0D.shape be () in that case? > Array0D.shape would be an empty tuple () in that case. I can't say what either Numeric3 or Numarray will do. The last time I read the Numeric3 PEP, it looked like there were going to be several special types of scalars. From cjw at sympatico.ca Sat Mar 26 17:57:59 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Mar 26 17:57:59 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4245C512.1030701@csun.edu> References: <4245C512.1030701@csun.edu> Message-ID: <424612E1.2020908@sympatico.ca> Stephen Walton wrote: > zeros() in Numeric3 defaults to typecode='d' while in numarray it > defaults to typecode=None, which in practice means 'i' by default. Is > this deliberate? Is this desirable? I'd vote for zeros(), ones() and > the like to default to 'i' or 'f' rather than 'd' in the interest of > space and speed. > The following seems to show that the default data type for the numarray elements is Int32: Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numarray as _n >>> a= _n.zeros(shape=(3, 3)) >>> a._type Int32 >>> I don't use the typecodes as the numerictypes are much more explicit. Colin W. From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 21:23:29 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 21:23:29 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> Message-ID: <42464443.8050402@ims.u-tokyo.ac.jp> Pearu Peterson wrote: > Michiel Jan Laurens de Hoon wrote: >> Have you tried integrating scipy_distutils with Python's distutils? My >> guess is that Python's distutils can benefit from what is in >> scipy_distutils, particularly the parts dealing with C compilers. A >> clean integration will also prevent duplicated code, avoids Pearu >> having to keep scipy_distutils up to date with Python's distutils, and >> will enlarge the number of potential users. Having two distutils >> packages seems to be too much of a good thing. > > > No, I have not. Though a year or so ago there was a discussion about > this in distutils list, mainly for adding Fortran compiler support to > distutils. At the time I didn't have resources to push scipy_distutils > features to distutils and even less so for now. So, one can think that > scipy_distutils is an extension to distutils, though it also includes > few bug fixes for older distutils. Having a separate scipy_distutils that fixes some bugs in Python's distutils is a design mistake in SciPy that we should not repeat in Numeric3. Not that I don't think the code in scipy_distutils is not useful -- I think it would be very useful. But the fact that it is not integrated with the existing Python distutils makes me wonder if this package really has been thought out that well. As far as I can tell, scipy_distutils now fulfills four functions: 1) Bug fixes for Python's distutils for older Python versions. As Numeric3 will require Python 2.3 or up, these are no longer relevant. 2) Bug fixes for current Python's distutils. These should be integrated with Python's distutils. Writing your own package instead of contributing to Python gives you bad karma. 3) Fortran support. Very useful, and I'd like to see them in Python's distutils. Another option would be to put this in SciPy.fortran or something similar. But since Python's distutils already has a language= option for C++ and Objective-C, the cleanest way would be to add this to Python's distutils and enable language="fortran". 4) Stuff particular to SciPy, for example finding Atlas/Lapack/Blas libraries. These we can decide on a case-by-case basis if it's useful for Numeric3. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 21:27:24 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 21:27:24 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42464443.8050402@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <424644FC.6070003@ims.u-tokyo.ac.jp> Michiel Jan Laurens de Hoon wrote: > Pearu Peterson wrote: > > Michiel Jan Laurens de Hoon wrote: > ..... Not that I don't think the code in scipy_distutils is not > useful -- I think it would be very useful. One negation too many in this sentence -- sorry. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sat Mar 26 21:33:04 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 26 21:33:04 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: References: <4244AFB6.40601@ee.byu.edu> Message-ID: <42464639.6050207@ims.u-tokyo.ac.jp> I'm a bit confused about where Numeric3 is heading. Originally, the idea was that Numeric3 should go in Python core. Are we still aiming for that? More recently, another goal was to integrate Numeric and numarray, which I fully support. However, from looking at the Numeric3 source code, and from looking at the errors found by Arnd, it looks like that Numeric3 is a complete rewrite of Numeric. This goes well beyond integrating Numeric and numarray. Now I realize that sometimes it is necessary to rethink and rewrite some code. However, Numerical Python has served us very well over the years, and I'm worried that rewriting the whole thing will break more than it fixes. So where is Numeric3 going? --Michiel. Arnd Baecker wrote: > Hi Travis, > > On Fri, 25 Mar 2005, Travis Oliphant wrote: > > >>To all who were waiting: >> >>I've finished adding the methods to the array object so that Numeric3 in >>CVS now compiles (at least for me on Linux). > > > Compilation is fine for me as well (linux). > I played around a bit - > Obviously, addition of arrays, multiplication etc. don't work > yet (as expected, if I remember your mails correctly). > One thing which confused me is the following > > In [1]:from ndarray import * > In [2]:x=arange(10.0) > In [3]:scalar=x[3] > In [4]:print scalar+1 > --------------------------------------------------------------------------- > exceptions.TypeError Traceback (most recent call last) > TypeError: unsupported operand type(s) for +: 'double_arrtype' and 'int' > In [5]:print type(scalar) > > > OK, this has been discussed up-and-down on this mailing > list. At the moment I don't know how this > will affect my Numeric(3) life, so I will wait > until the other operations are implemented > and see if there are any consequences > for my programs at all ... ;-) > > A couple of things seem to be a bit unusual, e.g.: > > In [9]:x=arange(10.0) > In [10]:x > Out[10]:array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 'd') > In [11]:x.argmax() > --------------------------------------------------------------------------- > exceptions.TypeError Traceback (most > recent call last) > > /home/scratch/abaecker/INSTALL_SOFT/TEST_NUMERIC3/ > > TypeError: function takes exactly 1 argument (0 given) > In [12]:x.argmax(None) > Out[12]:9 > In [13]:t=x.argmax(None) > In [14]:type(t) > Out[14]: > > So argmax also returns an array type, but I would > have really thought that this is an integer index?! > > Also a couple of attributes (E.g. x.sum() are not yet > implemented) or lack documention (I know this comes last ;-). > > Best, > > Arnd > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Sat Mar 26 22:43:34 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Mar 26 22:43:34 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <42464639.6050207@ims.u-tokyo.ac.jp> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> Message-ID: <424655B1.4000503@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > I'm a bit confused about where Numeric3 is heading. Originally, the > idea was that Numeric3 should go in Python core. Are we still aiming > for that? More recently, another goal was to integrate Numeric and > numarray, which I fully support. I would prefer to re-integrate the numarray people "back" into the Numeric community, by adding the features to Numeric that they need. > However, from looking at the Numeric3 source code, and from looking at > the errors found by Arnd These errors are all due to the fact that umath functions are not available yet. Please don't judge too hastily. At this point, I'm really just looking for people who are going to pitch in and help with coding or to offer suggestions about where the code base should go. I have done this completely in the open as best I can. I have no other "hidden" plans. All of the scalar types will get their math from the umath functions too. So, of course they don't work either yet (except for those few that inherit from the basic Python types). > it looks like that Numeric3 is a complete rewrite of Numeric. I really don't understand how you can make this claim. All I've done is add significantly to Numeric's code base and re-inserted some code-generators (code-generators are critical for maintainability --- in answer to your previous question). > This goes well beyond integrating Numeric and numarray. Now I realize > that sometimes it is necessary to rethink and rewrite some code. > However, Numerical Python has served us very well over the years, and > I'm worried that rewriting the whole thing will break more than it > fixes. So where is Numeric3 going? You really need to justify this idea you are creating that I am "re-writing the whole thing" I disagree wholeheartedly with your assessment. Certain changes were made to accomodate numarray features, new types were added, indexing was enhanced. When possible, I've deferred to Numeric's behavior. But, you can't bring two groups back to a common array type by not changing the array type at all. Numeric3 is going wherever the community takes it. It is completely open. Going into the Python core is going to have to wait until things settle down in our own community. It's not totally abandoned just put on hold (again). That was the decision thought best by Guido, Perry, myself, and Paul at our lunch together. We thought it best to instead suggest interoperability strategies. Numeric3 is NOT a re-write of Numeric. I've re-used a great deal of the Numeric code. The infrastructure is exactly the same. I started the project from the Numeric code base. I've just added a great deal more following the discussions on this list. At worst, I've just expanded quite a few things and moved a lot into C. Little incompatibilties are just a sign of alpha code not a "direction." Some have thought that zeros returning ints was a bug in the past which is why the change occured. But, it is an easy change, and I tend to agree that it should stay returning the Numeric default of Intp. >> >> In [1]:from ndarray import * >> In [2]:x=arange(10.0) >> In [3]:scalar=x[3] >> In [4]:print scalar+1 >> --------------------------------------------------------------------------- >> >> exceptions.TypeError Traceback (most recent call last) >> TypeError: unsupported operand type(s) for +: 'double_arrtype' and 'int' >> In [5]:print type(scalar) >> > All scalar types by default will do math operations as rank-0 arrays (which haven't been brought back in yet). That is why the error. I believe that right now I am inheriting first from the Generic Scalar Type instead of the double type (so it will use Scalar Arithmetic first). Later, special methods for each scalar type could be used (for optimization). >> A couple of things seem to be a bit unusual, e.g.: >> >> In [9]:x=arange(10.0) >> In [10]:x >> Out[10]:array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 'd') >> In [11]:x.argmax() >> --------------------------------------------------------------------------- >> >> exceptions.TypeError Traceback (most >> recent call last) >> This is an error. I'll look into it. Notice all method calls that take an axis argument are defaulting to None (which means ravel the whole array). The function calls won't change for backward compatibility. >> /home/scratch/abaecker/INSTALL_SOFT/TEST_NUMERIC3/ >> >> TypeError: function takes exactly 1 argument (0 given) >> In [12]:x.argmax(None) >> Out[12]:9 >> In [13]:t=x.argmax(None) >> In [14]:type(t) >> Out[14]: >> >> So argmax also returns an array type, but I would >> have really thought that this is an integer index?! > Remember, array operations always return array scalars! >> >> Also a couple of attributes (E.g. x.sum() are not yet >> implemented) or lack documention (I know this comes last ;-). > Hmm. x.sum should be there. Believe me, it was a bit of a pain to convert all the Python code to C. Have to check this. -Travis From oliphant at ee.byu.edu Sat Mar 26 22:51:25 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Mar 26 22:51:25 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <424655B1.4000503@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> <424655B1.4000503@ee.byu.edu> Message-ID: <424657A4.2050203@ee.byu.edu> >>> >>> In [1]:from ndarray import * >>> In [2]:x=arange(10.0) >>> In [3]:scalar=x[3] >>> In [4]:print scalar+1 >>> --------------------------------------------------------------------------- >>> >>> exceptions.TypeError Traceback (most recent call last) >>> TypeError: unsupported operand type(s) for +: 'double_arrtype' and >>> 'int' >>> In [5]:print type(scalar) >>> >> >> > > All scalar types by default will do math operations as rank-0 arrays > (which haven't been brought back in yet). That is why the error. I > believe that right now I am inheriting first from the Generic Scalar > Type instead of the double type (so it will use Scalar Arithmetic > first). Later, > special methods for each scalar type could be used (for optimization). To clarify, it is the umath operations that have not been 'brought back in', or activated, yet. The rank-0 arrays are of course there as they have always been. -Travis From oliphant at ee.byu.edu Sat Mar 26 23:23:30 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Mar 26 23:23:30 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050326065814.27019.qmail@web50204.mail.yahoo.com> References: <20050326065814.27019.qmail@web50204.mail.yahoo.com> Message-ID: <42465F29.208@ee.byu.edu> Scott Gilbert wrote: >Adding metadata at the buffer object level causes problems for "view" >semantics. Let's say that everyone agreed what "itemsize" and "itemtype" >meant: > > real_view = complex_array.real > >The real_view will have to use a new buffer since they can't share the old >one. The buffer used in complex_array would have a typecode like >ComplexDouble and an itemsize of 16. The buffer in real_view would need a >typecode of Double and an itemsize of 8. If metadata is stored with the >buffer object, it can't be the same buffer object in both places. > > This is where having "strides" metadata becomes very useful. Then, real_view would not have to be a copy at all, unless the coder didn't want to deal with it. >Another case would be treating a 512x512 image of 4 byte pixels as a >512x512x4 image of 1 byte RGBA elements. Or even coercing from Signed to >Unsigned. > > Why not? A different bytes object could point to the same memory but the different metadata would say "treat this data differently" > >The bytes object as proposed does allow new views to be created from other >bytes objects (sharing the same memory underneath), and these views could >each have separate metadata, but then you wouldn't be able to have arrays >that used other types of buffers. > > I don't see why not. Your argument is not clear to me. >The bytes object shouldn't create views from arbitrary other buffer objects >because it can't rely on the general semantics of the PyBufferProcs >interface. The foreign buffer object might realloc and invalidate the >pointer for instance... The current Python "buffer" builtin does this, and >the results are bad. So creating a bytes object as a view on the mmap >object doesn't work in the general case. > > This is a problem with the objects that expose the buffer interface. The C-API could be more clear that you should not "reallocate" memory if another array is referencing you. See the arrayobject's resize method for an example of how Numeric does not allow reallocation of the memory space if another object is referencing it. I suppose you could keep track separately in the object of when another object is using your memory, but the REFCOUNT works for this also (though it is not so specific, and so you would miss cases where you "could" reallocate but this is rarely used in arrayobject's anyway). Another idea is to fix the bytes object so it always regrabs the pointer to memory from the object instead of relying on the held pointer in view situations. >Still, I think keeping the metadata at a different level, and having the >bytes object just be the Python way to spell a call to C's malloc will >avoid a lot of problems. Read below for how I think the metadata stuff >could be handled. > > Metadata is such a light-weight "interface-based" solution. It could be as simple as attributes on the bytes object. I don't see why you resist it so much. Imaging defining a jpeg file by a single bytes object with a simple EXIF header metadata string. If the bytes object allowed the "bearmin" attributes you are describing then that would be one way to describe an array that any third-party application could support as much as they wanted. In short, I think we are thinking along similar lines. It really comes down to being accepted by everybody as a standard. One of the things, I want for Numeric3 is to be able to create an array from anything that exports the buffer interface. The problem, of course is with badly-written exentsion modules that rudely reallocate their memory even after they've shared it with someone else. Yes, Python could be improved so that this were handled better, but it does work right now, as long as buffer interface exporters play nice. This is the way to advertise the buffer interface (and buffer object). Rather than vague references to buffer objects being a "bad-design" and a blight we should say: objects wanting to export the buffer interface currently have restrictions on their ability to reallocate their buffers. >> >> > >I think being able to traffic in N-Dimensional arrays without requiring >linking against the libraries is a good thing. > > Several of us are just catching on to the idea. Thanks for your patience. >I think the proposal is still relevant today, but I might revise it a bit >as follows. A bear minimum N-Dimensional array for interchanging data >across libraries could get by with following attributes: > > # Create a simple record type for storing attributes > class BearMin: pass > bm = BearMin() > > # Set the attributes sufficient to describe a simple ndarray > bm.buffer = > bm.shape = > bm.itemtype = > >The bm.buffer and bm.shape attributes are pretty obvious. I would suggest >that the bm.itemtype borrow it's typecodes from the Python struct module, >but anything that everyone agreed on would work. > > I've actually tried to do this if you'll notice, and I'm sure I'll take some heat for that decision at some point too. The only difference currently I think are long types (q and Q), I could be easily persuaded to change thes typecodes too. I agree that the typecode characters are very simple and useful for interchanging information about type. That is a big reason why I am not "abandoning them" >Those attributes are sufficient for someone to *produce* an N-Dimensional >array that could be understood by many libraries. Someone who *consumes* >the data would need to know a few more: > > bm.offset = > > I don't like this offset parameter. Why doesn't the buffer just start where it needs too? > bm.strides = > > Things are moving this direction (notice that Numeric3 has attributes much like you describe), except we use the word .data (instead of .buffer) It would be an easy thing to return an ArrayObject from an object that exposes those attributes (and a good idea). So, I pretty much agree with what you are saying. I just don't see how this is at odds with attaching metadata to a bytes object. We could start supporting this convention today, and also handle bytes objects with metadata in the future. >There is another really valid argument for using the strategy above to >describe metadata instead of wedging it into the bytes object: The Numeric >community could agree on the metadata attributes and start using it >*today*. > > Yes, but this does not mean we should not encourage the addition of metadata to bytes objects (as this has larger uses than just Numeric arrays). It is not a difficult thing to support both concepts. >If you wait until someone commits the bytes object into the core, it won't >be generally available until Python version 2.5 at the earliest, and any >libraries that depended on using bytes stored metadata would not work with >older versions of Python. > > > I think we should just start advertising now, that with the new methods of numarray and Numeric3, extension writers can right now deal with Numeric arrays (and anything else that exposes the same interface) very easily by using attribute access (or the buffer protocol together with attribute access). They can do this because Numeric arrays (and I suspect numarrays as well) use the buffer interface responsibly (we could start a political campaign encouraging responsible buffer usage everywhere :-) ). -Travis From mdehoon at ims.u-tokyo.ac.jp Sun Mar 27 00:45:33 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 27 00:45:33 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <424655B1.4000503@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> <424655B1.4000503@ee.byu.edu> Message-ID: <4246732D.2080908@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: >> I'm a bit confused about where Numeric3 is heading. Originally, the idea >> was that Numeric3 should go in Python core. Are we still aiming for that? >> More recently, another goal was to integrate Numeric and numarray, which I >> fully support. > > I would prefer to re-integrate the numarray people "back" into the Numeric > community, by adding the features to Numeric that they need. > >> However, from looking at the Numeric3 source code, and from looking at the >> errors found by Arnd > > These errors are all due to the fact that umath functions are not available > yet. Please don't judge too hastily. At this point, I'm OK. Fair enough. Maybe I did judge too hastily. From your comments, it looks like we agree on where Numeric3 is going. >> It seems that this message is due to the section labeled "#Generate code" >> in setup.py, where python is being run with a call to os.system. What does >> this do? Is there a need to generate code automatically in setup.py, rather >> than include the generated code with the Numeric3 source code? > > (code-generators are critical for maintainability --- in answer to your > previous question). As far as I can tell from their setup.py, neither Numerical Python nor numarray currently does code generation on the fly from setup.py. (This was one of the reasons that I started to worry if Numeric3 is more than Numerical Python + numarray). --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sun Mar 27 02:29:11 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 27 02:29:11 2005 Subject: [Numpy-discussion] Numeric3 CVS compiles now In-Reply-To: <4244AFB6.40601@ee.byu.edu> References: <4244AFB6.40601@ee.byu.edu> Message-ID: <42468BB3.4060503@ims.u-tokyo.ac.jp> I have made one change to setup.py and Include/ndarray/arrayobject.h to make the definition of intp, uintp consistent with what is in pyport.h (which gets #included via Python.h). This shouldn't change anything, but since this affects the compilation almost everywhere, I thought I should let everybody know. If it causes any problems, feel free to change it back. --Michiel. Travis Oliphant wrote: > > To all who were waiting: > > I've finished adding the methods to the array object so that Numeric3 in > CVS now compiles (at least for me on Linux). > > I will be away for at least a day so it is a good time to play... > -Travis > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From dd55 at cornell.edu Sun Mar 27 15:22:08 2005 From: dd55 at cornell.edu (Darren Dale) Date: Sun Mar 27 15:22:08 2005 Subject: [Numpy-discussion] searching a list of arrays Message-ID: <200503271820.51307.dd55@cornell.edu> Hi, I have a list of numeric-23.8 arrays: a = [array([0,1]), array([0,1]), array([1,0]), array([1,0])] b = [array([0,1,0]), array([0,1,0]), array([1,0,0]), array([1,0,0])] and I want to make a new list out of b: c = [array([0,1,2]), array([1,0,2])] where the last index in each array is the result of b.count([0,1,0]) # or [1,0,0] The problem is that the result of b.count(array([1,0,0])) is 4, not 2, and b.remove(array([1,0,0])) indescriminantly removes arrays from the list. a.count and a.remove work the way I expected. Does anyone know why 1x2 arrays work, but 1x3 or larger arrays do not? Thanks, Darren From xscottg at yahoo.com Sun Mar 27 18:08:13 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Sun Mar 27 18:08:13 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42465F29.208@ee.byu.edu> Message-ID: <20050328020731.85506.qmail@web50202.mail.yahoo.com> Hi Travis. I'm quite possibly misunderstanding how you want to incorporate the metadata into the bytes object, so I'm going to try and restate both of our positions from the point of view of a third party who will be using ndarrays. Let's take Chris Barker's point of view with regards to wxPython... We all roughly agree which pieces of metadata are needed for arrays. There are a few persnicketies, and the names could vary. I'll use your given names: .data (could be .buffer or .__array_buffer__) .shape (could be .dimensions or .__array_shape__) .strides (maybe .__array_strides__) .itemtype (coulb be .typecode or .__array_itemtype__) Several other attributes can be derived (calculated) from those (isfortran, iscontiguous, etc...), and we might need a few more, but we'll ignore those for now. In my proposal, Chris would write a routine like such: def version_one(a): data = a.data shape = a.shape strides = a.strides itemtype = a.itemtype # Cool code goes here I believe you are suggesting Chris would write: def version_two(a): data = a shape = a.shape strides = a.strides itemtype = a.itemtype # Cool code goes here Of if you have the .meta dictionary, Chris would write: def version_three(a): data = a shape = a.meta["shape"] strides = a.meta["strides"] itemtype = a.meta["itemtype"] # Cool code goes here Of course Chris could save one line of code with: def version_two_point_one(data): shape = a.shape strides = a.strides itemtype = a.itemtype # Cool code goes here If I'm mistaken about your proposal, please let me know. However if I'm not mistaken, I think there are limitations with version_two and version_three. First, most of the existing buffer objects do not allow attributes to be added to them. With version_one, Chris could have data of type array.array, Numarray.memory, mmap.mmap, __builtins__.str, the new __builtins__.bytes type as well as any other PyBufferProcs supporting object (and possibly sequence objects like __builtins__.list). With version_two and version_three, something more is required. In a few cases like the __builtins__.str type you could add the necessary attributes by inheritance. In other cases like the mmap.mmap, you could wrap it with a __builtins__.bytes object. (That's assuming that __builtins__.bytes knows how to wrap mmap.mmap objects...) However, other PyBufferProcs objects like array.array will never allow themselves to be wrapped by a __builtins__.bytes since they realloc their memory and violate the promises that the __builtins__.bytes object makes. I think you disagree with me on this part, so more on that later in this message. For now I'll take your side, let's pretend that all PyBufferProcs supporting objects could be made well enough behaved to wrap up in a __builtins__.bytes object. Do you really want to require that only __builtins__.bytes objects are suitable for data interchange across libraries? This isn't explicitly stated by you, but since the __builtins__.bytes object is the only common PyBufferProcs supporting object that could define the metadata attributes, it would be the rule in practice. I think you're losing flexibility if you do it this way. From Chris's point of view it's basically the same amount of code for all three versions above. Another consideration that might sway you is that the existing N-Dimensional array packages could easily add attribute methods to implement the interface, and they could do this without changing any part of their implementation. The .data attribute when requested would call a "get method" that returns a buffer. This allows user defined objects which do not implement the PyBufferProcs protocol themselves, but which contain a buffer inside of them to participate in the "ndarray protocol". Both version_two and version_three do not allow this - the object being passed must *be* a buffer. > > > The bytes object shouldn't create views from arbitrary other buffer > > objects because it can't rely on the general semantics of the > > PyBufferProcs interface. The foreign buffer object might realloc > > and invalidate the pointer for instance... The current Python > > "buffer" builtin does this, and the results are bad. So creating > > a bytes object as a view on the mmap object doesn't work in the > > general case. > > > > > This is a problem with the objects that expose the buffer interface. > The C-API could be more clear that you should not "reallocate" memory if > another array is referencing you. See the arrayobject's resize method > for an example of how Numeric does not allow reallocation of the memory > space if another object is referencing it. I suppose you could keep > track separately in the object of when another object is using your > memory, but the REFCOUNT works for this also (though it is not so > specific, and so you would miss cases where you "could" reallocate but > this is rarely used in arrayobject's anyway). > The reference count on the PyObject pointer is different than the number of users using the memory. In Python you could have: import array a = array.array('d', [1]) b = a The reference count on the array.array object is 2, but there are 0 users working with the memory. Given the existing semantics of the array.array object, it really should be allowed to resize in this case. Storing the object in a dictionary would be another common situation that would increase it's refcount but shouldn't lock down the memory. A good solution to this problem was presented with PEP-298, but no progress seems to have been made on it. http://www.python.org/peps/pep-0298.html To my memory, PEP-298 was in response to PEP-296. I proposed PEP-296 to create a good working buffer (bytes) object that avoided the problems of the other buffer objects. Several folks wanted to fix the other (non bytes) objects where possible, and PEP-298 was the result. A strategy like this could be used to make array.array safe outside of the GIL. Bummer that it didn't get implemented. > > Another idea is to fix the bytes object so it always regrabs the pointer > to memory from the object instead of relying on the held pointer in view > situations. > A while back, I submitted a patch [552438] like this to fix the __builtins__.buffer object: http://sourceforge.net/tracker/index.php?func=detail&aid=552438&group_id=5470&atid=305470 It was ignored for a bit, and during the quiet time I came to realize that even if the __builtins__.buffer object was fixed, it still wouldn't meet my needs. So I proposed the bytes object, and this patch fell on the floor (the __builtins__.buffer object is still broken). The downside to this approach is that it only solves the problem for code running with posession of the GIL. It does solve the stale pointer problem that is exposed by the __builtins__.buffer object, but if you release the GIL in C code, all bets are off - the pointer can become stale again. The promises that bytes tries to make about the lifetime of the pointer can only be guaranteed by the object itself. Just because bytes could wrap the other object and grab the latest pointer when you need it doesn't mean that the other object won't invalidate the pointer a split second later when the GIL is released. It is mere chance that the mmap object is well behaved enough. And even the mmap object can release it's memory if someone closes the object - again leading to a stale pointer. > > Metadata is such a light-weight "interface-based" solution. It could be > as simple as attributes on the bytes object. I don't see why you resist > it so much. Imaging defining a jpeg file by a single bytes object > with a simple EXIF header metadata string. If the bytes object > allowed the "bearmin" attributes you are describing then that would be > one way to describe an array that any third-party application could > support as much as they wanted. > Please don't think I'm offering you resistance. I'm only trying to point out some things that I think you might have overlooked. Lots of people ignore my suggestions all the time. You'd be in good company if you did too, and I wouldn't even hold a grudge against you. Now let me be argumentative. :-) I've listed what I consider the disadvantages above, but I guess I don't see any advantages of putting the metadata on the bytes object. In what way is: jpeg = bytes() jpeg.exif = better than: class record: pass jpeg = record() jpeg.data = jpeg.exif = The only advantage I see if that yours is a little shorter, but in any real application, you were probably going to define an object of some sort to add all the methods needed. And as I showed up in version_one, version_two, and version_three above, it's basically the same number of lines for the consumer of the data. There is nothing stopping a PyBufferProcs object like bytes from supporting version_one above: jpeg = bytes() jpeg.data = jpeg jpeg.exif = But non PyBufferProcs objects can't play with version_two or version_three. Incidently, being able to add attributes to bytes means that it needs to play nicely with the garbage collection system. At that point, bytes is basically a container for arbitrary Python objects. That's additional implementation headache. > > It really comes down to being accepted by everybody as a standard. > This I completely agree with. I think the community will roll with whatever you and Perry come to agree on. Even the array.array object in the core could be made to work either way. If the decision you come up with makes it easy to add the interface to existing array objects then everyone would probably adopt it and it would become a standard. This is the main reason I like the double underscore __*meta*__ names. It matches the similar pattern all over Python, and existing array packages could add those without interfering with their existing implementation: class Numarray: # # lots of array implementing code # # Down here at the end, add the "well-known" interface # (I haven't embraced the @property decorator syntax yet.) def __get_shape(self): return self._shape __array_shape__ = property(__get_shape) def __get_data(self): # Note that they use a different name internally return self._buffer __array_data__ = property(__get_data) def __get_itemtype(self): # Perform an on the fly conversion from the class # hierarchy type to the struct module typecode that # closest matches return self._type._to_typecode() __array_itemtype__ = property(__get_itemtype) Changing class Numarray to a PyBufferProcs supporting object would be harder. The C version for Numeric3 arrays would be similar, and there is no wasted space on a per instance basis in either case. > > One of the things, I want for Numeric3 is to be able to create an array > from anything that exports the buffer interface. The problem, of course > is with badly-written exentsion modules that rudely reallocate their > memory even after they've shared it with someone else. Yes, Python > could be improved so that this were handled better, but it does work > right now, as long as buffer interface exporters play nice. > I think the behavior of the array.array objects are pretty defensible. It is useful that you can extend those arrays to new sizes. For all I know, it was written that way before there was a GIL. I think PEP-298 is a good way to make the dynamic buffers more GIL friendly. > > This is the way to advertise the buffer interface (and buffer > object). Rather than vague references to buffer objects being a > "bad-design" and a blight we should say: objects wanting to export the > buffer interface currently have restrictions on their ability to > reallocate their buffers. > I agree. The "bad-design" type of comments about "the buffer problem" on python-dev have always annoyed me. It's not that hard of a problem to solve technically. > > > I would suggest that the bm.itemtype borrow it's typecodes from > > the Python struct module, but anything that everyone agreed on > > would work. > > I've actually tried to do this if you'll notice, and I'm sure I'll take > some heat for that decision at some point too. The only difference > currently I think are long types (q and Q), I could be easily persuaded > to change thes typecodes too. I agree that the typecode characters are > very simple and useful for interchanging information about type. That > is a big reason why I am not "abandoning them" > The real advantage to the struct module typecodes comes in two forms. First and most important is that it's already documented and in place - a defacto standard. Second is that Python script code could use those typecodes directly with the struct module to pull apart pieces of data. The disadvantage is that a few new typecodes would be needed... I would even go as far as to recommend their '>' '<' prefix codes for big-endian and little-endian for just this reason... > > I don't like this offset parameter. Why doesn't the buffer just start > where it needs too? > Well if you stick with using the bytes object, you could probably get away with this. Effectively, the offset is encoded in the bytes object. At this point, I don't know if anything I said above was pursuasive, but I think there are other cases where you would really want this. Does anyone plan to support tightly packed (8 bits to a byte) bitmask arrays? Object arrays could be implemented on top of shared __builtins__.list objects, and there is no easy way to create offset views into lists. > > It would be an easy thing to return an ArrayObject from an object that > exposes those attributes (and a good idea). > This would be wonderful. Third party libraries could produce data that is sufficiently ndarray like without hassle, and users of that library could promote it to a Numeric3 array with no headaches. > > So, I pretty much agree with what you are saying. I just don't see how > this is at odds with attaching metadata to a bytes object. > > We could start supporting this convention today, and also handle bytes > objects with metadata in the future. > Unfortunately, I don't think any buffer objects exist today which have the ability to dynamically add attributes. If my arguments above are unpursuasive, I believe bytes (once it is written) will be the only buffer object with this support. By the way, it looks like the "bytes" concept has been revisited recently. there is a new PEP dated Aug 11, 2004: http://www.python.org/peps/pep-0332.html > > > > There is another really valid argument for using the strategy above to > > describe metadata instead of wedging it into the bytes object: The > > Numeric community could agree on the metadata attributes and start > > using it *today*. > > I think we should just start advertising now, that with the new methods > of numarray and Numeric3, extension writers can right now deal with > Numeric arrays (and anything else that exposes the same interface) very > easily by using attribute access (or the buffer protocol together with > attribute access). They can do this because Numeric arrays (and I > suspect numarrays as well) use the buffer interface responsibly (we > could start a political campaign encouraging responsible buffer usage > everywhere :-) ). > I can just imagine the horrible mascot that would be involved in the PR campaign. Thanks for your attention and patience with me on this. I really appreciate the work you are doing. I wish I could explain my understanding of things more clearly. Cheers, -Scott From mdehoon at ims.u-tokyo.ac.jp Sun Mar 27 19:01:13 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Mar 27 19:01:13 2005 Subject: [Numpy-discussion] searching a list of arrays In-Reply-To: <200503271820.51307.dd55@cornell.edu> References: <200503271820.51307.dd55@cornell.edu> Message-ID: <42477475.4040406@ims.u-tokyo.ac.jp> This is because of how "==" is defined for arrays. For lists, list1==list2 if all elements are the same; a boolean value is returned: >>> x = [0,1,0] >>> x==[0,1,0] True >>> x==[1,0,0] False For arrays, "==" does a element-wise comparison: >>> from Numeric import * >>> x = array([0,1,0]) >>> x==array([0,1,0]) array([1, 1, 1]) >>> x==array([1,0,0]) array([0, 0, 1]) >>> Now, when you count how often array([0,1,0]) appears in b, actually you evaluate element==array([0,1,0]) for each element in b, and count how often you get a True, with every array other than array([0,0,0]) regarded as True. For list a, this happens to work because array([0,1]) and array([1,0]) have no elements in common. But in this case: >>> a = [array([0,0]),array([0,0]),array([0,1]),array([0,1])] >>> a [array([0, 0]), array([0, 0]), array([0, 1]), array([0, 1])] >>> a.count(array([0,0])) 4 you also get the non-intuitive answer 4. An easy way to get this to work is to use lists instead of arrays: >>> b = [[0,1,0], [0,1,0], [1,0,0], [1,0,0]] >>> b.count([0,1,0]) 2 But I don't know if this solution is suitable for your application. --Michiel. Darren Dale wrote: > Hi, > > I have a list of numeric-23.8 arrays: > > a = [array([0,1]), > array([0,1]), > array([1,0]), > array([1,0])] > > b = [array([0,1,0]), > array([0,1,0]), > array([1,0,0]), > array([1,0,0])] > > and I want to make a new list out of b: > > c = [array([0,1,2]), > array([1,0,2])] > > where the last index in each array is the result of > > b.count([0,1,0]) # or [1,0,0] > > > The problem is that the result of b.count(array([1,0,0])) is 4, not 2, and > b.remove(array([1,0,0])) indescriminantly removes arrays from the list. > a.count and a.remove work the way I expected. > > Does anyone know why 1x2 arrays work, but 1x3 or larger arrays do not? > > Thanks, > Darren > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From stephen.walton at csun.edu Sun Mar 27 21:00:15 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Sun Mar 27 21:00:15 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <424612E1.2020908@sympatico.ca> References: <4245C512.1030701@csun.edu> <424612E1.2020908@sympatico.ca> Message-ID: <42478F28.7020501@csun.edu> Colin J. Williams wrote: > The following seems to show that the default data type for the > numarray elements is Int32: It is, and I thought my original message said that. I was talking about Numeric3, where the default type for zeros() is 'd' (Float64 in numarray parlance). From pearu at scipy.org Mon Mar 28 01:24:09 2005 From: pearu at scipy.org (Pearu Peterson) Date: Mon Mar 28 01:24:09 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42464443.8050402@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: On Sun, 27 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Having a separate scipy_distutils that fixes some bugs in Python's distutils > is a design mistake in SciPy that we should not repeat in Numeric3. Not that > I don't think the code in scipy_distutils is not useful -- I think it would > be very useful. But the fact that it is not integrated with the existing > Python distutils makes me wonder if this package really has been thought out > that well. I don't think that part of scipy_distutils design was to fix Python's distutils bugs. As we found a bug, its fix was added to scipy_distutils as well as reported to distutils bug tracker. The main reason for adding bug fixes to scipy_distutils was to continue the work with scipy instead of waiting for the next distutils release (i.e. Python release), nor we could expect that SciPy users would use CVS version of Python's distutils. Also, SciPy was meant to support Python 2.1 and up, so the bug fixes remained relevant even when the bugs were fixed in Python 2.2 or 2.3 distutils. So much of history.. > As far as I can tell, scipy_distutils now fulfills four functions: > 1) Bug fixes for Python's distutils for older Python versions. As Numeric3 > will require Python 2.3 or up, these are no longer relevant. > 2) Bug fixes for current Python's distutils. These should be integrated with > Python's distutils. Writing your own package instead of contributing to > Python gives you bad karma. > 3) Fortran support. Very useful, and I'd like to see them in Python's > distutils. Another option would be to put this in SciPy.fortran or something > similar. But since Python's distutils already has a language= option for C++ > and Objective-C, the cleanest way would be to add this to Python's distutils > and enable language="fortran". > 4) Stuff particular to SciPy, for example finding Atlas/Lapack/Blas > libraries. These we can decide on a case-by-case basis if it's useful for > Numeric3. Plus I would add the scipy_distutils ability to build sources on-fly feature (build_src command). That's a very fundamental feature useful whenever swig or f2py is used, or when building sources from templates or dynamically during a build process. Btw, I have started scipy_core clean up. The plan is to create the following package tree under Numeric3 source tree: scipy.distutils - contains cpuinfo, exec_command, system_info, etc scipy.distutils.fcompiler - contains Fortran compiler support scipy.distutils.command - contains build_src and config_compiler commands plus few enhancements to build_ext, build_clib, etc commands scipy.base - useful modules from scipy_base scipy.testing - enhancements to unittest module, actually current scipy_test contains one useful module (testing.py) that could also go under scipy.base and so getting rid of scipy.testing scipy.weave - scipy.f2py - not sure yet how to incorporate f2py2e or weave sources here. As a first instance people are assumed to download them to Numeric3/scipy/ directory but in future their sources could be added to Numeric3 repository. For Numeric3 f2py and weave are optional. scipy.lib.lapack - wrappers to Atlas/Lapack libraries, by default f2c generated wrappers are used as in current Numeric. For backwards compatibility, there will be Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ and Lib/{LinearAlgebra,..}.py under Numeric3 that will use modules from scipy. Pearu From oliphant at ee.byu.edu Mon Mar 28 01:32:12 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 01:32:12 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328020731.85506.qmail@web50202.mail.yahoo.com> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> Message-ID: <4247CEC9.1030903@ee.byu.edu> Scott, Thank you for your detailed explanations. This is starting to make more sense to me. It is obvious that you understand what we are trying to do, and I pretty much agree with you in how you think it should be done. I think you do a great job of explaining things. I agree we should come up with a set of names for the interface to arrayobjects. I'm even convinced that offset should be an optional part of the interface (implied 0 if it's not there). >However, other PyBufferProcs objects like array.array will never allow >themselves to be wrapped by a __builtins__.bytes since they realloc their >memory and violate the promises that the __builtins__.bytes object makes. >I think you disagree with me on this part, so more on that later in this >message. > > I think I agree with you: array.array shouldn't allow itself to by wrapped by a bytes object because it reallocates without tracking what it's shared. >Another consideration that might sway you is that the existing >N-Dimensional array packages could easily add attribute methods to >implement the interface, and they could do this without changing any part >of their implementation. The .data attribute when requested would call a >"get method" that returns a buffer. This allows user defined objects which >do not implement the PyBufferProcs protocol themselves, but which contain a >buffer inside of them to participate in the "ndarray protocol". Both >version_two and version_three do not allow this - the object being passed >must *be* a buffer. > > I am not at all against the ndarray protocol you describe. In fact, I'm quite a fan. I think we should start doing it, now. I was just wondering if adding attributes to the bytes object was useful in any case. Your arguments have persuaded me that it is not worth the trouble. Underscore names are a good idea. We already have __array__ which is a protocol for returning an array object: Currently Numeric3 already implements this protocol minus name differences. So, let's come up with names. I'm happy with __array__XXXXX type names as it does dovetail nicely with the already established __array__ name which Numeric3 expects will return an actual array object. As I've already said, it would be easy to check for the more specialized attributes at object creation time to boot-strap an array from an arbitrary object. In addition, to what you state. Why not also have the protocol look at the object itself to expose the PyBufferProcs protocol if it doesn't expose a .__array__data method? >The reference count on the PyObject pointer is different than the number of >users using the memory. In Python you could have: > > Your examples explaining this are good, but I did realize this, that's why I stated that the check in arr.resize is overkill and will disallow situations that could actually work. Do you think the Numeric3 arrayobject should have a "memory pointer count" added to the PyArrayObject structure? >Please don't think I'm offering you resistance. I'm only trying to point >out some things that I think you might have overlooked. Lots of people >ignore my suggestions all the time. You'd be in good company if you did >too, and I wouldn't even hold a grudge against you. > > I very much appreciate the pointers. I had overlooked some things and I believe your suggestions are better. > class Numarray: > # > # lots of array implementing code > # > > # Down here at the end, add the "well-known" interface > # (I haven't embraced the @property decorator syntax yet.) > > def __get_shape(self): > return self._shape > __array_shape__ = property(__get_shape) > > def __get_data(self): > # Note that they use a different name internally > return self._buffer > __array_data__ = property(__get_data) > > def __get_itemtype(self): > # Perform an on the fly conversion from the class > # hierarchy type to the struct module typecode that > # closest matches > return self._type._to_typecode() > __array_itemtype__ = property(__get_itemtype) > > >Changing class Numarray to a PyBufferProcs supporting object would be >harder. > > I think they just did this, though... >The C version for Numeric3 arrays would be similar, and there is no wasted >space on a per instance basis in either case. > > Doing this in C would be extremely easy a simple binding of a name to an already available function (and disallowing any set attribute). >The real advantage to the struct module typecodes comes in two forms. >First and most important is that it's already documented and in place - a >defacto standard. Second is that Python script code could use those >typecodes directly with the struct module to pull apart pieces of data. >The disadvantage is that a few new typecodes would be needed... > > >I would even go as far as to recommend their '>' '<' prefix codes for >big-endian and little-endian for just this reason... > > Hmm.. an interesting idea. I don't know if I agree or not. >This would be wonderful. Third party libraries could produce data that is >sufficiently ndarray like without hassle, and users of that library could >promote it to a Numeric3 array with no headaches. > > >By the way, it looks like the "bytes" concept has been revisited recently. >there is a new PEP dated Aug 11, 2004: > > http://www.python.org/peps/pep-0332.html > > Thanks for the pointer. >Thanks for your attention and patience with me on this. I really >appreciate the work you are doing. I wish I could explain my understanding >of things more clearly. > > As I said before, you do a really good job of explaining. I'm pretty much on your side now :-) Let's go ahead and get some __array__XXXXX attribute names decided on. I'll put them in the Numeric3 code base (I could also put them in old Numeric and make a 24.0 release as well --- I need to do that because of a horrible bug in the new empty method: Numeric.empty(, 'O'). -Travis From oliphant at ee.byu.edu Mon Mar 28 01:39:02 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 01:39:02 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <4247D072.3090406@ee.byu.edu> > Plus I would add the scipy_distutils ability to build sources on-fly > feature (build_src command). That's a very fundamental feature useful > whenever swig or f2py is used, or when building sources from templates > or dynamically during a build process. I'd like to use this feature in Numeric3 (which has code-generation). > > Btw, I have started scipy_core clean up. The plan is to create the > following package tree under Numeric3 source tree: This is great news. I'm thrilled to have Pearu's help in doing this. He understands a lot of these issues very well. I'm sure he will be open to suggestions. > > scipy.distutils - contains cpuinfo, exec_command, system_info, etc > scipy.distutils.fcompiler - contains Fortran compiler support > scipy.distutils.command - contains build_src and config_compiler > commands plus few enhancements to build_ext, build_clib, etc > commands > scipy.base - useful modules from scipy_base > scipy.testing - enhancements to unittest module, actually > current scipy_test contains one useful module (testing.py) that > could also go under scipy.base and so getting rid of scipy.testing > scipy.weave - > scipy.f2py - not sure yet how to incorporate f2py2e or weave sources > here. As a first instance people are assumed to download them to > Numeric3/scipy/ directory but in future their sources could be added > to Numeric3 repository. For Numeric3 f2py and weave are optional. > scipy.lib.lapack - wrappers to Atlas/Lapack libraries, by default > f2c generated wrappers are used as in current Numeric. > > For backwards compatibility, there will be > Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ > and > Lib/{LinearAlgebra,..}.py > under Numeric3 that will use modules from scipy. This looks like a good break down. Where will the ndarray object and the ufunc code go in this breakdown? In scipy.base? -Travis From konrad.hinsen at laposte.net Mon Mar 28 02:25:04 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Mar 28 02:25:04 2005 Subject: [Numpy-discussion] Re: Trying out Numeric3 In-Reply-To: <4244963B.5010103@csun.edu> References: <20050323105807.59603.qmail@web50208.mail.yahoo.com> <4241C781.8080001@ee.byu.edu> <4244963B.5010103@csun.edu> Message-ID: <79bf1f715a3fa3de61ca8ebf45cd6c0f@laposte.net> On 25.03.2005, at 23:52, Stephen Walton wrote: > where str is a string is not allowed because strings are immutable. > But if I type "b=7" followed by "b=3", do I really care whether the 3 > gets stuck in the same memory location previously occupied by the 7 > (mutable) or the symbol b points to a new location containing a 3 > (immutable)? What are some circumstances where this might matter? > The most important one in practice is a = some_array[0] b = a a += 3 If the indexing operation returns a scalar (immutable), then "a" and "b" will have different values. If it returns a rank-0 (mutable), then "a" and "b" will be the same. This matters for code that is written with scalars in mind and which then gets fed rank-0 arrays as arguments. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Mon Mar 28 02:30:07 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Mar 28 02:30:07 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4245C512.1030701@csun.edu> References: <4245C512.1030701@csun.edu> Message-ID: <57e36543b3cad1b8680667aa61f5166c@laposte.net> On 26.03.2005, at 21:24, Stephen Walton wrote: > zeros() in Numeric3 defaults to typecode='d' while in numarray it > defaults to typecode=None, which in practice means 'i' by default. Is > this deliberate? Is this desirable? I'd vote for zeros(), ones() and > the like to default to 'i' or 'f' rather than 'd' in the interest of > space and speed. My main argument is a different one: consistency. I see zeros() as an array constructor, a shorthand for calling array() with an explicit list argument. From that point of view, zeros((n,)) should return the same value as array(n*[0]) i.e. an integer array. If people feel a need for a compact float-array generator, I'd rather have an additional function "fzeros()" than a modification of zeros(), whose behaviour in current Numeric and numarray is both consistent and well established. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Mon Mar 28 02:32:13 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Mar 28 02:32:13 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42464443.8050402@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: On 27.03.2005, at 07:27, Michiel Jan Laurens de Hoon wrote: > 3) Fortran support. Very useful, and I'd like to see them in Python's > distutils. Another option would be to put this in SciPy.fortran or > something similar. But since Python's distutils already has a > language= option for C++ and Objective-C, the cleanest way would be to > add this to Python's distutils and enable language="fortran". I agree in principle, but I wonder how stable the Fortran support in SciPy distutils is. If it contains compiler-specific data, then it might not be a good idea to restrict modifications and additions to new Python releases. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From mdehoon at ims.u-tokyo.ac.jp Mon Mar 28 02:58:17 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 28 02:58:17 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <4247E431.5050006@ims.u-tokyo.ac.jp> Pearu Peterson wrote: > Btw, I have started scipy_core clean up. The plan is to create the > following package tree under Numeric3 source tree: > ... > > For backwards compatibility, there will be > Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ > and > Lib/{LinearAlgebra,..}.py > under Numeric3 that will use modules from scipy. Just for clarification: Is this scipy_core or Numeric3 that you're working on? Or are they the same? --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From pearu at scipy.org Mon Mar 28 05:21:18 2005 From: pearu at scipy.org (Pearu Peterson) Date: Mon Mar 28 05:21:18 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <4247E431.5050006@ims.u-tokyo.ac.jp> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> Message-ID: On Mon, 28 Mar 2005, Michiel Jan Laurens de Hoon wrote: > Pearu Peterson wrote: >> Btw, I have started scipy_core clean up. The plan is to create the >> following package tree under Numeric3 source tree: >> ... >> >> For backwards compatibility, there will be >> Packages/{FFT,MA,RNG,dotblas+packages from numarray}/ >> and >> Lib/{LinearAlgebra,..}.py >> under Numeric3 that will use modules from scipy. > > Just for clarification: Is this scipy_core or Numeric3 that you're working > on? Or are they the same? The idea was merge tools from scipy_core (that basically contains scipy_distutils and scipy_base) to Numeric3. The features of scipy_distutils have been stated in previous messages, some of these features will be used to build Numeric3. scipy_base contains enhancements to Numeric (now to be natural part of Numeric3) plus few useful python modules. Which scipy_core modules exactly should be included to Numeric3 or left out of it, depends on how crusial are they for building/maintaining Numeric3 and whether they are useful in general for Numeric3 users. This is completely open for discussion. No part of scipy_core should be blindly copied to Numeric3 project. Pearu From mdehoon at ims.u-tokyo.ac.jp Mon Mar 28 05:31:25 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 28 05:31:25 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> Message-ID: <424807D9.7090800@ims.u-tokyo.ac.jp> Pearu Peterson wrote: > > The idea was merge tools from scipy_core (that basically contains > scipy_distutils and scipy_base) to Numeric3. The features of > scipy_distutils have been stated in previous messages, some of these > features will be used to build Numeric3. scipy_base contains > enhancements to Numeric (now to be natural part of Numeric3) plus few > useful python modules. Which scipy_core modules exactly should be > included to Numeric3 or left out of it, depends on how crusial are they > for building/maintaining Numeric3 and whether they are useful in general > for Numeric3 users. This is completely open for discussion. No part of > scipy_core should be blindly copied to Numeric3 project. > Sounds good to me. Thanks, Pearu. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From faltet at carabos.com Mon Mar 28 07:17:10 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Mar 28 07:17:10 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <4247CEC9.1030903@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: <200503281713.33850.faltet@carabos.com> Hi Travis, Scott, I've been following your discussions and I'm very happy that Travis has finally decided to go with adopting the bytes object in Numeric3. It's also very important that from the discussions, you finally reached an almost complete agreement on how to support the __array__ protocol. I do think that this idea is both very simple and powerful. I do hope this would be a *major* step towards interchanging data between differents applications and packages and, perhaps, this would render almost a non-sense the final goal of including a specific ndarray object in the Python standard library: this simply should be not necessary at all! A Dilluns 28 Mar? 2005 11:30, Travis Oliphant va escriure: [snip] > As I've already said, it would be easy to check for the more specialized > attributes at object creation time to boot-strap an array from an > arbitrary object. [snip] > Let's go ahead and get some __array__XXXXX attribute names decided on. > I'll put them in the Numeric3 code base (I could also put them in old > Numeric and make a 24.0 release as well --- I need to do that because of > a horrible bug in the new empty method: Numeric.empty(, 'O'). Very nice! From what you stated above I deduce that you will be including a case in the Numeric.array constructor so that it can create a properly defined array if the sequence that is passed to it fulfils the __array__ protocol. In addition, if the numarray people would be willing to do the same thing, I envision a very easy (and very efficient) way to convert from/to Numeric to/from numarray (until Numeric3 would be ready for production), something like: NumericArray = Numeric.array(numarrayArray) numarrayArray = numarray.array(NumericArray) Internally, one should decide which is the optimum way to convert from one object to the other. Based on suggestions from Todd Miller on how to do this as efficiently as possible, I have arrived to the conclusions that the next conversions are the most efficient ones: In [69]:na = numarray.arange(100*1000,shape=(100,1000)) In [70]:num = Numeric.arange(100*1000);num=num.resize((100,1000)) In [72]:t1=time();num2=Numeric.fromstring(na._data, typecode=na.typecode());num2=num2.resize(na.shape);time()-t1 Out[72]:0.0017759799957275391 In [73]:t1=time();na2=numarray.fromstring(num.tostring(),type=num.typecode(),shape=num.shape);time()-t1 Out[73]:0.0039050579071044922 Both ways, although very efficient, still copy the data area in the conversion process. In the future, when Numeric3 will support the bytes object, there will be no copy of memory at all for interchanging data with another package (i.e. numarray). Until then, the __array__ protocol may contribute to share data (well, at least contiguous data) efficiently between applications right now. A big thanks to Scott for suggesting and heartedly defending the bytes object and to Travis for unrecklessly becoming a convert. We, the developers of extensions, will be grateful forever :-) Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From rkern at ucsd.edu Mon Mar 28 07:44:18 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 07:44:18 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4245C512.1030701@csun.edu> References: <4245C512.1030701@csun.edu> Message-ID: <4248262D.5060407@ucsd.edu> Stephen Walton wrote: > zeros() in Numeric3 defaults to typecode='d' while in numarray it > defaults to typecode=None, which in practice means 'i' by default. Is > this deliberate? Is this desirable? I'd vote for zeros(), ones() and > the like to default to 'i' or 'f' rather than 'd' in the interest of > space and speed. For zeros() and ones(), I don't think space and speed are going to be affected by the default typecode. In my use of these functions, I almost always need a specific typecode. If I use the default, it's because I actually need the default typecode. Unfortunately, I almost always want Float and not Int, so all of my code is littered with zeros(shape, Float) I'll bet Travis's code looks the same. I would *love* to be able to spell these things like Float.zeros(shape) UInt8.ones(shape) Complex32.array(other) ... Then we could leave zeros() and ones() defaults as they are for backwards compatibility, and deprecate the functions. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From perry at stsci.edu Mon Mar 28 08:18:16 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Mar 28 08:18:16 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <4248262D.5060407@ucsd.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> Message-ID: <09347872476f2c45aaa5d80d2c856088@stsci.edu> On Mar 28, 2005, at 10:43 AM, Robert Kern wrote: > Stephen Walton wrote: >> zeros() in Numeric3 defaults to typecode='d' while in numarray it >> defaults to typecode=None, which in practice means 'i' by default. >> Is this deliberate? Is this desirable? I'd vote for zeros(), ones() >> and the like to default to 'i' or 'f' rather than 'd' in the interest >> of space and speed. > > For zeros() and ones(), I don't think space and speed are going to be > affected by the default typecode. In my use of these functions, I > almost always need a specific typecode. If I use the default, it's > because I actually need the default typecode. Unfortunately, I almost > always want Float and not Int, so all of my code is littered with > > zeros(shape, Float) > > I'll bet Travis's code looks the same. > > I would *love* to be able to spell these things like > > Float.zeros(shape) > UInt8.ones(shape) > Complex32.array(other) > ... This is an odd thought but why not: Float(shape) # defaults to 0 UInt(shape, value=1) I forget if it was proposed to make the type object a constructor for arrays in which case this may conflict with the usage of converting the argument of the callable form to an array, i.e., Float((2,3)) --> array([2.,3.], typecode=Float) # or whatever the name of the type parameter becomes From faltet at carabos.com Mon Mar 28 08:52:18 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Mar 28 08:52:18 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503281713.33850.faltet@carabos.com> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <200503281713.33850.faltet@carabos.com> Message-ID: <200503281847.15157.faltet@carabos.com> A Dilluns 28 Mar? 2005 17:13, Francesc Altet va escriure: [snip] > Based on suggestions from Todd Miller on how > to do this as efficiently as possible, I have arrived to the > conclusions that the next conversions are the most efficient ones: > > In [69]:na = numarray.arange(100*1000,shape=(100,1000)) > In [70]:num = Numeric.arange(100*1000);num=num.resize((100,1000)) > > In [72]:t1=time();num2=Numeric.fromstring(na._data, > typecode=na.typecode());num2=num2.resize(na.shape);time()-t1 > Out[72]:0.0017759799957275391 > In > [73]:t1=time();na2=numarray.fromstring(num.tostring(),type=num.typecode(),s >hape=num.shape);time()-t1 Out[73]:0.0039050579071044922 > Er, sorry, there is in fact a more efficient way to convert from a Numeric object to a numarray object that doesn't require any data copy at all. This is: In [212]:num=Numeric.arange(100*1000, typecode="i");num=num.resize((100,1000)) In [213]:num[0,:5] Out[213]:array([0, 1, 2, 3, 4],'i') In [214]:t1=time();na2=numarray.array(numarray.memory.writeable_buffer(num),type=num.typecode(),shape=num.shape);time()-t1 Out[214]:0.0001010894775390625 # takes just 100 us! In [215]:na2[0,4] = 1 # modify a cell In [216]:num[0,:5] Out[216]:array([0, 1, 2, 3, 1],'i') In [217]:na2[0,:5] Out[217]:array([0, 1, 2, 3, 1]) # na2 has been modified as well, so the # data area is shared between num and na2 in fact, its speed is independent of the array size (as it should be for a non-data-copying procedure): # Create a Numeric object 10x larger In [218]:num=Numeric.arange(1000*1000, typecode="i");num=num.resize((1000,1000)) In [219]:t1=time();na2=numarray.array(numarray.memory.writeable_buffer(num),type=num.typecode(),shape=num.shape);time()-t1 Out[219]:0.00010204315185546875 # 100 us again! This is because numarray has chosen to use a buffer object internally, and that the Numeric object can be wrapped by a buffer object without any actual data copy. That drives me to think that, if the bytes object (that seems to be implemented by Numeric3) could wrap the buffer object where numarray objects hold its data, the conversion between Numeric3 <--> numarray (or, in general, between those packages that deal with bytes objects and other packages that deal with buffer objects) can be done with a cost of 1 (that is, independent of the data size). If this cannot be done (I mean, to get a safe bytes object from a buffer object and vice-versa), well, it should be a pity. Do you think that would be possible at all? Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From faltet at carabos.com Mon Mar 28 09:03:47 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Mar 28 09:03:47 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <09347872476f2c45aaa5d80d2c856088@stsci.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> Message-ID: <200503281902.42770.faltet@carabos.com> A Dilluns 28 Mar? 2005 18:18, Perry Greenfield va escriure: > This is an odd thought but why not: > > Float(shape) # defaults to 0 > UInt(shape, value=1) > > I forget if it was proposed to make the type object a constructor for > arrays in which case this may conflict with the usage of converting the > argument of the callable form to an array, i.e., > > Float((2,3)) --> array([2.,3.], typecode=Float) # or whatever the name > of the type parameter becomes Well, why not: Array(shape, type=Float, defvalue=None) In the end, all three paramters are used to univoquely determine the Array object. Moreover, "defvalue = None" would be a synonymous of the recently introduced "empty" factory. However, this looks suspiciously similar to the "array" factory. Perhaps it would be nice to add this "defvalue" or "value" parameter to the "array" factory and that's all. -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From rkern at ucsd.edu Mon Mar 28 09:38:22 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 09:38:22 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <200503281902.42770.faltet@carabos.com> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> <200503281902.42770.faltet@carabos.com> Message-ID: <42483F5C.7050002@ucsd.edu> Francesc Altet wrote: > A Dilluns 28 Mar? 2005 18:18, Perry Greenfield va escriure: > >>This is an odd thought but why not: >> >>Float(shape) # defaults to 0 >>UInt(shape, value=1) >> >>I forget if it was proposed to make the type object a constructor for >>arrays in which case this may conflict with the usage of converting the >>argument of the callable form to an array, i.e., >> >>Float((2,3)) --> array([2.,3.], typecode=Float) # or whatever the name >>of the type parameter becomes > > > Well, why not: > > Array(shape, type=Float, defvalue=None) > > In the end, all three paramters are used to univoquely determine the > Array object. Moreover, "defvalue = None" would be a synonymous of the > recently introduced "empty" factory. My thought was to not deal with typecode keywords at all and put more responsibility on the typecode objects for general constructor-type operations. In this vein, though, I suggest the spelling Float.new(shape, value=None) # empty Float.new(shape, value=0) # zeros Float.new(shape, value=1) # ones value defaults to None. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From stephen.walton at csun.edu Mon Mar 28 09:39:29 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Mar 28 09:39:29 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <57e36543b3cad1b8680667aa61f5166c@laposte.net> References: <4245C512.1030701@csun.edu> <57e36543b3cad1b8680667aa61f5166c@laposte.net> Message-ID: <424840F1.4000500@csun.edu> konrad.hinsen at laposte.net wrote: > My main argument is a different one: consistency. > > I see zeros() as an array constructor, a shorthand for calling > array() with an explicit list argument. Ah, but array(10*[0]) returns an integer array, and array(10*[0.]) returns a double. Which should zeros() be equivalent to? From xscottg at yahoo.com Mon Mar 28 10:30:22 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Mar 28 10:30:22 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <4247CEC9.1030903@ee.byu.edu> Message-ID: <20050328182929.50411.qmail@web50205.mail.yahoo.com> --- Travis Oliphant wrote: > > Thank you for your detailed explanations. This is starting to make more > sense to me. It is obvious that you understand what we are trying to > do, and I pretty much agree with you in how you think it should be > done. I think you do a great job of explaining things. > > I agree we should come up with a set of names for the interface to > arrayobjects. I'm even convinced that offset should be an optional part > of the interface (implied 0 if it's not there). > Very cool! You just made my day. I wish I had time to do a good writeup, but I need to catch a flight in a couple hours, and I won't be back behind my computer until Wednesday night. Here is an initial stab: __array_shape__ Required, a sequence (typically tuple) of non-negative int/longs __array_storage__ Required, a buffer or possibly sequence object (list) (Required unless the object support PyBufferProcs directly? I don't have a strong opinion on that one...) A slightly different name to indicate it could be a buffer or sequence object (like a list). Typically buffer. __array_itemtype__ Suggested, but Optional if __array_itemsize__ is present. This attribute probably warrants some discussion... A struct module format string or one of the additional ones that needs to be added. Need to discuss "long double" and "Object". (Capital 'O' for Object, Captial 'D' for long double, Capital 'X' for bit?) If not present or the empty string '', indicates that the array elements can only be treated as blobs and the real data representation must be gotten from some other means. I think doubling the typecode as a convention to denote complex numbers makes some sense (for instance 'ff' is complex float). The struct module convention for denoting native, portable big endian, and portable little endian is concise and documented. __array_itemsize__ Optional if __array_itemtype is present and the value can calculated from struct.calcsize(__array_itemtype__) __array_strides__ Optional if the array data is in a contiguous C layout. Required otherwise. Same length as __array_shape__. Indicates how much to multiply subscripts by to get to the desired position in the storage. A sequence (typically tuple) of ints/longs. These are in byte offsets (not element_size offsets) for most arrays. Special exceptions made for: Tightly packed (8 bits to a byte) bitmask arrays, where they offsets are bit indexes PyObject arrays (lists) where the offsets are indexes They should be byte offsets to handle non-aligned data or data with odd packing. Fortran arrays might be common enough to warrant special casing. We could discuss whether a __array_fortran__ attribute indicates that the array is in contiguous Fortran layout __array_offset__ Optional and defaults to zero. An int/long indicating the offset to treat as the zeroth element __array_complicated__ Optional and defaults to zero/false. This is a kluge to indicate that while yes the data is an array, the storage layout can not be easily described by the shape/strides/offset combination alone. This could warrant some discussion. __array_fortran__ Optional and defaults to zero/false. If you want to represent Fortran arrays without creating a strides for them, this would be necessary. I'd vote to leave it out and stick with strides... These are all just suggestions. Is something important missing? Predicates like iscontiguous(a) and isfortran(a) can all be easily determined from the above. The ndims or rank is simply len(a.__array_shape__). I wish I had more time to respond to some of the other things in your message, but I'm gone until Wednesday night... Cheers, -Scott From oliphant at ee.byu.edu Mon Mar 28 12:05:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 12:05:08 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> Message-ID: <42486323.9080801@ee.byu.edu> >> Just for clarification: Is this scipy_core or Numeric3 that you're >> working on? Or are they the same? > > > The idea was merge tools from scipy_core (that basically contains > scipy_distutils and scipy_base) to Numeric3. The features of > scipy_distutils have been stated in previous messages, some of these > features will be used to build Numeric3. scipy_base contains > enhancements to Numeric (now to be natural part of Numeric3) plus few > useful python modules. Which scipy_core modules exactly should be > included to Numeric3 or left out of it, depends on how crusial are > they for building/maintaining Numeric3 and whether they are useful in > general for Numeric3 users. This is completely open for discussion. No > part of scipy_core should be blindly copied to Numeric3 project. My understanding is that scipy_core and Numeric3 are the same thing. I'm using the terminology Numeric3 in emails to avoid confusion, but I would rather see one package emerge from this like scipy_core. I would prefer not to have a "Numeric3" package and a separate "scipy_core" package, unless there is a good reason to have two packages. -Travis From perry at stsci.edu Mon Mar 28 13:54:10 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Mar 28 13:54:10 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <4247CEC9.1030903@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: On Mar 28, 2005, at 4:30 AM, Travis Oliphant wrote: > Scott, > > Thank you for your detailed explanations. This is starting to make > more sense to me. It is obvious that you understand what we are > trying to do, and I pretty much agree with you in how you think it > should be done. I think you do a great job of explaining things. > I agree we should come up with a set of names for the interface to > arrayobjects. I'm even convinced that offset should be an optional > part of the interface (implied 0 if it's not there). > Just to add my two cents, I don't think I ever thought it was necessary to bundle the metadata with the memory object for the reasons Scott outlined. It isn't needed functionally, and there are cases where the same memory may be used in different contexts (as is done with our record arrays). Numarray, when it uses the buffer object, always gets a fresh pointer for the buffer object for every data access. But Scott is right that that pointer is good so long as there isn't a chance for something else to change it. In practice, I don't think that ever happens with the buffers that numarray happens to use, but it's still a flaw of the current buffer object that there is no way to ensure it won't change. I'm not sure how the support for large data sets should be handled. I generally think that it will be very awkward to handle these until Python does as well. Speaking of which... I had been in occasional contact with Martin von Loewis about his work to update Python to handle 64-bit addressing. We weren't planning to handle this in nummarray (nor Numeric3, right Travis or do I have that wrong?) until Python did. A few months ago Martin said he was mostly done. I had a chance to talk to him at Pycon about where that work stood. Unfortunately, it is not turning out to be as easy as he hoped. This is too bad. I have a feeling that this work is going to stall without help on our (numpy community) part to help make the changes or drum beating to make it a higher priority. At the moment the Numeric3 effort should be the most important focus, but I think that after that, this should become a high priority. Perry From perry at stsci.edu Mon Mar 28 14:04:24 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Mar 28 14:04:24 2005 Subject: [Numpy-discussion] What is Numeric3 anyway? In-Reply-To: <4246732D.2080908@ims.u-tokyo.ac.jp> References: <4244AFB6.40601@ee.byu.edu> <42464639.6050207@ims.u-tokyo.ac.jp> <424655B1.4000503@ee.byu.edu> <4246732D.2080908@ims.u-tokyo.ac.jp> Message-ID: <5d690db672d4a406a82e3b8b6c0da541@stsci.edu> On Mar 27, 2005, at 3:47 AM, Michiel Jan Laurens de Hoon wrote: > As far as I can tell from their setup.py, neither Numerical Python nor > numarray currently does code generation on the fly from setup.py. > (This was one of the reasons that I started to worry if Numeric3 is > more than Numerical Python + numarray). Numarray definitely does code generation (and so did Numeric originally, eventually the generated code was hand-edited). Code generation is the way to go (with C anyway). Perry From rkern at ucsd.edu Mon Mar 28 14:18:09 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 14:18:09 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> Message-ID: <42488268.3030505@ucsd.edu> konrad.hinsen at laposte.net wrote: > On 27.03.2005, at 07:27, Michiel Jan Laurens de Hoon wrote: > >> 3) Fortran support. Very useful, and I'd like to see them in Python's >> distutils. Another option would be to put this in SciPy.fortran or >> something similar. But since Python's distutils already has a >> language= option for C++ and Objective-C, the cleanest way would be >> to add this to Python's distutils and enable language="fortran". > > > I agree in principle, but I wonder how stable the Fortran support in > SciPy distutils is. If it contains compiler-specific data, then it > might not be a good idea to restrict modifications and additions to new > Python releases. Case in point: Pearu just added g95 support last week. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From pearu at scipy.org Mon Mar 28 14:47:13 2005 From: pearu at scipy.org (Pearu Peterson) Date: Mon Mar 28 14:47:13 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42486323.9080801@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> <42486323.9080801@ee.byu.edu> Message-ID: On Mon, 28 Mar 2005, Travis Oliphant wrote: > >>> Just for clarification: Is this scipy_core or Numeric3 that you're working >>> on? Or are they the same? >> >> >> The idea was merge tools from scipy_core (that basically contains >> scipy_distutils and scipy_base) to Numeric3. The features of >> scipy_distutils have been stated in previous messages, some of these >> features will be used to build Numeric3. scipy_base contains enhancements >> to Numeric (now to be natural part of Numeric3) plus few useful python >> modules. Which scipy_core modules exactly should be included to Numeric3 or >> left out of it, depends on how crusial are they for building/maintaining >> Numeric3 and whether they are useful in general for Numeric3 users. This is >> completely open for discussion. No part of scipy_core should be blindly >> copied to Numeric3 project. > > > My understanding is that scipy_core and Numeric3 are the same thing. I'm > using the terminology Numeric3 in emails to avoid confusion, but I would > rather see one package emerge from this like scipy_core. I would prefer not > to have a "Numeric3" package and a separate "scipy_core" package, unless > there is a good reason to have two packages. In that case ndarray object and ufunc codes should go under scipy.base. We can postpone this move until scipy.distutils is ready. And if I understand you correctly then from scipy.base import * will replace from Numeric import * or from numarray import * roughly speaking. Pearu From oliphant at ee.byu.edu Mon Mar 28 15:16:20 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 15:16:20 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> <42486323.9080801@ee.byu.edu> Message-ID: <42489027.8030903@ee.byu.edu> Pearu Peterson wrote: > > from scipy.base import * > > will replace > > from Numeric import * > > or > > from numarray import * > > roughly speaking. > > Pearu This is exactly what I would like to see. We will need, however, to provide that import Numeric and friends still works for backward compatibility, but it should be deprecated. Best, -Travis From oliphant at ee.byu.edu Mon Mar 28 15:26:32 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 15:26:32 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: <42489275.7060600@ee.byu.edu> > > Just to add my two cents, I don't think I ever thought it was > necessary to bundle the metadata with the memory object for the > reasons Scott outlined. It isn't needed functionally, and there are > cases where the same memory may be used in different contexts (as is > done with our record arrays). I'm glad we've worked that one out. > > Numarray, when it uses the buffer object, always gets a fresh pointer > for the buffer object for every data access. But Scott is right that > that pointer is good so long as there isn't a chance for something > else to change it. In practice, I don't think that ever happens with > the buffers that numarray happens to use, but it's still a flaw of the > current buffer object that there is no way to ensure it won't change. One could see it as a "flaw" in the buffer object, but I prefer to see it as problesm with objects that use the PyBufferProcs protocol. It is at worst, a "limitation" of the buffer interface that should be advertised (in my mind the problem lies with the objects that make use of the buffer protocol and also reallocate memory willy-nilly since Python does not allow for this). To me, an analagous situation occurs when an extension module writes into memory it does not own and causes a seg-fault. I suppose a casual observer could say this is a Python flaw but clearly the problem is with the extension object. It certinaly does not mean at all that something like a buffer object should never exist or that the buffer protocol should not be used. I get the feeling sometimes, that some naive (to Numeric and numarray) people on python-dev feel that way. > > I'm not sure how the support for large data sets should be handled. I > generally think that it will be very awkward to handle these until > Python does as well. Speaking of which... > > I had been in occasional contact with Martin von Loewis about his work > to update Python to handle 64-bit addressing. We weren't planning to > handle this in nummarray (nor Numeric3, right Travis or do I have that > wrong?) until Python did. A few months ago Martin said he was mostly > done. I had a chance to talk to him at Pycon about where that work > stood. Unfortunately, it is not turning out to be as easy as he hoped. > This is too bad. I have a feeling that this work is going to stall > without help on our (numpy community) part to help make the changes or > drum beating to make it a higher priority. At the moment the Numeric3 > effort should be the most important focus, but I think that after > that, this should become a high priority. > I would be interested to hear what the problems are. Why can't you just change the protocol replacing all int's with Py_intptr_t? Is backward compatibilty the problem? This seems like it's on the extension code level (and then only on 64-bit systesm), and so would be easier to force through the change in Python 2.5. Numeric3 will suffer limitations whenever the sequence protocol is used. We can work around it as much as possible (by not using the sequence protocol whenever possible), but the limitation lies firmly in the Python sequence protocol. -Travis From stephen.walton at csun.edu Mon Mar 28 15:40:17 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Mar 28 15:40:17 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <42483F5C.7050002@ucsd.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> <200503281902.42770.faltet@carabos.com> <42483F5C.7050002@ucsd.edu> Message-ID: <424895BC.7030504@csun.edu> Robert Kern wrote: > Float.new(shape, value=None) # empty > Float.new(shape, value=0) # zeros > Float.new(shape, value=1) # ones Uhm, my first reaction to this kind of thing is "ugh," but maybe I'm just not thinking in the correct OO mode. Is this any better than zeros() and ones()? For that matter, is it any better than x=zeros(shape) x=any_old_scalar Having said that, the main reason I use zeros() in MATLAB is to preallocate space. MATLAB can dynamically grow arrays, so the following is legal: x=[]; do k=1:100 x[:,k]=a_vector_of_100_values; and produces a 100 by 100 array. While legal, it is much faster to preallocate x by changing the first line to "x=zeros(100,100);". Since NumPy arrays can't grow dynamically, perhaps this is a small issue. From oliphant at ee.byu.edu Mon Mar 28 16:00:14 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 16:00:14 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328182929.50411.qmail@web50205.mail.yahoo.com> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> Message-ID: <42489A65.2030201@ee.byu.edu> >I wish I had time to do a good writeup, but I need to catch a flight in a >couple hours, and I won't be back behind my computer until Wednesday night. > Here is an initial stab: > > __array_shape__ > Required, a sequence (typically tuple) of non-negative int/longs > > great. I agree. > __array_storage__ > Required, a buffer or possibly sequence object (list) > > (Required unless the object support PyBufferProcs directly? > I don't have a strong opinion on that one...) > > A slightly different name to indicate it could be a buffer or > sequence object (like a list). Typically buffer. > > I prefer __array_data__ (it's a common name for Numeric and numarray, It can be interpreted as a sequence object if desired). > __array_itemtype__ > Suggested, but Optional if __array_itemsize__ is present. > > I say this one defaults to "V" for void * if not present. And _array_itemsize__ is necessary if it is "S" (string), "U" unicode, or "V". I also like __array_typestr__ or __array_typechar__ better as a name. > A struct module format string or one of the additional ones > that needs to be added. Need to discuss "long double" and > "Object". (Capital 'O' for Object, Captial 'D' for long double, > Capital 'X' for bit?) > > Don't like 'D' for long double. Complex floats is already using it. I'm not sure I like the idea of moving to two character typecodes at this point because it indicates more internal changes to Numeric3 (otherwise we have two typecharacter standards which is not a good thing). What is wrong with 'g' and 'G' for long double and complex long double respectively. > If not present or the empty string '', indicates that the > array elements can only be treated as blobs and the real > data representation must be gotten from some other means. > > Again, a void * type handles this well. > The struct module convention for denoting native, portable > big endian, and portable little endian is concise and documented. > > So, you think we should put the byte-order in the typecharacter interface. Don't know.... could be persuaded. > __array_itemsize__ > Optional if __array_itemtype is present and the value can > calculated from struct.calcsize(__array_itemtype__) > > I think it is only optional if typechar is not 'S', 'U', or 'V'. > __array_strides__ > Optional if the array data is in a contiguous C layout. > Required otherwise. Same length as __array_shape__. > Indicates how much to multiply subscripts by to get to > the desired position in the storage. > > A sequence (typically tuple) of ints/longs. These are in > byte offsets (not element_size offsets) for most arrays. > Special exceptions made for: > Tightly packed (8 bits to a byte) bitmask arrays, where > they offsets are bit indexes > > PyObject arrays (lists) where the offsets are indexes > > They should be byte offsets to handle non-aligned data or data > with odd packing. > > Fortran arrays might be common enough to warrant special casing. > We could discuss whether a __array_fortran__ attribute indicates > that the array is in contiguous Fortran layout > > I don't think it is necessary in the interface. > __array_offset__ > Optional and defaults to zero. An int/long indicating the offset > to treat as the zeroth element > > __array_complicated__ > Optional and defaults to zero/false. This is a kluge to indicate > that while yes the data is an array, the storage layout can not > be easily described by the shape/strides/offset combination alone. > > This could warrant some discussion. > > I don't see the utility here I guess, If it can't be described by a shape/strides combination then how can it participate in the protocol? > __array_fortran__ > Optional and defaults to zero/false. If you want to represent > Fortran arrays without creating a strides for them, this would > be necessary. I'd vote to leave it out and stick with strides... > > > Me too. We should make the interface as minimal as possible, intially. My proposal: __array_data__ (optional object that exposes the PyBuffer protocol or a sequence object, if not present, the object itself is used). __array_shape__ (required tuple of int/longs that gives the shape of the array) __array_strides__ (optional provides how to step through the memory in bytes (or bits if a bit-array), default is C-contiguous) __array_typestr__ (optional struct-like string showing the type --- optional endianness indicater + Numeric3 typechars, default is 'V') __array_itemsize__ (required if above is 'S', 'U', or 'V') __array_offset__ (optional offset to start of buffer, defaults to 0) So, you could define an array interface with only two additional attributes if your object exposed the buffer or sequence protocol. We should figure out a way to work around the 32-bit limitations of the sequence and buffer protocols as well. -Travis From oliphant at ee.byu.edu Mon Mar 28 16:07:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Mar 28 16:07:08 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328182929.50411.qmail@web50205.mail.yahoo.com> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> Message-ID: <42489BF7.4040401@ee.byu.edu> Scott Gilbert wrote: > __array_itemtype__ > Suggested, but Optional if __array_itemsize__ is present. > > This attribute probably warrants some discussion... > > A struct module format string or one of the additional ones > that needs to be added. Need to discuss "long double" and > "Object". (Capital 'O' for Object, Captial 'D' for long double, > Capital 'X' for bit?) > > If not present or the empty string '', indicates that the > array elements can only be treated as blobs and the real > data representation must be gotten from some other means. > > I think doubling the typecode as a convention to denote complex > numbers makes some sense (for instance 'ff' is complex float). > > The struct module convention for denoting native, portable > big endian, and portable little endian is concise and documented. > > After more thought, I think here we need to also allow the "c-type" independent way of describing an array (i.e. numarray-introduced 'c4' for a complex-valued 4 byte itemsize array). So, pehaps __array_ctypestr_ and __array_typestr__ should be two ways to get the information (or overload the __array_typestr__ interface and reequire consumers to accept either style). -Travis From mdehoon at ims.u-tokyo.ac.jp Mon Mar 28 18:18:44 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 28 18:18:44 2005 Subject: [SciPy-dev] Re: [Numpy-discussion] Trying out Numeric3 In-Reply-To: <42486323.9080801@ee.byu.edu> References: <423A6F69.8020803@ims.u-tokyo.ac.jp> <4242BA03.5050204@ims.u-tokyo.ac.jp> <42435D18.809@ee.byu.edu> <4243D4A5.9050004@ims.u-tokyo.ac.jp> <42464443.8050402@ims.u-tokyo.ac.jp> <4247E431.5050006@ims.u-tokyo.ac.jp> <42486323.9080801@ee.byu.edu> Message-ID: <4248BBC3.9080102@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > My understanding is that scipy_core and Numeric3 are the same thing. > I'm using the terminology Numeric3 in emails to avoid confusion, but I > would rather see one package emerge from this like scipy_core. I would > prefer not to have a "Numeric3" package and a separate "scipy_core" > package, unless there is a good reason to have two packages. > Right now, I think it's probably better to call it it scipy_core instead of Numeric3, since we'll be doing >>> from scipy.base import * instead of >>> from Numeric import * --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From rkern at ucsd.edu Mon Mar 28 23:37:16 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Mar 28 23:37:16 2005 Subject: [Numpy-discussion] zeros() default type in Numeric3 In-Reply-To: <424895BC.7030504@csun.edu> References: <4245C512.1030701@csun.edu> <4248262D.5060407@ucsd.edu> <09347872476f2c45aaa5d80d2c856088@stsci.edu> <200503281902.42770.faltet@carabos.com> <42483F5C.7050002@ucsd.edu> <424895BC.7030504@csun.edu> Message-ID: <42490437.6050002@ucsd.edu> Stephen Walton wrote: > Robert Kern wrote: > >> Float.new(shape, value=None) # empty >> Float.new(shape, value=0) # zeros >> Float.new(shape, value=1) # ones > > > Uhm, my first reaction to this kind of thing is "ugh," but maybe I'm > just not thinking in the correct OO mode. Is this any better than > zeros() and ones()? For that matter, is it any better than > > x=zeros(shape) > x=any_old_scalar x[:] = any_old_scalar you mean? Perhaps not, *if* I need the default type, which I rarely do. And when I do need the default type, and I'm coding carefully, I will add the type anyways to be explicit. I *do* think that x = CFloat.new(shape, 2j*pi) is better than x = empty(shape, type=CFloat) x[:] = 2j*pi I don't think there's much OO in it. The implementations won't change, really. It's more a matter of aesthetics of the API. I like it for much the same reasons that transpose(array) et al. were folded into methods of arrays. Also, with Perry's and Francesc's suggestions, it collapses three very similar functions into one. > Having said that, the main reason I use zeros() in MATLAB is to > preallocate space. I use it the same in Python. Sometimes, I'm going to be replacing all of the values (in which case I would use empty()), but often I only need to sparsely replace values. Usually, the "background" value ought to be 0, but occasionally, things get weirder. But, this isn't a particularly important issue. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From rkern at ucsd.edu Tue Mar 29 03:46:19 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Mar 29 03:46:19 2005 Subject: [Numpy-discussion] searching a list of arrays In-Reply-To: <200503271820.51307.dd55@cornell.edu> References: <200503271820.51307.dd55@cornell.edu> Message-ID: <42493FBD.1060002@ucsd.edu> Darren Dale wrote: > Hi, > > I have a list of numeric-23.8 arrays: > > a = [array([0,1]), > array([0,1]), > array([1,0]), > array([1,0])] > > b = [array([0,1,0]), > array([0,1,0]), > array([1,0,0]), > array([1,0,0])] > > and I want to make a new list out of b: > > c = [array([0,1,2]), > array([1,0,2])] > > where the last index in each array is the result of > > b.count([0,1,0]) # or [1,0,0] > > > The problem is that the result of b.count(array([1,0,0])) is 4, not 2, and > b.remove(array([1,0,0])) indescriminantly removes arrays from the list. > a.count and a.remove work the way I expected. This is a result of rich comparisons. (array1 == array2) yields an array, not a boolean. In [1]:a = [array([0,1]), ...: array([0,1]), ...: array([1,0]), ...: array([1,0])] In [2]:b = [array([0,1,0]), ...: array([0,1,0]), ...: array([1,0,0]), ...: array([1,0,0])] In [3]: In [3]:b.count(array([0,1,0])) Out[3]:4 In [4]:[x == array([0,1,0]) for x in b] Out[4]: [array([1, 1, 1],'b'), array([1, 1, 1],'b'), array([0, 0, 1],'b'), array([0, 0, 1],'b')] To replace b.count(), you can do In [12]:sum(alltrue(equal(b, array([0,1,0])), axis=-1)) Out[12]:2 To replace b.remove(), you can do In [14]:[x for x in b if not alltrue(x == array([0,1,0]))] Out[14]:[array([1, 0, 0]), array([1, 0, 0])] -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From faltet at carabos.com Tue Mar 29 05:24:24 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Mar 29 05:24:24 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> Message-ID: <200503291523.18309.faltet@carabos.com> A Dilluns 28 Mar? 2005 23:54, Perry Greenfield va escriure: > Numarray, when it uses the buffer object, always gets a fresh pointer > for the buffer object for every data access. But Scott is right that > that pointer is good so long as there isn't a chance for something else > to change it. In practice, I don't think that ever happens with the > buffers that numarray happens to use, but it's still a flaw of the > current buffer object that there is no way to ensure it won't change. However, having to update the pointer for the buffer object for every data access does impact performance quite a lot. This issue has been brought up to this list some months ago (see [1]). I, as for one, have renounced to call NA_updateDataPtr() during table reads in PyTables and this speeded up the reading process by 70%, which is not a joke. And this speed-up could be theoretically achieved in every piece of code that reads like: for i range(n): a = numarrayobject[i] that is, whenever a single element in array is accessed. If the bytes object suggested by Scott makes the call to NA_updateDataPtr() unnecessary then this is an added advantage of bytes over buffer. [1] http://sourceforge.net/mailarchive/message.php?msg_id=8848962 Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From magnus at hetland.org Tue Mar 29 07:21:39 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 07:21:39 2005 Subject: [Numpy-discussion] Linear programming Message-ID: <20050329151958.GA28688@idi.ntnu.no> Is there some standard Python (i.e., numarray/Numeric) mapping for some linear programming package out there? Might be rather useful... -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From perry at stsci.edu Tue Mar 29 07:46:47 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Mar 29 07:46:47 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489275.7060600@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> Message-ID: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> On Mar 28, 2005, at 6:25 PM, Travis Oliphant wrote: > > One could see it as a "flaw" in the buffer object, but I prefer to see > it as problesm with objects that use the PyBufferProcs protocol. It > is at worst, a "limitation" of the buffer interface that should be > advertised (in my mind the problem lies with the objects that make use > of the buffer protocol and also reallocate memory willy-nilly since > Python does not allow for this). To me, an analagous situation > occurs when an extension module writes into memory it does not own and > causes a seg-fault. I suppose a casual observer could say this is a > Python flaw but clearly the problem is with the extension object. > > It certinaly does not mean at all that something like a buffer object > should never exist or that the buffer protocol should not be used. I > get the feeling sometimes, that some naive (to Numeric and numarray) > people on python-dev feel that way. > Certainly there needs to be something like this (that's why we used it for numarray after all). >> >> I'm not sure how the support for large data sets should be handled. I >> generally think that it will be very awkward to handle these until >> Python does as well. Speaking of which... >> >> I had been in occasional contact with Martin von Loewis about his >> work to update Python to handle 64-bit addressing. We weren't >> planning to handle this in nummarray (nor Numeric3, right Travis or >> do I have that wrong?) until Python did. A few months ago Martin said >> he was mostly done. I had a chance to talk to him at Pycon about >> where that work stood. Unfortunately, it is not turning out to be as >> easy as he hoped. This is too bad. I have a feeling that this work is >> going to stall without help on our (numpy community) part to help >> make the changes or drum beating to make it a higher priority. At the >> moment the Numeric3 effort should be the most important focus, but I >> think that after that, this should become a high priority. >> > > I would be interested to hear what the problems are. Why can't you > just change the protocol replacing all int's with Py_intptr_t? Is > backward compatibilty the problem? This seems like it's on the > extension code level (and then only on 64-bit systesm), and so would > be easier to force through the change in Python 2.5. > As Martin explained it, he said there is a lot of code that uses int declarations. If you are saying that it would be easy just to replace all int declarations in Python, I doubt it is that simple since there are calls to many other libraries that must use ints. So it means that there are thousands (so Martin says) of declarations that one must change by hand. It has to be changed for strings, lists, tuples and everything that uses them (Guido was open to doing this but everything had to be updated at once, not just strings or certain objects, and he is certainly right about that). Martin also said that we would need a system with enough memory to test all of these. Lists in particular would need a system with 16GB of memory to test lists that use more than the current limit (because of the size of list objects). I'm not sure I agree with that. It would be nice to have that kind of test, but I think it would be reasonable to have tested on the largest memory systems available at the time for our testing. If there are latent list sequence bugs that surface when 16 GB systems become available, then the bugs can be dealt with at that time (IMHO). (Anybody out there have a system with that much memory available for test purposes :-). Of course, this change will change the C API for Python too as far as sequence use goes (or is there some way around that? A compatibility API and a new one that supports extended indices?) It would be nice if there were some way of handling that gracefully without requiring all extensions to have to change to match this. I imagine that this is going to be the biggest objection to making any changes unless the old API is supported for a while. Perhaps someone has thought this all out already. I haven't thought about it at all. Perry From perry at stsci.edu Tue Mar 29 07:53:23 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Mar 29 07:53:23 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489A65.2030201@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> Message-ID: <7805a039fbad32679dcc101cde7b9be8@stsci.edu> On Mar 28, 2005, at 6:59 PM, Travis Oliphant wrote: >> The struct module convention for denoting native, portable >> big endian, and portable little endian is concise and >> documented. >> > So, you think we should put the byte-order in the typecharacter > interface. Don't know.... could be persuaded. > I think we need to think about what the typecharacter is supposed to represent. Is it the value as the user will see it or to indicate what the internal representation is? These are two different things. Then again, I'm not sure how this info is exposed to the user; if it is appropriately handled by intermediate code it may not matter. For example, if this corresponds to what the user will see for the type, I think it is bad. Most of the time they don't care what the internal representation is, they just want to know if it is Int16 or whatever; with the two combined, they have to test for both variants. Perry From rkern at ucsd.edu Tue Mar 29 07:58:22 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Mar 29 07:58:22 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <20050329151958.GA28688@idi.ntnu.no> References: <20050329151958.GA28688@idi.ntnu.no> Message-ID: <42497AD6.2030700@ucsd.edu> Magnus Lie Hetland wrote: > Is there some standard Python (i.e., numarray/Numeric) mapping for > some linear programming package out there? Might be rather useful... My Google-fu does not reveal an obvious one. There does seem to be a recent one in which the authors wrote their own matrix object! http://www.ee.ucla.edu/~vandenbe/cvxopt/ -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From pearu at cens.ioc.ee Tue Mar 29 12:08:27 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Tue Mar 29 12:08:27 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <20050329151958.GA28688@idi.ntnu.no> Message-ID: On Tue, 29 Mar 2005, Magnus Lie Hetland wrote: > Is there some standard Python (i.e., numarray/Numeric) mapping for > some linear programming package out there? Might be rather useful... It is certainly not a standard one but some years ago I wrote a wrapper to cddlib: http://cens.ioc.ee/projects/polyhedron/ I haven't used it with recent versions of Numeric or Python though. Pearu From magnus at hetland.org Tue Mar 29 13:07:20 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 13:07:20 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <20050328182929.50411.qmail@web50205.mail.yahoo.com> References: <4247CEC9.1030903@ee.byu.edu> <20050328182929.50411.qmail@web50205.mail.yahoo.com> Message-ID: <20050329210615.GA4743@idi.ntnu.no> > __array_storage__ How about __array_data__? -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Mar 29 13:11:20 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 13:11:20 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489A65.2030201@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> Message-ID: <20050329211013.GB4743@idi.ntnu.no> Travis Oliphant : [snip] > My proposal: > > __array_data__ (optional object that exposes the PyBuffer protocol or a > sequence object, if not present, the object itself is used). > __array_shape__ (required tuple of int/longs that gives the shape of the > array) > __array_strides__ (optional provides how to step through the memory in > bytes (or bits if a bit-array), default is C-contiguous) > __array_typestr__ (optional struct-like string showing the type --- > optional endianness indicater + Numeric3 typechars, default is 'V') > __array_itemsize__ (required if above is 'S', 'U', or 'V') > __array_offset__ (optional offset to start of buffer, defaults to 0) > > So, you could define an array interface with only two additional > attributes if your object exposed the buffer or sequence protocol. Wohoo! Niiice :) (Okay, a bit "me too"-ish, but I just wanted to contribute some enthusiasm ;) -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Mar 29 13:15:19 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Mar 29 13:15:19 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <42497AD6.2030700@ucsd.edu> References: <20050329151958.GA28688@idi.ntnu.no> <42497AD6.2030700@ucsd.edu> Message-ID: <20050329211417.GC4743@idi.ntnu.no> Robert Kern : > > Magnus Lie Hetland wrote: > >Is there some standard Python (i.e., numarray/Numeric) mapping for > >some linear programming package out there? Might be rather useful... > > My Google-fu does not reveal an obvious one. Neither did mine ;) I did find pysimplex, though... But that's not really what I'm after, I guess. > There does seem to be a recent one in which the authors wrote their own > matrix object! Oh, no! 8-| Hm. Maybe this is a use-case for the new buffer stuff? Exposing the bytes of their arrays shouldn't be so hard... Easier than introducing numpy arrays of some sort, I should think ;) > http://www.ee.ucla.edu/~vandenbe/cvxopt/ Hm. They support sparse matrices too. Interesting. Thanks for the tip! -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From faltet at carabos.com Tue Mar 29 13:55:37 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Mar 29 13:55:37 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503291523.18309.faltet@carabos.com> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <200503291523.18309.faltet@carabos.com> Message-ID: <200503292354.20352.faltet@carabos.com> A Dimarts 29 Mar? 2005 15:23, Francesc Altet va escriure: > This issue has been brought up to this list some months ago (see [1]). > I, as for one, have renounced to call NA_updateDataPtr() during table > reads in PyTables and this speeded up the reading process by 70%, > which is not a joke. And this speed-up could be theoretically achieved > in every piece of code that reads like: > > for i range(n): > a = numarrayobject[i] > > that is, whenever a single element in array is accessed. Well, the statement above is not exactly true. The overhead introduced by NA_updateDataPtr (and other functions related with the buffer object) is mainly important when you call the __getitem__ method from *extensions* and less important (but yet significant!) when you are in pure Python. This evening I wanted to evaluate how much would be the acceleration if it would be not necessary to call NA_updateDataPtr and companions (i.e. getting rid of the buffer object), found some interesting results and ended doing a quite long report that took this sunny Spring evening away from me :( Despite its rather serious format, please, don't look at it as a serious demonstration of nothing. It was made basically because I need maximum performance on __getitem__ operations and was curious on what Numeric/numarray/Numeric3 can offer in that regard. If I'm publishing it here is because it could of help for somebody. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" A note on __getitem__ performance on Numeric/numarray on Python extensions (with an small follow-up on Numeric3) ========================================================================== Francesc Altet 2005-03-29 Abstract ======== Numeric [1] and numarray [2] are Python packages that provide very convenient containers to deal with large amounts of data in memory in an efficient way. The fact that they have quite different implementations lends naturally to areas where one package is better suited than the other, and vice-versa. In fact, it is a luck to have such a duality because competence is basic on every software (sane) ecosystem. The best way of determining which package is better adapted to do a certain task is benchmarking. In this report, I have made use of Pyrex [3] and oprofile [4] in order to decide which is the best candidate to be used for accessing the data in the containers from C extensions. In the appendix, some attention has been dedicated as well to Numeric3, a new-born contender for Numeric and numarray. Motivation ========== I need peak performance when accessing to data belonging to Numeric/numarray objects in my extensions, so I decided to do some profiling on the next code, which is representative of my own needs: niter = 5 N = 1000*1000 def matrix_loop(object): for j in xrange(niter): for i in xrange(N): p = object[i] This basically exercises the __getitem__ special method in Numeric/numarray objects. The benchmark ============= In order to get some comparisons done, I've made a small script (getitem-numarrayVSNumeric.py) that checks the speed for both kinds of objects: Numeric and numarray. Also, and in order to reduce the Python overhead, I've used psyco [3] so that the results may get as close as possible as if these tests were running inside a Python extension (made in C). Moreover, I've used the oprofile [4] so as to get an idea of where the CPU is wasted in this loop. First of all, I've made a calibration test to measure the time of the empty loop, that is: def null_loop(): for j in xrange(niter): for i in xrange(N): pass This time is almost negligible when running with Psyco (and the same happens inside a C extension), but it takes a *significant* time if psyco is not active. Once this time has been measured, it is substracted from the loops that actually exercise __getitem__. First (naive) timings ===================== Now, let's see some of the timings that I've done. My platform is a Pentium4 @ 2GHZ laptop, using Debian GNU/Linux and kernel 2.6.9 and with gcc 3.3.5. First of all, I'll list the results without psyco: $ python2.3 bench/getitem-numarrayVSNumeric.py Psyco not active Numeric version: 23.8 numarray version: 1.2.3 Calibration loop: 0.11173081398 Time for numarray(getitem)/iter: 3.82528972626e-07 Time for Numeric(getitem)/iter: 2.51150989532e-07 getitem in Numeric is 1.52310358537 times faster We can see how the time per iteration for numarray is 380 ns while for Numeric is 250 ns, which accounts for a 1.5x speed-up of Numeric vs numarray. Using psyco to reduce Python overhead ===================================== However, and even though we have substracted the time for the calibration loop, there may remain other places were time is wasted in Python space. Psyco is a good manner to optimize loops and make them go almost as fast as in C. Now, the figures using psyco: $ python2.3 bench/getitem-numarrayVSNumeric.py Psyco active Numeric version: 23.8 numarray version: 1.2.3 Calibration loop: 0.0015878200531 Time for numarray(getitem)/iter: 2.4246096611e-07 Time for Numeric(getitem)/iter: 1.19336557388e-07 getitem in Numeric is 2.0317409134 times faster We can see how the time for the calibration loop has been improved a factor 100x. Not too bad for a silly loop. Also, the time per iteration for numarray has dropped to 242 ns and to 119 ns for Numeric. This accounts for a 2x speedup. The first conclusion is that numarray is considerably slower than Numeric when accessing its data. Besides, when using psyco, part of the Python overhead evaporates, making the gap between Numeric and numarray loops to grow. Introducing oprofile: getting a broad view of what's going on ============================================================= In order to measure the exact difference of __getitem__ method without the Python overhead (in an extension, for example) I've used oprofile against the psyco version of the benchmark. Here is the result for the run with psyco and profiled with oprofile: # opreport /usr/bin/python2.3 samples| %| ------------------ 586 34.1293 libnumarray.so 454 26.4415 python2.3 331 19.2778 _numpy.so 206 11.9977 _ndarray.so 102 5.9406 memory.so 22 1.2813 libc-2.3.2.so 9 0.5242 ld-2.3.2.so 4 0.2330 multiarray.so 2 0.1165 _sort.so 1 0.0582 _psyco.so libnumarray.so, _ndarray.so, memory.so and _sort.so shared libraries all belongs to numarray package. The _numpy.so and multiarray.so fall into Numeric. The time spent in python space is very little (just a 26%, in a great deal thanks to psyco acceleration). The libc-2.3.2.so and ld-2.3.2.so belongs to the C runtime library, and it is not possible to decide whether this time has been used by numarray, Numeric or Python itself, but as the time consumed is very little, we can safely ignore it. So, if we sum the samples when the CPU was in the C space (the shared libs) in numarray, and compare against the time in C space in Numeric, we get that this is 894 against 331, which means that Numeric is 2.7x faster than numarray for __getitem__. Of course, this is more than 1.5x and 2x factor that we get earlier because of the time spent in python space. However, the 2.7x factor is probably more accurate when one wants to exercise __getitem__ in C extensions. Most CPU intensive functions using oprofile ========================================== If we want to look at the most consuming functions in numarray: # opstack -t 1 /usr/bin/python2.3 | sort -nr| head -10 454 26.6432 python2.3 (no symbols) 331 19.4249 _numpy.so (no symbols) 145 8.5094 libnumarray.so NA_getPythonScalar 115 6.7488 libnumarray.so NA_getByteOffset 101 5.9272 libnumarray.so isBufferWriteable 98 5.7512 _ndarray.so _ndarray_subscript 91 5.3404 _ndarray.so _simpleIndexingCore 73 4.2840 libnumarray.so NA_updateDataPtr 64 3.7559 memory.so memory_getbuf 60 3.5211 libnumarray.so getReadBufferDataPtr The _numpy.so was stripped out of debugging info, so we can't see where the time was spent in Numeric. However, we can estimate the cost for getting a fresh pointer for the data buffer for every data access in numarray: isBufferWriteable+NA_updateDataPtr+memory_getbuf+getReadBufferDataPtr gives a total of 298 samples, which is almost as much as all the time spent by the Numeric shared library (331). So we can conclude that having a buffer object in our array object can be a serious drawback if we want to get maximum performance for accessing the data. Another point that can be worth to look at is in NA_getByteOffset that takes 115 samples by itself. This is perhaps a little too much. Conclusions =========== To sum up, we can expect that the __getitem__ method in Numeric would be 1.5x times faster than numarray in pure python code, 2x when using Psyco, and 2.7x times faster when used in C extensions. One factor that (partially) explain that numarray is slower in this area is that it is based on the buffer interface to keep its data. This feature, while very convenient for certain tasks (like sharing data with other Python packages or extensions), has a limitation that make an extension to crash if the memory buffer is reallocated. Other solutions (like the "bytes" object [5]) has been proposed to overcome this limitation (and others) of the buffer interface. Numeric3 might choose this to avoid these kind of contention problems created by the buffer interface. Finally, we have seen how using oprofile could be of unvaluable help for determining where the hot spots are, not only in our extensions, but also in other shared libraries in our system. If the shared libraries also have debugging info on them, then it would be possible to track down even the most expensive routines in our application. Appendix ======== Even though it is in the very early stages of existence, I was curious about how Numeric3 [3] would perform in comparison with Numeric. By slightly changing getitem-numarrayVSNumeric.py, I've come up with getitem-NumericVSNumeric3.py, which do the comparison I wanted to. When running without psyco, I got: $ python2.3 bench/getitem-NumericVSNumeric3.py Psyco not active Numeric version: 23.8 Numeric3 version: Very early alpha release...! Calibration loop: 0.107951593399 Time for Numeric3(getitem)/iter: 1.18472018242e-06 Time for Numeric(getitem)/iter: 2.45458602905e-07 getitem in Numeric is 4.82655799551 times faster Ops, Numeric3 is almost 5 times slower than Numeric. So it really seems to be still in very alpha (you know, premature optimization is the root of all evils). Never mind, this is just an exercise. So, let's continue with the psyco version: $ python2.3 bench/getitem-NumericVSNumeric3.py Psyco active Numeric version: 23.8 Numeric3 version: Very early alpha release...! Calibration loop: 0.00171356201172 Time for Numeric3(getitem)/iter: 1.04013824463e-06 Time for Numeric(getitem)/iter: 1.19578647614e-07 getitem in Numeric is 8.69836099828 times faster The gap has increased to 8.6x as expected. Let's have a look at the most consuming shared libs by using oprofile: # opreport /usr/bin/python2.3 samples| %| ------------------ 1841 33.7365 multiarray.so 1701 31.1710 libc-2.3.2.so 1586 29.0636 python2.3 318 5.8274 _numpy.so 6 0.1100 ld-2.3.2.so 3 0.0550 multiarray.so 2 0.0367 _psyco.so God! two libraries alone are getting more than half of the CPU: multiarray.so and libc-2.3.2.so. As we already know that Numeric3 __getitem__ takes much more time than its counterpart in Numeric, we can conclude that Numeric3 comes with its own multiarray.so, and that it is responsible for taking one third (33.7%) of the time. Moreover, multiarray.so should be the responsible to be calling the libc routines so much, because in our previous benchmarks, the libc calls never took more than 5% of the time, and here is taking more than 30%. To conclude, let's see which are the most consuming routines in Numeric3 for this exercise: # opstack -t 1 /usr/bin/python2.3 | sort -nr| head -20 1586 30.1750 python2.3 (no symbols) 669 12.7283 libc-2.3.2.so __GI___strcasecmp 618 11.7580 multiarray.so PyArray_MapIterNew 374 7.1157 multiarray.so array_subscript 318 6.0502 _numpy.so (no symbols) 260 4.9467 libc-2.3.2.so __realloc 190 3.6149 libc-2.3.2.so _int_malloc 172 3.2725 multiarray.so PyArray_New 152 2.8919 libc-2.3.2.so __strncasecmp 123 2.3402 libc-2.3.2.so malloc_consolidate 121 2.3021 libc-2.3.2.so __memalign_internal 118 2.2451 multiarray.so array_dealloc 102 1.9406 libc-2.3.2.so _int_realloc 93 1.7694 multiarray.so fancy_indexing_check 86 1.6362 multiarray.so arraymapiter_dealloc 79 1.5030 multiarray.so PyArray_Scalar 76 1.4460 multiarray.so LONG_copyswapn 62 1.1796 multiarray.so PyArray_UpdateFlags 57 1.0845 multiarray.so PyArray_DescrFromType While we can see that a lot of time is spent inside the multiarray.so of Numeric3 it also catch our attention that a lot of time is spent doing the __GI___strcasecmp system call. This is very strange, because our arrays are made of integers and calling strcasecmp on each iteration seems like very unnecessary. In order to know who is calling strcasecmp (i.e. get the call tree), oprofile needs a special patched version of the linux kernel. But this is material for another story. References ========== [1] http://numpy.sourceforge.net/ [2] http://stsdas.stsci.edu/numarray/ [3] http://psyco.sourceforge.net/ [4] http://oprofile.sourceforge.net/ [5] http://www.python.org/peps/pep-0296.html -------------- next part -------------- A non-text attachment was scrubbed... Name: getitem-numarrayVSNumeric.py Type: application/x-python Size: 1280 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: getitem-NumericVSNumeric3.py Type: application/x-python Size: 1281 bytes Desc: not available URL: From oliphant at ee.byu.edu Tue Mar 29 18:13:35 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Mar 29 18:13:35 2005 Subject: [Numpy-discussion] large file and array support In-Reply-To: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> Message-ID: <424A0AE6.9090209@ee.byu.edu> There are two distinct issues with regards to large arrays. 1) How do you support > 2Gb memory mapped arrays on 32 bit systems and other large-object arrays only a part of which are in memory at any given time (there is an equivalent problem for > 8 Eb (exabytes) on 64 bit systems, an Exabyte is 2^60 bytes or a giga-giga-byte). 2) Supporting the sequence protocol for in-memory objects on 64-bit systems. Part 2 can be fixed using the recommendations Martin is making and which will likely happen (though it could definitely be done faster). Handling part 1 is more difficult. One idea is to define some kind of "super object" that mediates between the large file and the in-memory portion. In other words, the ndarray is an in-memory object, while the super object handles interfacing it with a larger structure. Thoughts? -Travis From perry at stsci.edu Tue Mar 29 18:26:35 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Mar 29 18:26:35 2005 Subject: [Numpy-discussion] Re: large file and array support In-Reply-To: <424A0AE6.9090209@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <424A0AE6.9090209@ee.byu.edu> Message-ID: <776140fae84b09d015d2508955611c5b@stsci.edu> On Mar 29, 2005, at 9:11 PM, Travis Oliphant wrote: > There are two distinct issues with regards to large arrays. > > 1) How do you support > 2Gb memory mapped arrays on 32 bit systems and > other large-object arrays only a part of which are in memory at any > given time (there is an equivalent problem for > 8 Eb (exabytes) on 64 > bit systems, an Exabyte is 2^60 bytes or a giga-giga-byte). > > 2) Supporting the sequence protocol for in-memory objects on 64-bit > systems. > > Part 2 can be fixed using the recommendations Martin is making and > which will likely happen (though it could definitely be done faster). > Handling part 1 is more difficult. > > One idea is to define some kind of "super object" that mediates > between the large file and the in-memory portion. In other words, the > ndarray is an in-memory object, while the super object handles > interfacing it with a larger structure. > > Thoughts? Maybe I'm missing something but isn't it possible to mmap part of a large file? In that case one just limits the memory maps to what can be handled on a 32 bit system leaving it up to the user software to determine which part of the file to mmap. Did you have something more automatic in mind? As for other large-object arrays I'm not sure what other examples there are other than memory mapping. Do you have any? Perry From pjssilva at ime.usp.br Tue Mar 29 18:45:31 2005 From: pjssilva at ime.usp.br (Paulo J. S. Silva) Date: Tue Mar 29 18:45:31 2005 Subject: [Numpy-discussion] Linear programming In-Reply-To: <20050329151958.GA28688@idi.ntnu.no> References: <20050329151958.GA28688@idi.ntnu.no> Message-ID: <1112150674.8038.10.camel@localhost.localdomain> Em Ter, 2005-03-29 ?s 17:19 +0200, Magnus Lie Hetland escreveu: > Is there some standard Python (i.e., numarray/Numeric) mapping for > some linear programming package out there? Might be rather useful... > Hello, I have written a very simple wrapper to COIN/CLP (www.coin-or.org) based on swig. I am using this code in my own research. It is simple, but it is good enough for me. I will clean it up a little and "release" it this week. Please enter in contact by Friday with me. Here is a sample code using the wrapper: --- Sample code --- from numarray import * import Coin s = Coin.OsiSolver() # Define objective and variables bounds ncols = 2 obj = array([-1.0, -1.0]) col_lb = array([0.0, 0.0]) col_ub = s.getInfinity()*array([1.0, 1.0]) # Define constraints nrows = 2 row_lb = -s.getInfinity()*array([1.0, 1.0]) row_ub = array([3.0, 3.0]) matrix = Coin.CoinPackedMatrix(0, 0, 0) matrix.setDimensions(0, ncols) row1 = Coin.CoinPackedVector() row1.insert(0, 1.0) row1.insert(1, 2.0) matrix.appendRow(row1) row2 = Coin.CoinPackedVector() row2.insert(0, 2.0) row2.insert(1, 1.0) matrix.appendRow(row2) # Load Problem s.loadProblem(matrix, col_lb, col_ub, obj, row_lb, row_ub) # Write mps model. s.writeMps('example') # Solve problem s.initialSolve() # Print optimal value. print 'Optimal value: ', s.getObjValue() print 'Solution: ', s.getColSolution() --- End sample --- Note that I am using the COIN's sparce matrix and vector so as to use sparcity in the CLP solver. Best, Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: pjssilva at ime.usp.br Web: http://www.ime.usp.br/~pjssilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica.(understand well enough to call practice) From faltet at carabos.com Wed Mar 30 02:45:00 2005 From: faltet at carabos.com (Francesc Altet) Date: Wed Mar 30 02:45:00 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <42489A65.2030201@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> Message-ID: <200503301240.55483.faltet@carabos.com> A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: > My proposal: > > __array_data__ (optional object that exposes the PyBuffer protocol or a > sequence object, if not present, the object itself is used). > __array_shape__ (required tuple of int/longs that gives the shape of the > array) > __array_strides__ (optional provides how to step through the memory in > bytes (or bits if a bit-array), default is C-contiguous) > __array_typestr__ (optional struct-like string showing the type --- > optional endianness indicater + Numeric3 typechars, default is 'V') > __array_itemsize__ (required if above is 'S', 'U', or 'V') > __array_offset__ (optional offset to start of buffer, defaults to 0) > Considering that heterogenous data is to be suported as well, and there is some tradition of assigning names to the different fields, I wonder if it would not be good to add something like: __array_names__ (optional comma-separated names for record fields) Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From oliphant at ee.byu.edu Wed Mar 30 11:39:02 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 30 11:39:02 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503301240.55483.faltet@carabos.com> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> Message-ID: <424AFFE9.40300@ee.byu.edu> >A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: > > >>My proposal: >> >>__array_data__ (optional object that exposes the PyBuffer protocol or a >>sequence object, if not present, the object itself is used). >>__array_shape__ (required tuple of int/longs that gives the shape of the >>array) >>__array_strides__ (optional provides how to step through the memory in >>bytes (or bits if a bit-array), default is C-contiguous) >>__array_typestr__ (optional struct-like string showing the type --- >>optional endianness indicater + Numeric3 typechars, default is 'V') >>__array_itemsize__ (required if above is 'S', 'U', or 'V') >>__array_offset__ (optional offset to start of buffer, defaults to 0) >> >> >> > >Considering that heterogenous data is to be suported as well, and >there is some tradition of assigning names to the different fields, I >wonder if it would not be good to add something like: > >__array_names__ (optional comma-separated names for record fields) > > > I'm O.K. with that. After more thought, I think using the struct-like typecharacters is not a good idea for the array protocol. I think that the character codes used by the numarray record array: kind_character + byte_width is better. Commas can separate heterogeneous data. The problem is that if the data buffer originally came from a different machine or saved with a different compiler (e.g. a mmap'ed file), then the struct-like typecodes only tell you the c-type that machine thought the data was. It does not tell you how to interpret the data on this machine. So, I think we should use the __array_typestr__ method to pass type information using the kind_character + byte_width method. I'm also going to use this type information for pickles, so that arrays pickled on one machine type will be able to be interpreted on another with ease. Bool -- "b%d" % sizeof(bool) Signed Integer -- "i%d" % sizeof() Unsigned Integer -- "u%d" % sizeof() Float -- "f%d" % sizeof() Complex -- "c%d" % sizeof() Object -- "O%d" % sizeof(PyObject *) --- this would only be useful on shared memory String -- "S%d" % itemsize Unicode -- "U%d" % itemsize Void -- "V%d" % itemsize I also think that rather than attach < or > to the start of the string it would be easier to have another protocol for endianness. Perhaps something like: __array_endian__ (optional Python integer with the value 1 in it). If it is not 1, then a byteswap must be necessary. -Travis From oliphant at ee.byu.edu Wed Mar 30 11:49:03 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 30 11:49:03 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <424AFFE9.40300@ee.byu.edu> References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> <424AFFE9.40300@ee.byu.edu> Message-ID: <424B022B.3040004@ee.byu.edu> > > After more thought, I think using the struct-like typecharacters is > not a good idea for the array protocol. I think that the character > codes used by the numarray record array: kind_character + byte_width > is better. Commas can separate heterogeneous data. The problem is > that if the data buffer originally came from a different machine or > saved with a different compiler (e.g. a mmap'ed file), then the > struct-like typecodes only tell you the c-type that machine thought > the data was. It does not tell you how to interpret the data on this > machine. > So, I think we should use the __array_typestr__ method to pass type > information using the kind_character + byte_width method. I'm also > going to use this type information for pickles, so that arrays pickled > on one machine type will be able to be interpreted on another with ease. > > Bool -- "b%d" % sizeof(bool) > Signed Integer -- "i%d" % sizeof() > Unsigned Integer -- "u%d" % sizeof() > Float -- "f%d" % sizeof() > Complex -- "c%d" % sizeof() > Object -- "O%d" % sizeof(PyObject *) --- this > would only be useful on shared memory > String -- "S%d" % itemsize > Unicode -- "U%d" % itemsize > Void -- "V%d" % itemsize Of course with this protocol for the typestr, the array_itemsize is redundant and can disappear. Another reason to like it. > I also think that rather than attach < or > to the start of the string > it would be easier to have another protocol for endianness. Perhaps > something like: > __array_endian__ (optional Python integer with the value 1 in it). > If it is not 1, then a byteswap must be necessary. I'm mixed on this, I could be persuaded either way. -Travis From cookedm at physics.mcmaster.ca Wed Mar 30 13:06:42 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Mar 30 13:06:42 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: <200503301240.55483.faltet@carabos.com> (Francesc Altet's message of "Wed, 30 Mar 2005 12:40:55 +0200") References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> Message-ID: Francesc Altet writes: > A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: >> My proposal: >> >> __array_data__ (optional object that exposes the PyBuffer protocol or a >> sequence object, if not present, the object itself is used). >> __array_shape__ (required tuple of int/longs that gives the shape of the >> array) >> __array_strides__ (optional provides how to step through the memory in >> bytes (or bits if a bit-array), default is C-contiguous) >> __array_typestr__ (optional struct-like string showing the type --- >> optional endianness indicater + Numeric3 typechars, default is 'V') >> __array_itemsize__ (required if above is 'S', 'U', or 'V') >> __array_offset__ (optional offset to start of buffer, defaults to 0) > > Considering that heterogenous data is to be suported as well, and > there is some tradition of assigning names to the different fields, I > wonder if it would not be good to add something like: > > __array_names__ (optional comma-separated names for record fields) A sequence (list or tuple) of strings would be preferable. That removes all worrying about using commas in the names. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Wed Mar 30 15:34:45 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Mar 30 15:34:45 2005 Subject: [Numpy-discussion] Pickle complete (new ideas for Python arrays) Message-ID: <424B3730.4040408@ee.byu.edu> Hi all, Pickling is now implemented for scipy.base (was calling it Numeric3) Anybody wanting to tackle a function to read old Numeric and/or numarray pickles is welcome. I think this could be all in Python. Ideally, we should be able to read these pickles without having those packages installed. I think the PEP for Python should be converted to a bare-bones protocol (e.g. the one that is emerging) Optionally we could create a very simple default arrayobject for Python that just has a default pickle implementation and knows how to get data through the buffer interface from other objects). That way any array implementation just has to talk the Python array protocol to be interoperable with any other array implementation. -Travis From oliphant at ee.byu.edu Thu Mar 31 15:53:01 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Mar 31 15:53:01 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core Message-ID: <424C8D05.7030006@ee.byu.edu> To all interested in the future of arrays... I'm still very committed to Numeric3 as I want to bring the numarray and Numeric people together behind a single array object for scientific computing. But, I've been thinking about the array protocol and thinking that it would be a good thing if this became universal. One of the ways to make it universal is by having something that follows it in the Python core. So, what if we proposed for the Python core not something like Numeric3 (which would still exist in scipy.base and be everybody's favorite array :-) ), but a very minimal array object (scaled back even from Numeric) that followed the array protocol and had some C-API associated with it. This minimal array object would support 5 basic types ('bool', 'integer', 'float', 'complex', 'Object'). (Maybe a void type could be defined and a void "scalar" introduced (which would be the bytes object)). These types correspond to scalars already available in Python and so the whole 0-dim array Python scalar arguments could be ignored. Math could be done without ufuncs initially (people really needing speed would use scipy.base anyway). But, more people in the Python community would be able to use arrays and get used to them. And we would have a reference array_protocol object so that extension writers could write to it. I would not try a project like this until after scipy_core is out, but it's an interesting thing to think about. I mainly wanted feedback on the basic concept. An alternative would be to "add" multidimensionality to the array object already part of Python, fix it's reallocating with an exposed buffer problem, and add the array protocol. -Travis From xscottg at yahoo.com Thu Mar 31 20:14:15 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Mar 31 20:14:15 2005 Subject: [Numpy-discussion] Array Metadata Message-ID: <20050401041204.18335.qmail@web50208.mail.yahoo.com> I got back late last night, and there were lots of things I wanted to comment on. I've put parts of several threads into this one message since they're all dealing with the same general topic: Perry Greenfield wrote: > > I'm not sure how the support for large data sets should be handled. > I generally think that it will be very awkward to handle these > until Python does as well. Speaking of which... > I agree that it's going to be difficult to have general support for large PyBufferProcs objects until the Python core is made 64 bit clean. But specific support can be added for buffer types that are known in advance. For instance, the bytes object PEP proposes an alternate way to get a 64 bit length, and similar support could easily be added to Numarray.memory, mmap.mmap, and whatever else on a case by case basis. So you could get a 64 bit pointer from some types of buffers before the rest of Python becomes 64 bit clean. If the ndarray consumer (wxWindows for instance) doesn't recognize the particular implementation, it has to stick with the limitations of the standard PyBufferProcs and assume a 32 bit length would suffice. Travis Oliphant wrote: > > I prefer __array_data__ (it's a common name for Numeric and > numarray, It can be interpreted as a sequence object if desired). > So long as everyone agrees it doesn't matter what name it is. Sounds like __array_data__ works for everyone. > > I also like __array_typestr__ or __array_typechar__ better as a name. > A name is a name as far as I'm concerned. The name __array_typestr__ works for me. The name __array_typechar__ implies a single character, and that won't be true. > > Don't like 'D' for long double. Complex floats is already > using it. I'm not sure I like the idea of moving to two > character typecodes at this point because it indicates more > internal changes to Numeric3 (otherwise we have two typecharacter > standards which is not a good thing). What is wrong with 'g' > and 'G' for long double and complex long double respectively. > Nothing in this array protocol should *require* internal changes to either Numeric3 or Numarray. I suspect Numarray is going to keep it's type hierarchy, and Numeric3 can use single character codes for it's representation if it wants. However, both Numeric3 and Numarray might (probably would) have to translate their internal array type specifiers into the agreed upon "type code string" when reporting out this attribute. The important qualities __array_typestr__ should have are: 1) Everyone should agree on the interpretation. It needs to be documented somewhere. Third party libraries should get the same __array_typestr__ from Numarray as they do from Numeric3. 2) It should be sufficiently general in it's capabilities to describe a wide category of array types. Simple things should be simple, and harder things should be possible. An ndarray of double should have a simple common well recognized value for __array_typestr__. An ndarray of a multi-field structs should be representable too. > > > > > __array_complicated__ > > > > I don't see the utility here I guess, If it can't be described by a > shape/strides combination then how can it participate in the protocol? > I'm not married to this one. I don't know if Numarray or Numeric3 will ever do such a thing, but I can imagine more complicated schemes of arranging the data than offset/shape/strides are capable of representing. So this is forward compatibility with "Numarric4" :-). Pretty hypothetical, but imagine that typically Numarric4 can represent it's data with offset/shape/strides, but for more advanced operations that falls apart. I could bore you with a detailed example... The idea is that if array consumers like wxPython were aware that more complicated implementations can occur in the future, they can gracefully bow out and raise an exception instead of incorrectly interpreting the data. If you need it later, you can't easily add it after the fact. Take it or leave it I guess - it's possibly a YAGNI. > > After more thought, I think here we need to also allow the > "c-type" independent way of describing an array (i.e. numarray > introduced 'c4' for a complex-valued 4 byte itemsize array). > So, perhaps __array_ctypestr_ and __array_typestr__ should be > two ways to get the information (or overload the __array_typestr__ > interface and reequire consumers to accept either style). > I don't understand what you are proposing here. Why would you want to represent the same information two different ways? Perry Greenfield wrote: > > I think we need to think about what the typecharacter is supposed > to represent. Is it the value as the user will see it or to indicate > what the internal representation is? These are two different things. > I think __array_typestr__ should accurately represent the internal representation. It is not intended for typical end users. The whole of the __array_*metadata*__ stuff is intended for third party libraries like wxPython or PIL to be able to grab a pointer to the data, calculate offsets, and cast it to the appropriate type without writing lots of special case code to handle the differences between Numeric, Numarray, Numeric3, and whatever else. > > Then again, I'm not sure how this info is exposed to the user; if it > is appropriately handled by intermediate code it may not matter. For > example, if this corresponds to what the user will see for the type, > I think it is bad. Most of the time they don't care what the internal > representation is, they just want to know if it is Int16 or whatever; > with the two combined, they have to test for both variants. > Typical users would call whatever attribute or method you prefer (.type() or .typecode() for instance), and the type representation could be classes or typecodes or whatever you think is best. The __array_typestr__ attribute is not for typical users (unless they start to care about the details under the hood). It's for libraries that need to know what's going on in a generic fashion. You don't have to store this attribute as separate data, it can be a property style attribute that calculates it's value dynamically from your own internal representation. Francesc Altet wrote: > > Considering that heterogenous data is to be suported as well, and > there is some tradition of assigning names to the different fields, > I wonder if it would not be good to add something like: > > __array_names__ (optional comma-separated names for record fields) > I really like this idea. Although I agree with David M. Cooke that it should be a tuple of names. Unless there is a use case I'm not considering, it would be preferrable if the names were restricted to valid Python identifiers. Travis Oliphant wrote: > > After more thought, I think using the struct-like typecharacters > is not a good idea for the array protocol. I think that the > character codes used by the numarray record array: kind_character > + byte_width is better. Commas can separate heterogeneous data. > The problem is that if the data buffer originally came from a > different machine or saved with a different compiler (e.g. a mmap'ed > file), then the struct-like typecodes only tell you the c-type that > machine thought the data was. It does not tell you how to interpret > the data on this machine. > The struct module has a portable set of typecodes. They call it "standard", but it's the same thing. The struct module let's you specify either standard or native. For instance, the typecode for "standard long" ("=l") is always 4 bytes while a "native long" ("@l") is likely to be 4 or 8 bytes depending on the platform. The __array_typestr__ codes should require the "standard" sizes. There is a table at the bottom of the documentation that goes into detail: http://docs.python.org/lib/module-struct.html The only problem with the struct module is that it's missing a few types... (long double, PyObject, unicode, bit). > > I also think that rather than attach < or > to the start of the > string it would be easier to have another protocol for endianness. > Perhaps something like: > > __array_endian__ (optional Python integer with the value 1 in it). > If it is not 1, then a byteswap must be necessary. > This has the problem you were just describing. Specifying "byteswapped" like this only tells you if the data was reversed on the machine it came from. It doesn't tell you what is correct for the current machine. Assuming you represented little endian as 0 and big endian as 1, you could always figure out whether to byteswap like this: byteswap = data_endian ^ host_endian Do you want to have an __array_endian__ where 0 indicates "little endian", 1 indicates "big endian", and the default is whatever the current host machine uses? I think this would work for a lot of cases. A limitation of this approach is that it can't adequately represent struct/record arrays where some fields are big endian and others are little endian. > > Bool -- "b%d" % sizeof(bool) > Signed Integer -- "i%d" % sizeof() > Unsigned Integer -- "u%d" % sizeof() > Float -- "f%d" % sizeof() > Complex -- "c%d" % sizeof() > Object -- "O%d" % sizeof(PyObject *) > --- this would only be useful on shared memory > String -- "S%d" % itemsize > Unicode -- "U%d" % itemsize > Void -- "V%d" % itemsize > The above is a nice start at reinventing the struct module typecodes. If you and Perry agree to it, that would be great. A few additions though: I think you're proposing that "struct" or "record" arrays would be a concatenation of the above strings. If so, you'll need an indicator for padding bytes. (You probably know this, but structs in C frequently have wasted bytes inserted by the compiler to make sure data is aligned on the machine addressable boundaries.) I also assume that you intend the ("c%d" % itemsize) to always represent complex floating point numbers. That leaves my favorite example of complex short integer data with no way to be represented... I guess I could get by with "i2i2". How about not having a complex type explicitly, but representing complex data as something like: __array_typestr__ = "f4f4 __array_names__ = ("real", "imag") Just a thought... I do like it though. I think that both Numarray and Numeric3 are planning on storing booleans in a full byte. A typecode for tightly packed bits wouldn't go unused however... > > 1) How do you support > 2Gb memory mapped arrays on 32 bit systems > and other large-object arrays only a part of which are in memory at > any given time > Doing this well is a lot like implementing mmap in user space. I think this is a modification to the buffer protocol, not the array protocol. It would add a bit of complexity if you want to deal with it, but it is doable. Instead of just grabbing a pointer to the whole thing, you need to ask the object to "page in" ranges of the data and give you a pointer that is only valid in that range. Then when you're done with the pointer, you need to explicitly tell the object so that it can write back if necessary and release the memory for other requests. Do you think Numeric3 or Numarray would support this? I think it would be very cool functionality to have. > > (there is an equivalent problem for > 8 Eb (exabytes) on 64 bit > systems, an Exabyte is 2^60 bytes or a giga-giga-byte). > I think it will be at least 10-20 years before we could realisticly exceed a 64 bit address space. Probably a lot longer. That's a billion times more RAM than any machine I've ever worked on, and it's a million times more bytes than any RAID set I've worked with. Are there any super computers approaching this level? Even at Moore's law rates, I'm not worried about that one just yet. > > But, I've been thinking about the array protocol and thinking that > it would be a good thing if this became universal. One of the ways > to make it universal is by having something that follows it in the > Python core. > > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some > C-API associated with it. > > This minimal array object would support 5 basic types ('bool', > 'integer', 'float', 'complex', 'Object'). (Maybe a void type > could be defined and a void "scalar" introduced (which would be > the bytes object)). These types correspond to scalars already > available in Python and so the whole 0-dim array Python scalar > arguments could be ignored. > I really like this idea. It could easily be implemented in C or Python script. Since half it's purpose is for documentation, the Python script implementation might make more sense. Additionally, a module that understood the defaults and did the right thing with the metadata attributes would be useful: def get_ndims(a): return len(a.__array_shape__) def get_offset(a): if hasattr(a, "__array_offset__"): return a.__array_offset__ return 0 def get_strides(a): if hasattr(a, "__array_strides__"): return a.array_strides # build the default strides from the shape def is_c_contiguous(a): shape = a.__array_shape__ strides = get_strides(a) # determine if the strides indicate it is contiguous def is_fortran_contiguous(a): # similar to is_c_contiguous etc... Thes functions could be useful for third party libraries to work with *any* of the array packages. > > An alternative would be to "add" multidimensionality to the array > object already part of Python, fix it's reallocating with an exposed > buffer problem, and add the array protocol. > I'd recommend not breaking backward compatibility on the array.array object, but adding the __array_*metadata*__ attributes wouldn't hurt anything. (The __array_shape__ would always be a tuple of length one, but that's allowed...). Magnus Lie Hetland wrote: > > Wohoo! Niiice :) > > (Okay, a bit "me too"-ish, but I just wanted to contribute some > enthusiasm ;) > I completely agree! :-) Cheers, -Scott From konrad.hinsen at laposte.net Thu Mar 31 23:23:01 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Mar 31 23:23:01 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424C8D05.7030006@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> Message-ID: <324ad11b79f2594d6589ce4dec7ee1e4@laposte.net> On 01.04.2005, at 01:51, Travis Oliphant wrote: > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some C-API > associated with it. What would that minimal array object have in common with the full-size one? A subset of both the Python API and the C API? The data layout? Would the full one be a subtype of the minimal one? I like the idea in principle but I would like to be sure that it doesn't create additional overhead in the full array or in extension modules that use arrays, in the form of additional typecheck and compatibility criteria. Once there is a minimal array type in the core, objects of that type will be circulating and must somehow be handled. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ -------