From verveer at embl-heidelberg.de Fri Apr 1 00:40:06 2005 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Fri Apr 1 00:40:06 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424C8D05.7030006@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> Message-ID: Good idea, for many applications such an extension would be 'good enough'. 1) python code using such arrays should be 100% compatible with numarray/numeric/scipy. Should be possible if a sub-set of numeric/numarray/scipy is used. 2) Extensions written in C should handle such arrays transparently (without unnecessary copying). Should also be possible given a compatible data layout. Peter > To all interested in the future of arrays... > > I'm still very committed to Numeric3 as I want to bring the numarray > and Numeric people together behind a single array object for > scientific computing. > > But, I've been thinking about the array protocol and thinking that it > would be a good thing if this became universal. One of the ways to > make it universal is by having something that follows it in the Python > core. > > > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some C-API > associated with it. > > > This minimal array object would support 5 basic types ('bool', > 'integer', 'float', 'complex', 'Object'). (Maybe a void type could > be defined and a void "scalar" introduced (which would be the bytes > object)). These types correspond to scalars already available in > Python and so the whole 0-dim array Python scalar arguments could be > ignored. > > Math could be done without ufuncs initially (people really needing > speed would use scipy.base anyway). But, more people in the Python > community would be able to use arrays and get used to them. And we > would have a reference array_protocol object so that extension writers > could write to it. > > > I would not try a project like this until after scipy_core is out, but > it's an interesting thing to think about. I mainly wanted feedback on > the basic concept. > > > An alternative would be to "add" multidimensionality to the array > object already part of Python, fix it's reallocating with an exposed > buffer problem, and add the array protocol. From oliphant at ee.byu.edu Fri Apr 1 01:30:38 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 01:30:38 2005 Subject: [Numpy-discussion] __array_typestr__ Message-ID: <424D14E9.70607@ee.byu.edu> For the most part, it seems the array protocol is easy to agree on. The one difficulty is typestr. For what it's worth, here are my opinions on what has been said regarding the typestr. * Endian-ness should be included in the typestr --- it is how the data is viewed and an intrinsic part of the type as much as int, or float. * I like the fact that struct character codes are documented, but it is hard to remember. The simpler division into basic types and byte-widths that the numarray record module uses is easier to remember. * I'm mixed on whether or not support for describing complex data types should be used or if their description as a record is good enough. On the one hand we think of complex numbers as additional types, but on the other hand, in terms of machine layout they really are just two floats, so perhaps it is better to look at them that way in a protocol whose purpose is just describing how to interpret a block of memory. Especially since complex numbers could conceivably be built on top of any of the other types. In addition, it is conceivable that a rational array might be supported by some array object in the future and that would most easily be handled by a record array where the names were now something like ("numer", "denom") . The typestr argument should just help us specify what is in the memory chunk at each array element (how should it be described). * I'm wondering about including multiple types in the typestr. On the one hand we could describe complicated structures by packing all the information into the typestr. On the other hand, it may be better if we just use 'V8' to describe an 8-byte memory buffer with an additional attribute that contains both the names and the typestr: __array_recinfo__ = (('real','f4'),('imag','f4')) or for a "rational type" __array_recinfo__ = (('numer','i4'),('denom','i4')) so that the detail of the typecode for a "record" type is handled by another special method using tuples. On this level, we could add the possibility of specifying a shape for a small array inside (just like the record array of numarray does). -Travis From faltet at carabos.com Fri Apr 1 02:01:11 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Apr 1 02:01:11 2005 Subject: [Numpy-discussion] Re: Array Metadata In-Reply-To: <20050401041204.18335.qmail@web50208.mail.yahoo.com> References: <20050401041204.18335.qmail@web50208.mail.yahoo.com> Message-ID: <200504011146.44549.faltet@carabos.com> I'm very much with the opinions of Scott. Just some remarks. A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure: > > __array_names__ (optional comma-separated names for record fields) > > I really like this idea. Although I agree with David M. Cooke that it > should be a tuple of names. Unless there is a use case I'm not > considering, it would be preferrable if the names were restricted to valid > Python identifiers. Ok. I was thinking on easing the life of C extension writers, but I agree that a tuple of names should be relatively easily dealed in C as well. However, as the __array_typestr__ would be a plain string, then an __array_names__ being a plain string would be consistent with that. Also, it would be worth to know how to express a record of different shaped fields. I mean, how to represent a record like: [array(Int32,shape=(2,3)), array(Float64,shape=(3,))] The possibilities are: __array_shapes__ = ((2,3),(3,)) __array_typestr__ = (i,d) Other possibility could be an extension of the current struct format: __array_typestr__ = "(2,3)i(3,)d" more on that later on. > The struct module has a portable set of typecodes. They call it > "standard", but it's the same thing. The struct module let's you specify > either standard or native. For instance, the typecode for "standard long" > ("=l") is always 4 bytes while a "native long" ("@l") is likely to be 4 or > 8 bytes depending on the platform. The __array_typestr__ codes should > require the "standard" sizes. There is a table at the bottom of the > documentation that goes into detail: > > http://docs.python.org/lib/module-struct.html I fully agree with Scott here. Struct typecodes are offering a way to approach the Python standards, and this is a good thing for many developers that knows nothing of array packages and its different typecodes. IMO, the set of portable set of typecodes in struct module should only be abandoned if they cannot fulfil all the requirements of Numeric3/numarray. But I'm pretty confident that they will eventually do. > The only problem with the struct module is that it's missing a few types... > (long double, PyObject, unicode, bit). Well, bit is not used either in Numeric/numarray and I think few people would complain on this (they can always pack bits into bytes). PyObject and unicode can be reduced to a sequence of bytes and some other metadata to the array protocol can be added to complement its meaning (say __array_str_encoding__ = "UTF-8" or similar). long double is the only type that should be added to struct typecodes, but convincing the Python crew to do that should be not difficult, I guess. > > I also think that rather than attach < or > to the start of the > > string it would be easier to have another protocol for endianness. > > Perhaps something like: > > > > __array_endian__ (optional Python integer with the value 1 in it). > > If it is not 1, then a byteswap must be necessary. > > A limitation of this approach is that it can't adequately represent > struct/record arrays where some fields are big endian and others are little > endian. Having a mix of different endianess data values in the same data record would be a bit ill-minded. In fact, numarray does not support this: a recarray should be all little or big endian. I think that '<' and '>' would be more than enough to represent this. > > Bool -- "b%d" % sizeof(bool) > > Signed Integer -- "i%d" % sizeof() > > Unsigned Integer -- "u%d" % sizeof() > > Float -- "f%d" % sizeof() > > Complex -- "c%d" % sizeof() > > Object -- "O%d" % sizeof(PyObject *) > > --- this would only be useful on shared memory > > String -- "S%d" % itemsize > > Unicode -- "U%d" % itemsize > > Void -- "V%d" % itemsize > > The above is a nice start at reinventing the struct module typecodes. If > you and Perry agree to it, that would be great. A few additions though: Again, I think it would be better to not get away from the struct typecodes. But if you end doing it, well, I would like to propose a couple of additions to the new protocol: 1.- Support shapes for record specification. I'm listing two possibilities: A) __array_typestr__ = "(2,3)i(3,)d" This would be an easy extension of the struct string type definition. B) __array_typestr__ = ("i4","f8") __array_shapes__ = ((2,3),(3,)) This is more '? la numarray'. 2.- Allow nested datatypes. Although numarray does not support this yet, I think it could be very advantageous to be able to express: [array(Int32,shape=(5,)),[array(Int16,shape=(2,)),array(Float32,shape=(3,4))]] i.e., the first field would be an array of ints with 6 elements, while the second field would be actually another record made of 2 fields: one array of short ints, and other array of simple precision floats. I'm not sure how exactly implement this, but, what about: A) __array_typestr__ = "(5,)i[(2,)h(3,4)f]" B) __array_typestr__ = ("i4",("i2","f8")) __array_shapes__ = ((5,),((2,),(3,4)) Because I'm suggesting to adhere the struct specification, I prefer option A), although I guess option B would be easier to use for developers (even for extension developers). > > So, what if we proposed for the Python core not something like > > Numeric3 (which would still exist in scipy.base and be everybody's > > favorite array :-) ), but a very minimal array object (scaled back > > even from Numeric) that followed the array protocol and had some > > C-API associated with it. > > > > This minimal array object would support 5 basic types ('bool', > > 'integer', 'float', 'complex', 'Object'). (Maybe a void type > > could be defined and a void "scalar" introduced (which would be > > the bytes object)). These types correspond to scalars already > > available in Python and so the whole 0-dim array Python scalar > > arguments could be ignored. > > I really like this idea. It could easily be implemented in C or Python > script. Since half it's purpose is for documentation, the Python script > implementation might make more sense. Yeah, I fully agree with this also. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From faltet at carabos.com Fri Apr 1 02:17:36 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Apr 1 02:17:36 2005 Subject: [Numpy-discussion] __array_typestr__ In-Reply-To: <424D14E9.70607@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> Message-ID: <200504011215.52914.faltet@carabos.com> A Divendres 01 Abril 2005 11:31, Travis Oliphant va escriure: > * I'm wondering about including multiple types in the typestr. On the > one hand we could describe complicated structures by packing all the > information into the typestr. On the other hand, it may be better if > we just use 'V8' to describe an 8-byte memory buffer with an additional > attribute that contains both the names and the typestr: > > __array_recinfo__ = (('real','f4'),('imag','f4')) > > or for a "rational type" > > __array_recinfo__ = (('numer','i4'),('denom','i4')) > > so that the detail of the typecode for a "record" type is handled by > another special method using tuples. On this level, we could add the > possibility of specifying a shape for a small array inside (just like > the record array of numarray does). Like: __array_recinfo__ = (('numer','i4', (3,4)),('denom','i4', (2,))) ? Also, this can be easily extended to nested types: __array_recinfo__ = (('a','i4',(3,4)),(('b','i4',(2,)),('c','f4',(10,2))) Well, this looks pretty good to me. It has nothing to do with struct format, but is much more usable, of course. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From cjw at sympatico.ca Fri Apr 1 04:57:57 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Apr 1 04:57:57 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> Message-ID: <424D4504.4030606@sympatico.ca> David M. Cooke wrote: >Francesc Altet writes: > > > >>A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: >> >> >>>My proposal: >>> >>>__array_data__ (optional object that exposes the PyBuffer protocol or a >>>sequence object, if not present, the object itself is used). >>>__array_shape__ (required tuple of int/longs that gives the shape of the >>>array) >>>__array_strides__ (optional provides how to step through the memory in >>>bytes (or bits if a bit-array), default is C-contiguous) >>>__array_typestr__ (optional struct-like string showing the type --- >>>optional endianness indicater + Numeric3 typechars, default is 'V') >>>__array_itemsize__ (required if above is 'S', 'U', or 'V') >>>__array_offset__ (optional offset to start of buffer, defaults to 0) >>> >>> >>Considering that heterogenous data is to be suported as well, and >>there is some tradition of assigning names to the different fields, I >>wonder if it would not be good to add something like: >> >>__array_names__ (optional comma-separated names for record fields) >> >> > >A sequence (list or tuple) of strings would be preferable. That >removes all worrying about using commas in the names. > > > As I understand it, record arrays can be heterogenous. If so, wouldn't it make sense for this to be a sequence of tuples? For example: [('Name', charStringType), ('Age', _nt.Int8), ...] Where _nt is defined by something like: import numarray.numerictypes as _nt Colin W. From cjw at sympatico.ca Fri Apr 1 05:49:53 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Apr 1 05:49:53 2005 Subject: [Numpy-discussion] __array_typestr__ In-Reply-To: <424D14E9.70607@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> Message-ID: <424D5136.8060703@sympatico.ca> Travis Oliphant wrote: > > For the most part, it seems the array protocol is easy to agree on. > The one difficulty is typestr. > > For what it's worth, here are my opinions on what has been said > regarding the typestr. > > * Endian-ness should be included in the typestr --- it is how the data > is viewed and an intrinsic part of the type as much as int, or float. In most cases, endian-ness is associated with the machine being used, rather than the data element. It seems to me that numarray's numeric types provides a good model, which may need enhancing for records, strings etc. numarray has: Numeric type objects: Bool Int8 Int16 Int32 Int64 UInt8 UInt16 UInt32 UInt64 Float32 Double64 Complex32 Complex64 Numeric type classes: NumericType BooleanType SignedType UnsignedType IntegralType SignedIntegralType UnsignedIntegralType FloatingType ComplexType > > * I like the fact that struct character codes are documented, but it > is hard to remember. This is the problem. numerictypes provides nmenonic names and, if one uses an editor with autocompletion, a prompt from the editor. For those interface to existing code, there could be a helper function: def toType(eltType= 'i'): => an instance of NumericType It should also be possible to derive the typeCode from the eltType, numarray doesn't seem to provide this. Colin W. From cjw at sympatico.ca Fri Apr 1 06:07:38 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Apr 1 06:07:38 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424C8D05.7030006@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> Message-ID: <424D5557.5010806@sympatico.ca> Travis Oliphant wrote: > > To all interested in the future of arrays... > > I'm still very committed to Numeric3 as I want to bring the numarray > and Numeric people together behind a single array object for > scientific computing. > Good. > But, I've been thinking about the array protocol and thinking that it > would be a good thing if this became universal. One of the ways to > make it universal is by having something that follows it in the Python > core. > > > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some C-API > associated with it. > I thought that your original Numeric3 proposal was in this direction - a simple multidimensional array class/type which could eventually replace Python's array module. In addition, and separately, there were to be a collection of ufuncs. Later, discussion seemed to drift from the basic Numeric3 towards SciPy. > > This minimal array object would support 5 basic types ('bool', > 'integer', 'float', 'complex', 'Object'). (Maybe a void type could > be defined and a void "scalar" introduced (which would be the bytes > object)). These types correspond to scalars already available in > Python and so the whole 0-dim array Python scalar arguments could be > ignored. Could this be subclassed so that provision could be made for Int8 (or even Int1)? How would an array of records be handled? > > Math could be done without ufuncs initially (people really needing > speed would use scipy.base anyway). But, more people in the Python > community would be able to use arrays and get used to them. And we > would have a reference array_protocol object so that extension writers > could write to it. It would be good if the user could write his/her ufunc in Python. > > > I would not try a project like this until after scipy_core is out, but > it's an interesting thing to think about. I mainly wanted feedback on > the basic concept. > The concept looks good. Regarding timing, it seems better to build the foundation before building the house. Colin W. > > An alternative would be to "add" multidimensionality to the array > object already part of Python, fix it's reallocating with an exposed > buffer problem, and add the array protocol. > > > > -Travis From oliphant at ee.byu.edu Fri Apr 1 12:10:00 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 12:10:00 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <371840ef050401104875650ddd@mail.gmail.com> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> Message-ID: <424DAA16.10007@ee.byu.edu> >>I'm still very committed to Numeric3 as I want to bring the numarray and >>Numeric people together behind a single array object for scientific >>computing. >> >> Notice that regardless of what I said about what goes into standard Python, something like Numeric3 will always exist for use by scientific users. It may just be a useful add on package like Numeric has always been. There is no way I'm going to abandon use of a more capable Numeric. >Right. I believe that, among all libraries related with numeric array, >eventually only one library in the Python core will survive no matter >how much advanced functions are available, because of the strong >compatibility with other packages. > > I don't think this is true. Things will survive based on utility. What we are trying to do with the Python core is define a standard protocol that is flexible enough to handle anybody's concept of an advanced array (in particular the advanced array that will be in scipy.base). >Totally agree. I doubt that Guido will accept a large and complex >library into the standard Python core. I think Numeric is already too >complex, and numarray is far more complex to be a standard lib in the >Python core. Numeric3 must shift its focus from better Numeric to >scale-downed Numeric. > > I disagree about "shifting focus." Personally, I'm not going to work on something like that until we have a single array package that fulfills the needs of all Numeric and most numarray users. I'm just pointing out that what goes in to the Python core should probably be a scaled down object with a souped-up "protocol" so that the array object in scipy.base can be used through the array protocol by any other package without worrying about having scipy_core at compile time. >For example, how many Python users care about masked arrays? How many >Python users want the advanced type from the Python core? I think the >advanced array type should in some extension lib, not in core array >lib. > Perhaps you do see my point of view. Not all Python users care about an advanced array object but nearly all technical (scientific and engineering users) will. We just need interoperability. >If we make clear our target ? becoming a standard library in the >Python core, we may have no problem in determining what functions >should be in the core array lib and what functions should be in >extension libraries using the core array type. > > >Today, the array type in the Python core is almost useless. >If Numeric3 offers just much faster performance on numeric types, many >Python users will start to use new array type in their applications. >Once it happens, we can create a bunch of extension libraries for more >advanced operations on the new array type. > > The "bunch of extension libraries" is already happening and is already in progress. I think we've overshot the mark for the Python core, however. No need to wait "til something happens" >With all my heart I hope that Numeric3 gears to this direction before > > >we get the tragedy to have Numeric4, Numeric5, and so on. > > I'm coming to see that what is most important for the Python core is "protocols". Then, there can be a "million" different array types that can all share each other's memory without hassle and much overhead. I'm still personally interested in a better Numeric, however, and so won't be abandoning the concept of Numeric3 (notice I now call it scipy.base --- not a change of focus just a change of name). I just wanted to encourage some discussion on the array protocol. -Travis From oliphant at ee.byu.edu Fri Apr 1 12:23:19 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 12:23:19 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424D5557.5010806@sympatico.ca> References: <424C8D05.7030006@ee.byu.edu> <424D5557.5010806@sympatico.ca> Message-ID: <424DAD00.1050203@ee.byu.edu> > I thought that your original Numeric3 proposal was in this direction - > a simple multidimensional array class/type which could > eventually replace Python's array module. In addition, and > separately, there were to be a collection of ufuncs. No, that's a misunderstanding. Original Numeric3 was never about "simplyifying." Because, we can't "simplify" and still support the uses that Numeric and numarray have enjoyed. I'm more interested in using something like Numeric and will always install it should it exist. I was iunterested in getting it into the Python core for standardization. I now believe that "universal" standardization should occur around a "protocol" and perhaps a simple implementation. I'm still interested in a more "local standardization" for numarray and Numeric users (not all Python users) which is the focus of scipy.base (used to call it Numeric3). In the process we are generating good ideas that can be used for "global standardization" among all Python users. But, I can't do it all. I have to keep focused on what I'm doing with the current Numeric arrayobject (and that has never been about "getting rid of functionality"). > > Later, discussion seemed to drift from the basic Numeric3 towards SciPy. The context of the problem as I see it intimately involves scipy and the collection of packages surrounding numarray. The small community we have built up was diverging in the creation of external packages. This is what troubled me most deeply. So, there is no Numeric3 separate from the larger issue of "a collection of standard scientific packages" that scipy has tried to be. That is why reference to scipy is made. I see no "drifting occurring" There is a separate issue of a good array module for Python. I now see the solution there as being more of a "good array protocol" for Python with a default very simple implementation that is improved by extension modules. > > Could this be subclassed so that provision could be made for Int8 (or > even Int1)? I suppose, but this is kind of missing the point, because Numeric3 will support those types. If you need a more advanced array you install scipy.base. > > How would an array of records be handled? By installing a more advanced array. > The concept looks good. Regarding timing, it seems better to build > the foundation before building the house. The problem with your analogy is that the "sprawling mansion in the suburbs is already built" (Numeric has been around for a long time). The question is what kind of housing to build for the city dwellers and what kind of transportation system do we establish so people can move back and forth easily. -Travis From sdhyok at gmail.com Fri Apr 1 12:59:07 2005 From: sdhyok at gmail.com (Daehyok Shin) Date: Fri Apr 1 12:59:07 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424DAA16.10007@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu> Message-ID: <371840ef05040112574b6a86bd@mail.gmail.com> On Apr 1, 2005 8:07 PM, Travis Oliphant wrote: snip > I disagree about "shifting focus." Personally, I'm not going to work on > something like that until we have a single array package that fulfills > the needs of all Numeric and most numarray users. I'm just pointing > out that what goes in to the Python core should probably be a scaled > down object with a souped-up "protocol" so that the array object in > scipy.base can be used through the array protocol by any other package > without worrying about having scipy_core at compile time. Would you tell me what exactly you means by "protocol"? Do you mean a standard defintion of a series of "interfaces" for array type in Python? -- Daehyok Shin Geography Department University of North Carolina-Chapel Hill USA From oliphant at ee.byu.edu Fri Apr 1 15:14:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 15:14:07 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <371840ef05040112574b6a86bd@mail.gmail.com> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu> <371840ef05040112574b6a86bd@mail.gmail.com> Message-ID: <424DD56E.6070801@ee.byu.edu> Daehyok Shin wrote: >On Apr 1, 2005 8:07 PM, Travis Oliphant wrote: > >snip > > > >>I disagree about "shifting focus." Personally, I'm not going to work on >>something like that until we have a single array package that fulfills >>the needs of all Numeric and most numarray users. I'm just pointing >>out that what goes in to the Python core should probably be a scaled >>down object with a souped-up "protocol" so that the array object in >>scipy.base can be used through the array protocol by any other package >>without worrying about having scipy_core at compile time. >> >> > >Would you tell me what exactly you means by "protocol"? >Do you mean a standard defintion of a series of "interfaces" for array >type in Python? > > Yes, pretty much. I would even go so far as to say a set of hooks in the typeobject (like the sequence, mapping, and buffer protocols). -Travis From steve at shrogers.com Sat Apr 2 06:50:58 2005 From: steve at shrogers.com (Steven H. Rogers) Date: Sat Apr 2 06:50:58 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424DAA16.10007@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu> Message-ID: <424EB08F.90909@shrogers.com> First, thanks for doing this Travis. Travis Oliphant wrote: > > I'm coming to see that what is most important for the Python core is > "protocols". Then, there can be a "million" different array types that > can all share each other's memory without hassle and much overhead. > I'm still personally interested in a better Numeric, however, and so > won't be abandoning the concept of Numeric3 (notice I now call it > scipy.base --- not a change of focus just a change of name). I just > wanted to encourage some discussion on the array protocol. > Your array protocol protocol idea sounds good. It should not only make it easier to interoperate with other Python packages, but foreign entities like APL/J, Matlab, and LabVIEW. Regards, Steve -- Steven H. Rogers, Ph.D., steve at shrogers.com Weblog: http://shrogers.com/weblog "Reach low orbit and you're half way to anywhere in the Solar System." -- Robert A. Heinlein From oliphant at ee.byu.edu Sat Apr 2 21:30:03 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Apr 2 21:30:03 2005 Subject: [Numpy-discussion] scipy.base (Numeric3) now has math Message-ID: <424F7F06.4090200@ee.byu.edu> I've updated scipy.base (Numeric3) so math is now supported (uses the old ufunc apparatus with new added types support). There is still some work to be done so this is still very alpha (but at least math operations work): - update the ufunc apparatus to use buffers to avoid copying an entire array just for type casting (and to support unaligned and non byteswapped arrays) - update the way error handling is done. - update the coercion strategy like numarray does - fix all the bugs. I've also fixed things so Numeric extension modules should compile --- Please report warnings and bugs with this as well. Thanks for all your help, -Travis From oliphant at ee.byu.edu Sun Apr 3 01:06:16 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 01:06:16 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <200504011215.52914.faltet@carabos.com> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> Message-ID: <424FB19B.4060800@ee.byu.edu> Hello all, I've updated the numeric web site and given special prominence to the array interface which I believe should be pushed. Numeric 24.0 will support it as will scipy.base (Numeric3). I hope that numarray will also support it in an upcoming release. Please read through the interface and feel free to comment. However, unless there is a glaring problem, I'm more interested that you feel free to start using the interface then that we debate it further. Scott has expressed interest in implementing a very basic Python-only implementation of an object exporting the interface. I suggest he and anyone else interested look at numarray for a starting point for a Python implementation, and Numeric for a C implementation. -Travis From mdehoon at ims.u-tokyo.ac.jp Sun Apr 3 01:24:07 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 3 01:24:07 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <424FB19B.4060800@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> Message-ID: <424FB72F.4020201@ims.u-tokyo.ac.jp> There are two questions that I have about the array interface: 1) To what degree will the new array interface look different to users of the existing Numerical Python? If I were to install the new array interface on the computer of a current Numerical Python user and I didn't tell them, would they notice a difference? 2) To what degree is the new array interface compatible with Numerical Python for the purpose of C extension modules? Do C extension modules need to be modified in order to use the new array interface? --Michiel. Travis Oliphant wrote: > > Hello all, > > I've updated the numeric web site and given special prominence to the > array interface which I believe should be pushed. Numeric 24.0 will > support it as will scipy.base (Numeric3). I hope that numarray will > also support it in an upcoming release. > > Please read through the interface and feel free to comment. However, > unless there is a glaring problem, I'm more interested that you feel > free to start using the interface then that we debate it further. > > Scott has expressed interest in implementing a very basic Python-only > implementation of an object exporting the interface. I suggest he and > anyone else interested look at numarray for a starting point for a > Python implementation, and Numeric for a C implementation. > > -Travis > > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Sun Apr 3 01:41:09 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 01:41:09 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <424FB72F.4020201@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> Message-ID: <424FB9FA.1090109@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > There are two questions that I have about the array interface: > > 1) To what degree will the new array interface look different to users > of the existing Numerical Python? If I were to install the new array > interface on the computer of a current Numerical Python user and I > didn't tell them, would they notice a difference? Nothing will look different. For now there is nothing to "install" so the array interface is just something to expect from other objects. The only thing that would be different is in Numeric 24.0 (if a users were to call array() and supported the array interface then Numeric could return an array (without copying data). Older versions of Numeric won't benefit from the interface but won't be harmed either. > 2) To what degree is the new array interface compatible with Numerical > Python for the purpose of C extension modules? Do C extension modules > need to be modified in order to use the new array interface? It is completely compatible. C-extensions don't need to be modified at all to make use of the interface (of course they should be re-compiled if using Numeric 24.0). Only two things will be modified in Numeric 24.0. 1) PyArray_FromObject and friends will be expanded so that if an object exposes the array interface the right thing will be done to use it's memory. 2) Attributes will be added so that Numeric arrays expose the array interface so other objects can use their memory intelligently -Travis From cjw at sympatico.ca Sun Apr 3 05:23:12 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Apr 3 05:23:12 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem Message-ID: <424FE002.6010800@sympatico.ca> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install running install running build running config error: The .NET Framework SDK needs to be installed before building extensions for Python. Is there any chance that a Windows binary could be made available for testing? Colin W. From mdehoon at ims.u-tokyo.ac.jp Sun Apr 3 05:35:05 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 3 05:35:05 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <424FE002.6010800@sympatico.ca> References: <424FE002.6010800@sympatico.ca> Message-ID: <424FE3D8.7040200@ims.u-tokyo.ac.jp> You can use Cygwin's MinGW compiler by adding --compiler=mingw after the setup command. --Michiel. Colin J. Williams wrote: > C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install > running install > running build > running config > error: The .NET Framework SDK needs to be installed before building > extensions for Python. > > Is there any chance that a Windows binary could be made available for > testing? > > Colin W. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sun Apr 3 05:46:04 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 3 05:46:04 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <424FE3D8.7040200@ims.u-tokyo.ac.jp> References: <424FE002.6010800@sympatico.ca> <424FE3D8.7040200@ims.u-tokyo.ac.jp> Message-ID: <424FE64F.7030706@ims.u-tokyo.ac.jp> Sorry, that should be --compiler=mingw32. Michiel Jan Laurens de Hoon wrote: > You can use Cygwin's MinGW compiler by adding --compiler=mingw after the > setup command. > > --Michiel. > > Colin J. Williams wrote: > >> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install >> running install >> running build >> running config >> error: The .NET Framework SDK needs to be installed before building >> extensions for Python. >> >> Is there any chance that a Windows binary could be made available for >> testing? >> >> Colin W. >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real users. >> Discover which products truly live up to the hype. Start reading now. >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/numpy-discussion >> >> > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From gruben at bigpond.net.au Sun Apr 3 06:32:09 2005 From: gruben at bigpond.net.au (Gary Ruben) Date: Sun Apr 3 06:32:09 2005 Subject: [Numpy-discussion] array slicing question Message-ID: <424FF03A.4060107@bigpond.net.au> This may be relevant to Numeric 3, but is possibly just a general question about array slicing which will either reveal a deficiency in specifying slices or in my knowledge of slicing with numpy. A while ago I was trying to reimplement some Matlab image processing code in Numeric and revealed a deficiency in the way slices are defined. Suppose I have an n x m array and want to slice off the first and last p rows and columns where p can range from 0 to some number. Matlab provides a clean way of doing this, but in numpy it's a bit of a mess. You might think you could do >>> p=1 >>> b = a[p:-p] but if p=0, this fails. My final solution involved getting the array shape and explicitly calculating start and stop columns, but is there a better way? Gary R. From oliphant at ee.byu.edu Sun Apr 3 08:36:35 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 08:36:35 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> Message-ID: <42500D03.3030809@ee.byu.edu> I don't know if you have followed the array interface discussion. It is defined at http://numeric.scipy.org I have implemented consumer and exporter interfaces for Numeric and an exporter interface for numarray. The consumer interface needs a little help but shouldn't take too long for someone who understands numarray better. Now Numeric arrays can share data with numarray (no data copy). scipy.base arrays will also implement the array interface. I think the array interface is a good direction to go. -Travis From konrad.hinsen at laposte.net Sun Apr 3 13:03:19 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Sun Apr 3 13:03:19 2005 Subject: [Numpy-discussion] array slicing question In-Reply-To: <424FF03A.4060107@bigpond.net.au> References: <424FF03A.4060107@bigpond.net.au> Message-ID: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net> On 03.04.2005, at 15:31, Gary Ruben wrote: > You might think you could do > >>> p=1 > >>> b = a[p:-p] > > but if p=0, this fails. b = a[p:len(a)-p] works even for p=0. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From oliphant at ee.byu.edu Sun Apr 3 21:21:15 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 21:21:15 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <20050403165914.GC10730@idi.ntnu.no> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> Message-ID: <4250C0A4.9070707@ee.byu.edu> Magnus Lie Hetland wrote: >Travis Oliphant : > > >>I don't know if you have followed the array interface discussion. It >>is defined at http://numeric.scipy.org >> >> > >This very, very good! The numeric future of Python is looking very >bright, IMO :) > >Some tiny points: > > - Shouldn't the regexp for __array_typestr__ be > '[<>]?[tbiufcOSUV][0-9]+'? > > Probably. Since, I guess you can only have one of < or > . Thanks.. > - What are the semantics when __array_typestr__ isn't V[0-9]+ and > __array_descr__ is set? Is __array_typestr__ ignored? Or... What > would it be used for? > > I would say that the __array_descr__ always gives more information but not every array implementation will support looking at it. For example, current Numeric (24.0 in CVS) ignores __array_descr__ and just looks at the typestr (and doesn't support 'V'). So, I suspect that another array package that knows this may choose something else besides 'V' if it really wants Numeric to still understand it. Suppose you have a complex short int array with __array_descr__ = 'V8 > - Does the description of __array_data__ mean that the discussed > bytes type is no longer needed? (If we can use buffers, that > sounds very good to me.) > > Bytes is still needed because the buffer object is not very good and we need a good buffer object in Python for lots of other reasons. It would be very useful, for example to be able to allocate memory using the Python bytes object. But, it does mean less pressure to get it to work. > - Why the parentheses around "buffer protocol-satisfying object" in > the description of __array_mask__? And why must it be 'b1'? What > if I happen to have mask data from a non-array-protocol source, > which happens to be, say, b8 (not unreasonable, I think)? Wouldn't > it be good to allow any size of these, and just use zero/non-zero > as the criterion? Some of the point of this protocol is to avoid > copying and using the original data, after all...? (Same goes for > the requirement that it be C-contiguous. I guess I'm basically > saying that perhaps __array_mask__ should be an array itself. Or, > at least, that it could be *allowed* to be...) > > I added the mask late last night. It is probably the least thought out portion. Everything else has been through the ringer a couple more times. My whole thinking is that I just didn't want to explode the protocol with another special name for the mask type. But, saying that the mask object itself can support the array interface doesn't do that, so I think that is a good call. Last night, using the numarray exporter interface and the Numeric consumer interface I was able to share data between a Numeric array and numarray array with no copying of the data buffers. It was very nice. -Travis From oliphant at ee.byu.edu Sun Apr 3 21:29:12 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 21:29:12 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <4250C0A4.9070707@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> Message-ID: <4250C276.5090300@ee.byu.edu> >> > Probably. Since, I guess you can only have one of < or > . Thanks.. > >> - What are the semantics when __array_typestr__ isn't V[0-9]+ and >> __array_descr__ is set? Is __array_typestr__ ignored? Or... What >> would it be used for? >> >> > I would say that the __array_descr__ always gives more information but > not every array implementation will support looking at it. For > example, current Numeric (24.0 in CVS) ignores __array_descr__ and > just looks at the typestr (and doesn't support 'V'). So, I suspect > that another array package that knows this may choose something else > besides 'V' if it really wants Numeric to still understand it. > Suppose you have a complex short int array with > > __array_descr__ = 'V8 Let me finish this example: Suppose you have a complex short int array with __array_descr__ = [('real','i2'),('imag','i2')] you could describe this as __array_typestr__ = 'V4' or think of it as a 4 byte integer if you want to make sure that another array package that may not support void pointers can still manipulate the data, and so the creator of the complex short int array may decide that __array_typestr__ = 'i4' is the right thing to do for packages that ignore the __array_descr__ attribute. -Travis From mdehoon at ims.u-tokyo.ac.jp Mon Apr 4 01:17:15 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Apr 4 01:17:15 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <424FB9FA.1090109@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> Message-ID: <4250F8E5.9020701@ims.u-tokyo.ac.jp> Travis Oliphant wrote: >> 1) To what degree will the new array interface look different to users >> of the existing Numerical Python? > > Nothing will look different. For now there is nothing to "install" so > the array interface is just something to expect from other objects. > The only thing that would be different is in Numeric 24.0 (if a users > were to call array() and supported the array > interface then Numeric could return an array (without copying data). > Older versions of Numeric won't benefit from the interface but won't be > harmed either. Very nice. Thanks, Travis. I'm not sure what you mean by "the array interface could become part of the Python standard as early as Python 2.5", since there is nothing to install. Or does this mean that Python's array will conform to the array interface? Some comments on the array interface: 1) The "__array_shape__" method is identical to the existing "shape" method in Numerical Python and numarray (except that "shape" does a little bit better checking, but it can be added easily to "__array_shape__"). To avoid code duplication, it might be better to keep that method. (and rename the other methods for consistency, if desired). 2) The __array_datalen__ is introduced to get around the 32-bit int limitation of len(). Another option is to fix len() in Python itself, so that it can return integers larger than 32 bits. So we can avoid adding a new method. 3) Where do default values come from? Is it the responsability of the extension module writer to find out if the array module implements e.g. __array_strides__, and substitute the default values if it doesn't? If so, I have a slight preference to make all methods required, since it's not a big effort to return the defaults, and there will be more extension modules than array packages (or so I hope). Whereas the array interface certainly helps extension writers to create an extension module that works with all array implementations, it also enables and perhaps encourages the creation of different array modules, while our original goal was to create a single array module that satisfies the needs of both Numerical Python and numarray users. I still think such a solution would be preferable. Inconsistencies other than the array interface (e.g. one implements argmax(x) while another implements x.argmax()) may mean that an extension module can work with one array implementation but not with another, even though they both conform to the array interface. We may end up with several array packages (we already have Numerical Python, numarray, and scipy), and extension modules that work with one package and not with another. So in a sense, the array interface is letting the genie out of the bottle. But maybe such a single array package is not attainable given the different needs of the different communities. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From magnus at hetland.org Mon Apr 4 02:05:28 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Apr 4 02:05:28 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <4250C0A4.9070707@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> Message-ID: <20050404090356.GB21527@idi.ntnu.no> Travis Oliphant : > [snip] > Last night, using the numarray exporter interface and the Numeric > consumer interface I was able to share data between a Numeric array and > numarray array with no copying of the data buffers. It was very nice. Wow -- a historic moment :) Now, if we can only get the stdlib's array module to support this protocol (and sprout some more dimensions), as you mentioned... That would really be cool. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Mon Apr 4 02:15:10 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Apr 4 02:15:10 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <4250C276.5090300@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> <4250C276.5090300@ee.byu.edu> Message-ID: <20050404091311.GC21527@idi.ntnu.no> Travis Oliphant : > [snip] > > Let me finish this example: > > Suppose you have a complex short int array with > > __array_descr__ = [('real','i2'),('imag','i2')] > > you could describe this as > > __array_typestr__ = 'V4' Sure -- I can see how using 'V' makes sense... You're just telling the host program how many bytes you've got, and that's it. That makes sense to me. What I wondered about was what happened when you use a more specific (and conflicting) type for the typestr... > or think of it as a 4 byte integer if you want to make sure that another > array package that may not support void pointers can still manipulate > the data, and so the creator of the complex short int array may decide that > > __array_typestr__ = 'i4' This is basically what I'm wondering about. It would make sense (to me) to say that the data type was 'V4', because that's simply less specific, in a way. But saying 'i4' is just as specific as the complex example, above -- but it means something else! You're basically giving the program permission to interpret a four-byte complex number as a four-byte integer, aren't you? Sounds almost like a recipe for disaster to me :} On the other hand -- there is no complex integer type in the interface, and using 'c4' probably would be completely wrong as well. I would almost be tempted to say that if __array_descr__ is in use, __array_typestr__ *has* to use the 'V' type. (Or, one could make some more complicated rules, perhaps, in order to allow other types.) As for not supporting the 'V' type -- would that really be considered a conforming implementation? According to the spec, "Objects wishing to support an N-dimensional array in application code should look for these attributes and use the information provided appropriately". The typestr is required, so... Perhaps the spec should be explicit about the shoulds/musts/mays of the specific typecodes? What must be supported, what may be supported etc.? Or perhaps that doesn't make sense? It just seems almost too bad that one package would have to know what another package supports in order to formulate its own typestr... It sort of throws part of the interoperability out the window. > is the right thing to do for packages that ignore the __array_descr__ > attribute. > > -Travis -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Mon Apr 4 02:25:17 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Apr 4 02:25:17 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> Message-ID: <20050404092421.GD21527@idi.ntnu.no> Michiel Jan Laurens de Hoon : > [snip] > 1) The "__array_shape__" method is identical to the existing "shape" method > in Numerical Python and numarray (except that "shape" does a little bit > better checking, but it can be added easily to "__array_shape__"). To avoid > code duplication, it might be better to keep that method. (and rename the > other methods for consistency, if desired). Why not just use 'shape' as an alias for '__array_shape__' (or vice versa)? > 2) The __array_datalen__ is introduced to get around the 32-bit int > limitation of len(). Another option is to fix len() in Python > itself, so that it can return integers larger than 32 bits. So we > can avoid adding a new method. That would bee good, IMO. But how realistic is it? (I have no idea -- this is not a rhetorical question :) > 3) Where do default values come from? Is it the responsability of the > extension module writer to find out if the array module implements e.g. > __array_strides__, and substitute the default values if it doesn't? If the support of these attributes is optional, that would have to be the case. > If so, I have a slight preference to make all methods required, > since it's not a big effort to return the defaults, and there will > be more extension modules than array packages (or so I hope). But isn't the point that you should be able to export other things (such as images or sounds or what-have-you) *as* arrays? As for implementing the defaults: How about having some utility functions (or a wrapper object or whatever) that does just this -- so neither array nor client code need think about it? This could, perhaps, be put in the stdlib array module or something... > Whereas the array interface certainly helps extension writers to > create an extension module that works with all array > implementations, it also enables and perhaps encourages the creation > of different array modules, while our original goal was to create a > single array module that satisfies the needs of both Numerical > Python and numarray users. I still think such a solution would be > preferable. I agree. But what I think would be cool if such a standardized package could take any object conforming to this protocol and use it (possibly as the argument to the array() constructor) -- with all the ufuncs and operators it has. Because then I could implement specialized arrays where the specialized behaviour lies just in the data itself, not the behaviour. For example, I might want to create a thin array wrapper around a memory-mapped, compressed video file, and treat it as a three-dimensional array of rgb triples... (And so forth.) > Inconsistencies other than the array interface (e.g. one implements > argmax(x) while another implements x.argmax()) may mean that an > extension module can work with one array implementation but not with > another, This does *not* sound like a good thing -- I agree. Certainly not what I would hope this protocol is used for. > even though they both conform to the array interface. We may end up > with several array packages (we already have Numerical Python, > numarray, and scipy), and extension modules that work with one > package and not with another. So in a sense, the array interface is > letting the genie out of the bottle. Well, perhaps -- but the current APIs of e.g., Numeric or numarray could be used in the same way (i.e., writing your own array implementations with the same interface). As (I think) Travis has said, there is still a goal (somewhat separate from the protocol) of getting one standard heavy-duty numerical array package. I think that would be very beneficial. The point (as I see it) is just to make it easier for various array implementations (i.e., the data, not the ufuncs/operators etc.) to interoperate with it. > But maybe such a single array package is not attainable given the > different needs of the different communities. I would certainly hope it is. > --Michiel. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From gruben at bigpond.net.au Mon Apr 4 05:14:09 2005 From: gruben at bigpond.net.au (Gary Ruben) Date: Mon Apr 4 05:14:09 2005 Subject: [Numpy-discussion] array slicing question In-Reply-To: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net> References: <424FF03A.4060107@bigpond.net.au> <9d9c98344e25f20ac8509e76f3917ec6@laposte.net> Message-ID: <42512F57.2050007@bigpond.net.au> Thanks Konrad, Sorry, my example was too simple. The actual example representing an image should have been 2-D and not necessarily square. Therefore I used shape instead of len and it seemed messy doing it this way. Gary konrad.hinsen at laposte.net wrote: > On 03.04.2005, at 15:31, Gary Ruben wrote: > >> You might think you could do >> >>> p=1 >> >>> b = a[p:-p] >> >> but if p=0, this fails. > > > b = a[p:len(a)-p] works even for p=0. > > Konrad. > -- > ------------------------------------------------------------------------ > ------- > Konrad Hinsen > Laboratoire Leon Brillouin, CEA Saclay, > 91191 Gif-sur-Yvette Cedex, France > Tel.: +33-1 69 08 79 25 > Fax: +33-1 69 08 82 61 > E-Mail: khinsen at cea.fr > ------------------------------------------------------------------------ > ------- > > From oliphant at ee.byu.edu Mon Apr 4 12:16:09 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 4 12:16:09 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> Message-ID: <4251920B.6060708@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >>> 1) To what degree will the new array interface look different to >>> users of the existing Numerical Python? >> >> >> Nothing will look different. For now there is nothing to "install" >> so the array interface is just something to expect from other >> objects. The only thing that would be different is in Numeric 24.0 >> (if a users were to call array() and supported the >> array interface then Numeric could return an array (without copying >> data). Older versions of Numeric won't benefit from the interface but >> won't be harmed either. > > > Very nice. Thanks, Travis. > I'm not sure what you mean by "the array interface could become part > of the Python standard as early as Python 2.5", since there is nothing > to install. Or does this mean that Python's array will conform to the > array interface? The latter is what I mean... I think it is important to have something in Python itself that "conforms to the interface." I wonder if it would also be nice to have some protocol slots in the object type so that extension writers can avoid converting some objects. There is also the possibility that a very simple N-d array type could be included in Python 2.5 that conforms to the interface, if somebody wants to champion that. I think it is important to realize what the array interface is trying to accomplish. From my perspective, I still think it is better for the scientific community to build off of a single array object that is "best of breed." The purpose of the array interface is to allow us scientific users to share information with other Python extension writers who may be wary to require scipy.base for their users but who really should be able to interoperate with scipy.base arrays. I'm thinking of extensions like wxPython, PIL, and so forth. There are also lots of uses for arrays that don't necessarily need the complexity of the scipy.base array (or uses that need even more types). At some point we may be able to accomodate dynamic type additions to the scipy.base array. But, right now it requires enough work that others may want to design their own simple arrays. It's very useful if all such arrays could speak together with a common basic language. The fact that numarray and Numeric arrays can talk to each other more seamlessly was not the main goal of the array interface but it is a nice side benefit. I'd still like to see the scientific community use a single array. But, others may not see it that way. The array interface lets us share more easily. > > Some comments on the array interface: > > 1) The "__array_shape__" method is identical to the existing "shape" > method in Numerical Python and numarray (except that "shape" does a > little bit better checking, but it can be added easily to > "__array_shape__"). To avoid code duplication, it might be better to > keep that method. (and rename the other methods for consistency, if > desired). There is no code duplication. In these cases it is just another name for .shape. What "better checking" are you referring to? > > 2) The __array_datalen__ is introduced to get around the 32-bit int > limitation of len(). Another option is to fix len() in Python itself, > so that it can return integers larger than 32 bits. So we can avoid > adding a new method. Python len() will never return a 64-bit number on a 32-bit platform. > > 3) Where do default values come from? Is it the responsability of the > extension module writer to find out if the array module implements > e.g. __array_strides__, and substitute the default values if it > doesn't? If so, I have a slight preference to make all methods > required, since it's not a big effort to return the defaults, and > there will be more extension modules than array packages (or so I hope). Optional attributes let modules that care talk to each other on a "higher level" without creating noise for simpler extensions. Both the consumer and exporter have to use it to matter. The defaults are just clarifying what is being assumed if it isn't there. > > Whereas the array interface certainly helps extension writers to > create an extension module that works with all array implementations, > it also enables and perhaps encourages the creation of different array > modules, while our original goal was to create a single array module > that satisfies the needs of both Numerical Python and numarray users. > I still think such a solution would be preferable. I agree with you. I would like a single array module for scientific users. But, satisfying everybody is probably impossible with a single array object. Yes, there could be a proliferation of array objects but sometimes we need multiple array objects to learn from each other. It's nice to have actual code that implements some idea rather than just words in a mailing list. The interface allows us to talk to each other while we learn from each other's actual working implementations. In a way this is like the old argument between the 1920-era communists and the free-marketers. The communists say that we should have only one company that produces some product because having multiple companies is "wasteful" of resources, while the free-marketers point out that satisfying consumers is tricky business, and there is not only "one right way to do it." Therefore, having multiple companies each trying to satisfy consumers actually creates wealth as new and better ideas are tried by the different companies. The successful ideas are emulated by the rest. In mature markets there tend to be a reduction in the number of producers while in developing markets there are all kinds of companies producing basically the same thing. Of course software creates it's own issues that aren't addressed by that simple analogy, but I think it's been shown repeatedly that good interfaces (http, smtp anyone?) create a lot of utility. > Inconsistencies other than the array interface (e.g. one implements > argmax(x) while another implements x.argmax()) may mean that an > extension module can work with one array implementation but not with > another, even though they both conform to the array interface. We may > end up with several array packages (we already have Numerical Python, > numarray, and scipy), and extension modules that work with one package > and not with another. So in a sense, the array interface is letting > the genie out of the bottle. I think this genie is out of the bottle already. We need to try and get our wishes from it now. -Travis From xscottg at yahoo.com Mon Apr 4 19:09:30 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 4 19:09:30 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: 6667 Message-ID: <20050404233322.61350.qmail@web50208.mail.yahoo.com> --- Michiel Jan Laurens de Hoon wrote: > > I'm not sure what you mean by "the array interface could become > part of the Python standard as early as Python 2.5", since there > is nothing to install. Or does this mean that Python's array will > conform to the array interface? > It would be nice to have the Python array module support the protocol for the 1-Dimensional arrays that it implements. It would also be nice to add a *simple* ndarray object in the core that supports multi-dimensional arrays. I think breaking backward compatibility of the existing Python array module to support multiple dimensions would be a mistake and unlikely to get accepted. A PEP would likely be required to make the changes to the array module, and also to add an ndarray module would likely document the interface. In that regard, it could "make it into the core" for Python 2.5. But you're right that external packages could support this interface today. There is nothing to install... > > 1) The "__array_shape__" method is identical to the existing "shape" > method in Numerical Python and numarray (except that "shape" does a > little bit better checking, but it can be added easily > to "__array_shape__"). To avoid code duplication, it might be better > to keep that method. (and rename the other methods for consistency, > if desired). > The intent is that all array packages would have the required/optional protocol attributes. Of course at a higher level, this information will probably be presented to the users, but they might choose a different mechanism. So while A.__array_shape__ always returns a tuple of longs, A.shape is free to return a ShapeObject or be an assignable attribute that changes the shape of the object. With the property mechanism, there is no need to store duplicated data (__array_shape__ can be a property method that returns a dynamically generated tuple). Separating the low level description of the array data in memory from the high level interface that particular packages like scipy.base or numarray present to their users is a good thing. > > 3) Where do default values come from? Is it the responsability of the > extension module writer to find out if the array module implements e.g. > __array_strides__, and substitute the default values if it doesn't? If > so, I have a slight preference to make all methods required, since it's > not a big effort to return the defaults, and there will be more extension > modules than array packages (or so I hope). > If we can get a *simple* package into the core, in addition to implementing an ndarray object, this module could have helper functions that do this sort of thing. For instance: def get_strides(A): if hasattr(A, "__array_strides__"): return A.__array_strides__ shape = A.__array_shape__ size = get_itemsize(A) for i in range(len(shape)-1, -1, -1): strides.append(size) size *= shape[i] return tuple(strides) def get_itemsize(A): typestr = A.__array_typestr__ # skip the endian if typestr[0] in '<>': typestr = typestr[1:] # skip the char code typestr = typestr[1:] return long(typestr) def is_contiguous(A): # etc.... Those are probably buggy and need work, but you get the idea... A C implementation of the above would be easy to do and useful, and it could be done inline in a single include file (no linking headaches). Cheers, -Scott From xscottg at yahoo.com Mon Apr 4 19:09:34 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 4 19:09:34 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: 6667 Message-ID: <20050404233447.26327.qmail@web50204.mail.yahoo.com> --- Magnus Lie Hetland wrote: > > I would almost be tempted to say that if __array_descr__ is in use, > __array_typestr__ *has* to use the 'V' type. (Or, one could make some > more complicated rules, perhaps, in order to allow other types.) > Yup, having multiple ways to spell the same information will likely cause problems. Wouldn't be bad for the protocol to say "thou shalt use the specfic typestr when possible". Or to say that the __array_descr__ is only for 'V' typestrs. > > As for not supporting the 'V' type -- would that really be considered > a conforming implementation? According to the spec, "Objects wishing > to support an N-dimensional array in application code should look for > these attributes and use the information provided appropriately". The > typestr is required, so... > I think the intent is that libraries like wxPython or PIL can recognize data that they *want* to work with. They can raise an exception when passed anything that is more complicated than they're willing to deal with. I think many packages will simply punt when they see a 'V' typestr and not look at the more complicated description at all. Nothing wrong with that... The packages that produce more complicated data structures have a way to express it and pass it to the packages that are capable of consuming it. Easy things are easy, and hard things are possible. > > Perhaps the spec should be explicit about the shoulds/musts/mays of > the specific typecodes? What must be supported, what may be supported > etc.? Or perhaps that doesn't make sense? It just seems almost too bad > that one package would have to know what another package supports in > order to formulate its own typestr... It sort of throws part of the > interoperability out the window. > Being very precise in the language describing the protocol is probably a good thing, but I don't see anything that requires packages to formulate their typestr's differently. The little bit of ambiguity that is in the __array_typestr__ and __array_descr__ attributes can be easily clarified. Cheers, -Scott From xscottg at yahoo.com Mon Apr 4 19:09:38 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 4 19:09:38 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404092421.GD21527@idi.ntnu.no> Message-ID: <20050404233620.70070.qmail@web50209.mail.yahoo.com> --- Magnus Lie Hetland wrote: > > Why not just use 'shape' as an alias for '__array_shape__' (or vice > versa)? > The protocol just describes the layout and format of the data in memory. As such, most users won't use it directly just as most users don't call obj.__add__ directly... If an array implementation has a .shape attribute, it can be whatever the implementor wants. Perhaps it's assignable. Maybe it's a method that returns a ShapeObject with methods and attributes of it's own. Features like these are the things that make the high level array packages like Numeric and Numarray enjoyable to use. The low level __array_*metadata__ interface should be simple and precisely defined and just for data interchange. > > > 3) Where do default values come from? Is it the responsability of the > > extension module writer to find out if the array module implements e.g. > > __array_strides__, and substitute the default values if it doesn't? > > If the support of these attributes is optional, that would have to be > the case. > > > As for implementing the defaults: How about having some utility > functions (or a wrapper object or whatever) that does just this -- so > neither array nor client code need think about it? This could, > perhaps, be put in the stdlib array module or something... > There will be a simple Python module or C include file for such things. Hopefully it will eventually be included in the Python standard distribution, but even if that doesn't happen, it will be easier than requiring and linking against the Numeric/Numarray/scipy.base libraries directly. > > But what I think would be cool if such a standardized package could > take any object conforming to this protocol and use it (possibly as > the argument to the array() constructor) -- with all the ufuncs and > operators it has. Because then I could implement specialized arrays > where the specialized behaviour lies just in the data itself, not the > behaviour. For example, I might want to create a thin array wrapper > around a memory-mapped, compressed video file, and treat it as a > three-dimensional array of rgb triples... (And so forth.) > If you want the ufuncs, you probably want one of the full featured library packages like scipy.base or numarray. It looks like Travis is able to promote any "array protocol object" to a full blown scipy.base.array already. > > > Inconsistencies other than the array interface (e.g. one implements > > argmax(x) while another implements x.argmax()) may mean that an > > extension module can work with one array implementation but not with > > another, > > This does *not* sound like a good thing -- I agree. Certainly not what > I would hope this protocol is used for. > Things like argmax(x) are not part of this protocol. The high level array packages and libraries will have all sorts of crazy and useful features. The protocol only describes the layout and format of the data. It enables higher level packages to work seemlessly with all the different array objects. That said, this protocol would allow a version argmax(x) to be written in such a way as to handle *any* array object. Cheers, -Scott From mdehoon at ims.u-tokyo.ac.jp Mon Apr 4 19:13:33 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Apr 4 19:13:33 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404092421.GD21527@idi.ntnu.no> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> Message-ID: <4251F40C.6000402@ims.u-tokyo.ac.jp> Magnus Lie Hetland wrote: > Michiel Jan Laurens de Hoon : >>2) The __array_datalen__ is introduced to get around the 32-bit int >>limitation of len(). Another option is to fix len() in Python >>itself, so that it can return integers larger than 32 bits. So we >>can avoid adding a new method. > > > That would bee good, IMO. But how realistic is it? (I have no idea -- > this is not a rhetorical question :) Actually, why is __array_datalen__ needed at all? Can't it be calculated trivially from __array_shape__? --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Mon Apr 4 19:56:23 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Apr 4 19:56:23 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4251920B.6060708@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu> Message-ID: <4251F384.7080506@ims.u-tokyo.ac.jp> Travis Oliphant wrote: >> Some comments on the array interface: >> >> 1) The "__array_shape__" method is identical to the existing "shape" >> method in Numerical Python and numarray (except that "shape" does a >> little bit better checking, but it can be added easily to >> "__array_shape__"). To avoid code duplication, it might be better to >> keep that method. (and rename the other methods for consistency, if >> desired). > > > > There is no code duplication. In these cases it is just another name > for .shape. What "better checking" are you referring to? The method __array_shape__ is if (strcmp(name, "__array_shape__") == 0) { PyObject *res; int i; res = PyTuple_New(self->nd); for (i=0; ind; i++) { PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i])); } return res; } while the method shape is if (strcmp(name, "shape") == 0) { PyObject *s, *o; int i; if ((s=PyTuple_New(self->nd)) == NULL) return NULL; for(i=self->nd; --i >= 0;) { if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL; if (PyTuple_SetItem(s,i,o) == -1) return NULL; } return s; } so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be fine. --Michiel. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Mon Apr 4 20:37:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 4 20:37:07 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4251F40C.6000402@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp> Message-ID: <4252078C.3050300@ee.byu.edu> > Actually, why is __array_datalen__ needed at all? Can't it be > calculated trivially from __array_shape__? Lovely point. I've taken away the __array_datalen__ from the interface description. -Travis From cookedm at physics.mcmaster.ca Mon Apr 4 21:17:19 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Mon Apr 4 21:17:19 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4251F384.7080506@ims.u-tokyo.ac.jp> (Michiel Jan Laurens de Hoon's message of "Tue, 05 Apr 2005 11:10:12 +0900") References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu> <4251F384.7080506@ims.u-tokyo.ac.jp> Message-ID: Michiel Jan Laurens de Hoon writes: > Travis Oliphant wrote: >>> Some comments on the array interface: >>> >>> 1) The "__array_shape__" method is identical to the existing >>> "shape" method in Numerical Python and numarray (except that >>> "shape" does a little bit better checking, but it can be added >>> easily to "__array_shape__"). To avoid code duplication, it might >>> be better to keep that method. (and rename the other methods for >>> consistency, if desired). >> There is no code duplication. In these cases it is just another >> name for .shape. What "better checking" are you referring to? > > The method __array_shape__ is > > if (strcmp(name, "__array_shape__") == 0) { > PyObject *res; > int i; > res = PyTuple_New(self->nd); > for (i=0; ind; i++) { > PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i])); > } > return res; > } > > while the method shape is > > if (strcmp(name, "shape") == 0) { > PyObject *s, *o; > int i; > > if ((s=PyTuple_New(self->nd)) == NULL) return NULL; > > for(i=self->nd; --i >= 0;) { > if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL; > if (PyTuple_SetItem(s,i,o) == -1) return NULL; > } > return s; > } > > so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I > don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be > fine. The #1 rule of thumb when using the Python C API: _always_ check your returned results (this usually means checking for NULL). In this, PyInt_FromLong _can_ fail (if there's an error creating the int free list). I've fixed this in CVS. You're right on PyTuple_SET_ITEM: the space for it is guaranteed to exist after the PyTuple_New. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Mon Apr 4 22:18:23 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 4 22:18:23 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <20050403165914.GC10730@idi.ntnu.no> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> Message-ID: <42521F76.5080309@ee.byu.edu> Magnus Lie Hetland wrote: > - Does the description of __array_data__ mean that the discussed > bytes type is no longer needed? (If we can use buffers, that > sounds very good to me.) > > We can use the buffer object, now and it works as far as it goes. But, there are very important reasons for the creation of a good bytes object. Probably, THE most important reason for the bytes object is Pickle support without always making an intermediate string (and the accompanying copy that is involved). Right now, a string is the only way to Pickle array data. A bytes object would allow a way to Pickle without making a copy. -Travis From Chris.Barker at noaa.gov Tue Apr 5 00:32:17 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Apr 5 00:32:17 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <42521F76.5080309@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> Message-ID: <42523EC0.5000303@noaa.gov> Travis Oliphant wrote: > Right now, a string is the only > way to Pickle array data. A bytes object would allow a way to Pickle > without making a copy. So could the new array protocol allow us to make a Python String from an array without copying? That could be pretty handy. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From magnus at hetland.org Tue Apr 5 01:49:25 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:49:25 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404233620.70070.qmail@web50209.mail.yahoo.com> References: <20050404092421.GD21527@idi.ntnu.no> <20050404233620.70070.qmail@web50209.mail.yahoo.com> Message-ID: <20050405084839.GD29671@idi.ntnu.no> Scott Gilbert : > [snip] > > > Inconsistencies other than the array interface (e.g. one implements > > > argmax(x) while another implements x.argmax()) may mean that an > > > extension module can work with one array implementation but not with > > > another, > > > > This does *not* sound like a good thing -- I agree. Certainly not what > > I would hope this protocol is used for. > > > > Things like argmax(x) are not part of this protocol. The high level array > packages and libraries will have all sorts of crazy and useful features. Sure -- I realise that. I just mean that I hope there won't be several scientific array modules that implement similar concepts with different APIs, just because they can (because of the new array API). > The protocol only describes the layout and format of the data. It enables > higher level packages to work seemlessly with all the different array > objects. Exactly. > That said, this protocol would allow a version argmax(x) to be > written in such a way as to handle *any* array object. ... given that you can compare the values in the array, of course. But, yes. This would be (IMO) the ideal situation. Instead of spawning several equivalent-but-different scientific array modules (i.e. the ones implementing such functionality as argmax()) we would have *one* main, standard such module, whose operations would work with almost any conceivable array object (e.g. from wxPython or PIL). That seems like a very, very good situation, IMO. > Cheers, > -Scott -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:51:35 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:51:35 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <42521F76.5080309@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> Message-ID: <20050405085041.GE29671@idi.ntnu.no> Travis Oliphant : > > Magnus Lie Hetland wrote: > > > - Does the description of __array_data__ mean that the discussed > > bytes type is no longer needed? (If we can use buffers, that > > sounds very good to me.) > > > > > > We can use the buffer object, now and it works as far as it goes. But, > there are very important reasons for the creation of a good bytes object. > > Probably, THE most important reason for the bytes object is Pickle > support without always making an intermediate string (and the > accompanying copy that is involved). Right now, a string is the only > way to Pickle array data. A bytes object would allow a way to Pickle > without making a copy. Ah. Very good argument, of course. But, as I understand it, the protocol as it stands could work with buffers until we get bytes objects? > -Travis -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:52:09 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:52:09 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <42523EC0.5000303@noaa.gov> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> <42523EC0.5000303@noaa.gov> Message-ID: <20050405085108.GF29671@idi.ntnu.no> Chris Barker : > > Travis Oliphant wrote: > >Right now, a string is the only > >way to Pickle array data. A bytes object would allow a way to Pickle > >without making a copy. > > So could the new array protocol allow us to make a Python String from an > array without copying? That could be pretty handy. Or treat a string as an array... Yay! :) > -Chris -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:52:25 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:52:25 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4252078C.3050300@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp> <4252078C.3050300@ee.byu.edu> Message-ID: <20050405085138.GG29671@idi.ntnu.no> Travis Oliphant : > > > >Actually, why is __array_datalen__ needed at all? Can't it be > >calculated trivially from __array_shape__? > > Lovely point. I've taken away the __array_datalen__ from the > interface description. This is only getting prettier and prettier :) > -Travis -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:57:12 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:57:12 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <20050404233447.26327.qmail@web50204.mail.yahoo.com> References: <20050404233447.26327.qmail@web50204.mail.yahoo.com> Message-ID: <20050405085642.GH29671@idi.ntnu.no> Scott Gilbert : > [snip] > I think the intent is that libraries like wxPython or PIL can > recognize data that they *want* to work with. They can raise an > exception when passed anything that is more complicated than they're > willing to deal with. Sure. I'm just saying that it would be good to have a baseline -- a basic, mandatory level of conformance, so that if I expose an array using only that part of the API (or, with the rest being optional information) I know that any conforming array consumer will understand me. As long as we have this, I have to know the capabilities of my consumer before I can write an appropriate typestr, for example. E.g., one application may only accept b1, while another would only accept i1 etc. Who knows -- there may well be sets of consumer applications that have mutually exclusive sets of accepted typestrings unless a minimum is mandated. That's really what I was after here. In addition to saying that typestr *must* be supported, one might say something about what typestrs must be supported. On the other hand -- perhaps such requirements should only be made on the array side? What requirements can/should one really make on the consumer side? I mean -- even though we have a strict sequence protocol, there is nothing wrong with creating something sequence-like (e.g., supporting floats as indices) and having consumer functions that aren't as strict as the official protocol... I just think it's something that it might be worth being explicit about. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 02:00:24 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 02:00:24 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404233322.61350.qmail@web50208.mail.yahoo.com> References: <20050404233322.61350.qmail@web50208.mail.yahoo.com> Message-ID: <20050405085905.GI29671@idi.ntnu.no> Scott Gilbert : > > > --- Michiel Jan Laurens de Hoon wrote: > > > > I'm not sure what you mean by "the array interface could become > > part of the Python standard as early as Python 2.5", since there > > is nothing to install. Or does this mean that Python's array will > > conform to the array interface? > > > > It would be nice to have the Python array module support the protocol for > the 1-Dimensional arrays that it implements. It would also be nice to add > a *simple* ndarray object in the core that supports multi-dimensional > arrays. I think breaking backward compatibility of the existing Python > array module to support multiple dimensions would be a mistake and unlikely > to get accepted. Do we really have to break backward compatibility in order to add more dimensions to the array module? There may be some issues with, e.g., typecode, but still... -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From a.schmolck at gmx.net Tue Apr 5 05:28:13 2005 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Tue Apr 5 05:28:13 2005 Subject: [Numpy-discussion] array slicing question In-Reply-To: <424FF03A.4060107@bigpond.net.au> (Gary Ruben's message of "Sun, 03 Apr 2005 23:31:38 +1000") References: <424FF03A.4060107@bigpond.net.au> Message-ID: Gary Ruben writes: > This may be relevant to Numeric 3, but is possibly just a general question > about array slicing which will either reveal a deficiency in specifying slices > or in my knowledge of slicing with numpy. > A while ago I was trying to reimplement some Matlab image processing code in > Numeric and revealed a deficiency in the way slices are defined. Suppose I > have an n x m array and want to slice off the first and last p rows and > columns where p can range from 0 to some number. Matlab provides a clean way > of doing this, but in numpy it's a bit of a mess. > > You might think you could do > >>> p=1 > >>> b = a[p:-p] b = a[p:-p or None] 'as From werner.bruhin at free.fr Tue Apr 5 11:26:36 2005 From: werner.bruhin at free.fr (Werner F. Bruhin) Date: Tue Apr 5 11:26:36 2005 Subject: [Numpy-discussion] AttributeError: _NumErrorMode instance has no attribute 'dividebyzero' Message-ID: <4252D77F.10600@free.fr> If I use "Numeric.Error.setMode(all='Raise')" I get the above AttributeError. I found this on 1.1.1 but just downloaded "numarray-1.2.3.win32-py2.4.exe" and I still find the same problem. I use numarray with wx.lib.plot.py to generate some simple charts. I would like to catch the exceptions and display an appropriate message to the user. Is the above the right approach or am I going about this the wrong way round? Any hints are appreciated. Werner From xscottg at yahoo.com Tue Apr 5 13:35:37 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Tue Apr 5 13:35:37 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: 6667 Message-ID: <20050405203434.38638.qmail@web50204.mail.yahoo.com> --- Magnus Lie Hetland wrote: > > Do we really have to break backward compatibility in order to add more > dimensions to the array module? > You're right. The Python array module could change in a backwards compatible way. Possibly using keyword arguments to specify parameters that have never been there before. We could probably make sense out of array.insert(), array.append(), array.extend(), array.pop(), and array.reverse() by giving those an "axis" keyword. Even array.remove() could be made to work for more dimensions, but it probably wouldn't get used often. Maybe some of these would just raise an exception for ndims > 1. Then we'd have to add some additional typecodes for complex and a few others. Under the hood, it would basically be a complete reimplementation, but maybe that is the way to go... It does keep the number of array modules down. I wonder which way would meet less resistance in getting accepted in the core. I think creating a new ndarray object would be less risk of breaking existing applications. > > There may be some issues with, e.g., typecode, but still... > The .typecode attribute could return the same values it always has. The .__array_typestr__ attribute would return the new style values. That's confusing, but probably unavoidable. It would be nice if there was only one set of typecodes for all of Python, but I think we're stuck with many (array module typecores, struct module typecodes, array protocol typecodes). Cheers, -Scott From oliphant at ee.byu.edu Tue Apr 5 14:28:39 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 5 14:28:39 2005 Subject: [Numpy-discussion] Questions about ufuncs now. Message-ID: <4253028D.4090407@ee.byu.edu> The arrayobject for scipy.base seems to be working. Currently the Numeric3 CVS tree is using the "old-style" ufuncs modified with new code for the newly added types. It should be quite functionable now for the brave at heart. I'm now working on modifying the ufunc object for scipy.base. These are the changes I'm working on: 1) a thread-specific? context that allows "buffer-size" level trapping of errors and retrieving of flags set. Similar to the decimal.context specification, but it uses the floating point sticky bits to implement. 2) implementation of buffers so that type-conversions (and byteswapping and alignment if necessary) never creates temporaries larger than the buffer-size (the buffer-size is user settable). 3) a reworking of the general N-dimensional loop to use array iterators with optimizations applied for contiguous arrays. 4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) do not dictate coercion rules Also, change so that certain mixed-type operations are computed in larger type for both. Most of this is pretty straightforward. But, I do have one addiitonal question. Do the new array scalars count as "non-coercing" scalars (i.e. like the Python scalars), or do they cause coercion? My preference is that ALL scalars (anything that becomes 0-dimensional arrays internally) cause only "kind-casting" (i.e. int to float, float to complex, etc.) but not "type-casting" -Travis From oliphant at ee.byu.edu Tue Apr 5 16:02:34 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 5 16:02:34 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: <42531880.3060600@ee.byu.edu> I'd like to release a Numeric 24.0 to get the array interface out there. There are also some other bug fixes in Numeric 24.0 Here is the list so far from Numeric 23.7 [Greenfield] Changed so a[0,0] and a[0][0] returns same type when a is 2-d of Int16 [unreported] Added array interface [unreported] Allow Long Integers to be used in slices [1123145] Handle mu==0.0 appropiately in ranlib/ignpoi. [unreported] Return error info in ranlib instead of printing it to stderr [1151892] dot() would quit python with zero-sized arrays when using dotblas. The BLAS routines *gemv and *gemm need LDA >= 1. [unreported] Fixed empty for Object arrays Version 23.8 March 2005 [Cooke] Fixed more 64-bit issues (patch 117603) [unreported] Changed arrayfnsmodule back to PyArray_INT where the code typecasts to (int *). Changed CanCastSafely to check if sizeof(long) == sizeof(int) I'll wait a little bit to allow last minute bug fixes to go in, but I'd realy like to see this release get out there. For users of Numeric >23.7 try Numeric.empty((10,20),'O') if you want to see an *interesting* bug that is fixed in CVS. -Travis From cookedm at physics.mcmaster.ca Tue Apr 5 16:13:31 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Tue Apr 5 16:13:31 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42531880.3060600@ee.byu.edu> (Travis Oliphant's message of "Tue, 05 Apr 2005 17:00:16 -0600") References: <42531880.3060600@ee.byu.edu> Message-ID: Travis Oliphant writes: > I'd like to release a Numeric 24.0 to get the array interface out > there. There are also some other bug fixes in Numeric 24.0 > > Here is the list so far from Numeric 23.7 > > [Greenfield] Changed so a[0,0] and a[0][0] returns same type when a > is 2-d of Int16 > [unreported] Added array interface > [unreported] Allow Long Integers to be used in slices > [1123145] Handle mu==0.0 appropiately in ranlib/ignpoi. > [unreported] Return error info in ranlib instead of printing it to stderr > [1151892] dot() would quit python with zero-sized arrays when using > dotblas. The BLAS routines *gemv and *gemm need LDA >= 1. > [unreported] Fixed empty for Object arrays > > Version 23.8 March 2005 > [Cooke] Fixed more 64-bit issues (patch 117603) > [unreported] Changed arrayfnsmodule back to PyArray_INT where the code > typecasts to (int *). Changed CanCastSafely to check > if sizeof(long) == sizeof(int) > > > I'll wait a little bit to allow last minute bug fixes to go in, but > I'd realy like to see this release get out there. For users of > Numeric >23.7 try > Numeric.empty((10,20),'O') if you want to see an *interesting* bug > that is fixed in CVS. Can you hold on? I've got some bugs I'm working on. There's some 64-bit things I'm working (various places that a long is cast to an int). For instance, a = Numeric.array((3,)) a.resize((2**32,)) gives a.shape == (1,) instead of an error. Stuff like this happens in the new array interface too :-) I'd suggest, before releasing with a bumped version number to 24.0, we release a beta version first. Shake out bugs in the array interface, and potentially allow for some changes if necessary. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From mdehoon at ims.u-tokyo.ac.jp Tue Apr 5 20:34:03 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Apr 5 20:34:03 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42531880.3060600@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> Message-ID: <4253597F.1090501@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > I'd like to release a Numeric 24.0 to get the array interface out > there. There are also some other bug fixes in Numeric 24.0 Thanks for the notification, Travis. I have commited patch #732520 (Eigenvalues on cygwin bug fix), which fixes bug #706716 (eigenvalues is broken). It's great to be a Numerical Python developer, I get to accept my own patches :-). The same patch was previously accepted by numarray. About the array interface, my feeling is that while it may be helpful in the short run, it is likely to damage SciPy in the long run. The array interface allows different array implementations to move in different directions. These different implementations will be compatible with respect to the array interface, but incompatible otherwise (depending on the level of self-restraint of the developers of the different array implementations). So in the end, extension modules will be written for a specific array implementation anyway. At this point, Numerical Python is the most established and has most users. Numarray, as far as I can tell, keeps closer to the Numerical Python tradition, so maybe extension modules can work with either one without further modification (e.g., pygist seems to work with both Numerical Python and numarray). But SciPy has been moving away (e.g. by replacing functions by methods). As extension module writers are usually busy people, they may not be willing to modify their code so that it works with SciPy, and even less to maintain two versions of their code, one for Numerical Python/numarray and one for SciPy. Users who could previously choose to install SciPy as an addition to Numerical Python, now find that they have to choose between SciPy and Numerical Python. As Numerical Python has many more extension packages, I expect that SciPy will end up losing users. Personally I use Numerical Python, and I plan to continue to use it for years to come, so it doesn't matter much to me. I'm just warning that the array interface may be a Trojan horse for the SciPy project. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Tue Apr 5 22:26:38 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 5 22:26:38 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <4253597F.1090501@ims.u-tokyo.ac.jp> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> Message-ID: <425372A4.7020900@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >> I'd like to release a Numeric 24.0 to get the array interface out >> there. There are also some other bug fixes in Numeric 24.0 > > > > About the array interface, my feeling is that while it may be helpful > in the short run, it is likely to damage SciPy in the long run. Well, I guess we'll just have to see. Again, I see the array interface as important for talking to other modules that may not need or want the "full power" of a packed array module like scipy.base is. > The array interface allows different array implementations to move in > different directions. These different implementations will be > compatible with respect to the array interface, but incompatible > otherwise (depending on the level of self-restraint of the developers > of the different array implementations). So in the end, extension > modules will be written for a specific array implementation anyway. At > this point, Numerical Python is the most established and has most > users. Numarray, as far as I can tell, keeps closer to the Numerical > Python tradition, so maybe extension modules can work with either one > without further modification (e.g., pygist seems to work with both > Numerical Python and numarray). > But SciPy has been moving away (e.g. by replacing functions by methods). Michiel, you seem to want to create this impression that "SciPy" is "moving away." I'm not sure of your motivations. But, since this is a public forum, I have to restate emphatically, that "SciPy" is not "moving away from Numeric." It is all about bringing together the communities. For the 5 years that scipy has been in development, it has always been about establishing a library of common routines that we could all share. It has built on Numeric from the beginning. Now, there is another "library" of routines that is developing around numarray. It is this very real break that I'm trying to help fix. I have no other "desire" to "move away" or "create a break" or any other such notions that you seem to want to spread. That is precisely why I have publically discussed practically every step of my work. You seem to be the only vocal one who thinks that scipy.base is not just a replacement for Numeric, but something else entirely. So, I repeat: **scipy.base is just a new version of Numeric with a few minor compatibility issues and a lot of added functionality and features** For example, despite your claims, I have not "replaced" functions by methods. The functions are still all there just like before. I've simply noticed that numarray has a lot of methods and so I've added similar methods to the Numeric object to help numarray users make the transition back. Everything else that I've changed, I've done to bring Numeric up-to-date with modern Python versions, and to fix old warts that have sat around for years. If there are problems with my changes, speak up. Tell me what to do to make the new Numeric better. > As extension module writers are usually busy people, they may not be > willing to modify their code so that it works with SciPy, and even > less to maintain two versions of their code, one for Numerical > Python/numarray and one for SciPy. It's comments like this that make me wonder what you are thinking. It seems to me that you are the only one I've talked to that wants to maintain the notion of a "split". Everybody else, I'm in contact with is in full support of merging the two communities behind a single scientific array object. Every extension module that compiles for Numeric should compile for scipy.base. Notice that full scipy already has a huge number of extension modules that needs to compile for scipy.base. So, I have every motivation to make that a painless process. > Users who could previously choose to install SciPy as an addition to > Numerical Python, now find that they have to choose between SciPy and > Numerical Python. As Numerical Python has many more extension > packages, I expect that SciPy will end up losing users. Again, scipy.base should *replace* Numerical Python for all users (except the most adamant who don't seem to want to go with the rest of the community). scipy.base is a new version of Numeric. On the C-level I don't know of any incompatibilities, on the Python level there are a very few (most of them rarely-used typecode character issues which a simple search and replace will fix). I should emphasize this next point, since I don't seem to be coming across very clearly to some people. As head Numeric developer, I'm stating that **Numeric 24 is the last release that will be called Numeric**. New releases of Numeric will be called scipy.base. Of course, I realize that people can do whatever they want with the old Numeric code base, but then they will be the ones responsible for continuing a "split," because the Numerical Python project at sourceforge will point people to install scipy.base. Help me make the transition as painless as possible, that's all I'm asking. People transitioning from Numeric should have no trouble at all as I repeatedly point out. People transitioning from numarray will have a *little* harder time which is why the array interface should help out during that process. It is helping people transition back from numarray that is 90% of the reason I've made any changes to the internals of Numeric. I've been a happy and quiet Numeric user and developer for years, but I respect the problems that Perry, Rick, Paul, and Todd have pointed out with their numarray implementation, and I saw a way to support their needs inside of Numeric. That is the whole reason for my efforts. I wish people would stop trying to make it seem to casual readers of this forum that I'm trying to create a "whole new" incompatible system. Help me fix the obviously unnecessary incompatibilites where they may exist, and help me make automatic transistion scripts to help people upgrade painlessly to the newer Numeric. I very much appreciate all who voice your concerns. Michiel, you are particularly appreciated because you are voice from a solid Numeric user. I just think that such concerns would be more productive in the context of accepting the fact that an upgrade from Numeric to scipy.base is going to happen, rather than trying to make it look like some new "split" is occurring. I've received a lot of offline support for the Numeric/numarray unification effort that scipy.base is. It would help if more people could provide public support on this forum so that others can see that I'm not just some outsider pushing some random ideas, but I am simply someone who decided to sacrifice some time for what I think is a very important effort. It would also help if other people who have concerns would voice them (I'm very grateful for those who have expressed their concerns) so that we can all address them and get on the same page for future development. Right now, the CVS version of Numeric3 works reasonably. It compiles and uses the old ufunc objects (which have only been extended to support the new types). I could use a lot of help in finding bugs. You can also try out the new array scalars to see how they work (math works on them now) and also see what may still be missing in their implementation. > > Personally I use Numerical Python, and I plan to continue to use it > for years to come, so it doesn't matter much to me. I'm just warning > that the array interface may be a Trojan horse for the SciPy project. As long as you realize that as far as I know the other developers of Numerical Python are going to be moving to scipy.base, and so you will be using obsolete technology, you are free to do as you wish. But, I really hope we can persuade you to join us. It is much better if we work together. -Travis From Fernando.Perez at colorado.edu Tue Apr 5 22:43:33 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Tue Apr 5 22:43:33 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <425372A4.7020900@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> Message-ID: <42537690.5040400@colorado.edu> Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: >>But SciPy has been moving away (e.g. by replacing functions by methods). > > > > Michiel, you seem to want to create this impression that "SciPy" is > "moving away." I'm not sure of your motivations. But, since this is a > public forum, I have to restate emphatically, that "SciPy" is not > "moving away from Numeric." It is all about bringing together the > communities. For the 5 years that scipy has been in development, it has > always been about establishing a library of common routines that we > could all share. It has built on Numeric from the beginning. Now, > there is another "library" of routines that is developing around > numarray. It is this very real break that I'm trying to help fix. I > have no other "desire" to "move away" or "create a break" or any other > such notions that you seem to want to spread. FWIW, I think you (Travis) have been exceedingly clear in explaining this process, and in pointing out how this is: a) NOT a further split, but rather the EXACT OPPOSITE (numarray users will have a transition path back into a project which will provide the best of the old Numeric, along with all the critical enhancements which Perry, Todd et al. added to numarray). b) a way, via the array protocol, to provide third-party low-level libraries an easy way to, AT THE C LEVEL, interact easily and efficiently (without unnecessary copies) with numeri* arrays. I fail to see where Michiel gets his split/Trojan horse arguments, or what line of reasoning can connect your detailed explanations with such a conclusion. In particular, the comments on the whole 'trojan' issue seem to me absolutely unfounded. Nobody in their sane mind will use this protocol to invent a scipy.base competitor, which most likely would end up (if done right) being simply a copy. What it provides is a minimal, compact, low-level API which will be a huge boon for interoperability with things like PIL, WX or other simliar libraries. This protocol has been extensively debated, and Scott's extensive comments have made this discussion a very productive one (along with the help of others, of course). I can only see this as a GREAT step forward for numerical python support and reliability 'in the wild'. I hesitated to send this message, but since you (Travis) have sunk an enormous amount of your time into this effort, which I can only applaud and rejoice in, I figure the least I can do is contribute a little to dispel some unnecessary confusion. Users with less knowledge of the details may become afraid of using Python for scientific computing by reading Michiel's comments, which I think would be a shame. Michiel, please note that none of what I said is meant to be a personal attack. I simply feel it is necessary to clarify, in no uncertain terms, how your recent comments of impending doom are unfounded. Best to all, and again thanks to Travis for this much needed hard work, f From Chris.Barker at noaa.gov Tue Apr 5 23:59:31 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Apr 5 23:59:31 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <425372A4.7020900@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> Message-ID: <42538880.7010301@noaa.gov> Travis Oliphant wrote: > It would help > if more people could provide public support on this forum Easy enough. I. for one am very happy about what Travis is doing. It seems to be exactly what is needed to mend the Numeric-numarray split, which has been an annoyance for a couple years now. I'm also VERY happy about the proposed array protocol. While I suppose it could facilitate the creation of other array packages, that is only speculation, and unlikely, in my judgment. What is I'm quite sure is going to happen is that other packages that do not provide an array implementation will be able to efficiently take arrays as input without crating a dependence on any particular package. I intend to make sure wxPython can efficiently take Numeric24 arrays, for instance. (Now that I think about it, it would be great if we could get this into wxPython2.6, which will be out pretty darn soon. I'm very pressed for time right now..can anyone help?) > It would also help if other > people who have concerns would voice them (I'm very grateful for those > who have expressed their concerns) so that we can all address them and > get on the same page for future development. My only concern is versioning. Particularly when under rapid development (but really this applies anytime), I'd really love to be able to have more than one version of Numeric (or SciPy.base, or whatever) installed at once, and be able to select which one is used at runtime, in code (before importing the first time, of course). This would facilitate testing, but also allow me to have a working environment for older apps that will continue to work, without modification or re-compiling, after installing a newer version. Something like wxPython's wxversion is what I have in mind. http://wiki.wxpython.org/index.cgi/MultiVersionInstalls -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From magnus at hetland.org Wed Apr 6 00:30:48 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Wed Apr 6 00:30:48 2005 Subject: [Numpy-discussion] Possible example application of the array interface Message-ID: <20050406072854.GA12700@idi.ntnu.no> I was just thinking about some experimental designs, and whether I could, perhaps, do the statistics in Python. I remembered having used RPy [1] briefly at some time (there may be other similar bindings out there -- I don't remember) and started thinking about whether I could, perhaps, combine it with numpy in some way. My first thought was to reimplement the relevant statistical functions; then I thought about how to convert data back and forth -- but then it occurred to me that R also uses arrays extensively, and that it could, perhaps, be possible to expose those (through something like RPy) through the array interface/protocol! This would be (IMO) a good example of the benefits of the array protocol; it's not a matter of "getting yet another array module". RPy is an external library/language with *lots* of features that might be useful to numpy users, many of which aren't likely to be implemented in Python for quite a while, I'd guess (unless, perhaps, someone writes a translator from R, which I'm sure is doable). I don't know enough (at least yet ;) about the implementation of RPy and the R library to say for sure whether this would even be possible, but it does seem like it could be really useful... [1] rpy.sf.net -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From sdementen at hotmail.com Wed Apr 6 00:36:39 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 00:36:39 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: Hi Travis, Could you look at bug [ 635104 ] segfault unpickling Numeric 'O' array [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of previous one) I proposed a (rather simple) solution that I put in the comment of bug [ 635104 ]. But apparently, nobody is looking at those bugs... > >I'd like to release a Numeric 24.0 to get the array interface out there. >There are also some other bug fixes in Numeric 24.0 > >Here is the list so far from Numeric 23.7 > >[Greenfield] Changed so a[0,0] and a[0][0] returns same type when a is 2-d >of Int16 This is quite disturbing. In fact for all types that are not exactly equivalent to python type, indexing a multidimensional array (rank > 1) return arrays even if the final shape is (). So type(zeros((5,2,4), Int8 )[0,0,0]) => type(zeros((5,2,4), Int32 )[0,0,0]) => type(zeros((5,2), Float32 )[0,0]) => But type(zeros((5,2,4), Int )[0,0,0]) => type(zeros((5,2,4), Float64)[0,0,0]) => type(zeros((5,2,4), Float)[0,0,0]) => type(zeros((5,2,4), PyObject)[0,0,0]) => Notice too the weird difference betweeb Int <> Int32 and Float == Float64. However, when indexing a onedimensional array (rank == 1), then we get back scalar for indexing operations on all types. So, when you say "return the same type", do you think scalar or array (it smells like a recent discussion on Numeric3 ...) ? >[unreported] Added array interface >[unreported] Allow Long Integers to be used in slices >[1123145] Handle mu==0.0 appropiately in ranlib/ignpoi. >[unreported] Return error info in ranlib instead of printing it to stderr >[1151892] dot() would quit python with zero-sized arrays when using > dotblas. The BLAS routines *gemv and *gemm need LDA >= 1. >[unreported] Fixed empty for Object arrays > >Version 23.8 March 2005 >[Cooke] Fixed more 64-bit issues (patch 117603) >[unreported] Changed arrayfnsmodule back to PyArray_INT where the code > typecasts to (int *). Changed CanCastSafely to check > if sizeof(long) == sizeof(int) > > >I'll wait a little bit to allow last minute bug fixes to go in, but I'd >realy like to see this release get out there. For users of Numeric >23.7 >try >Numeric.empty((10,20),'O') if you want to see an *interesting* bug that is >fixed in CVS. > >-Travis > > From nwagner at mecha.uni-stuttgart.de Wed Apr 6 01:01:42 2005 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Apr 6 01:01:42 2005 Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical Message-ID: <42539706.3000503@mecha.uni-stuttgart.de> Hi all, Using Numeric 24.0 >>> scipy.__version__ '0.3.3_303.4599' scipy.test() results in ====================================================================== ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", line 152, in check_simple_todense b = mmread(fn).todense() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 254, in todense csc = self.tocsc() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1437, in tocsc return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1]) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 60, in check_elmul c = a ** b File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 186, in __pow__ return csc ** other File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 485, in __pow__ return csc_matrix(c,(rowc,ptrc),M=M,N=N) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 71, in check_matmat assert_array_almost_equal((asp*bsp).todense(),dot(asp.todense(),bsp.todense())) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1184, in __mul__ return self.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 239, in matmat res = csc.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 568, in matmat return csc_matrix(c, (rowc, ptrc), M=M, N=N) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 75, in check_tocoo assert_array_almost_equal(a.todense(),self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 254, in todense csc = self.tocsc() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1437, in tocsc return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1]) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_mult (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 155, in check_mult D = A*A.T File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1184, in __mul__ return self.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 239, in matmat res = csc.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 568, in matmat return csc_matrix(c, (rowc, ptrc), M=M, N=N) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ---------------------------------------------------------------------- Ran 1173 tests in 3.113s FAILED (errors=31) >>> From cookedm at physics.mcmaster.ca Wed Apr 6 02:23:11 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 02:23:11 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: References: Message-ID: <20050406092143.GA31688@arbutus.physics.mcmaster.ca> On Wed, Apr 06, 2005 at 07:33:56AM +0000, S?bastien de Menten wrote: > > Hi Travis, > > Could you look at bug > [ 635104 ] segfault unpickling Numeric 'O' array > [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of > previous one) > > I proposed a (rather simple) solution that I put in the comment of bug [ > 635104 ]. But apparently, nobody is looking at those bugs... This is too true. Travis added myself and Michiel de Hoon recently to the developers, so there's some new blood, and we've been banging on things, though. I'll have a look at it if I've got time. I personally really hate bugs that crash my interpreter :-) > >I'd like to release a Numeric 24.0 to get the array interface out there. > >There are also some other bug fixes in Numeric 24.0 > > > >Here is the list so far from Numeric 23.7 > > > >[Greenfield] Changed so a[0,0] and a[0][0] returns same type when a is > >2-d of Int16 > > This is quite disturbing. In fact for all types that are not exactly > equivalent to python type, indexing a multidimensional array (rank > 1) > return arrays even if the final shape is (). > So > type(zeros((5,2,4), Int8 )[0,0,0]) => > type(zeros((5,2,4), Int32 )[0,0,0]) => > type(zeros((5,2), Float32 )[0,0]) => > But > type(zeros((5,2,4), Int )[0,0,0]) => > type(zeros((5,2,4), Float64)[0,0,0]) => > type(zeros((5,2,4), Float)[0,0,0]) => > type(zeros((5,2,4), PyObject)[0,0,0]) => > Notice too the weird difference betweeb Int <> Int32 and Float == Float64. That's because Int is *not* Int32. Int32 is the first typecode of '1sil' that has 32 bits. For (all?) platforms I've seen, that'll be 'i'. Int corresponds to a Python integer, and Float corresponds to a Python float. Now, a Python integer is actually a C long, and a Python float is actually a C double. I've made a table: Numeric type typecode Python type C type Array type Int 'l' int long PyArray_LONG Int32 'i' [1] N/A int PyArray_INT Float 'd' float double PyArray_DOUBLE Float32 'f' N/A float PyArray_FLOAT Float64 'd' float double PyArray_DOUBLE [1] assuming sizeof(int)==4, which is true on most platforms. There are some 64-bit platforms where this won't be true, I think. On (all? most?) 32-bit platforms, sizeof(int) == sizeof(long) == 4, so both Int and Int32 be 32-bit quantities. Not so on some 64-bit platforms (Linux on an Athlon 64, like the one I'm typing at now), where sizeof(long) == 8. I've been fixing oodles of assumptions in Numeric where ints and longs have been used interchangeably, hence the extended discussion :-) [I haven't addressed here why you get an array sometimes and a Python type the others. This is the standard, old, behaviour -- it's likely not going to change in Numeric. Whether it's a *good* thing is another question. scipy.base and numarray do it differently.] -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Wed Apr 6 02:46:55 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 02:46:55 2005 Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical In-Reply-To: <42539706.3000503@mecha.uni-stuttgart.de> References: <42539706.3000503@mecha.uni-stuttgart.de> Message-ID: <20050406094438.GA32297@arbutus.physics.mcmaster.ca> On Wed, Apr 06, 2005 at 10:00:06AM +0200, Nils Wagner wrote: > Hi all, > > Using Numeric 24.0 > >>> scipy.__version__ > '0.3.3_303.4599' > > scipy.test() results in > > ====================================================================== > ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", > line 152, in check_simple_todense > b = mmread(fn).todense() > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 254, in todense > csc = self.tocsc() > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 1437, in tocsc > return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1]) > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 357, in __init__ > self._check() > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 375, in _check > if (nnz>0) and (max(self.rowind[:nnz]) >= M): > IndexError: invalid slice (etc. -- note to self: use scipy for regression testing :-) nnz is coming from nnz = self.indptr[-1] where self.indptr is an array of Int32. Hmm, this corresponds to the behaviour I just responded to Sebastien de Menten about. The problem is that nnz is *not* an Python integer; it's an array, so the slice fails. I think I was wrong in that email about saying this was expected behaviour :-) This comes from the recent fix of a[0,0] and a[0][0] returning the same type. Either change that back, or else we need to spruce up the slicing logic to consider 0-dimensional integer arrays as scalars. A minimal test case: a = Numeric.array([5,6,7,8]) b = Numeric.array([0,1,2,3], 'i') n = b[-1] assert a[:n] == 8 (I'm not tackling this right now) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From magnus at hetland.org Wed Apr 6 02:59:18 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Wed Apr 6 02:59:18 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050405203434.38638.qmail@web50204.mail.yahoo.com> References: <20050405203434.38638.qmail@web50204.mail.yahoo.com> Message-ID: <20050406095639.GA16810@idi.ntnu.no> Scott Gilbert : > > > --- Magnus Lie Hetland wrote: > > > > Do we really have to break backward compatibility in order to add more > > dimensions to the array module? > > > > You're right. The Python array module could change in a backwards > compatible way. Possibly using keyword arguments to specify parameters > that have never been there before. > > We could probably make sense out of array.insert(), array.append(), > array.extend(), array.pop(), and array.reverse() by giving those an "axis" > keyword. Even array.remove() could be made to work for more dimensions, > but it probably wouldn't get used often. Maybe some of these would just > raise an exception for ndims > 1. Sure. I guess basically the extend/pop/reverse/etc. methods and the ndim-functionality would sort of be two quite different ways of using arrays, so keeping them mutually exclusive doesn't seem like a problem to me. This might speak in favour of separating the functionality into two different classes, but I think there's merit to keeping it gathered, because this is partly for basic use(rs) who just want to get an array and do things to it that make sense. Appending to a multidimensional array (as long as we don't tempt them with an axis keyword) just doesn't make sense -- so people (hopefully) won't do it. > Then we'd have to add some additional typecodes for complex and a > few others. Yeah; the question is how compatible the typecode system is with the new array protocol -- some overlap and some differences, I believe (without checking right now)? So -- this might look a bit like patchwork. But I think might get that if we have two modules (or classes) too -- one, called array, with the existing functionality, and one, called (e.g.) ndarray, with a similar but incompatible interface... It *may* be better, but I'm not quite sure I think so. In my experience (which may be very biased and selective here ;) the array module isn't exactly among the "hottest" features of Python or the standard libs. In fact, it seems almost a bit pointless to me. It claims to have "efficient arrays of numeric values" but is the efficiency really that great, if you write your code in Python? (Using lists and psyco would, quite possibly, be just as good, for example.) So -- at *least* adding the array protocol to it would be doing it a favour, i.e., making it a useful module, and sort of a prototypical example of the protocol and such. Adding more dimensions might simply make it more useful. (I've many times been asked by people how to create e.g. two-dimensional arrays in Python. It would be nice if there was actually some basic support for it.) > Under the hood, it would basically be a complete reimplementation, Sure; except for the (possibly minor?) work involved, I don't see that this is a problem? (Well... The inherent instability of new code, perhaps... But still.) > but maybe that is the way to go... It does keep the number of array > modules down. Yes. > I wonder which way would meet less resistance in getting accepted in > the core. I think creating a new ndarray object would be less risk > of breaking existing applications. I guess that's true. > > > > There may be some issues with, e.g., typecode, but still... > > > > The .typecode attribute could return the same values it always has. Sure. But we might end up with, e.g., a constructor that looks almost exactly like the numpy array() constructor -- but whose typecodes are different... :/ > The .__array_typestr__ attribute would return the new style values. > That's confusing, but probably unavoidable. Yes, if we do use this approach. If we only allow one-dimensional arrays here (i.e., only add the protocol to the existing functionality) there might be less confusion? Oh, I don't know. Having a separate module or class/type might be just as good an idea. Perhaps I'm just being silly :-> > It would be nice if there was only one set of typecodes for all of > Python, Yeah -- or some similar system (using type objects). > but I think we're stuck with many (array module typecores, struct > module typecodes, array protocol typecodes). :( Yes, lots of history here. Oh, well. Not the greatest of problems, I guess. But using different typecodes in the explicit user-part of the ND-array interface in the stdlibs from those in scipy, for example, seems like a decidedly Bad Idea(tm). So ... that might be a good enough reason for using a separate ndarray entity, unless there can be some upward compatibility somehow. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From sdementen at hotmail.com Wed Apr 6 03:12:32 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 03:12:32 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: Hi, I follow with great interest the threads around Numeric3/scipy.base. As Travis suggested (?It would also help if other people who have concerns would voice them (I'm very grateful for those who have expressed their concerns) so that we can all address them and get on the same page for future development.?), I voice my concert J Sometimes it is quite useful to treat data at a higher level than just an ?array of number of some types?. Adding metadata to array (I called them ?augmented arrays?) is a simple way to add sense to an array. I see different user cases like: 1) attaching a physical unit to array data (see for instance Unum http://home.tiscali.be/be052320/Unum.html ) 2) description of axis (see http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very useful to manipulate easily time series. 3) masked arrays as in MA module of Numeric 4) arrays for interval arithmetic where one keep another array with precision of data 5) record arrays (currently being integrated in scipy.base as a base type) The current solution for those situation is nicely summarized by quoting Konrad ?but rather a class written using arrays than a variety of the basic array type. It?s actually pretty straightforward to implement, the most difficult choice being the form of the constructor that gives most flexibility in use.? However, I disagree with the ?pretty straightforward to implement?. In fact, if one wants to inherit most of the functionalities of Numeric, it becomes quite cumbersome. Looking at MA module, I see that it needs to: 1) redefine all methods (__add__, ?) 2) redefine all ufuncs 3) redefine all array functions (like reshape, sort, argmax, ?) For other purposes, the same burden may apply. A general solution to this problem is not straightforward and may be out of reach (computationally and/or conceptually). However, a quite-general-enough elegant solution could solve most practical problems. Looking at threads in this list, I think that there is enough brain power to get to something usable in the medium term. An embryo of idea would be to add hooks in the machinery to allow an object to interact with an ufunc. Currently, this is done by calling __array__ to extract a ?naked array? (== Numeric.array vs ?augmented array?) but the result is then always a ?naked array?. In pseudocode, this looks like: def ufunc( augmented_array ): if not isarray(augmented_array): augmented_array = augmented_array.__array__() return ufunc.apply(augmented_array) where I would prefer something like def ufunc( augmented_array ): if not isarray(augmented_array): augmented_array, contructor = augmented_array.__array_constructor__() else: constructor = lambda x:x return constructor(ufunc.apply(augmented_array)) For array functions and methods, I have even less clues to a solution J. But calling hooks specified by some protocol would be a path: a) __array_constructor__ b) __array_binary_op__ (would be called for __add__, __sub__, ?) c) __array_rbinary_op__ (would be called for __radd__, __rsub__, ?) If I miss a point and there is an easy way to do this, I?ll be pleased to know it. Otherwise, any feedback on this ability to easily increase array functionalities by appending metadata and related behavior. Sebastien From cjw at sympatico.ca Wed Apr 6 03:15:13 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Apr 6 03:15:13 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <424FE8E7.4040904@ee.byu.edu> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> Message-ID: <4253B691.5030902@sympatico.ca> Travis Oliphant wrote: > Colin J. Williams wrote: > >> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install >> running install >> running build >> running config >> error: The .NET Framework SDK needs to be installed before building >> extensions for Python. >> >> Is there any chance that a Windows binary could be made available for >> testing? > > > Probably not in the near term (but you could ask Michiel). > > I'm assuming you have mingw32 installed which would allow you to build > it provided you have created an exports file for python2.4 (look on > the net for how to compile extensions with mingw32 using a MSVC > compiled python). > You have to tell distutils what compiler to use: > > python setup.py config --compiler=mingw32 > python setup.py build --compiler=mingw32 > python setup.py install > > -Travis Thanks to Michiel and Travis for their suggestions. I am using Windows XP and get the following result: C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py config --compiler=minw32 running config error: don't know how to compile C/C++ code on platform 'nt' with 'minw32' compiler C:\Python24\Lib\site-packages\Numeric3\Download> I would welcome any comments. Colin W. From cookedm at physics.mcmaster.ca Wed Apr 6 03:31:40 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 03:31:40 2005 Subject: [Numpy-discussion] array interface nitpicks Message-ID: Just some small nitpicks in the array interface document (http://numeric.scipy.org/array_interface.html): As written: """ __array_shape__ (required) Tuple showing size in each dimension. Each entry in the tuple must be a Python (long) integer. Note that these integers could be larger than the platform "int" or "long" could hold. Use Py_LONG_LONG if accessing the entries of this tuple in C. """ Since this is supposed to be an interface, not an implementation (duck-typing and all that), I think this is too strict: __array_shape__ should just be a sequence of integers, not necessarily a tuple. I'd suggest something like this: ''' __array_shape__ (required) Sequence whose elements are the size in each dimension. Each entry is an integer (a Python int or long). Note that these integers could be larger than the platform "int" or "long" could hold (a Python int is a C long). It is up to the calling code to handle this appropiately; either by raising an error when overflow is possible, or by using Py_LONG_LONG as the C type for the shapes. ''' This is clearer about the users responsibility -- note that Numeric is taking the first approach (error), as the dimensions in PyArrayObject are ints. Similiar comments about __array_strides. I'd reword it along the lines of ''' __array_strides__ (optional) Sequence of strides which provides the number of bytes needed to jump to the next array element in the corresponding dimension. Each entry must be integer (a Python int or long). As with __array_shape__, the values may be larger than can be represented by a C "int" or "long"; the calling code should handle this appropiately, either by raising an error, or by using Py_LONG_LONG in C. Default is a strides tuple which implies a C-style contiguous memory buffer. In this model, the last dimension of the array varies the fastest. For example, the default __array_strides__ tuple for an object whose array entries are 8 bytes long and whose __array_shape__ is (10,20,30) would be (4800, 240, 8) Default: C-style contiguous ''' I'm mostly worried about the use of Python longs; it shouldn't be necessary in almost all cases, and adds extra complications (in normal usage, you don't see Python longs all that much). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cjw at sympatico.ca Wed Apr 6 03:33:05 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Apr 6 03:33:05 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <4253BAA1.7010403@sympatico.ca> S?bastien de Menten wrote: > Hi, > > I follow with great interest the threads around Numeric3/scipy.base. > As Travis suggested (?It would also help if other people who have > concerns would voice them (I'm very grateful for those who have > expressed their concerns) so that we can all address them and get on > the same page for future development.?), I voice my concert J > > Sometimes it is quite useful to treat data at a higher level than just > an ?array of number of some types?. Adding metadata to array (I called > them ?augmented arrays?) is a simple way to add sense to an array. I > see different user cases like: > 1) attaching a physical unit to array data (see for instance Unum > http://home.tiscali.be/be052320/Unum.html ) > 2) description of axis (see > http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very > useful to manipulate easily time series. Does the record array provide a means of addressing this need? > 3) masked arrays as in MA module of Numeric > 4) arrays for interval arithmetic where one keep another array with > precision of data > 5) record arrays (currently being integrated in scipy.base as a base > type) > Yes, and there is numarray's array of objects. > The current solution for those situation is nicely summarized by > quoting Konrad > ?but rather a class written using arrays than a variety of the basic > array type. > It?s actually pretty straightforward to implement, the most difficult > choice being the form of the constructor that gives most flexibility > in use.? > [snip] Colin W. From rkern at ucsd.edu Wed Apr 6 03:36:51 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Apr 6 03:36:51 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050406095639.GA16810@idi.ntnu.no> References: <20050405203434.38638.qmail@web50204.mail.yahoo.com> <20050406095639.GA16810@idi.ntnu.no> Message-ID: <4253BB73.5000605@ucsd.edu> Magnus Lie Hetland wrote: > So -- at *least* adding the array protocol to it would be doing it a > favour, i.e., making it a useful module, and sort of a prototypical > example of the protocol and such. Adding more dimensions might simply > make it more useful. (I've many times been asked by people how to > create e.g. two-dimensional arrays in Python. It would be nice if > there was actually some basic support for it.) Re-implementing the stdlib-array module to support multiple dimensions is almost certainly a non-starter. You can't easily do it without breaking its pre-allocation strategy. It preallocates memory for elements using the same algorithm that lists do, so .append() has reasonable amortized time behaviour. python-dev will not appreciate changing the algorithmic complexity of a long-existing component to accomodate a half-arsed implementation of N-D arrays. OTOH, it is the one reason for stdlib-array's use in a Numeric world: sometimes, you just need to append values; you can't pre-allocate with Numeric.empty() and index in values. Using stdlib-array to collect the values, then using the buffer interface (soon-to-be __array__ interface) to convert to a Numeric array is faster than the alternatives. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From sdementen at hotmail.com Wed Apr 6 03:59:35 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 03:59:35 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <4253BAA1.7010403@sympatico.ca> Message-ID: >>1) attaching a physical unit to array data (see for instance Unum >>http://home.tiscali.be/be052320/Unum.html ) >>2) description of axis (see >>http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very >>useful to manipulate easily time series. > >Does the record array provide a means of addressing this need? > Not really, when I mean axis, I speak about indexing. For an array (named a) with shape (10, 5, 33), I would like to attach 3 arrays or list or tuple (named axis_information[0], axis_information[1] and axis_information[2]) of size (10,), (5,) and (33,) which give sense to the first, second and third index. For instance, A[i,j,k] => means the element of A at (axis_information[0][i], axis_information[1][j], axis_information[2][k]) instead of A[i,j,k] => means the element of A at index position [i,j,k] which makes less sense (you always need to track the meaning of i,j,k in parallel). >>3) masked arrays as in MA module of Numeric Maybe this one could be implemented using record array with a record like (data, mask). However, it would be cumbersome to use. E.g. a.field("data")[:] = cos( a.field("data")[:] ) instead of a[:] = cos(a[:]) with the current MA module >>4) arrays for interval arithmetic where one keep another array with >>precision of data >>5) record arrays (currently being integrated in scipy.base as a base type) >> >Yes, and there is numarray's array of objects. > This is overkilling as it eats way too much memory. E.g. your data represents instantaneous speeds and so it tagged with a "m/s" information (a complex object) valid for the full array. Distributing this information to each component of an array via an array object is not practical. From mdehoon at ims.u-tokyo.ac.jp Wed Apr 6 04:22:52 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Apr 6 04:22:52 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <4253B691.5030902@sympatico.ca> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> Message-ID: <4253C73E.4030703@ims.u-tokyo.ac.jp> Colin J. Williams wrote: > Thanks to Michiel and Travis for their suggestions. I am using Windows > XP and get the following result: > > C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py > config --compiler=minw32 > running config > error: don't know how to compile C/C++ code on platform 'nt' with > 'minw32' compiler > > C:\Python24\Lib\site-packages\Numeric3\Download> > > I would welcome any comments. --mingw32 contains a 'g'. Also, make sure you have Cygwin installed, with all the necessary packages. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From steve at shrogers.com Wed Apr 6 05:12:39 2005 From: steve at shrogers.com (Steven H. Rogers) Date: Wed Apr 6 05:12:39 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <425372A4.7020900@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> Message-ID: <4253D1B9.90709@shrogers.com> Travis Oliphant wrote: > > Again, scipy.base should *replace* Numerical Python for all users > (except the most adamant who don't seem to want to go with the rest of > the community). scipy.base is a new version of Numeric. On the > C-level I don't know of any incompatibilities, on the Python level > there are a very few (most of them rarely-used typecode character issues > which a simple search and replace will fix). > > I should emphasize this next point, since I don't seem to be coming > across very clearly to some people. As head Numeric developer, I'm > stating that **Numeric 24 is the last release that will be called > Numeric**. New releases of Numeric will be called scipy.base. > I'm happy with the direction your taking to rejoin Numeric and Numarray. However, changing the name from Numeric to scipy.base may contribute to the confusion/concern. Is it really necessary? Steve -- Steven H. Rogers, Ph.D., steve at shrogers.com Weblog: http://shrogers.com/weblog "Reach low orbit and you're half way to anywhere in the Solar System." -- Robert A. Heinlein From konrad.hinsen at laposte.net Wed Apr 6 07:49:45 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Apr 6 07:49:45 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> On Apr 6, 2005, at 12:10, S?bastien de Menten wrote: > However, I disagree with the ?pretty straightforward to implement?. In > fact, if one wants to inherit most of the functionalities of Numeric, > it becomes quite cumbersome. Looking at MA module, I see that it needs > to: It is straightforward AND cumbersome. Lots of work, but nothing difficult. I agree of course that it would be nice to improve the situation. > An embryo of idea would be to add hooks in the machinery to allow an > object to interact with an ufunc. Currently, this is done by calling > __array__ to extract a ?naked array? (== Numeric.array vs ?augmented > array?) but the result is then always a ?naked array?. > In pseudocode, this looks like: > > def ufunc( augmented_array ): > if not isarray(augmented_array): > augmented_array = augmented_array.__array__() > return ufunc.apply(augmented_array) The current behaviour of Numeric is more like def ufunc(object): if isarray(object): return array_ufunc(object) elif is_array_like(object): return array_func(array(object)) else: return object.ufunc() A more general version, which should cover your case as well, would be: def ufunc(object): if isarray(object): return array_ufunc(object) else: try: return object.applyUfunc(ufunc) except AttributeError: if is_array_like(object): return array_func(array(object)) else: raise ValueError There are two advantages: 1) Classes can handle ufuncs in any way they like, even if they implement array-like objects. 2) Classes must implement only one method, not one per ufunc. Compared to the approach that you suggested: > where I would prefer something like > > def ufunc( augmented_array ): > if not isarray(augmented_array): > augmented_array, contructor = > augmented_array.__array_constructor__() > else: > constructor = lambda x:x > return constructor(ufunc.apply(augmented_array)) mine has the advantage of also covering classes that are not array-like at all. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From cjw at sympatico.ca Wed Apr 6 08:16:33 2005 From: cjw at sympatico.ca (cjw at sympatico.ca) Date: Wed Apr 6 08:16:33 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <4253FCD1.2090808@sympatico.ca> S?bastien de Menten wrote: >>> 1) attaching a physical unit to array data (see for instance Unum >>> http://home.tiscali.be/be052320/Unum.html ) >>> 2) description of axis (see >>> http://sourceforge.net/mailarchive/message.php?msg_id=11051806). >>> Very useful to manipulate easily time series. >> >> >> Does the record array provide a means of addressing this need? >> > > Not really, when I mean axis, I speak about indexing. Fair enough, I was thinking one dimensionally. > For an array (named a) with shape (10, 5, 33), I would like to attach > 3 arrays or list or tuple (named axis_information[0], > axis_information[1] and axis_information[2]) of size (10,), (5,) and > (33,) which give sense to the first, second and third index. > For instance, > A[i,j,k] => means the element of A at (axis_information[0][i], > axis_information[1][j], axis_information[2][k]) > instead of > A[i,j,k] => means the element of A at index position [i,j,k] which > makes less sense (you always need to track the meaning of i,j,k in > parallel). > >>> 3) masked arrays as in MA module of Numeric >> > > Maybe this one could be implemented using record array with a record > like (data, mask). > However, it would be cumbersome to use. > E.g. a.field("data")[:] = cos( a.field("data")[:] ) > instead of > a[:] = cos(a[:]) > with the current MA module Assuming "data" is the name of a field in a record array "a", why not have a.data to represent a view (or copy, depending on the convention adopted) of a column in a or a.data.Cos to provide the cosines of the values in the data column? "Cos" is used in place of "cos" to distinguish the method from the function. The former requires no parentheses. This assumes that the values in data are of the approriate numerictype ( with its appropriate typecode). Colin W. > > >>> 4) arrays for interval arithmetic where one keep another array with >>> precision of data >>> 5) record arrays (currently being integrated in scipy.base as a base >>> type) >>> >> Yes, and there is numarray's array of objects. >> > > This is overkilling as it eats way too much memory. > E.g. your data represents instantaneous speeds and so it tagged with a > "m/s" information (a complex object) valid for the full array. > Distributing this information to each component of an array via an > array object is not practical. > From sdementen at hotmail.com Wed Apr 6 08:52:05 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 08:52:05 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: >> >>Maybe this one could be implemented using record array with a record like >>(data, mask). However, it would be cumbersome to use. E.g. >>a.field("data")[:] = cos( a.field("data")[:] ) instead of a[:] = cos(a[:]) >>with the current MA module > >Assuming "data" is the name of a field in a record array "a", why not have >a.data to represent a view (or copy, depending on the convention adopted) >of a column in a or a.data.Cos to provide the cosines of the values in the >data column? > >"Cos" is used in place of "cos" to distinguish the method from the >function. The former requires no parentheses. > Well, I think the whole point is to be able to use "without changes" any library that manipulate arrays with "augmented arrays": same code for all arrays independently of them being "naked" or "augmented". The "without changes" and "any library" should be taken with a pinch of salt as operation that are accepted for any array will not necessarily mean something for some "augmented arrays". On a side note, I rather prefer to keep mathematical notation instead of OO notation ( cos as function vs method ) From sdementen at hotmail.com Wed Apr 6 09:07:07 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 09:07:07 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> Message-ID: > >>However, I disagree with the ?pretty straightforward to implement?. In >>fact, if one wants to inherit most of the functionalities of Numeric, it >>becomes quite cumbersome. Looking at MA module, I see that it needs to: > >It is straightforward AND cumbersome. Lots of work, but nothing difficult. >I agree of course that it would be nice to improve the situation. My fault, I misunderstood your answer (... but it was a little bit misleading :-) >The current behaviour of Numeric is more like > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > elif is_array_like(object): > return array_func(array(object)) > else: > return object.ufunc() > >A more general version, which should cover your case as well, would be: > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > else: > try: > return object.applyUfunc(ufunc) > except AttributeError: > if is_array_like(object): > return array_func(array(object)) > else: > raise ValueError > >There are two advantages: > >1) Classes can handle ufuncs in any way they like, even if they implement > array-like objects. >2) Classes must implement only one method, not one per ufunc. > >Compared to the approach that you suggested: > >>where I would prefer something like >> >> def ufunc( augmented_array ): >> if not isarray(augmented_array): >> augmented_array, contructor = >>augmented_array.__array_constructor__() >> else: >> constructor = lambda x:x >> return constructor(ufunc.apply(augmented_array)) > >mine has the advantage of also covering classes that are not array-like at >all. > Yes !! That's a elegant solution for the ufunc part. Do you think it is possible to integrate a similar mechanism in array functions (like searchsorted, argmax, ...). If we can register functions taking one array as argument within scipy.base and let it dispatch those functions as ufunc, we could use a similar strategy. For instance, let "sort" and "argmax" be registered as gfunc (general functions on an array <> ufunc), then any class that would like to overide any of them could do it too with the same trick Konrad exposed here above. If another function uses those gfuncs and ufuncs, it inherits the genericity of the latter. Konrad, do you think it is tricky to have a prototype of your suggestion (i.e. the modification does not need a full understanding of Numeric and you can locate it approximately in the source code) ? Seb >Konrad. >-- From mike_lists at yahoo.com.au Wed Apr 6 10:12:39 2005 From: mike_lists at yahoo.com.au (Michael Sorich) Date: Wed Apr 6 10:12:39 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: 6667 Message-ID: <20050406171008.58480.qmail@web53602.mail.yahoo.com> I think that this is a great idea! While I have a strong preference for python, I generally use R for statistical analyses due to the large number of mature libraries available. There are also some aspects of the R data types (eg data-frames and column/row names for 2D arrays) that are really nice for spreadsheet like data. I hope that scipy.base record arrays will be as easily manipulated as data-frames are. While RPy works well for small simple problems, there are data conversion limitations between R and Python. If one could efficiently convert between the major R data types and python scipy.base data types without loss of data, it would become possible to do most of the data manipulation in python and freely mix in R functions when required. This may encourage the use of python for the development of statistical routines. >From my meager understanding of RPy: R vectors are converted to python lists. It may make more sense to convert them to an array (either stdlib or scipy.base version) - without copying data if possible. R arrays and matrices are converted to Numeric arrays. Eg In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) Out[8]: array([[1, 3, 5], [2, 4, 6]]) However, column and row names (or dimnames for arrays with >2 dimensions) are lost in R->Py conversion. I do not know whether these conversions require copying of the data. R data-frames are currently converted to python dictionaries and I don?t think that there is any simple way to convert a python object to an R data frame. This is the biggest limitation of rpy in my opinion. In [16]: r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) Out[16]: {'col2': ['one', 'two', 'three', 'four'], 'col1': [1, 2, 3, 4]} If it were possible to convert between an R data-frame and a scipy.base record array without copying or losing data, RPy would become more useful. I wish I understood C, scipy.base and R well enough to give this a go. However, this is Way over my head! Mike --- Magnus Lie Hetland wrote: > I was just thinking about some experimental designs, > and whether I > could, perhaps, do the statistics in Python. I > remembered having used > RPy [1] briefly at some time (there may be other > similar bindings out > there -- I don't remember) and started thinking > about whether I could, > perhaps, combine it with numpy in some way. My first > thought was to > reimplement the relevant statistical functions; then > I thought about > how to convert data back and forth -- but then it > occurred to me that > R also uses arrays extensively, and that it could, > perhaps, be > possible to expose those (through something like > RPy) through the > array interface/protocol! > > This would be (IMO) a good example of the benefits > of the array > protocol; it's not a matter of "getting yet another > array module". RPy > is an external library/language with *lots* of > features that might be > useful to numpy users, many of which aren't likely > to be implemented > in Python for quite a while, I'd guess (unless, > perhaps, someone > writes a translator from R, which I'm sure is > doable). > > I don't know enough (at least yet ;) about the > implementation of RPy > and the R library to say for sure whether this would > even be possible, > but it does seem like it could be really useful... > > [1] rpy.sf.net > > -- > Magnus Lie Hetland Fall seven > times, stand up eight > http://hetland.org > [Japanese proverb] > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT > Products from real users. > Discover which products truly live up to the hype. > Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > Find local movie times and trailers on Yahoo! Movies. http://au.movies.yahoo.com From bsouthey at gmail.com Wed Apr 6 11:38:37 2005 From: bsouthey at gmail.com (Bruce Southey) Date: Wed Apr 6 11:38:37 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> Message-ID: Hi, I don't see that it is feasible to link R and numerical python in this way. As you point out, R objects (R is an object orientated language) uses a lot of meta-data. Then there is the IEEE stuff (NaN etc) that would also need to be handled in numerical python. You probably could get RPy or RSPython to use numerical python rather than just baisc Python. What statistical functions would you want in numerical python? Regards Bruce On Apr 6, 2005 12:10 PM, Michael Sorich wrote: > I think that this is a great idea! While I have a > strong preference for python, I generally use R for > statistical analyses due to the large number of mature > libraries available. There are also some aspects of > the R data types (eg data-frames and column/row names > for 2D arrays) that are really nice for spreadsheet > like data. I hope that scipy.base record arrays will > be as easily manipulated as data-frames are. > > While RPy works well for small simple problems, there > are data conversion limitations between R and Python. > If one could efficiently convert between the major R > data types and python scipy.base data types without > loss of data, it would become possible to do most of > the data manipulation in python and freely mix in R > functions when required. This may encourage the use of > python for the development of statistical routines. > > From my meager understanding of RPy: > > R vectors are converted to python lists. It may make > more sense to convert them to an array (either stdlib > or scipy.base version) - without copying data if > possible. > > R arrays and matrices are converted to Numeric arrays. > Eg > > In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) > Out[8]: > array([[1, 3, 5], > [2, 4, 6]]) > > However, column and row names (or dimnames for arrays > with >2 dimensions) are lost in R->Py conversion. I do > not know whether these conversions require copying of > the data. > > R data-frames are currently converted to python > dictionaries and I don't think that there is any > simple way to convert a python object to an R data > frame. This is the biggest limitation of rpy in my > opinion. > > In [16]: > r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) > Out[16]: {'col2': ['one', 'two', 'three', 'four'], > 'col1': [1, 2, 3, 4]} > > If it were possible to convert between an R data-frame > and a scipy.base record array without copying or > losing data, RPy would become more useful. > > I wish I understood C, scipy.base and R well enough to > give this a go. However, this is Way over my head! > > Mike > > --- Magnus Lie Hetland wrote: > > I was just thinking about some experimental designs, > > and whether I > > could, perhaps, do the statistics in Python. I > > remembered having used > > RPy [1] briefly at some time (there may be other > > similar bindings out > > there -- I don't remember) and started thinking > > about whether I could, > > perhaps, combine it with numpy in some way. My first > > thought was to > > reimplement the relevant statistical functions; then > > I thought about > > how to convert data back and forth -- but then it > > occurred to me that > > R also uses arrays extensively, and that it could, > > perhaps, be > > possible to expose those (through something like > > RPy) through the > > array interface/protocol! > > > > This would be (IMO) a good example of the benefits > > of the array > > protocol; it's not a matter of "getting yet another > > array module". RPy > > is an external library/language with *lots* of > > features that might be > > useful to numpy users, many of which aren't likely > > to be implemented > > in Python for quite a while, I'd guess (unless, > > perhaps, someone > > writes a translator from R, which I'm sure is > > doable). > > > > I don't know enough (at least yet ;) about the > > implementation of RPy > > and the R library to say for sure whether this would > > even be possible, > > but it does seem like it could be really useful... > > > > [1] rpy.sf.net > > > > -- > > Magnus Lie Hetland Fall seven > > times, stand up eight > > http://hetland.org > > [Japanese proverb] > > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT > > Products from real users. > > Discover which products truly live up to the hype. > > Start reading now. > > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > Find local movie times and trailers on Yahoo! Movies. > http://au.movies.yahoo.com > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From oliphant at ee.byu.edu Wed Apr 6 12:28:50 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 12:28:50 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42537C6D.8040900@ims.u-tokyo.ac.jp> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537C6D.8040900@ims.u-tokyo.ac.jp> Message-ID: <425437E2.4090000@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >> Again, scipy.base should *replace* Numerical Python for all users > > > Sorry, I give up. I have been very happy with Numerical Python so far > and the new Numerical Python just looks too much like SciPy to me. > It's even called scipy.base. In practical terms, what I've noticed is > that what used to work with Numerical Python no longer works with > Numeric3. For example: It's apparent you have negative pre-conceptions about scipy (even though scipy has always just built on top of Numeric so I'm not sure what your difficulties have been). This is unfortunate. scipy.base is going to be a lot more like Numeric than scipy was. So, I think you can relax. > > >>> from ndarray import * > >>> argmax > Traceback (most recent call last): > File "", line 1, in ? > NameError: name 'argmax' is not defined This is only because the conversion hasn't completely taken place (I'm not importing the numeric.py module in __init__ yet because it hasn't been adjusted). Remember ndarray is just a place-holder while development happens, so of course quite a few things aren't there yet. I've been swamped so far. from ndarray import * won't even be the name to use. The package won't be called ndarray. This is all just for temporary development purposes. All of what you belive should work will still continue to work. So, relax..... > >>> > > From what I understand from the discussion, "from Numeric import *" > will still work, but it will be deprecated, which means that I will > have to change my code at some point. Not to mention the other > packages (LinearAlgebra, RandomArray, etc.). It's just too much trouble. Deprecated means new documentation won't teach that approach, that's pretty much it. The approach will still be supported for quite a while so people can switch when and if they want. I don't see "the trouble" at all. > Anyway, I am about to change jobs (I will be moving to Columbia > University soon), so I have decided to take some time off the > Numerical Python project and see where we stand in a few months time. > Hopefully, the situation will have cleared up by then. Sounds like an exciting move. Perhaps I can meet you in person if I'm in New York or if you are every in Utah. I sincerely hope you will find the new scipy.base to your liking. I can promise you that your concerns are near the top of my list. It's too bad you can't help us get there more quickly. -Travis From oliphant at ee.byu.edu Wed Apr 6 12:41:31 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 12:41:31 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: References: Message-ID: <42543B1B.3090209@ee.byu.edu> S?bastien de Menten wrote: > > Hi Travis, > > Could you look at bug > [ 635104 ] segfault unpickling Numeric 'O' array > [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of > previous one) > > I proposed a (rather simple) solution that I put in the comment of bug > [ 635104 ]. But apparently, nobody is looking at those bugs... One thing I don't like about sourceforge bug tracker is that I don't get any email notification of bugs. Is there an option for that? I check my email, far more often than I check a website. Sourceforge can be quite slow to manipulate around in. Now, that you've mentioned it, I'll look into it. I'm not sure that object arrays could every be pickled correctly. -Travis > >> >> I'd like to release a Numeric 24.0 to get the array interface out >> there. There are also some other bug fixes in Numeric 24.0 >> >> Here is the list so far from Numeric 23.7 >> >> [Greenfield] Changed so a[0,0] and a[0][0] returns same type when a >> is 2-d of Int16 > > > This is quite disturbing. In fact for all types that are not exactly > equivalent to python type, indexing a multidimensional array (rank > > 1) return arrays even if the final shape is (). So, what should it do? This is the crux of a long-standing wart in Numerical Python that nobody has had a good solution to (I think the array scalars that have been introduced for scipy.base are the best solution yet). Right now, the point is that different things are done for different indexing strategies. Is this a good thing? Maybe it is. We can certainly leave it the way it is now and back-out the change. The current behavior is: Subscripting always produces a rank-0 array if the type doesn't match a basic Python type. Item getting always produces a basic Python type (even if there is no match). So a[0,0] and a[0][0] will return different things if a is an array of short's for example. This may be what we live with and just call it a "feature" > So > type(zeros((5,2,4), Int8 )[0,0,0]) => > type(zeros((5,2,4), Int32 )[0,0,0]) => > type(zeros((5,2), Float32 )[0,0]) => > But > type(zeros((5,2,4), Int )[0,0,0]) => > type(zeros((5,2,4), Float64)[0,0,0]) => > type(zeros((5,2,4), Float)[0,0,0]) => > type(zeros((5,2,4), PyObject)[0,0,0]) => > > Notice too the weird difference betweeb Int <> Int32 and Float == > Float64. This has been in Numeric for a long time (the coercion problems was one of the big reasons for it). If you return a Python integer when indexing an Int8 array then use that for multiplication you get undesired up-casting. There is no scalar Int8 type to return (thus a 0-dimensional array that can act like a scalar is returned). In scipy.base there are now scalar-like objects for all of the supported array types which is one solution to this problem that was made possible by the ability to inherit in C that is now part of Python. What platform are you on? Notice that Int is interpreted as C-long (PyArray_LONG) while Int32 is PyArray_INT. This has been another wart in Numerical Python. By the way, I've fixed PyArray_Return so that if sizeof(long)==sizeof(int) then PyArray_INT also returns a Python integer. I think for places where sizeof(long)==sizeof(int) PyArray_LONG and PyArray_INT should be treated identically. > > However, when indexing a onedimensional array (rank == 1), then we get > back scalar for indexing operations on all types. > > So, when you say "return the same type", do you think scalar or array > (it smells like a recent discussion on Numeric3 ...) ? I just think the behavior ought to be the same for a[0,0] or a[0][0] but maybe I'm wrong and we should keep the dichotomy to satisfy both groups of people. Because of the problems I alluded to, sometimes a 0-dimensional array should be returned. -Travis From tchur at optushome.com.au Wed Apr 6 14:00:52 2005 From: tchur at optushome.com.au (Tim Churches) Date: Wed Apr 6 14:00:52 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> Message-ID: <42544D54.7040507@optushome.com.au> Michael Sorich wrote: > While RPy works well for small simple problems, there > are data conversion limitations between R and Python. > If one could efficiently convert between the major R > data types and python scipy.base data types without > loss of data, it would become possible to do most of > the data manipulation in python and freely mix in R > functions when required. This may encourage the use of > python for the development of statistical routines. That's exactly what we do in our project (http://www.netepi.org) which uses NumPy, RPy and R. The Python<->R interface provided by RPy has a few wrinkles but overall is remarkably seemless and remarkably robust. >>From my meager understanding of RPy: > > R vectors are converted to python lists. It may make > more sense to convert them to an array (either stdlib > or scipy.base version) - without copying data if > possible. RPy directly converts (by copying) NumPy arrays to R arrays and vice versa. C code is used to do this and it is quite fast. No Python lists are involved. You do need to have NumPy installed (oncluding its header files) when you compile RPy for this to work - otherwise RPy *does* convert R arrays to Python lists. > R arrays and matrices are converted to Numeric arrays. > Eg > > In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) > Out[8]: > array([[1, 3, 5], > [2, 4, 6]]) > > However, column and row names (or dimnames for arrays > with >2 dimensions) are lost in R->Py conversion. I do > not know whether these conversions require copying of > the data. > > R data-frames are currently converted to python > dictionaries and I don?t think that there is any > simple way to convert a python object to an R data > frame. This is the biggest limitation of rpy in my > opinion. > > In [16]: > r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) > Out[16]: {'col2': ['one', 'two', 'three', 'four'], > 'col1': [1, 2, 3, 4]} > > If it were possible to convert between an R data-frame > and a scipy.base record array without copying or > losing data, RPy would become more useful. > > I wish I understood C, scipy.base and R well enough to > give this a go. However, this is Way over my head! You can extend the conversion routines of RPy (in either direction) using a very simple interface, using just Python and R. No knowledge of C is necessary. For example, if you want to convert an R data.frame into a custom class which you have written in Python, it is quite easy to add that to Rpy. There is an example for doing this with data.frames given in the Rpy documentation. (More comments below). > --- Magnus Lie Hetland wrote: > >>I was just thinking about some experimental designs, >>and whether I >>could, perhaps, do the statistics in Python. I >>remembered having used >>RPy [1] briefly at some time (there may be other >>similar bindings out >>there -- I don't remember) There is also RSPython, which allows Python to be called from R as well as R to be called from Python. However, it is far more experimental than RPy, and much harder to build and rather less robust, but more ambitious in its scope. RPy only allows calling of R functions (almost everything is done via functions in R) from Python, although as noted above it has good facilities for converting R objects back into Python objects, and also allows R objects to be returned to Python as native, unconverted R objects - so you can store native R objects in a Python list or dictionary if you wish. You can't see inside those native R objects with Python, but you can use them as arguments to R functions called via RPy. However, the default action in RPy is to do its best to convert R objects into Python data structures when R functions called via RPy return. That conversion is easily customisable as noted above. >> and started thinking >>about whether I could, >>perhaps, combine it with numpy in some way. My first >>thought was to >>reimplement the relevant statistical functions; then >>I thought about >>how to convert data back and forth -- but then it >>occurred to me that >>R also uses arrays extensively, and that it could, >>perhaps, be >>possible to expose those (through something like >>RPy) through the >>array interface/protocol! It seems that the new NumPy array interface could indeed be used to allow Python and R to share the same array data, rather than making copies as happens at present (albeit very quickly). >>This would be (IMO) a good example of the benefits >>of the array >>protocol; it's not a matter of "getting yet another >>array module". RPy >>is an external library/language with *lots* of >>features that might be >>useful to numpy users, many of which aren't likely >>to be implemented >>in Python for quite a while, I'd guess (unless, >>perhaps, someone >>writes a translator from R, which I'm sure is >>doable). R is a massive project with a huge library of statistical routines - it is several times larger in its extent than Python (that's a weakness as well as a strength, as R tends to be sprawling and rather intimidating in its size). R also has a very large community of top computational statisticians behind it. Better to work with R than to try to compete with it. That said, there is no reason not to port R libraries or specific R functions to NumPy where that provides performance gains, or where the data are large and already handled in NumPy. Our approach in NetEpi (http://www.netepi.org) is to do the data selection and reduction (usually summarisation) in NumPy (where we store data on disc as memory-mapped NumPy arrays) and then pass the much smaller summarised results to R for plotting or fitting complex statistical models. However, we do calculation of elementary statistics (means, quantiles and other measures of location, variance etc) in NumPy wherever possible to avoid copying large amounts of data to R via RPy. >>I don't know enough (at least yet ;) about the >>implementation of RPy >>and the R library to say for sure whether this would >>even be possible, >>but it does seem like it could be really useful... >> >>[1] rpy.sf.net I have copied this message to the RPy list - hopefully some fruitful discussion can ensue. Tim C From gregory.r.warnes at pfizer.com Wed Apr 6 14:02:05 2005 From: gregory.r.warnes at pfizer.com (Warnes, Gregory R) Date: Wed Apr 6 14:02:05 2005 Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example applicati on of the array interface] Message-ID: <915D2D65A9986440A277AC5C98AA466F978DC2@groamrexm02.amer.pfizer.com> Hi All, It is possible to establish conversion functions so that R dataframe, lists, and vector objects are better translated into python equivalents. I've made several aborted stabs at this, but my time has been extremely limited. The basic task is to create a functionally equivalent python class [The tricky bit here is that R list and vector objects have both order and names. It is possible to emulate this in python by creating a base object that maintains a dictionary of names in along side the data vector/matrix data.] See the example in the rpu documentation at http://rpy.sourceforge.net/rpy/doc/manual_html/DataFrame-class.html#DataFram e%20class. This shouldn't be very hard if someone can dedicate a bit of time to it. -Greg (Current RPy maintainer) > -----Original Message----- > From: rpy-list-admin at lists.sourceforge.net > [mailto:rpy-list-admin at lists.sourceforge.net]On Behalf Of Tim Churches > Sent: Wednesday, April 06, 2005 4:22 PM > To: rpy-list at lists.sourceforge.net > Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example > application > of the array interface] > > > The following discussion occured on the Numeric Python mailing list. > Others may wish to enjoin the conversation. > > Tim C > > -------- Original Message -------- > Subject: Re: [Numpy-discussion] Possible example application of the > array interface > Date: Thu, 7 Apr 2005 03:10:08 +1000 (EST) > From: Michael Sorich > To: numpy-discussion at lists.sourceforge.net > > I think that this is a great idea! While I have a > strong preference for python, I generally use R for > statistical analyses due to the large number of mature > libraries available. There are also some aspects of > the R data types (eg data-frames and column/row names > for 2D arrays) that are really nice for spreadsheet > like data. I hope that scipy.base record arrays will > be as easily manipulated as data-frames are. > > While RPy works well for small simple problems, there > are data conversion limitations between R and Python. > If one could efficiently convert between the major R > data types and python scipy.base data types without > loss of data, it would become possible to do most of > the data manipulation in python and freely mix in R > functions when required. This may encourage the use of > python for the development of statistical routines. > > >From my meager understanding of RPy: > > R vectors are converted to python lists. It may make > more sense to convert them to an array (either stdlib > or scipy.base version) - without copying data if > possible. > > R arrays and matrices are converted to Numeric arrays. > Eg > > In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) > Out[8]: > array([[1, 3, 5], > [2, 4, 6]]) > > However, column and row names (or dimnames for arrays > with >2 dimensions) are lost in R->Py conversion. I do > not know whether these conversions require copying of > the data. > > R data-frames are currently converted to python > dictionaries and I don?t think that there is any > simple way to convert a python object to an R data > frame. This is the biggest limitation of rpy in my > opinion. > > In [16]: > r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) > Out[16]: {'col2': ['one', 'two', 'three', 'four'], > 'col1': [1, 2, 3, 4]} > > If it were possible to convert between an R data-frame > and a scipy.base record array without copying or > losing data, RPy would become more useful. > > I wish I understood C, scipy.base and R well enough to > give this a go. However, this is Way over my head! > > Mike > > --- Magnus Lie Hetland wrote: > > I was just thinking about some experimental designs, > > and whether I > > could, perhaps, do the statistics in Python. I > > remembered having used > > RPy [1] briefly at some time (there may be other > > similar bindings out > > there -- I don't remember) and started thinking > > about whether I could, > > perhaps, combine it with numpy in some way. My first > > thought was to > > reimplement the relevant statistical functions; then > > I thought about > > how to convert data back and forth -- but then it > > occurred to me that > > R also uses arrays extensively, and that it could, > > perhaps, be > > possible to expose those (through something like > > RPy) through the > > array interface/protocol! > > > > This would be (IMO) a good example of the benefits > > of the array > > protocol; it's not a matter of "getting yet another > > array module". RPy > > is an external library/language with *lots* of > > features that might be > > useful to numpy users, many of which aren't likely > > to be implemented > > in Python for quite a while, I'd guess (unless, > > perhaps, someone > > writes a translator from R, which I'm sure is > > doable). > > > > I don't know enough (at least yet ;) about the > > implementation of RPy > > and the R library to say for sure whether this would > > even be possible, > > but it does seem like it could be really useful... > > > > [1] rpy.sf.net > > > > -- > > Magnus Lie Hetland Fall seven > > times, stand up eight > > http://hetland.org > > [Japanese proverb] > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from > real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_ide95&alloc_id396&op=click > _______________________________________________ > rpy-list mailing list > rpy-list at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rpy-list > > LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. From cookedm at physics.mcmaster.ca Wed Apr 6 14:04:36 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 14:04:36 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42543B1B.3090209@ee.byu.edu> (Travis Oliphant's message of "Wed, 06 Apr 2005 13:40:11 -0600") References: <42543B1B.3090209@ee.byu.edu> Message-ID: Travis Oliphant writes: > S?bastien de Menten wrote: > >> >> Hi Travis, >> >> Could you look at bug >> [ 635104 ] segfault unpickling Numeric 'O' array >> [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of >> previous one) >> >> I proposed a (rather simple) solution that I put in the comment of >> bug [ 635104 ]. But apparently, nobody is looking at those bugs... > > > One thing I don't like about sourceforge bug tracker is that I don't > get any email notification of bugs. Is there an option for that? I > check my email, far more often than I check a website. Sourceforge > can be quite slow to manipulate around in. I think if the bug is assigned to you, you get email. > >> So >> type(zeros((5,2,4), Int8 )[0,0,0]) => >> type(zeros((5,2,4), Int32 )[0,0,0]) => >> type(zeros((5,2), Float32 )[0,0]) => >> But >> type(zeros((5,2,4), Int )[0,0,0]) => >> type(zeros((5,2,4), Float64)[0,0,0]) => >> type(zeros((5,2,4), Float)[0,0,0]) => >> type(zeros((5,2,4), PyObject)[0,0,0]) => >> >> Notice too the weird difference betweeb Int <> Int32 and Float == >> Float64. > > By the way, I've fixed PyArray_Return so that if > sizeof(long)==sizeof(int) then PyArray_INT also returns a Python > integer. I think for places where sizeof(long)==sizeof(int) > PyArray_LONG and PyArray_INT should be treated identically. I don't think this is good -- it's just papering over the problem. It leads to different behaviour on machines where sizeof(long) != sizeof(int) (specifically, the problem reported by Nils Wagner *won't* be fixed by this on my machine). On some machines x[0] will give you a int (where x is an array of Int32), on others an array: not fun. I see you already beat me in changing PyArray_PyIntAsInt to support rank-0 integer arrays. How about changing that to instead using anything that int() can handle (using PyNumber_AsInt)? This would include anything int-like (rank-0 integer arrays, scipy.base array scalars, etc.). The side-effect is that you can index using floats (since int() of a float truncates it towards 0). If this is a big deal, I can special-case floats to raise an error. This would make (almost) all Numeric behaviour consistent with regards to using Python ints, Python longs, and rank-0 integer arrays, and other int-like objects. >> However, when indexing a onedimensional array (rank == 1), then we >> get back scalar for indexing operations on all types. >> >> So, when you say "return the same type", do you think scalar or >> array (it smells like a recent discussion on Numeric3 ...) ? > > I just think the behavior ought to be the same for a[0,0] or a[0][0] > but maybe I'm wrong and we should keep the dichotomy to satisfy both > groups of people. Because of the problems I alluded to, sometimes a > 0-dimensional array should be returned. I'd prefer having a[0,0] and a[0][0] return the same thing: it's not the special case of how to do two indices: it's the special-casing of rank-1 arrays as compared to rank-n arrays. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Wed Apr 6 14:42:38 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 14:42:38 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric Message-ID: I've always found the Numeric setup.py to be not very user-friendly. So, I rewrote it. It's available as patch #1178095 http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 Basically, all the editing you need to do is in customize.py, instead of touching setup.py. No more commenting out files for lapack_lite (just tell it to use the system LAPACK, and tell it where to find it). Also, you could now use GSL's cblas interface for dotblas. Useful if you've already taken the trouble to link that with an optimized Fortran BLAS. I didn't want to just through this into CVS without feedback first :-) If it looks good, this can go in Numeric 24.0. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From perry at stsci.edu Wed Apr 6 15:05:47 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 6 15:05:47 2005 Subject: [Numpy-discussion] Re: Array Metadata In-Reply-To: <200504011146.44549.faltet@carabos.com> References: <20050401041204.18335.qmail@web50208.mail.yahoo.com> <200504011146.44549.faltet@carabos.com> Message-ID: <00c3ccc871b2107c78efa7cb3758fe8c@stsci.edu> Coming in very late... On Apr 1, 2005, at 4:46 AM, Francesc Altet wrote: > I'm very much with the opinions of Scott. Just some remarks. > > A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure: >>> I also think that rather than attach < or > to the start of the >>> string it would be easier to have another protocol for endianness. >>> Perhaps something like: >>> >>> __array_endian__ (optional Python integer with the value 1 in it). >>> If it is not 1, then a byteswap must be necessary. >> >> A limitation of this approach is that it can't adequately represent >> struct/record arrays where some fields are big endian and others are >> little >> endian. > > Having a mix of different endianess data values in the same data > record would be a bit ill-minded. In fact, numarray does not support > this: a recarray should be all little or big endian. I think that '<' > and '>' would be more than enough to represent this. > Nothing intrinsically prevents numarray from allowing this for records, but I'd agree that I have a hard time understanding when a given record array would have mixed endianess. >>> So, what if we proposed for the Python core not something like >>> Numeric3 (which would still exist in scipy.base and be everybody's >>> favorite array :-) ), but a very minimal array object (scaled back >>> even from Numeric) that followed the array protocol and had some >>> C-API associated with it. >>> >>> This minimal array object would support 5 basic types ('bool', >>> 'integer', 'float', 'complex', 'Object'). (Maybe a void type >>> could be defined and a void "scalar" introduced (which would be >>> the bytes object)). These types correspond to scalars already >>> available in Python and so the whole 0-dim array Python scalar >>> arguments could be ignored. >> >> I really like this idea. It could easily be implemented in C or >> Python >> script. Since half it's purpose is for documentation, the Python >> script >> implementation might make more sense. > > Yeah, I fully agree with this also. > > I'm not against it, but I wonder if it is the most important thing to do next. I can imagine that there are many other issues that deserve more attention than this. But I won't tell Travis what to do, obviously. Likewise about working on the current Python array module. Perry Perry From perry at stsci.edu Wed Apr 6 15:09:11 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 6 15:09:11 2005 Subject: [Numpy-discussion] Questions about ufuncs now. In-Reply-To: <4253028D.4090407@ee.byu.edu> References: <4253028D.4090407@ee.byu.edu> Message-ID: <0d2b3dd0b5f97750022b47de6f1fad33@stsci.edu> On Apr 5, 2005, at 5:26 PM, Travis Oliphant wrote: > > The arrayobject for scipy.base seems to be working. Currently the > Numeric3 CVS tree is using the "old-style" ufuncs modified with new > code for the newly added types. It should be quite functionable > now for the brave at heart. > > I'm now working on modifying the ufunc object for scipy.base. > > These are the changes I'm working on: > > 1) a thread-specific? context that allows "buffer-size" level > trapping > of errors and retrieving of flags set. Similar to the > decimal.context specification, but it uses the floating point > sticky bits to implement. > > 2) implementation of buffers so that type-conversions (and > byteswapping and alignment if necessary) never creates temporaries > larger than the buffer-size (the buffer-size is user settable). > > 3) a reworking of the general N-dimensional loop to use array > iterators with optimizations > applied for contiguous arrays. > > 4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) > do not dictate coercion rules > Also, change so that certain mixed-type operations are computed in > larger type for both. > > Most of this is pretty straightforward. But, I do have one addiitonal > question. Do the new array scalars count as "non-coercing" scalars > (i.e. like the Python scalars), or do they cause coercion? > > My preference is that ALL scalars (anything that becomes > 0-dimensional arrays internally) cause only "kind-casting" (i.e. int > to float, float to complex, etc.) but not "type-casting" > Seems reasonable. One could argue that since they have their own precision that normal coercion rules should apply, but so long as Python scalar literals don't, having different coercion rules for what look like scalars taken from arrays than for python scalars is bound to lead to great confusion. So I agree. Perry From perry at stsci.edu Wed Apr 6 15:09:51 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 6 15:09:51 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42537690.5040400@colorado.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537690.5040400@colorado.edu> Message-ID: <7779a4425dd6f32659e9c5f15b48e180@stsci.edu> I'll echo Fernando's comments. On Apr 6, 2005, at 1:41 AM, Fernando Perez wrote: > Travis Oliphant wrote: >> Michiel Jan Laurens de Hoon wrote: > >>> But SciPy has been moving away (e.g. by replacing functions by >>> methods). >> Michiel, you seem to want to create this impression that "SciPy" is >> "moving away." I'm not sure of your motivations. But, since this >> is a public forum, I have to restate emphatically, that "SciPy" is >> not "moving away from Numeric." It is all about bringing together >> the communities. For the 5 years that scipy has been in development, >> it has always been about establishing a library of common routines >> that we could all share. It has built on Numeric from the >> beginning. Now, there is another "library" of routines that is >> developing around numarray. It is this very real break that I'm >> trying to help fix. I have no other "desire" to "move away" or >> "create a break" or any other such notions that you seem to want to >> spread. > > FWIW, I think you (Travis) have been exceedingly clear in explaining > this process, and in pointing out how this is: > > a) NOT a further split, but rather the EXACT OPPOSITE (numarray users > will have a transition path back into a project which will provide the > best of the old Numeric, along with all the critical enhancements > which Perry, Todd et al. added to numarray). > > b) a way, via the array protocol, to provide third-party low-level > libraries an easy way to, AT THE C LEVEL, interact easily and > efficiently (without unnecessary copies) with numeri* arrays. > > [...] From Chris.Barker at noaa.gov Wed Apr 6 15:37:05 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 15:37:05 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <4253C73E.4030703@ims.u-tokyo.ac.jp> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp> Message-ID: <42546439.5060301@noaa.gov> Michiel Jan Laurens de Hoon wrote: > Also, make sure you have Cygwin installed, with all the necessary packages. MinGw is NOT Cygwin. You need to have MinGw installed, with all the necessary packages. I don't remember which ones, but I think there is not a single large package that gives you the whole pile. I do remember it being pretty easy for me last time I did it. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cookedm at physics.mcmaster.ca Wed Apr 6 15:44:36 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 15:44:36 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> (konrad hinsen's message of "Wed, 6 Apr 2005 16:48:30 +0200") References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> Message-ID: konrad.hinsen at laposte.net writes: > On Apr 6, 2005, at 12:10, S?bastien de Menten wrote: > >> However, I disagree with the "pretty straightforward to >> implement". In fact, if one wants to inherit most of the >> functionalities of Numeric, it becomes quite cumbersome. Looking at >> MA module, I see that it needs to: > > It is straightforward AND cumbersome. Lots of work, but nothing > difficult. I agree of course that it would be nice to improve the > situation. > >> An embryo of idea would be to add hooks in the machinery to allow an >> object to interact with an ufunc. Currently, this is done by calling >> __array__ to extract a "naked array" (== Numeric.array vs >> "augmented array") but the result is then always a "naked >> array". >> In pseudocode, this looks like: >> >> def ufunc( augmented_array ): >> if not isarray(augmented_array): >> augmented_array = augmented_array.__array__() >> return ufunc.apply(augmented_array) > > The current behaviour of Numeric is more like > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > elif is_array_like(object): > return array_func(array(object)) > else: > return object.ufunc() > > A more general version, which should cover your case as well, would be: > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > else: > try: > return object.applyUfunc(ufunc) > except AttributeError: > if is_array_like(object): > return array_func(array(object)) > else: > raise ValueError > > There are two advantages: > > 1) Classes can handle ufuncs in any way they like, even if they > implement > array-like objects. > 2) Classes must implement only one method, not one per ufunc. I like this! It's got namespace goodness all over it (last Python zen line in 'import this': Namespaces are one honking great idea -- let's do more of those!) I'd propose making the special method __ufunc__. > Compared to the approach that you suggested: > >> where I would prefer something like >> >> def ufunc( augmented_array ): >> if not isarray(augmented_array): >> augmented_array, contructor = >> augmented_array.__array_constructor__() >> else: >> constructor = lambda x:x >> return constructor(ufunc.apply(augmented_array)) > > mine has the advantage of also covering classes that are not > array-like at all. ... like your derivative classes, which are very useful. There are two different uses that ufuncs apply to, however. 1) arrays. Here, we want efficient computation of functions applied to lots of elements. That's where the output arguments and special methods (.reduce, .accumulate, and .outer) are useful 2) polymorphic functions. Output arguments aren't useful here. The special methods are useful for binary ufuncs only. For #2, just returning a callable from __ufunc__ would be fine. I'd suggest two levels of an informal ufunc interface corresponding to these two uses. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From Chris.Barker at noaa.gov Wed Apr 6 15:49:44 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 15:49:44 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: References: Message-ID: <42546709.1050600@noaa.gov> David M. Cooke wrote: > I've always found the Numeric setup.py to be not very user-friendly. > So, I rewrote it. It's available as patch #1178095 > http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 From that file: # If use_system_lapack is false, f2c'd versions of the required routines # will be used, except on Mac OS X, where the vecLib framework will be used # if found. Just to be clear, this does mean that vecLib will be used by default on OS-X? Very nice, setup.py has annoyed me too. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Apr 6 15:51:17 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 15:51:17 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: References: Message-ID: <42546766.5060802@noaa.gov> Hi all, (but mostly Travis), I've taken a look at: http://numeric.scipy.org/array_interface.html) to try and see how I would use this with wxPython. I have a few questions, and a little code I'd like you to look at to see if I understand how this works. Here's a first stab on how I might use this for the wxPython DrawPointsList method. The method takes a sequence of length-2 sequences of numbers, and draws a point at each point described by coordinates in the data: [(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints) Here's what I have: def DrawPointList(self, points, pens=None): ... # some checking code on the pens) ... if (hasattr(points,'__array_shape__') and hasattr(points,'__array_typestr__') and len(points.__array_shape__) == 2 and points.__array_shape__[1] == 2 and points.__array_typestr__ == 'i4' and ): # this means we have a compliant array # return the array protocol version return self._DrawPointArray(points.__array_data__, pens,[]) #This needs to be written now! else: #return the generic python sequence version return self._DrawPointList(points, pens, []) Then we'll need a function (in C++): _DrawPointArray(points.__array_data__, pens,[]) That takes a buffer object, and does the drawing. My questions: 1) Is this what you had in mind for how to use this? 2) As __array_strides__ is optional, I'd kind of like to have a __contiguous__ flag that I could just check, rather than checking for the existence of strides, then calculating what the strides should be, then checking them. 3) A number of the attributes are optional, but will always be there with SciPy arrays..(I assume) have you documented them anywhere? 4) a wxWidgets wxPoint is defined as such: class WXDLLEXPORT wxPoint { public: int x, y; etc. As wxWidgets is using "int", I"d like to be able to use "int". If I define it as a 4 byte integer, I'm losing platform independence, aren't I? Or can I use something like sizeof(int) ? 5) Why is: __array_data__ optional? Isn't that the whole point of this? 6) Should __array_offset__ be optional? I'd rather it were required, but default to zero. This way I have to check for it, then use it. Also, I assume it is an integer number of bytes, is that right? 7) An alternative to the above: A __simple_ flag, that means the data is a simple, C array of contiguous data of a single type. The most common use, and it would be nice to just check that flag and not have to take all other options into account. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From efiring at hawaii.edu Wed Apr 6 15:53:05 2005 From: efiring at hawaii.edu (Eric Firing) Date: Wed Apr 6 15:53:05 2005 Subject: [Numpy-discussion] masked arrays and NaNs Message-ID: <425467BB.305@hawaii.edu> Travis, I am whole-heartedly in favor of your efforts to end the Numeric/numarray split by combining the best of both. I am encouraged by the progress you have made, and by the depth and clarity of the accompanying technical discussions. Thank you! I am a long-time Matlab user in Physical Oceanography, and I have been trying to find a practical way to phase out Matlab. One key is matplotlib, which is coming along wonderfully. A second is the availability of a Num* (or scipy.base) module that provides the functionality and ease-of-use I presently get from Matlab. This leads to a request which I suspect and hope is consistent with your present plans: efficient handling of NaNs and/or masked arrays. In Physical Oceanography, and I suspect in many other fields, data sets are almost always full of holes. Matlab's ability to use NaN as a bad value flag provides a wonderfully simple and efficient way of dealing with missing or bad data values. A similar ease and transparency would be good in scipy.base. In addition, or as a way of implementing NaN-handling internally, it might be best to have masked arrays incorporated at the C level--with the functionality available by default--rather than bolted on as a pure-python package. I hope that inclusion of __array_mask__ in the protocol means that this is part of the plan. Eric From Chris.Barker at noaa.gov Wed Apr 6 16:00:09 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 16:00:09 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <42546439.5060301@noaa.gov> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp> <42546439.5060301@noaa.gov> Message-ID: <425469AA.2030703@noaa.gov> Chris Barker wrote: > there is not a single large package OOPS. There IS a single large package. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Wed Apr 6 16:13:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 16:13:08 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: References: <425458F7.9020307@ee.byu.edu> Message-ID: <42546CC7.40408@ee.byu.edu> David M. Cooke wrote: >Travis Oliphant writes: > > > >>David M. Cooke wrote: >> >> >> >>>I've always found the Numeric setup.py to be not very user-friendly. >>>So, I rewrote it. It's available as patch #1178095 >>>http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 >>> >>>Basically, all the editing you need to do is in customize.py, instead >>>of touching setup.py. No more commenting out files for lapack_lite >>>(just tell it to use the system LAPACK, and tell it where to find it). >>> >>>Also, you could now use GSL's cblas interface for dotblas. Useful if >>>you've already taken the trouble to link that with an optimized >>>Fortran BLAS. >>> >>>I didn't want to just through this into CVS without feedback first :-) >>>If it looks good, this can go in Numeric 24.0. >>> >>> >>> >>I like the new changes. I also think the setup.py file is unfriendly. >>Put them in... >> >> > >While I'm at it, I'm also thinking of writing a 'cblas_lite' for >dotblas. This would mean that dotblas would be enabled all the time. >You could use a C BLAS if you've got one (from ATLAS, say), or a >Fortran BLAS (like the cxml library on an Alpha running Tru64), or it >would use the existing blas_lite.c if you don't. > > > This is a good idea, but for more than just dotblas. It is the essential problem that must be solved to make scipy.base installable everywhere yet use fast libraries for users who have them without much fuss. -Travis From rkern at ucsd.edu Wed Apr 6 16:28:40 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Apr 6 16:28:40 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: <42546709.1050600@noaa.gov> References: <42546709.1050600@noaa.gov> Message-ID: <42547060.30204@ucsd.edu> Chris Barker wrote: > > > David M. Cooke wrote: > >> I've always found the Numeric setup.py to be not very user-friendly. >> So, I rewrote it. It's available as patch #1178095 >> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 >> > > > From that file: > > # If use_system_lapack is false, f2c'd versions of the required routines > # will be used, except on Mac OS X, where the vecLib framework will be used > # if found. > > Just to be clear, this does mean that vecLib will be used by default on > OS-X? I haven't tried it, yet, but my examination of it suggests that this is so. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Wed Apr 6 16:59:05 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 16:59:05 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <42546766.5060802@noaa.gov> References: <42546766.5060802@noaa.gov> Message-ID: <4254778A.1070100@ee.byu.edu> Chris Barker wrote: > Hi all, (but mostly Travis), > > I've taken a look at: > > http://numeric.scipy.org/array_interface.html) > > to try and see how I would use this with wxPython. I have a few > questions, and a little code I'd like you to look at to see if I > understand how this works. Great, fantastic!!! > > Here's a first stab on how I might use this for the wxPython > DrawPointsList method. The method takes a sequence of length-2 > sequences of numbers, and draws a point at each point described by > coordinates in the data: > > [(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints) > > Here's what I have: > > def DrawPointList(self, points, pens=None): > ... > # some checking code on the pens) > ... > if (hasattr(points,'__array_shape__') and > hasattr(points,'__array_typestr__') and > len(points.__array_shape__) == 2 and > points.__array_shape__[1] == 2 and > points.__array_typestr__ == 'i4' and > ): # this means we have a compliant array > # return the array protocol version You should account for the '<' or '>' that might be present in __array_typestr__ (Numeric won't put it there, but scipy.base and numarray will---since they can have byteswapped arrays internally). A more generic interface would handle multiple integer types if possible (but this is a good start...) > return self._DrawPointArray(points.__array_data__, pens,[]) > #This needs to be written now! > else: > #return the generic python sequence version > return self._DrawPointList(points, pens, []) > > Then we'll need a function (in C++): > _DrawPointArray(points.__array_data__, pens,[]) > That takes a buffer object, and does the drawing. > > My questions: > > 1) Is this what you had in mind for how to use this? Yes, pretty much. > > 2) As __array_strides__ is optional, I'd kind of like to have a > __contiguous__ flag that I could just check, rather than checking for > the existence of strides, then calculating what the strides should be, > then checking them. I don't want to add too much. The other approach is to establish a set of helper functions in Python to check this sort of thing: Thus, if you can't handle a general array you check: ndarray.iscontiguous(obj) where obj exports the array interface. But, it could really go either way. What do others think? I think one idea here is that if __array_strides__ returns None, then C-style contiguousness is assumed. In fact, I like that idea so much that I just changed the interface. Thanks for the suggestion. > > 3) A number of the attributes are optional, but will always be there > with SciPy arrays..(I assume) have you documented them anywhere? No, they won't always be there for SciPy arrays (currently 4 of them are). Only record-arrays will provide __array_descr__ for example and __array_offset__ is unnecessary for SciPy arrays. I actually don't much like the __array_offset__ parameter myself, but Scott convinced me that it would could be useful for very complicated array classes. > > 4) a wxWidgets wxPoint is defined as such: > > class WXDLLEXPORT wxPoint > { > public: > int x, y; > > etc. > > As wxWidgets is using "int", I"d like to be able to use "int". If I > define it as a 4 byte integer, I'm losing platform independence, > aren't I? Or can I use something like sizeof(int) ? Ah, yes.. here is where we need some standard Python functions to help establish the array interface. Sometimes you want to match a particular c-type, other times you want to match a particular bit width. So, what do you do? I had considered having an additional interface called ctypestr but decided against it for fear of creep. I think in general we need to have in Python some constants to make this conversion easy e.g. ndarray.cint (gives 'iX' on the correct platform). For now, I would check (__array_typestr__ == 'i%d' % array.array('i',[0]).itemsize) But, on most platforms these days an int is 4 bytes, but the about would be just to make sure. > > 5) Why is: __array_data__ optional? Isn't that the whole point of this? Because the object itself might expose the buffer interface. We could make __array_data__ required and prefer that it return a buffer object. But, really all that is needed is something that exposes the buffer interface: remember the difference between the buffer object and the buffer interface. So, the correct consumer usage for grabbing the data is data = getattr(obj, '__array_data__', obj) Then, in C you use the Buffer *Protocol* to get a pointer to memory. For example, the function: int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int *buffer_len) Of course this approach has the 32-bit limit until we get this changed in Python. > > 6) Should __array_offset__ be optional? I'd rather it were required, > but default to zero. This way I have to check for it, then use it. > Also, I assume it is an integer number of bytes, is that right? A consumer has to check for most of the optional stuff if they want to support all types of arrays. Again a simple: getattr(obj, '__array_offset__', 0) works fine. > > 7) An alternative to the above: A __simple_ flag, that means the data > is a simple, C array of contiguous data of a single type. The most > common use, and it would be nice to just check that flag and not have > to take all other options into account. I think if __array_strides__ returns None (and if an object doesn't expose it you can assume it) it is probably good enough. -Travis From oliphant at ee.byu.edu Wed Apr 6 17:17:13 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 17:17:13 2005 Subject: [Numpy-discussion] masked arrays and NaNs In-Reply-To: <425467BB.305@hawaii.edu> References: <425467BB.305@hawaii.edu> Message-ID: <42547B2B.4030700@ee.byu.edu> Eric Firing wrote: > Travis, > > I am whole-heartedly in favor of your efforts to end the > Numeric/numarray split by combining the best of both. I am encouraged > by the progress you have made, and by the depth and clarity of the > accompanying technical discussions. Thank you! > > I am a long-time Matlab user in Physical Oceanography, and I have been > trying to find a practical way to phase out Matlab. One key is > matplotlib, which is coming along wonderfully. A second is the > availability of a Num* (or scipy.base) module that provides the > functionality and ease-of-use I presently get from Matlab. This leads > to a request which I suspect and hope is consistent with your present > plans: efficient handling of NaNs and/or masked arrays. I think both options will be available. With the new error handling numarray showed nans will be allowed if you set the error mode correctly. A verson of masked arrays will also be available (either in python or C). -Travis From cookedm at physics.mcmaster.ca Wed Apr 6 17:18:51 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 17:18:51 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: (David M. Cooke's message of "Wed, 06 Apr 2005 17:41:50 -0400") References: Message-ID: cookedm at physics.mcmaster.ca (David M. Cooke) writes: > I've always found the Numeric setup.py to be not very user-friendly. > So, I rewrote it. It's available as patch #1178095 > http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 > > Basically, all the editing you need to do is in customize.py, instead > of touching setup.py. No more commenting out files for lapack_lite > (just tell it to use the system LAPACK, and tell it where to find it). > > Also, you could now use GSL's cblas interface for dotblas. Useful if > you've already taken the trouble to link that with an optimized > Fortran BLAS. > > I didn't want to just through this into CVS without feedback first :-) > If it looks good, this can go in Numeric 24.0. I've checked it in. Highlights: * You only need to edit customize.py * You don't need to edit if you're on OS X (>= 10.2): the vecLib framework for optimized BLAS and LAPACK will be used if found. * If you have an incomplete ATLAS library (one without LAPACK), you can use it for BLAS (instead of blas_lite.c), and the included f2c'd routines for LAPACK will be used. * Use whatever CBLAS interface you've got (ATLAS, GSL, the reference one available from netlib). There's also an INSTALL file now, although it could some comments about the 'python setup.py config' option. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Wed Apr 6 18:14:33 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 18:14:33 2005 Subject: [Numpy-discussion] New array interface helper file Message-ID: <4254890F.6080205@ee.byu.edu> At http://numeric.scipy.org/array_interface.py you will find the start of a set of helper functions for the array interface that can make it more easy to deal with. It also documents the array interface with docstrings. I tried to attach these to properties, but then I don't know how to "see" them from Python. This is the kind of thing I think should go into Python If anybody would like to try their hand at converter functions to go back and forth between the struct module strings and the __array_descr__ string, make my day. -Travis From cookedm at physics.mcmaster.ca Wed Apr 6 21:41:12 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 21:41:12 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: <42546CC7.40408@ee.byu.edu> (Travis Oliphant's message of "Wed, 06 Apr 2005 17:12:07 -0600") References: <425458F7.9020307@ee.byu.edu> <42546CC7.40408@ee.byu.edu> Message-ID: Travis Oliphant writes: > David M. Cooke wrote: >>While I'm at it, I'm also thinking of writing a 'cblas_lite' for >>dotblas. This would mean that dotblas would be enabled all the time. >>You could use a C BLAS if you've got one (from ATLAS, say), or a >>Fortran BLAS (like the cxml library on an Alpha running Tru64), or it >>would use the existing blas_lite.c if you don't. >> > This is a good idea, but for more than just dotblas. Hmm, like for what? dotblas is the only thing (in Numeric & numarray) that uses the cblas_* functions. Unless you're thinking of using them in more places, like ufuncs? cblas_lite would be thin shims with minimal error-checking, probably not much use outside of dotblas. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From rkern at ucsd.edu Wed Apr 6 21:47:30 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Apr 6 21:47:30 2005 Subject: [Numpy-discussion] New array interface helper file In-Reply-To: <4254890F.6080205@ee.byu.edu> References: <4254890F.6080205@ee.byu.edu> Message-ID: <4254BB2B.2000406@ucsd.edu> Travis Oliphant wrote: > > At http://numeric.scipy.org/array_interface.py > > you will find the start of a set of helper functions for the array > interface that can make it more easy to deal with. It also documents > the array interface with docstrings. I tried to attach these to > properties, but then I don't know how to "see" them from Python. Get it from the property object on the class itself. E.g. expanded.__array_shape__.__doc__ -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Wed Apr 6 22:13:04 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 22:13:04 2005 Subject: [Numpy-discussion] New array interface helper file In-Reply-To: <4254BB2B.2000406@ucsd.edu> References: <4254890F.6080205@ee.byu.edu> <4254BB2B.2000406@ucsd.edu> Message-ID: <4254C141.9040502@ee.byu.edu> Robert Kern wrote: > Travis Oliphant wrote: > >> >> At http://numeric.scipy.org/array_interface.py >> >> you will find the start of a set of helper functions for the array >> interface that can make it more easy to deal with. It also >> documents the array interface with docstrings. I tried to attach >> these to properties, but then I don't know how to "see" them from >> Python. > > > Get it from the property object on the class itself. > E.g. > > expanded.__array_shape__.__doc__ > Thank you. -Travis From Chris.Barker at noaa.gov Wed Apr 6 23:36:36 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 23:36:36 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254778A.1070100@ee.byu.edu> References: <42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu> Message-ID: <4254D4A8.5020007@noaa.gov> Travis Oliphant wrote: > You should account for the '<' or '>' that might be present in > __array_typestr__ (Numeric won't put it there, but scipy.base and > numarray will---since they can have byteswapped arrays internally). Good point, but a pain. Maybe they should be required, that way I don't have to first check for the presence of '<' or '>', then check if they have the right value. > A more generic interface would handle multiple integer types if possible I'd like to support doubles as well... > (but this is a good start...) Right. I want to get _something_ working, before I try to make it universal! > I think one idea here is that if __array_strides__ returns None, then > C-style contiguousness is assumed. In fact, I like that idea so much > that I just changed the interface. Thanks for the suggestion. You're welcome. I like that too. > No, they won't always be there for SciPy arrays (currently 4 of them > are). Only record-arrays will provide __array_descr__ for example and > __array_offset__ is unnecessary for SciPy arrays. I actually don't much > like the __array_offset__ parameter myself, but Scott convinced me that > it would could be useful for very complicated array classes. I can see that it would, but then, we're stuck with checking for all these optional attributes. If I don't bother to check for it, one day, someone is going to pass a weird array in with an offset, and a strange bug will show up. > e.g. ndarray.cint (gives 'iX' on the correct platform). > For now, I would check (__array_typestr__ == 'i%d' % > array.array('i',[0]).itemsize) I can see that that would work, but it does feel like a hack. BEsides, I might be doign this in C++ anyway, so it would probably be easier to use sizeof() > But, on most platforms these days an int is 4 bytes, but the about would > be just to make sure. Right. Making that assumption will jsut lead to weird bugs way don't he line. Of course, I wouldn't be surprised if wxWidgets and/or python makes that assumption in other places anyway! >> 5) Why is: __array_data__ optional? Isn't that the whole point of this? > > Because the object itself might expose the buffer interface. We could > make __array_data__ required and prefer that it return a buffer object. Couldn't it be required, and return a reference to itself if that works? Maybe I'm just being lazy, but it feels clunky and prone to errors to keep having to check if a attribute exists, then use it (or not). > So, the correct consumer usage for grabbing the data is > > data = getattr(obj, '__array_data__', obj) Ah! I hadn't noticed the default parameter to getattr(). That makes it much easier. Is there an equivalent in C? It doesn't look like it to me, but I'm kind of a newbie with the C API. > int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int > *buffer_len) I'm starting to get this. > Of course this approach has the 32-bit limit until we get this changed > in Python. That's the least of my worries! >> 6) Should __array_offset__ be optional? I'd rather it were required, >> but default to zero. This way I have to check for it, then use it. >> Also, I assume it is an integer number of bytes, is that right? > > A consumer has to check for most of the optional stuff if they want to > support all types of arrays. That's not quite true. I'm happy to support only the simple types of arrays (contiguous, single type elements, zero offset(, but I have to check all that stuff to make sure that I have a simple array. The simplest arrays are the most common case, they should be as easy as possible to support. > Again a simple: > > getattr(obj, '__array_offset__', 0) > > works fine. not too bad. Also, what if we find the need for another optional attribute later? Any older code won't check for it. Or maybe I'm being paranoid.... >> 7) An alternative to the above: A __simple_ flag, that means the data >> is a simple, C array of contiguous data of a single type. The most >> common use, and it would be nice to just check that flag and not have >> to take all other options into account. > I think if __array_strides__ returns None (and if an object doesn't > expose it you can assume it) it is probably good enough. That and __array_typestr__ Travis Oliphant wrote: > > At http://numeric.scipy.org/array_interface.py > > you will find the start of a set of helper functions for the array > interface that can make it more easy to deal with. Ah! this may well address my concerns. Good idea. Thanks for all your work on this Travis. By the way, a quote form Robin Dunn about this: "Sweet!" Thought you might appreciate that. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From konrad.hinsen at laposte.net Wed Apr 6 23:55:02 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Apr 6 23:55:02 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> Message-ID: <2701da761c9f34fc1dc72fc97e87e788@laposte.net> On 07.04.2005, at 00:43, David M. Cooke wrote: > I like this! It's got namespace goodness all over it (last Python zen > line in 'import this': Namespaces are one honking great idea -- let's > do more of those!) Sounds like a good principle! > 1) arrays. Here, we want efficient computation of functions applied to > lots of elements. That's where the output arguments and special > methods (.reduce, .accumulate, and .outer) are useful All that is accessible if the class gets passed the ufunc object. > 2) polymorphic functions. Output arguments aren't useful here. The > special methods are useful for binary ufuncs only. Fine, then they just call the ufunc. And the rare cases that need explicit code for each ufunc (my Derivatives, for example) can retrieve the name of the ufunc and dispatch on it. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Thu Apr 7 00:24:04 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Apr 7 00:24:04 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <1986f60349f1d4d146c6ddb727362fd9@laposte.net> On 06.04.2005, at 18:06, S?bastien de Menten wrote: > Do you think it is possible to integrate a similar mechanism in array > functions (like searchsorted, argmax, ...). That is less obvious. A generic interface for ufuncs is possible because of the uniform calling interface. Actually, there should perhaps be two ufunc application methods, for unary and for binary ufuncs. The other array functions each have a peculiar calling pattern. They can certainly be implemented through delegation to a method, but that would be one method per function. But I think that is inevitable if you want full flexibility. > If we can register functions taking one array as argument within > scipy.base and let it dispatch those functions as ufunc, we could use > a similar strategy. > > For instance, let "sort" and "argmax" be registered as gfunc (general > functions on an array <> ufunc), then any class that would like to > overide any of them could do it too with the same trick Konrad exposed > here above. Does that make sense in practice? Suppose you write a class that implements tables, i.e. arrays plus axis labels. You would want sort() to return an object of the same class, but argmax() to return a plain integer. The generic gfunc handler could do little else than dispatch on the name of the gfunc. > Konrad, do you think it is tricky to have a prototype of your > suggestion (i.e. the modification does not need a full understanding > of Numeric and you can locate it approximately in the source code) ? I haven't looked at the Numeric code in ages, but my guess is that the ufunc part should be easy to do, as it is just a modification of a generic handler that already exists. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From cookedm at physics.mcmaster.ca Thu Apr 7 00:55:37 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 7 00:55:37 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254D4A8.5020007@noaa.gov> (Chris Barker's message of "Wed, 06 Apr 2005 23:35:20 -0700") References: <42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu> <4254D4A8.5020007@noaa.gov> Message-ID: "Chris Barker" writes: > Travis Oliphant wrote: > >> You should account for the '<' or '>' that might be present in >> __array_typestr__ (Numeric won't put it there, but scipy.base and >> numarray will---since they can have byteswapped arrays internally). > > Good point, but a pain. Maybe they should be required, that way I > don't have to first check for the presence of '<' or '>', then check > if they have the right value. I'll second this. Pulling out more Python Zen: Explicit is better than implicit. >> A more generic interface would handle multiple integer types if >> possible > > I'd like to support doubles as well... > >> (but this is a good start...) > > Right. I want to get _something_ working, before I try to make it universal! > >> I think one idea here is that if __array_strides__ returns None, >> then C-style contiguousness is assumed. In fact, I like that idea >> so much that I just changed the interface. Thanks for the >> suggestion. > > You're welcome. I like that too. > >> No, they won't always be there for SciPy arrays (currently 4 of them >> are). Only record-arrays will provide __array_descr__ for example >> and __array_offset__ is unnecessary for SciPy arrays. I actually >> don't much like the __array_offset__ parameter myself, but Scott >> convinced me that it would could be useful for very complicated >> array classes. > > I can see that it would, but then, we're stuck with checking for all > these optional attributes. If I don't bother to check for it, one day, > someone is going to pass a weird array in with an offset, and a > strange bug will show up. Here's a summary: Attributes required by required array-like object to be checked __array_shape__ yes yes __array_typestr__ yes yes __array_descr__ no no __array_data__ no yes __array_strides__ no yes __array_mask__ no no? __array_offset__ no yes I'm assuming in "required to be checked" column a user of the array that's interested in looking at all of the elements, so we have to consider all possible situations where forgetting to consider an attribute could lead to invalid memory accesses. __array_strides__ and __array_offset__ in particular could be troublesome if forgotten. The __array_mask__ element is difficult: for most applications, you should check it, and raise an error if exists and is not None, unless you can handle missing elements. It's certainly not required that all users of an array object need to understand all array types! Since we have to check a bunch anyways, I think that's a good enough reason for having them to exist? There are suitable defaults defined in the protocol document (__array_strides__ in particular) that make it easy to add them in simple cases. >> So, the correct consumer usage for grabbing the data is >> data = getattr(obj, '__array_data__', obj) > > Ah! I hadn't noticed the default parameter to getattr(). That makes it > much easier. Is there an equivalent in C? It doesn't look like it to > me, but I'm kind of a newbie with the C API. You'd want something like adata = PyObject_GetAttrString(array_obj, "__attr_data__"); if (!adata) { /* error */ PyErr_Clear(); adata = array_obj; } >> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int >> *buffer_len) > > I'm starting to get this. > >> Of course this approach has the 32-bit limit until we get this >> changed in Python. > > That's the least of my worries! > >>> 6) Should __array_offset__ be optional? I'd rather it were >>> required, but default to zero. This way I have to check for it, >>> then use it. Also, I assume it is an integer number of bytes, is >>> that right? >> A consumer has to check for most of the optional stuff if they want >> to support all types of arrays. > > That's not quite true. I'm happy to support only the simple types of > arrays (contiguous, single type elements, zero offset(, but I have to > check all that stuff to make sure that I have a simple array. The > simplest arrays are the most common case, they should be as easy as > possible to support. > >> Again a simple: >> getattr(obj, '__array_offset__', 0) >> works fine. > > not too bad. > > Also, what if we find the need for another optional attribute later? > Any older code won't check for it. Or maybe I'm being paranoid.... This is a good point; all good protocols embed a version somewhere. Not doing it now could lead to grief/pain later. I'd suggest adding to __array_data__: If __array_data__ is None, then the array is implementing a newer version of the interface, and you'd either need to support that (maybe the new version uses __array_data2__ or something), or use the sequence protocol on the original object. The sequence protocol should definitely be safe all the time, whereas the buffer protocol may not. (Put it this way: I understand the sequence protocol well, but not the buffer one :-) That would also be a good argument for it existing, I think. Alternatively, we could add an __array_version__ attribute (required to exist, required to check) which is set to 1 for this protocol. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From magnus at hetland.org Thu Apr 7 01:05:03 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 7 01:05:03 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> Message-ID: <20050407080429.GB20252@idi.ntnu.no> Bruce Southey : > > Hi, > I don't see that it is feasible to link R and numerical python in this > way. As you point out, R objects (R is an object orientated language) > uses a lot of meta-data. Then there is the IEEE stuff (NaN etc) that > would also need to be handled in numerical python. Too bad. (I seem to recall seing somehthing about numpy conversion on the Web pages of RPy, though; perhaps, if one can stand a bit of copying, the two can be used together after all?) > You probably could get RPy or RSPython to use numerical python rather > than just baisc Python. > > What statistical functions would you want in numerical python? I think I'd want most of the standard, parametrized probability distributions (as well as automatic estimation from data, perhaps) and a handful of common statistical tests (t-test, z-test, Fishcher, chi-squared, what-have-you). Perhaps some support for factorial experiments (not sure if R has anything specific there, though). And another thing: R seems to have vary fancy (although difficult to use) plotting capabilities... Until SciPy catches up (it hasn't yet, has it? ;) that might be a reason for using R(Py) as well, I guess. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From cookedm at physics.mcmaster.ca Thu Apr 7 01:08:11 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 7 01:08:11 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <2701da761c9f34fc1dc72fc97e87e788@laposte.net> (konrad hinsen's message of "Thu, 7 Apr 2005 08:53:06 +0200") References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> Message-ID: konrad.hinsen at laposte.net writes: > On 07.04.2005, at 00:43, David M. Cooke wrote: > >> I like this! It's got namespace goodness all over it (last Python zen >> line in 'import this': Namespaces are one honking great idea -- let's >> do more of those!) > > Sounds like a good principle! > >> 1) arrays. Here, we want efficient computation of functions applied to >> lots of elements. That's where the output arguments and special >> methods (.reduce, .accumulate, and .outer) are useful > > All that is accessible if the class gets passed the ufunc object. > >> 2) polymorphic functions. Output arguments aren't useful here. The >> special methods are useful for binary ufuncs only. > > Fine, then they just call the ufunc. And the rare cases that need > explicit code for each ufunc (my Derivatives, for example) can > retrieve the name of the ufunc and dispatch on it. Hmm, I had misread your previous code. Here it is again, made more specific, and I'll assume this function lives in the ndarray package (as there is more than one package that defines ufuncs) def cos(obj): if ndarray.isarray(obj): return ndarray.array_cos(obj) else: try: return obj.__ufunc__(cos) except AttributeError: if ndarray.is_array_like(obj): a = ndarray.array(obj) return ndarray.array_cos(a) else: raise ValueError The thing is obj.__ufunc__ must understand about the *particular* object cos: the ndarray one. I was thinking more along the lines of obj.__ufunc__('cos'), where the name is passed instead. For binary ufuncs, you could use (with arguments obj1 and obj2), obj1.__ufunc__('add', obj2) Output argument (obj3): obj1.__ufunc__('add', obj2, obj3) Special methods: obj1.__ufunc__('add.reduce') obj1.__ufunc__('add.accumulate') obj1.__ufunc__('add.outer', obj2) Basically, special methods are just another ufunc. This suggests that add.outer should optionally take an output argument... Alternatively, __ufunc__ could be an object of implemented ufuncs: obj.__ufunc__.cos() obj1.__ufunc__.add(obj2) obj1.__ufunc__.add(obj2, obj3) obj1.__ufunc__.add.reduce() obj1.__ufunc__.add.accumulate() obj1.__ufunc__.add.outer(obj2) It depends where you want to do the dispatch. I think this version is better: it's easier to discover what __ufunc__'s are supported with generic tools (IPython tab completion, pydoc, etc.). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From konrad.hinsen at laposte.net Thu Apr 7 01:34:37 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Apr 7 01:34:37 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> Message-ID: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net> On Apr 7, 2005, at 10:06, David M. Cooke wrote: > Hmm, I had misread your previous code. Here it is again, made more > specific, and I'll assume this function lives in the ndarray package > (as there is more than one package that defines ufuncs) At the moment, there is one in Numeric and one in numarray. The Python API of both is nearly or fully identical. > The thing is obj.__ufunc__ must understand about the *particular* > object cos: the ndarray one. I was thinking more along the lines of No, it must only know the interface. In most cases, it would do something like class MyArray: def __ufunc__(self, ufunc): return MyArray(apply(ufunc, self.data)) > obj.__ufunc__('cos'), where the name is passed instead. That's also an interesting option. It would require the implementing class to choose an appropriate function from an appropriate module. Alternatively, it would work if ufuncs were also accessible as methods on array objects. > For binary ufuncs, you could use (with arguments obj1 and obj2), > obj1.__ufunc__('add', obj2) Except that it would perhaps be better to have a different method, as otherwise nearly every implementation would have to start with a condition test to distinguish unary from binary ufuncs. > Output argument (obj3): obj1.__ufunc__('add', obj2, obj3) > Special methods: > obj1.__ufunc__('add.reduce') > obj1.__ufunc__('add.accumulate') > obj1.__ufunc__('add.outer', obj2) > > Basically, special methods are just another ufunc. This suggests that > add.outer should optionally take an output argument... But they are not just another ufunc, because a standard unary ufunc always returns an array of the same shape as its argument. I'd probably prefer a few explicit methods: object.__unary__(cos) object.__binary__(add, other) object.__binary_reduce__(add) etc. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From Sebastien.deMentendeHorne at electrabel.com Thu Apr 7 02:26:28 2005 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Thu Apr 7 02:26:28 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be> > > On Apr 7, 2005, at 10:06, David M. Cooke wrote: > > > Hmm, I had misread your previous code. Here it is again, made more > > specific, and I'll assume this function lives in the ndarray package > > (as there is more than one package that defines ufuncs) > > At the moment, there is one in Numeric and one in numarray. > The Python > API of both is nearly or fully identical. > > > The thing is obj.__ufunc__ must understand about the *particular* > > object cos: the ndarray one. I was thinking more along the lines of > > No, it must only know the interface. In most cases, it would do > something like > > class MyArray: > def __ufunc__(self, ufunc): > return MyArray(apply(ufunc, self.data)) Exactly ! I see this as a very common use (masked arrays and all the other examples could live with that). Or more precisely (just to be explicity as the previous MyArray example is the simplest (purest) one), class MyArray: def __ufunc__(self, ufunc): metadata= process(self.metadata, ufunc) data = apply(ufunc, self.data) return MyArray(data, metadata) Or variations on this same theme. BTW, looking at Numeric3, the presence of a __mask_array__ in the array protocol looks like we want to add a specific case of "augmented array" to the core protocol. Hmmm, rather prefer to build a more generic mechanism as well as a clean interface for interacting with "augmented array". > > > obj.__ufunc__('cos'), where the name is passed instead. > > That's also an interesting option. It would require the implementing > class to choose an appropriate function from an appropriate module. > Alternatively, it would work if ufuncs were also accessible > as methods > on array objects. > Why not have the ability to ask the name of an ufunc to be able to dispatch on it ? > > For binary ufuncs, you could use (with arguments obj1 and obj2), > > obj1.__ufunc__('add', obj2) > > Except that it would perhaps be better to have a different method, as > otherwise nearly every implementation would have to start with a > condition test to distinguish unary from binary ufuncs. > > > Output argument (obj3): obj1.__ufunc__('add', obj2, obj3) > > Special methods: > > obj1.__ufunc__('add.reduce') > > obj1.__ufunc__('add.accumulate') > > obj1.__ufunc__('add.outer', obj2) > > > > Basically, special methods are just another ufunc. This > suggests that > > add.outer should optionally take an output argument... > > But they are not just another ufunc, because a standard unary ufunc > always returns an array of the same shape as its argument. > > I'd probably prefer a few explicit methods: > > object.__unary__(cos) > object.__binary__(add, other) > object.__binary_reduce__(add) > What about : object.__unary__(cos, mode = "reduce") object.__binary__(cos, other, mode = "reduce") or object.__unary__(cos.reduce) object.__binary__(cos.apply, other) or object.__binary__(cos.__call__, other) with the ability to ask to the first argument its type (with cos.mode or cos.reduce.mode ...) However, for binary operations, how it the call dispatched if one of the operand is of a type while the other is another type ? This problem is related to multimethods http://www.artima.com/weblogs/viewpost.jsp?thread=101605 ======================================================= This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. http://www.electrabel.be/homepage/general/disclaimer_EN.asp ======================================================= From konrad.hinsen at laposte.net Thu Apr 7 02:42:07 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Apr 7 02:42:07 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be> References: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be> Message-ID: On Apr 7, 2005, at 11:25, Sebastien.deMentendeHorne at electrabel.com wrote: > Why not have the ability to ask the name of an ufunc to be able to > dispatch on it ? That's already possible. > What about : > > object.__unary__(cos, mode = "reduce") > object.__binary__(cos, other, mode = "reduce") What does "reduce" mode mean for cos? What does a binary ufunc in reduce mode do with its second argument? > However, for binary operations, how it the call dispatched if one of > the operand is of a type while the other is another type ? This > problem is related to multimethods > http://www.artima.com/weblogs/viewpost.jsp?thread=101605 No need to be innovative: Python always dispatches on the first argument, and everybody is familiar with that approach even though it isn't perfect. If Python 3000 has multimethods, we can still adapt. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From Sebastien.deMentendeHorne at electrabel.com Thu Apr 7 02:54:57 2005 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Thu Apr 7 02:54:57 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AB@seebex02.eib.electrabel.be> > > Why not have the ability to ask the name of an ufunc to be able to > > dispatch on it ? > > That's already possible. > > > What about : > > > > object.__unary__(cos, mode = "reduce") > > object.__binary__(cos, other, mode = "reduce") > > What does "reduce" mode mean for cos? > What does a binary ufunc in reduce mode do with its second argument? raise a ValueError :-) It was an example of a way to pass argument, the focus was on cos.reduce or "cos.reduce" or cos, "reduce". > > However, for binary operations, how it the call dispatched > if one of > > the operand is of a type while the other is another type ? This > > problem is related to multimethods > > http://www.artima.com/weblogs/viewpost.jsp?thread=101605 > > No need to be innovative: Python always dispatches on the first > argument, and everybody is familiar with that approach even though it > isn't perfect. If Python 3000 has multimethods, we can still adapt. The problematic is related to multimethods, the implementation should not be specially related. In an a call like object.__binary__(add, other), if other is not of the same type of object, the latter could throw an exception as ImplementationError to give the hand to other.__binary__(add, binary) or to other.__binary__(radd, binary) or similar (i.e. those expressions may not make sense but the idea is to have a convention to give the hand to the other operand, python does this already when one overloads an operator like __add__ (__radd__)). So if we can keep this same protocol for binary ufunc, that would be great. Otherwise, I think it is not that a big deal. Sebastien ======================================================= This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. http://www.electrabel.be/homepage/general/disclaimer_EN.asp ======================================================= From xscottg at yahoo.com Thu Apr 7 04:35:49 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 04:35:49 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254778A.1070100@ee.byu.edu> Message-ID: <20050407113421.49329.qmail@web50202.mail.yahoo.com> --- Travis Oliphant wrote: > > > > 2) As __array_strides__ is optional, I'd kind of like to have a > > __contiguous__ flag that I could just check, rather than checking for > > the existence of strides, then calculating what the strides should be, > > then checking them. > > > I don't want to add too much. The other approach is to establish a set > of helper functions in Python to check this sort of thing: Thus, if > you can't handle a general array you check: > > ndarray.iscontiguous(obj) > > where obj exports the array interface. > > But, it could really go either way. What do others think? > I think this should definitely be done in the helper functions. Having extra attributes encode redundant information is a recipe for trouble. Cheers, -Scott From xscottg at yahoo.com Thu Apr 7 04:43:37 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 04:43:37 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254D4A8.5020007@noaa.gov> Message-ID: <20050407114157.23887.qmail@web50209.mail.yahoo.com> --- Chris Barker wrote: > > I can see that it would, but then, we're stuck with checking for all > these optional attributes. If I don't bother to check for it, one day, > someone is going to pass a weird array in with an offset, and a strange > bug will show up. > Everyone seems to think that an offset is so weird. I haven't looked at the internals of Numeric/scipy.base in a while so maybe it doesn't apply there. However, if you subscript an array and return a view to the data, you need an offset or you need to create a new buffer that encodes the offset for you. A = reshape(arange(9), (3,3)) 0, 1, 2 3, 4, 5 6, 7, 8 B = A[2] # create a view into A 6, 7, 8 # Shared with the data above Unless you're going to create a new buffer (which I guess is what Numeric is doing), the offset for B would be 6 in this very simple case. I think specifying the offset is much more elegant than creating a new buffer object with a hidden offset that refers to the old buffer object. I guess all I'm saying is that I wouldn't assume the offset is zero... > > Couldn't it be required, and return a reference to itself if that works? > > Maybe I'm just being lazy, but it feels clunky and prone to errors to > keep having to check if a attribute exists, then use it (or not). > The problem is that you aren't being lazy enough. :-) The fact that a lot of these attributes are optional should be hidden in helper functions like those in Travis's array_interface.py module, or a C/C++ include file (with inline functions). In a short while, you shouldn't have to check any __array_metadata__ attributes directly. There should even be a helper function for getting the array elements. It wouldn't be a horrible mistake to have all the attributes be mandatory, but it doesn't get array consumes any benefit that they can't get from a well written helper library, and it does add some burden to array producers. Cheers, -Scott From mrmaple at gmail.com Thu Apr 7 04:44:27 2005 From: mrmaple at gmail.com (James Carroll) Date: Thu Apr 7 04:44:27 2005 Subject: [Numpy-discussion] Re: Questions about the array interface. In-Reply-To: <42546766.5060802@noaa.gov> References: <42546766.5060802@noaa.gov> Message-ID: Hi Chris, Travis, ... Great conversation you've started. I have two questions at the moment... I do love the idea that an abstraction can bring the different but similar num* worlds together. Which sourceforge CVS repository is the interface (and an implementation) show up on first? My guess is numpy/numeric3 I see Travis has been updating it while I sleep. > def DrawPointList(self, points, pens=None): > ... > # some checking code on the pens) > ... > if (hasattr(points,'__array_shape__') and > hasattr(points,'__array_typestr__') and > len(points.__array_shape__) == 2 and > points.__array_shape__[1] == 2 and > points.__array_typestr__ == 'i4' and > ): # this means we have a compliant array > # return the array protocol version > return self._DrawPointArray(points.__array_data__, pens,[]) > #This needs to be written now! This means that whenever you have some complex multivalued multidementional structure with the data you want to plot, you have to reshape it into the above 'compliant' array before passing it on. I'm a newbie, but is this reshape something where the data has to be copied and take up memory twice? If not, then great, you would painlessly reshape into something that had a different set of strides that just accessed the data that complied in the big blob of data. If the reshape is expensive, then maybe we need the array abstraction, and then a second 'thing' that described which parts of the array to use for the sequence of 2-tuples to use for plotting the x,y s of a scatter plot. (or whatever) I do think we can accept more than just i4 for a datatype. Especially since a last-minute cast to i4 in inexpensive for almost every data type. > else: > #return the generic python sequence version > return self._DrawPointList(points, pens, []) > > Then we'll need a function (in C++): > _DrawPointArray(points.__array_data__, pens,[]) Looks great. -Jim From xscottg at yahoo.com Thu Apr 7 04:52:11 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 04:52:11 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: Message-ID: <20050407115141.96479.qmail@web50204.mail.yahoo.com> --- "David M. Cooke" wrote: > > > > Good point, but a pain. Maybe they should be required, that way I > > don't have to first check for the presence of '<' or '>', then check > > if they have the right value. > > I'll second this. Pulling out more Python Zen: Explicit is better than > implicit. > I'll third. > > This is a good point; all good protocols embed a version somewhere. > Not doing it now could lead to grief/pain later. > > I'd suggest adding to __array_data__: If __array_data__ is None, then > the array is implementing a newer version of the interface, and you'd > either need to support that (maybe the new version uses > __array_data2__ or something), or use the sequence protocol on the > original object. The sequence protocol should definitely be safe all > the time, whereas the buffer protocol may not. (Put it this way: I > understand the sequence protocol well, but not the buffer one :-) > > That would also be a good argument for it existing, I think. > > Alternatively, we could add an __array_version__ attribute (required > to exist, required to check) which is set to 1 for this protocol. > I like this, although I think having __array_data__ return None is confusing. I think __array_version__ (or __array_protocol__?) is the better choice. How about have it optional and default to 1? If it's present and greater than 1 then it means there is something new going on... Cheers, -Scott From cjw at sympatico.ca Thu Apr 7 05:57:36 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Apr 7 05:57:36 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net> References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> <9d8cfa0b284c9b9be787970030e6b3de@laposte.net> Message-ID: <42552DD2.2040200@sympatico.ca> konrad.hinsen at laposte.net wrote: > On Apr 7, 2005, at 10:06, David M. Cooke wrote: > >> Hmm, I had misread your previous code. Here it is again, made more >> specific, and I'll assume this function lives in the ndarray package >> (as there is more than one package that defines ufuncs) > > > At the moment, there is one in Numeric and one in numarray. The Python > API of both is nearly or fully identical. > >> The thing is obj.__ufunc__ must understand about the *particular* >> object cos: the ndarray one. I was thinking more along the lines of > > > No, it must only know the interface. In most cases, it would do > something like > > class MyArray: > def __ufunc__(self, ufunc): > return MyArray(apply(ufunc, self.data)) > >> obj.__ufunc__('cos'), where the name is passed instead. > > > That's also an interesting option. It would require the implementing > class to choose an appropriate function from an appropriate module. > Alternatively, it would work if ufuncs were also accessible as methods > on array objects. > Yes, perhaps with a slightly different name (say Cos vs cos) to distinguish between methods and functions. Since they don't require arguments, the methods would not require parentheses. Colin W. From bsouthey at gmail.com Thu Apr 7 06:45:32 2005 From: bsouthey at gmail.com (Bruce Southey) Date: Thu Apr 7 06:45:32 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050407080429.GB20252@idi.ntnu.no> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <20050407080429.GB20252@idi.ntnu.no> Message-ID: Hi, > > What statistical functions would you want in numerical python? > > I think I'd want most of the standard, parametrized probability > distributions (as well as automatic estimation from data, perhaps) and > a handful of common statistical tests (t-test, z-test, Fishcher, > chi-squared, what-have-you). Perhaps some support for factorial > experiments (not sure if R has anything specific there, though). Most of this is in SciPy already based Gary's code. I have not looked at it in great detail because is doesn't meet my immediate needs. One of my major needs is to be able to handle missing values. Perhaps one day it will handle that or I will get the time to do so. I have been working on code with another person to do general linear models (along the lines of R's lm function and SAS's glm procedure) that would address factorial and other experimental designs. R just doesn't do enough for me in this aspect. Two real problems are data storage and model declaration. The mixed model component is really only for my area and I want to use symmetric matrices as the requirements of these models grow really fast. I would be willing to try to address and contribute to the statistical needs if people are interested because I prefer a 'pure python' approach. The other way is to directly call some of the R functions from Python since the main core of these functions are written in C and Fortran. > And another thing: R seems to have vary fancy (although difficult to > use) plotting capabilities... Until SciPy catches up (it hasn't yet, > has it? ;) that might be a reason for using R(Py) as well, I guess. > > -- > Magnus Lie Hetland Fall seven times, stand up eight > http://hetland.org [Japanese proverb] > Yeah, S/S+/R provides some nice graphs until you need to change from the defaults. Regards Bruce From Gilles.Simond at obs.unige.ch Thu Apr 7 07:55:08 2005 From: Gilles.Simond at obs.unige.ch (SIMOND Gilles) Date: Thu Apr 7 07:55:08 2005 Subject: [Numpy-discussion] Quite curious behaviour in Numeric Message-ID: <1112885601.15142.53.camel@obssf5> 2.6.8-1-686-smp (dilinger at toaster.hq.voxel.net) (gcc version 3.3.4 (Debian 1:3.3.4-9)) #1 SMP Sat Aug 28 12:51:43 EDT 2004: and python2.3 >>> a=Numeric.ones((2,3),'i') >>> b=Numeric.sum(a)+1 >>> a[1]=b+1 Traceback (most recent call last): File "", line 1, in ? TypeError: Array can not be safely cast to required type >>> a.itemsize() 4 >>> b.itemsize() 4 >>> a.typecode() 'i' and e following works >>> a=Numeric.ones((2,3)) >>> b=Numeric.sum(a)+1 >>> a[1]=b+1 >>> a.itemsize() 4 >>> b.itemsize() 4 >>> a.typecode() 'l' >>> type(1) >>> Numeric.__version__ '23.6' It seems that itemsize() does not return the correct value which should be 8 for 'l' type array. This is quite annoying since this function is the only way to know actual format of the array. Gilles Simond From rkern at ucsd.edu Thu Apr 7 08:17:44 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 7 08:17:44 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050407080429.GB20252@idi.ntnu.no> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <20050407080429.GB20252@idi.ntnu.no> Message-ID: <42554EC6.9090807@ucsd.edu> Magnus Lie Hetland wrote: > Bruce Southey : >>What statistical functions would you want in numerical python? > > > I think I'd want most of the standard, parametrized probability > distributions (as well as automatic estimation from data, perhaps) and > a handful of common statistical tests (t-test, z-test, Fishcher, > chi-squared, what-have-you). Perhaps some support for factorial > experiments (not sure if R has anything specific there, though). Except for factorial designs, scipy.stats has all of that. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Thu Apr 7 08:23:13 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 7 08:23:13 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407115141.96479.qmail@web50204.mail.yahoo.com> References: <20050407115141.96479.qmail@web50204.mail.yahoo.com> Message-ID: <4255502D.6060306@ee.byu.edu> Scott Gilbert wrote: >--- "David M. Cooke" wrote: > > >>>Good point, but a pain. Maybe they should be required, that way I >>>don't have to first check for the presence of '<' or '>', then check >>>if they have the right value. >>> >>> >>I'll second this. Pulling out more Python Zen: Explicit is better than >>implicit. >> >> >> > >I'll third. > > O.K. It's done.... From curzio.basso at unibas.ch Thu Apr 7 09:58:40 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Thu Apr 7 09:58:40 2005 Subject: [Numpy-discussion] profile reveals calls to astype() Message-ID: <4255664F.2070107@unibas.ch> Hi all, I have a problem trying to profile a program using numarray, maybe someone with more experience can give me a hint... basically, the program I am profiling has a function like this: foo(): # some code # a call to astype() for i in xrange(N): # some other code and NO explicit call to astype() the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of N+1, as if it was called inside the loop. So now the first doubt I have is that astype() gets listed because called from some function called by foo(), even if this should not happen. Here is the list of numarray functions called in foo() Function called... generic.py:651(getshape)(14) 0.070 generic.py:918(reshape)(2) 0.000 generic.py:1013(where)(2) 0.050 generic.py:1069(concatenate)(2) 4.270 morphology.py:150(binary_erosion)(2) 0.070 numarraycore.py:698(__del__)(120032) 3.240 numarraycore.py:817(astype)(12002) 37.290 numarraycore.py:857(is_c_array)(36000) 10.450 numarraycore.py:878(type)(4) 0.000 numarraycore.py:964(__mul__)(12) 0.340 numarraycore.py:981(__div__)(8) 0.010 numarraycore.py:1068(__pow__)(8) 0.000 numarraycore.py:1180(__imul__)(12000) 0.930 numarraycore.py:1250(__eq__)(2) 0.080 numarraycore.py:1400(zeros)(54) 0.060 numarraycore.py:1409(ones)(8) 0.020 The second thing I can think of is that astype() is implicitly called by some conversion. Can this be? curzio From jmiller at stsci.edu Thu Apr 7 10:51:38 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 7 10:51:38 2005 Subject: [Numpy-discussion] profile reveals calls to astype() In-Reply-To: <4255664F.2070107@unibas.ch> References: <4255664F.2070107@unibas.ch> Message-ID: <1112896207.2437.34.camel@halloween.stsci.edu> astype() is used in a bunch of places, including the C-API, so it's hard to guess how it's getting called with the information here. In general, astype() gets called to "match up types" based on a particular parameterization of a function call, i.e. the c-code underlying some function call needs a different type than was passed in so astype() is used to convert an array to a workable type. One possibility for debugging this might be to drop N to something reasonable, like say 2, and then run under pdb with a breakpoint set on astype(). Something like this is what I have in mind; it may not be exactly right but with fiddling this approach might work: >>> from yourmodule import newfoo # you redefined foo to accept N as a parameter >>> import pdb >>> pdb.run("newfoo(N=2)") (pdb) s # step along a little to get into newfoo() ... step output (pdb) import numarray.numarraycore as nc (pdb) break nc.astype (pdb) c ... breakpoint output (pdb) where ... function traceback showing where astype() got called from (pdb) c ... breakpoint output (pdb) where ... more function traceback, eventually you should find it... ... Regards, Todd On Thu, 2005-04-07 at 12:56, Curzio Basso wrote: > Hi all, > > I have a problem trying to profile a program using numarray, maybe someone with more experience can > give me a hint... > > basically, the program I am profiling has a function like this: > > foo(): > # some code > # a call to astype() > for i in xrange(N): > # some other code and NO explicit call to astype() > > the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of > N+1, as if it was called inside the loop. > So now the first doubt I have is that astype() gets listed because called from some function called > by foo(), even if this should not happen. Here is the list of numarray functions called in foo() > > Function called... > generic.py:651(getshape)(14) 0.070 > generic.py:918(reshape)(2) 0.000 > generic.py:1013(where)(2) 0.050 > generic.py:1069(concatenate)(2) 4.270 > morphology.py:150(binary_erosion)(2) 0.070 > numarraycore.py:698(__del__)(120032) 3.240 > numarraycore.py:817(astype)(12002) 37.290 > numarraycore.py:857(is_c_array)(36000) 10.450 > numarraycore.py:878(type)(4) 0.000 > numarraycore.py:964(__mul__)(12) 0.340 > numarraycore.py:981(__div__)(8) 0.010 > numarraycore.py:1068(__pow__)(8) 0.000 > numarraycore.py:1180(__imul__)(12000) 0.930 > numarraycore.py:1250(__eq__)(2) 0.080 > numarraycore.py:1400(zeros)(54) 0.060 > numarraycore.py:1409(ones)(8) 0.020 > > The second thing I can think of is that astype() is implicitly called by some conversion. Can this be? > > curzio > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From Chris.Barker at noaa.gov Thu Apr 7 11:38:43 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Apr 7 11:38:43 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407114157.23887.qmail@web50209.mail.yahoo.com> References: <20050407114157.23887.qmail@web50209.mail.yahoo.com> Message-ID: <42557DE3.3010804@noaa.gov> Scott Gilbert wrote: > I think __array_version__ (or __array_protocol__?) is the > better choice. How about have it optional and default to 1? If it's > present and greater than 1 then it means there is something new going on... Again, I'm uncomfortable with something that I have to check being optional. If it is, we're encouraging people to not check it, and that' a recipe for bugs later on down the road. > Everyone seems to think that an offset is so weird. I haven't looked at > the internals of Numeric/scipy.base in a while so maybe it doesn't apply > there. However, if you subscript an array and return a view to the data, > you need an offset or you need to create a new buffer that encodes the > offset for you. > I guess all I'm saying is that I wouldn't assume the offset is zero... Good point. All the more reason to have the offset be mandatory. > The fact that a lot of these attributes are optional should be hidden in > helper functions like those in Travis's array_interface.py module, or a > C/C++ include file (with inline functions). Yes, if there is a C/C++ version of all these helper functions, I'll be a lot happier. And you're right, the same information should not be encoded in two places, so my "iscontiguous" attribute should be a helper function or maybe a method. > In a short while, you shouldn't have to check any __array_metadata__ > attributes directly. There should even be a helper function for getting > the array elements. Cool. How would that work? A C++ iterator? I"m thinking not, as this is all C, no? > It wouldn't be a horrible mistake to have all the attributes be mandatory, > but it doesn't get array consumes any benefit that they can't get from a > well written helper library, and it does add some burden to array > producers. Hardly any. I'm assuming that there will be a base_array class that can be used as a base class or mixin, so it wouldn't be any work at all to have a full set of attributes with defaults. It would take up a little bit of memory. I'm assuming that the whole point of this is to support large datasets, but maybe that isn't a valid assumption, After all, small array support has turned out to be very important for Numeric. As a rule of thumb, I think there will be consumers of arrays that producers, so I'd rather make it easy on the consumers that the producers, if we need to make such a trade off. Maybe I'm biased, because I'm a consumer. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Apr 7 12:20:05 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Apr 7 12:20:05 2005 Subject: [Numpy-discussion] Re: Questions about the array interface. In-Reply-To: References: <42546766.5060802@noaa.gov> Message-ID: <42558796.4070607@noaa.gov> James Carroll wrote: >> def DrawPointList(self, points, pens=None): >> ... >> # some checking code on the pens) >> ... >> if (hasattr(points,'__array_shape__') and >> hasattr(points,'__array_typestr__') and >> len(points.__array_shape__) == 2 and >> points.__array_shape__[1] == 2 and >> points.__array_typestr__ == 'i4' and >> ): # this means we have a compliant array >> # return the array protocol version >> return self._DrawPointArray(points.__array_data__, pens,[]) >> #This needs to be written now! > > > This means that whenever you have some complex multivalued > multidementional structure with the data you want to plot, you have to > reshape it into the above 'compliant' array before passing it on. I'm > a newbie, but is this reshape something where the data has to be > copied and take up memory twice? Probably. It depends on two things: 1) What structure the data is in at the moment 2) Whether we write the code to handle more "complex" arrangements of data: discontiguous arrays, for instance. But the idea is to require a data structure that makes sense for the data. For example, a natural way to store a whole set of coordinates is to use an NX2 NumPy array of doubles. This is exactly the data structure that I want the above function to accept. If the points are somehow a subset of a larger array, then they will be in a discontiguous array, and I'm not sure if I want to bother to try to handle that. You can always use the generic sequence interface to access the data, but that will be a lot slower. We're interfacing with a static language here, we can get optimum performance only by specifying a particular data structure. > If not, then great, you would > painlessly reshape into something that had a different set of strides > that just accessed the data that complied in the big blob of data. If > the reshape is expensive, then maybe we need the array abstraction, > and then a second 'thing' that described which parts of the array to > use for the sequence of 2-tuples to use for plotting the x,y s of a > scatter plot. (or whatever) The proposed array interface does provide a certain level of abstraction, that's what: __array_shape__ __array_typestr__ __array_descr__ __array_strides__ __array_offset__ Are all about we could certainly write the wxPy_LIST_helper functions to handle a larger variety of options that the simple contiguous C array, but I want to start with the simple case, and I'm not sure directly handling the more complex cases is worth it. I'm imagining that the user will need to do something like: dc.DrawPointList(asarray(points, Int)) It's easier to use the utility functions that Numeric provides than re-write similar code in wxPython. > I do think we can accept more than just i4 for a datatype. Especially > since a last-minute cast to i4 in inexpensive for almost every data > type. Sure, but we're interfacing with a static language, so for each data type supported, we need to cast the data pointer to the right type, then have a code to convert it to the type needed by wx. It's not a big deal, but I'd rather keep it simple. I do want to support at least doubles and ints. Users can use Numeric's astype() method to convert if need be. I've noticed that there is a wxRealPoint class that uses doubles, but it doesn't look like it can be used as input to any of the wxDC methods. Too bad. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From xscottg at yahoo.com Thu Apr 7 14:13:32 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 14:13:32 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050407211227.82679.qmail@web50206.mail.yahoo.com> --- Chris Barker wrote: > > Again, I'm uncomfortable with something that I have to check being > optional. If it is, we're encouraging people to not check it, and that' > a recipe for bugs later on down the road. > [snip] > > > I guess all I'm saying is that I wouldn't assume the offset is zero... > > Good point. All the more reason to have the offset be mandatory. > Lot's of protocols have optional parts. The helper functions would hide this level of detail. > > Yes, if there is a C/C++ version of all these helper functions, I'll be > a lot happier. And you're right, the same information should not be > encoded in two places, so my "iscontiguous" attribute should be a helper > function or maybe a method. > > > In a short while, you shouldn't have to check any __array_metadata__ > > attributes directly. There should even be a helper function for > > getting the array elements. > > Cool. How would that work? A C++ iterator? I"m thinking not, as this is > all C, no? > I think this will take shape as an include file with static/inline functions. No linking required, just #include and call the functions. It would be nice but not necessary that this was distributed with Python. I would be in favor of having some C++ iterator interfaces (possibly a template class) inside of a #ifdef __cplusplus block. Python doesn't seem to have a a lot C++ in the core so I wonder if this would meet resistance (even when it's inside of a #ifdef block). > > > It wouldn't be a horrible mistake to have all the attributes be > > mandatory, but it doesn't get array consumes any benefit that they > > can't get from a well written helper library, and it does add some > > burden to array producers. > > Hardly any. I'm assuming that there will be a base_array class that can > be used as a base class or mixin, so it wouldn't be any work at all to > have a full set of attributes with defaults. It would take up a little > bit of memory. I'm assuming that the whole point of this is to support > large datasets, but maybe that isn't a valid assumption, After all, > small array support has turned out to be very important for Numeric. > If the protocol can make things easy without the use of a mixin or base class, all the better to my way of thinking. I don't think the memory use is very relevant as the attributes would only require storage in the class object, not the instances. There is something elegant about making array creation as easy as: class easy_array: def __init__(self, filename): data = open(filename, 'r').read() self.__array_data__ = data self.__array_shape__ = (len(data)/4,) self.__array_typestr__ = '>i4' Like I said, I don't think it would be *horrible* to require all the attributes, but I don't see how it will benefit you at all. And even if all the attributes are mandatory, there are still a number of details to get right in reading the memory. You'll likely want to use the helper libraries/modules regardless. (Once they're completed of course...) > > As a rule of thumb, I think there will be [more] consumers of arrays > than producers, so I'd rather make it easy on the consumers that the > producers, if we need to make such a trade off. Maybe I'm biased, > because I'm a consumer. > I don't see the trade off. It will be easy for you either way, but harder for array producers (admittedly only a little). This has to be easier than the situation you have today right? Imagine the code you'd have to write to special case Numeric, scipy.base, Numarray, and Python's array module. Cheers, -Scott From tim.hochberg at cox.net Thu Apr 7 14:31:11 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 7 14:31:11 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407211227.82679.qmail@web50206.mail.yahoo.com> References: <20050407211227.82679.qmail@web50206.mail.yahoo.com> Message-ID: <4255A635.9010309@cox.net> Scott Gilbert wrote: >--- Chris Barker wrote: > > [SNIP] > >>As a rule of thumb, I think there will be [more] consumers of arrays >>than producers, so I'd rather make it easy on the consumers that the >>producers, if we need to make such a trade off. Maybe I'm biased, >>because I'm a consumer. >> >> >> > >I don't see the trade off. It will be easy for you either way, but harder >for array producers (admittedly only a little). > > I think there is a trade off, but not the one that Chris is worried about. It should be easy to hide complexity of dealing with missing attributes through the various helper functions. The cost will be in speed and will probably be most noticable in C extensions using small arrays where the extra code to check if an attribute is present will be signifigant. How signifigant this will be, I'm not sure. And frankly I don't care all that much since I generally only use large arrays. However, since one of the big faultlines between Numarray and Numeric involves the former's relatively poor small array performance, I suspect someone might care. -tim >This has to be easier than the situation you have today right? Imagine the >code you'd have to write to special case Numeric, scipy.base, Numarray, and >Python's array module. > > > From oliphant at ee.byu.edu Thu Apr 7 15:47:04 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 7 15:47:04 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407211501.60155.qmail@web50203.mail.yahoo.com> References: <20050407211501.60155.qmail@web50203.mail.yahoo.com> Message-ID: <4255B7D6.9000109@ee.byu.edu> Scott Gilbert wrote: >I agree, we need a road map of some sort. It could be multiple PEPs >depending, but it should include most of the following: > > - Get the bytes object submitted. There are only a few small > things in PEP 296 that should be changed. > > > #4 > - I'm not particularly interested in implementing the new bytes > literal and other features discussed in PEP 332, but it is > related to this topic. (The proposal is for b"xxxxxx" to be a > bytes literal.) We should make note that while this is not > part of the numpy roadmap, nothing prohibits that from being > implemented by another user. > > > - Add an ndarray module. This module will contain the ndarray > object as well as a superset of your helper functions. I > think implementing it in pure Python on top of the bytes > object is the right course. It's partly for documentation. > > - Add an include file to make this protocol easily accessible > from C. It's not much code, and the entire thing could be > done with inline/static functions in the .h file. It would > be nice if this went into Python too, but not strictly > required. > > I put these together at #1 > - Add the array protocol attributes to the existing array > object. > > #2 > - Flesh out the "locked buffer" stuff in PEP 298. Add support > for locking the buffer to the existing array object, the > bytes object, the mmap object, and anything else (string?) > that doesn't meet too much resistance. > > #3 > - Fix the existing buffer object to regrab it's pointer > every time it's needed. Could also add support to use > the "locked buffer" interface where possible. I gather > that you are using this particular object in scipy.base > (is that true??). Several shortcomings of it could be > easily fixed at the Python level, but I don't feel > strongly that this would have to be done... Then again > it isn't much work. > > #5 I can't think of anything you've missed. I'm very supportive of this, but I have to finish scipy.base first. I think Perry is supportive as well. I know he's been playing catch-up in the reading. I'm not sure of Todd's opinion. I suspect he would welcome these changes to Python. My preference order is 1) the ndarray module and ndarray.h header with these interface definitions and methods. 2) Add array interface attributes to array module 3) Flesh out locked buffer API 4) Bytes object (with Pickling support) 5) Fix current buffer object. -Travis From strawman at astraw.com Thu Apr 7 15:56:03 2005 From: strawman at astraw.com (Andrew Straw) Date: Thu Apr 7 15:56:03 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255502D.6060306@ee.byu.edu> References: <20050407115141.96479.qmail@web50204.mail.yahoo.com> <4255502D.6060306@ee.byu.edu> Message-ID: <4255BA56.7000001@astraw.com> Travis Oliphant wrote: > Scott Gilbert wrote: > >> --- "David M. Cooke" wrote: >> >> >>>> Good point, but a pain. Maybe they should be required, that way I >>>> don't have to first check for the presence of '<' or '>', then check >>>> if they have the right value. >>>> >>> >>> I'll second this. Pulling out more Python Zen: Explicit is better than >>> implicit. >>> >>> >> >> >> I'll third. >> >> > > O.K. It's done.... > Here's a bit of weirdness which has prevented me from using '<' or '>' in the past with the struct module. I'm not guru enough to know what's going on, but it has prevented me from being explicit rather than implicit. In [1]:import struct In [2]:from numarray.ieeespecial import nan In [3]:nan Out[3]:nan In [4]:struct.pack(' SystemError: frexp() result out of range In [5]:struct.pack('d',nan) Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' From Chris.Barker at noaa.gov Thu Apr 7 16:01:03 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Apr 7 16:01:03 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255A635.9010309@cox.net> References: <20050407211227.82679.qmail@web50206.mail.yahoo.com> <4255A635.9010309@cox.net> Message-ID: <4255BA80.4090201@noaa.gov> Tim Hochberg wrote: > Scott Gilbert wrote: >> --- Chris Barker wrote: >> I don't see the trade off. I wasn't sure it applied in this case, but if there were a trade off, we should make things easiest for the consumers of arrays. > I think there is a trade off, but not the one that Chris is worried > about. It should be easy to hide complexity of dealing with missing > attributes through the various helper functions. The cost will be in > speed and will probably be most noticable in C extensions using small > arrays where the extra code to check if an attribute is present will be > signifigant. Actually, that is one I'm worried about. You're quite right, if I'm dealing with a 2X2 array, those helper functions are going to take much longer to run than accessing (and maybe using) the data. Like Tim, I'm mostly interested in using this for large data sets, but I think the small array thing might crop up unexpectedly. For example, with the current numarray, if you pass in an NX2 array to wxPython (to draw a polygon, for instance), it's very slow. It turns out that that's because a whole set of (2,) arrays are created when extracting the data, so even though you're dealing with a large data set, you end up dealing with a LOT of small arrays. Of course, the whole point of this is to avoid that, but I don't think we should assume that any overhead is negligible. > >> This has to be easier than the situation you have today right? well, sure. Though it seems to be harder than using the Numeric API. Directly. However, I'll shut up now, as it seems that the proposed utility functions will address my issues. -Chris PS to Tim: Want to help out with the wxPython integration? -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From xscottg at yahoo.com Thu Apr 7 20:05:48 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 20:05:48 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408030336.54970.qmail@web50209.mail.yahoo.com> --- Andrew Straw wrote: > > Here's a bit of weirdness which has prevented me from using '<' or '>' > in the past with the struct module. I'm not guru enough to know what's > going on, but it has prevented me from being explicit rather than > implicit. > > In [1]:import struct > > In [2]:from numarray.ieeespecial import nan > > In [3]:nan > Out[3]:nan > > In [4]:struct.pack(' --------------------------------------------------------------------------- > exceptions.SystemError Traceback (most > recent call last) > > /home/astraw/ > > SystemError: frexp() result out of range > > In [5]:struct.pack('d',nan) > Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' > No clue why that is, but it certainly looks like a bug in the struct module. It shouldn't make any difference about whether or not the array protocol reports the endian though. It's using a different notation for typecodes. Cheers, -Scott From rkern at ucsd.edu Thu Apr 7 20:24:38 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 7 20:24:38 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408030336.54970.qmail@web50209.mail.yahoo.com> References: <20050408030336.54970.qmail@web50209.mail.yahoo.com> Message-ID: <4255F79D.4000501@ucsd.edu> Scott Gilbert wrote: > --- Andrew Straw wrote: > >>Here's a bit of weirdness which has prevented me from using '<' or '>' >>in the past with the struct module. I'm not guru enough to know what's >>going on, but it has prevented me from being explicit rather than >>implicit. >> >>In [1]:import struct >> >>In [2]:from numarray.ieeespecial import nan >> >>In [3]:nan >>Out[3]:nan >> >>In [4]:struct.pack('> > > --------------------------------------------------------------------------- > >>exceptions.SystemError Traceback (most >>recent call last) >> >>/home/astraw/ >> >>SystemError: frexp() result out of range >> >>In [5]:struct.pack('d',nan) >>Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' >> > > > > No clue why that is, but it certainly looks like a bug in the struct > module. It shouldn't make any difference about whether or not the array > protocol reports the endian though. It's using a different notation for > typecodes. This behavior is expplained by Tim Peters: http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From xscottg at yahoo.com Thu Apr 7 21:07:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 21:07:02 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408040601.86838.qmail@web50203.mail.yahoo.com> --- Tim Hochberg wrote: > > I think there is a trade off, but not the one that Chris is worried > about. It should be easy to hide complexity of dealing with missing > attributes through the various helper functions. The cost will be in > speed and will probably be most noticable in C extensions using small > arrays where the extra code to check if an attribute is present will be > signifigant. > > How signifigant this will be, I'm not sure. And frankly I don't care all > that much since I generally only use large arrays. However, since one of > the big faultlines between Numarray and Numeric involves the former's > relatively poor small array performance, I suspect someone might care. > You must check the return value of the PyObject_GetAttr (or PyObject_GetAttrString) calls regardless. Otherwise the extension will die with an ugly segfault the first time one passes an float where an array was expected. If we're talking about small light-weight arrays and a C/C++ function that wants to work with them very efficiently, I'm not convinced that requiring the attributes be present will make things faster. As we're talking about small light weight arrays, it's unlikely the individual arrays will have __array_shape__ or __array_strides__ already stored as tuples. They'll probably store them as a C array as part of their PyObject structure. In the world where some of these attributes are optional: If an attribute like __array_offset__ or __array_shape__ isn't present, the C code will know to use zero or the default C-contiguous layout. So the check failed, but the failure case is probably very fast (since a temporary tuple object doesn't have to be built by the array on the fly). In the world where all of the attributes are required: The array object will have to generate the __array_offset__ int/long or __array_shape___ tuple from it's own internal representation. Then the C/C++ consumer code will bust apart the tuple to get the values. So the check succeeded, but the success code needs to grab the parts of the tuple. The C helper code could look like: struct PyNDArrayInfo { int ndims; int endian; char itemcode; size_t itemsize; Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */ Py_LONG_LONG offset; Py_LONG_LONG strides[40]; /* More Array Info goes here */ }; int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) { PyObject* shape; PyObject* offset; PyObject* strides; int ii, len; info->itemsize = too_long_for_this_example(obj); shape = PyObject_GetAttrString(obj, "__array_shape__"); if (!shape) return 0; len = PySequence_Size(shape); if (len < 0) return 0; if (len > 40) return 0; /* This needs work */ info->ndims = len; for (ii = 0; iishape[ii] = PyLong_AsLongLong(val); Py_DECREF(val); } Py_DECREF(shape); offset = PyObject_GetAttrString(obj, "__array_offset__"); if (offset) { /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/ info->offset = PyLong_AsLongLong(offset); Py_DECREF(offset); } else { PyErr_Clear(); info->offset = 0; } strides = PyObject_GetAttrString(obj, "__array_strides__"); if (strides) { /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/ for (ii = 0; iistrides[ii] = PyLong_AsLongLong(val); Py_DECREF(val); } Py_DECREF(strides); } else { /*** THIS FAILURE PATH IS PROBABLY FASTER ***/ size_t size = info->size; PyErr_Clear(); for (ii = ndims-1; ii>=0; ii--) { info->strides[ii] = size; size *= info->shape[ii]; } } /* More code goes here */ } I have no idea how expensive PyErr_Clear() is. We'd have to profile it to see for certain. If PyErr_Clear() is not expensive, then we could make a strong argument that *not* requiring the attributes will be more efficient. It could also be so close that it doesn't matter - in which case it's back to being a matter of taste... Cheers, -Scott From xscottg at yahoo.com Thu Apr 7 21:16:06 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 21:16:06 2005 Subject: [Numpy-discussion] Questions about the array interface. Message-ID: <20050408041417.61390.qmail@web50210.mail.yahoo.com> Oops, sent too fast. Quick correction... > > In the world where some of these attributes are optional: If an > attribute like __array_offset__ or __array_shape__ isn't present, > the C code will know to use zero or the default C-contiguous layout. > So the check failed, but the failure case is probably very fast > (since a temporary tuple object doesn't have to be built by the array > on the fly). > I meant to say "__array_offset__ or __array_stides___". The __array_shape__ attribute would always be required for arrays... Cheers, -Scott From tim.hochberg at cox.net Thu Apr 7 23:56:10 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 7 23:56:10 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408040601.86838.qmail@web50203.mail.yahoo.com> References: <20050408040601.86838.qmail@web50203.mail.yahoo.com> Message-ID: <42562AC5.3040502@cox.net> Scott Gilbert wrote: >--- Tim Hochberg wrote: > > >>I think there is a trade off, but not the one that Chris is worried >>about. It should be easy to hide complexity of dealing with missing >>attributes through the various helper functions. The cost will be in >>speed and will probably be most noticable in C extensions using small >>arrays where the extra code to check if an attribute is present will be >>signifigant. >> >>How signifigant this will be, I'm not sure. And frankly I don't care all >>that much since I generally only use large arrays. However, since one of >>the big faultlines between Numarray and Numeric involves the former's >>relatively poor small array performance, I suspect someone might care. >> >> >> > >You must check the return value of the PyObject_GetAttr (or >PyObject_GetAttrString) calls regardless. Otherwise the extension will die >with an ugly segfault the first time one passes an float where an array was >expected. > >If we're talking about small light-weight arrays and a C/C++ function that >wants to work with them very efficiently, I'm not convinced that requiring >the attributes be present will make things faster. > > >As we're talking about small light weight arrays, it's unlikely the >individual arrays will have __array_shape__ or __array_strides__ already >stored as tuples. They'll probably store them as a C array as part of >their PyObject structure. > > >In the world where some of these attributes are optional: If an attribute >like __array_offset__ or __array_shape__ isn't present, the C code will >know to use zero or the default C-contiguous layout. So the check failed, >but the failure case is probably very fast (since a temporary tuple object >doesn't have to be built by the array on the fly). > > >In the world where all of the attributes are required: The array object >will have to generate the __array_offset__ int/long or __array_shape___ >tuple from it's own internal representation. Then the C/C++ consumer code >will bust apart the tuple to get the values. So the check succeeded, but >the success code needs to grab the parts of the tuple. > > > >The C helper code could look like: > > I'm not convinced it's legit to assume that a failure to get the attribute means that it's not present and call PyErrorClear. Just as a for instance, what if the attribute in question is implemented as a descriptor in which there is some internal error. Then your burying the error and most likely doing the wrong thing. As far as I can tell, the only correct way to do this is to use PyObject_HasAttrString, then PyObject_GetAttrString if that succeeds. The point about not passing around the tuples probably being faster is a good one. Another thought is that requiring tuples instead of general sequences would make the helper faster (since one could use *PyTuple_GET_**ITEM*, which I believe is much faster than PySequence_GetItem). This would possibly shift more pain onto the implementer of the object though. I suspect that the best strategy, orthogonal to requiring all attributes or not, is to use PySequence_Fast to get a fast sequence and work with that. This means that objects that return tuples for strides, etc would run at maximum possible speed, while other sequences would still work. Back to requiring attributes or not. I suspect that the fastest correct way is to require all attributes, but allow them to be None, in which case the default value is used. Then any errors are easily bubbled up and a fast check for None choses whether to use the defaults or not. It's late, so I hope that's not too incoherent. Or too wrong. Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort of error checking (it can allegedly raise errors). I suppose that means one is supposed to call PyError_Occurred after every call? That's sort of painful! -tim ** > struct PyNDArrayInfo { > int ndims; > int endian; > char itemcode; > size_t itemsize; > Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */ > Py_LONG_LONG offset; > Py_LONG_LONG strides[40]; > /* More Array Info goes here */ > }; > > int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) { > PyObject* shape; > PyObject* offset; > PyObject* strides; > int ii, len; > > info->itemsize = too_long_for_this_example(obj); > > shape = PyObject_GetAttrString(obj, "__array_shape__"); > if (!shape) return 0; > len = PySequence_Size(shape); > if (len < 0) return 0; > if (len > 40) return 0; /* This needs work */ > info->ndims = len; > for (ii = 0; ii PyObject* val = PySequence_GetItem(shape, ii); > info->shape[ii] = PyLong_AsLongLong(val); > Py_DECREF(val); > } > Py_DECREF(shape); > > offset = PyObject_GetAttrString(obj, "__array_offset__"); > if (offset) { > /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/ > info->offset = PyLong_AsLongLong(offset); > Py_DECREF(offset); > } else { > PyErr_Clear(); > info->offset = 0; > } > > strides = PyObject_GetAttrString(obj, "__array_strides__"); > if (strides) { > /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/ > for (ii = 0; ii PyObject* val = PySequence_GetItem(strides, ii); > info->strides[ii] = PyLong_AsLongLong(val); > Py_DECREF(val); > } > Py_DECREF(strides); > } else { > /*** THIS FAILURE PATH IS PROBABLY FASTER ***/ > size_t size = info->size; > PyErr_Clear(); > for (ii = ndims-1; ii>=0; ii--) { > info->strides[ii] = size; > size *= info->shape[ii]; > } > } > > /* More code goes here */ > } > > > >I have no idea how expensive PyErr_Clear() is. We'd have to profile it to >see for certain. If PyErr_Clear() is not expensive, then we could make a >strong argument that *not* requiring the attributes will be more efficient. > > >It could also be so close that it doesn't matter - in which case it's back >to being a matter of taste... > > >Cheers, > -Scott > > > > > > > From cookedm at physics.mcmaster.ca Fri Apr 8 00:43:08 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 8 00:43:08 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <42562AC5.3040502@cox.net> References: <20050408040601.86838.qmail@web50203.mail.yahoo.com> <42562AC5.3040502@cox.net> Message-ID: <20050408074129.GA16479@arbutus.physics.mcmaster.ca> On Thu, Apr 07, 2005 at 11:55:01PM -0700, Tim Hochberg wrote: > Scott Gilbert wrote: > > >--- Tim Hochberg wrote: > > > > > >>I think there is a trade off, but not the one that Chris is worried > >>about. It should be easy to hide complexity of dealing with missing > >>attributes through the various helper functions. The cost will be in > >>speed and will probably be most noticable in C extensions using small > >>arrays where the extra code to check if an attribute is present will be > >>signifigant. > >> > >>How signifigant this will be, I'm not sure. And frankly I don't care all > >>that much since I generally only use large arrays. However, since one of > >>the big faultlines between Numarray and Numeric involves the former's > >>relatively poor small array performance, I suspect someone might care. > >> > >> > >> > > > >You must check the return value of the PyObject_GetAttr (or > >PyObject_GetAttrString) calls regardless. Otherwise the extension will die > >with an ugly segfault the first time one passes an float where an array was > >expected. > > > >If we're talking about small light-weight arrays and a C/C++ function that > >wants to work with them very efficiently, I'm not convinced that requiring > >the attributes be present will make things faster. > > > > > >As we're talking about small light weight arrays, it's unlikely the > >individual arrays will have __array_shape__ or __array_strides__ already > >stored as tuples. They'll probably store them as a C array as part of > >their PyObject structure. > > > > > >In the world where some of these attributes are optional: If an attribute > >like __array_offset__ or __array_shape__ isn't present, the C code will > >know to use zero or the default C-contiguous layout. So the check failed, > >but the failure case is probably very fast (since a temporary tuple object > >doesn't have to be built by the array on the fly). > > > >In the world where all of the attributes are required: The array object > >will have to generate the __array_offset__ int/long or __array_shape___ > >tuple from it's own internal representation. Then the C/C++ consumer code > >will bust apart the tuple to get the values. So the check succeeded, but > >the success code needs to grab the parts of the tuple. > > > >The C helper code could look like: > > I'm not convinced it's legit to assume that a failure to get the > attribute means that it's not present and call PyErrorClear. Just as a > for instance, what if the attribute in question is implemented as a > descriptor in which there is some internal error. Then your burying the > error and most likely doing the wrong thing. As far as I can tell, the > only correct way to do this is to use PyObject_HasAttrString, then > PyObject_GetAttrString if that succeeds. No point: PyObject_HasAttrString *calls* PyObject_GetAttrString, then clears the error if there is one. [Side note: hasattr() in Python works the same way, which is why using properties is a pain when you've got code that's using it] > The point about not passing around the tuples probably being faster is a > good one. Another thought is that requiring tuples instead of general > sequences would make the helper faster (since one could use > *PyTuple_GET_**ITEM*, which I believe is much faster than > PySequence_GetItem). This would possibly shift more pain onto the > implementer of the object though. I suspect that the best strategy, > orthogonal to requiring all attributes or not, is to use PySequence_Fast > to get a fast sequence and work with that. This means that objects that > return tuples for strides, etc would run at maximum possible speed, > while other sequences would still work. How about objects that use a lightweight array as the strides sequence? I'm thinking that if you've got a fast 1-d array object, you'd be tempted to use an instance of that as the shape or strides attribute. You'd be saving on temporary tuple creation (but you'd still be losing some in making Python ints). I haven't benchmarked it, but I'm looking at the code for PySequence_GetItem(): it does a few pointer derefences to get the sq_item() method in the tp_as_sequence struct of an object implementing the sequence protocol, which for the tuple does an array indexing of the tuple's data. You've got about two function calls more compared to using PyTuple_GET_ITEM. It really depends on how big the arrays you expect to get passed to you. If they're big, this is all amortized: you'll hardly see it. It also depends on how your routines get used. If the routine is buried below a few layers of API, you'd likely be better off doing a typecast higher up to your own representation, or something. If it's at the border, so the user will call it directly *often*, you're going to be screwed for speed anyways (giving the user the option of casting arrays to something else would probably help a lot here also). > Back to requiring attributes or not. I suspect that the fastest correct > way is to require all attributes, but allow them to be None, in which > case the default value is used. Then any errors are easily bubbled up > and a fast check for None choses whether to use the defaults or not. > > It's late, so I hope that's not too incoherent. Or too wrong. > > Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort > of error checking (it can allegedly raise errors). I suppose that means > one is supposed to call PyError_Occurred after every call? That's sort > of painful! Yes! Check all C API functions that may return errors! That includes PySequence_GetItem() and PyLong_AsLongLong. > > struct PyNDArrayInfo { > > int ndims; > > int endian; > > char itemcode; > > size_t itemsize; > > Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */ > > Py_LONG_LONG offset; > > Py_LONG_LONG strides[40]; > > /* More Array Info goes here */ > > }; > > > > int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) { > > PyObject* shape; > > PyObject* offset; > > PyObject* strides; > > int ii, len; > > > > info->itemsize = too_long_for_this_example(obj); > > > > shape = PyObject_GetAttrString(obj, "__array_shape__"); > > if (!shape) return 0; > > len = PySequence_Size(shape); > > if (len < 0) return 0; > > if (len > 40) return 0; /* This needs work */ > > info->ndims = len; > > for (ii = 0; ii > PyObject* val = PySequence_GetItem(shape, ii); Like here > > info->shape[ii] = PyLong_AsLongLong(val); and here > > Py_DECREF(val); (if you don't check PySequence_GetItem -- not a good idea anyways -- this should be Py_XDECREF) [snip more code that needs checks :-)] > >I have no idea how expensive PyErr_Clear() is. We'd have to profile it to > >see for certain. If PyErr_Clear() is not expensive, then we could make a > >strong argument that *not* requiring the attributes will be more efficient. Not much; it's about three Py_XDECREF's. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Fri Apr 8 01:22:09 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 8 01:22:09 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? Message-ID: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> It seems that people are worried about speed of the attribute-based array interface when using small arrays in C. Here's an alternative: Define some attribute (for now, call it __array_c__), which returns a CObject whose value (which you get with PyCObject_GetVoidPtr) would be a pointer to a struct describing the array. It would look something like typedef struct { int version; int nd; Py_LONG_LONG *shape; char typecode; Py_LONG_LONG *strides; Py_LONG_LONG offset; void *data; } SimpleCArray; (The order here follows that of the array interface spec; if somebody's got any comments on what mixing int's, Py_LONG_LONG, and char's in a struct does to the packing and potential alignment problems I'd like to know.) version is there as a sanity check: I'd say for this version it's something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily a check that you've got the right thing (sinc CObjects are intrinsically opaque types). Then: - the array object guarantees that the data, etc. remains alive, probably by passing itself as the desc parameter to the CObject. The array data would have to stay at the same location and the same size while the reference is held. - typecode follows that of the __array_typestr__ attribute - shape and strides are pointers to arrays of at least nd elements. - this doesn't handle byteswapped as-is. Maybe a flags, or endian, attribute could be added. - you can still have the full attribute-base array interface (__array_strides__, etc.) to fall back on. If the typecode is 'V', you'll have to look at __array_descr__. Creating one from a Numeric PyArrayObject would go like this: PyObject *create_SimpleCArray(PyArrayObject *a) { SimpleCArray *ca = PyMem_New(SimpleCArray, 1); ca->version = 0xDECAF; ca->nd = a->nd; ca->shape = PyMem_New(Py_LONG_LONG, ca->nd); for (i = 0; i < ca->nd; i++) { ca->shape[i] = a->dimensions[i]; } ca->strides = PyMem_New(Py_LONG_LONG, ca->nd); for (i = 0; i < ca->nd; i++) { ca->strides[i] = a->strides[i]; } ca->offset = 0; ca->data = &my_data; Py_INCREF(a); PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray); return co; } where void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a) { PyMem_Free(ca->shape); PyMem_Free(ca->strides); PyMem_Free(ca); Py_DECREF(a); } Some points: - you have to keep the CObject around: destroying it will potentially destroy the array you're looking at. - I was thinking that maybe adding a PyObject *owner could make it easier to keep track of the owner; I'm not sure, as the descr argument in CObjects can easily play that role. - The creator of the SimpleCArray is free to add elements to the end (as long as they don't affect the padding/alignment of the previous ones: haven't thought about this). You could put the real owner of the array data there, for example (say, if it was wrapping a Blitz++ array). Or have a small _strides[30] array at the end, and strides would point to that (saving you a memory allocation). This simple C interface would, I think, alleviate much worries about speed for small arrays, and even for large arrays. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From curzio.basso at unibas.ch Fri Apr 8 06:30:05 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Fri Apr 8 06:30:05 2005 Subject: [Numpy-discussion] profile reveals calls to astype() In-Reply-To: <1112896207.2437.34.camel@halloween.stsci.edu> References: <4255664F.2070107@unibas.ch> <1112896207.2437.34.camel@halloween.stsci.edu> Message-ID: <4256873B.2060501@unibas.ch> Todd Miller wrote: > astype() is used in a bunch of places, including the C-API, so it's > hard to guess how it's getting called with the information here. In ok, so probably C functions are somehow 'transparent' to the profiler which does not report them, but reports the python functions called by the C one... >>>>from yourmodule import newfoo # you redefined foo to accept N as a parameter >>>>import pdb >>>>pdb.run("newfoo(N=2)") > > (pdb) s # step along a little to get into newfoo() > ... step output > (pdb) import numarray.numarraycore as nc > (pdb) break nc.astype strange, what I get now is: > (Pdb) b nc.astype > *** The specified object 'nc.astype' is not a function > or was not found along sys.path. and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather than just the function) under ipython, starting it with > %run -d myprog.py maybe this could mess up things? curzio From jmiller at stsci.edu Fri Apr 8 06:45:13 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 8 06:45:13 2005 Subject: [Numpy-discussion] profile reveals calls to astype() In-Reply-To: <4256873B.2060501@unibas.ch> References: <4255664F.2070107@unibas.ch> <1112896207.2437.34.camel@halloween.stsci.edu> <4256873B.2060501@unibas.ch> Message-ID: <1112967803.5142.29.camel@halloween.stsci.edu> On Fri, 2005-04-08 at 09:29, Curzio Basso wrote: > Todd Miller wrote: > > > astype() is used in a bunch of places, including the C-API, so it's > > hard to guess how it's getting called with the information here. In > > ok, so probably C functions are somehow 'transparent' to the profiler which does not report them, > but reports the python functions called by the C one... > > >>>>from yourmodule import newfoo # you redefined foo to accept N as a parameter > >>>>import pdb > >>>>pdb.run("newfoo(N=2)") > > > > (pdb) s # step along a little to get into newfoo() > > ... step output > > (pdb) import numarray.numarraycore as nc > > (pdb) break nc.astype > > strange, what I get now is: > > > (Pdb) b nc.astype > > *** The specified object 'nc.astype' is not a function > > or was not found along sys.path. > > and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather > than just the function) under ipython, starting it with > > > %run -d myprog.py > > maybe this could mess up things? No. I should have said "b nc.NumArray.astype". I just tried this out with an astype() callback from numarray.convolve's C-code and it worked OK for me. Regards, Todd From strawman at astraw.com Fri Apr 8 08:00:13 2005 From: strawman at astraw.com (Andrew Straw) Date: Fri Apr 8 08:00:13 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255F79D.4000501@ucsd.edu> References: <20050408030336.54970.qmail@web50209.mail.yahoo.com> <4255F79D.4000501@ucsd.edu> Message-ID: <42569C4D.2080904@astraw.com> Robert Kern wrote: > Scott Gilbert wrote: > >> --- Andrew Straw wrote: >> >>> Here's a bit of weirdness which has prevented me from using '<' or >>> '>' in the past with the struct module. I'm not guru enough to know >>> what's going on, but it has prevented me from being explicit rather >>> than >>> implicit. >>> >>> In [1]:import struct >>> >>> In [2]:from numarray.ieeespecial import nan >>> >>> In [3]:nan >>> Out[3]:nan >>> >>> In [4]:struct.pack('>> >> >> --------------------------------------------------------------------------- >> >> >>> exceptions.SystemError Traceback (most >>> recent call last) >>> >>> /home/astraw/ >>> >>> SystemError: frexp() result out of range >>> >>> In [5]:struct.pack('d',nan) >>> Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' >>> >> >> >> >> No clue why that is, but it certainly looks like a bug in the struct >> module. It shouldn't make any difference about whether or not the array >> protocol reports the endian though. It's using a different notation for >> typecodes. > > > This behavior is expplained by Tim Peters: > > http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a > I feared it was something like that. (No platform independent way to represent special values like nan, inf, and so on.) So I think if we're going to require an encoding character such as '<' or '>' we should also include one that means native which CAN handle these special values... And document why it's needed and why it may get one into trouble. From jmiller at stsci.edu Fri Apr 8 10:14:04 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 8 10:14:04 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> Message-ID: <1112980431.5142.116.camel@halloween.stsci.edu> On Fri, 2005-04-08 at 04:21, David M. Cooke wrote: > It seems that people are worried about speed of the attribute-based > array interface when using small arrays in C. I was a little worried too, but think the array protocol idea is a good one in any case. Thinking about this, I'm wondering if what we used to do in early numarray (0.2) wouldn't work here. Our "consumer interface" / helper function looked more like this: int getSimpleCArray(PyObject *o, SimpleCArray *info); It basically just fills in the caller's SimpleCArray struct using information from o and returns 0, or -1 with an exception set if there's some problem. In numarray's SimpleCArray struct, the shape and strides arrays were fully allocated (i.e. Py_LONG_LONG shape[MAXDIM];) so the struct could be placed in an auto variable with nothing to free() later. In this interface, there is no implied getattr at all, since the helper function getSimpleCArray() can be made as smart (i.e. given knowledge about specific types) as people are motivated to make it. So, for a Numeric array or a numarray or a Numeric3 array, getSimpleCArray would presumably just copy from struct to struct, but for other types, it might fall back on the many-getattr approach. Regards, Todd > Here's an alternative: Define some attribute (for now, call it > __array_c__), which returns a CObject whose value (which you get with > PyCObject_GetVoidPtr) would be a pointer to a struct describing the > array. It would look something like > > typedef struct { > int version; > int nd; > Py_LONG_LONG *shape; > char typecode; > Py_LONG_LONG *strides; > Py_LONG_LONG offset; > void *data; > } SimpleCArray; > > (The order here follows that of the array interface spec; if somebody's > got any comments on what mixing int's, Py_LONG_LONG, and char's in a > struct does to the packing and potential alignment problems I'd like to > know.) > > version is there as a sanity check: I'd say for this version it's > something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily > a check that you've got the right thing (sinc CObjects are > intrinsically opaque types). > > Then: > - the array object guarantees that the data, etc. remains alive, > probably by passing itself as the desc parameter to the CObject. > The array data would have to stay at the same location and the same > size while the reference is held. > > - typecode follows that of the __array_typestr__ attribute > > - shape and strides are pointers to arrays of at least nd elements. > > - this doesn't handle byteswapped as-is. Maybe a flags, or endian, > attribute could be added. > > - you can still have the full attribute-base array interface > (__array_strides__, etc.) to fall back on. If the typecode is 'V', > you'll have to look at __array_descr__. > > Creating one from a Numeric PyArrayObject would go like this: > > PyObject *create_SimpleCArray(PyArrayObject *a) > { > SimpleCArray *ca = PyMem_New(SimpleCArray, 1); > ca->version = 0xDECAF; > ca->nd = a->nd; > ca->shape = PyMem_New(Py_LONG_LONG, ca->nd); > for (i = 0; i < ca->nd; i++) { > ca->shape[i] = a->dimensions[i]; > } > ca->strides = PyMem_New(Py_LONG_LONG, ca->nd); > for (i = 0; i < ca->nd; i++) { > ca->strides[i] = a->strides[i]; > } > ca->offset = 0; > ca->data = &my_data; > > Py_INCREF(a); > PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray); > return co; > } > > where > void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a) > { > PyMem_Free(ca->shape); > PyMem_Free(ca->strides); > PyMem_Free(ca); > Py_DECREF(a); > } > > Some points: > - you have to keep the CObject around: destroying it will potentially > destroy the array you're looking at. > - I was thinking that maybe adding a PyObject *owner could make it > easier to keep track of the owner; I'm not sure, as the descr argument > in CObjects can easily play that role. > - The creator of the SimpleCArray is free to add elements to the end > (as long as they don't affect the padding/alignment of the previous > ones: haven't thought about this). You could put the real owner of the > array data there, for example (say, if it was wrapping a Blitz++ > array). Or have a small _strides[30] array at the end, and strides > would point to that (saving you a memory allocation). > > This simple C interface would, I think, alleviate much worries about > speed for small arrays, and even for large arrays. -- From xscottg at yahoo.com Fri Apr 8 11:06:04 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 11:06:04 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <42562AC5.3040502@cox.net> Message-ID: <20050408180523.95022.qmail@web50207.mail.yahoo.com> --- Tim Hochberg wrote: > > The point about not passing around the tuples probably being faster is a > good one. Another thought is that requiring tuples instead of general > sequences would make the helper faster (since one could use > *PyTuple_GET_**ITEM*, which I believe is much faster than > PySequence_GetItem). This would possibly shift more pain onto the > implementer of the object though. I suspect that the best strategy, > orthogonal to requiring all attributes or not, is to use PySequence_Fast > to get a fast sequence and work with that. This means that objects that > return tuples for strides, etc would run at maximum possible speed, > while other sequences would still work. > I hadn't seen this "fast" sequence stuff before. Thanks for the pointer. > > Back to requiring attributes or not. I suspect that the fastest correct > way is to require all attributes, but allow them to be None, in which > case the default value is used. Then any errors are easily bubbled up > and a fast check for None choses whether to use the defaults or not. > How about saying that, for all the optional attributes, if they return None that's to be treated the same way as if they weren't present at all? In other words, they're still optional, but people in the know would know that returning None was probably faster... From xscottg at yahoo.com Fri Apr 8 11:14:27 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 11:14:27 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408074129.GA16479@arbutus.physics.mcmaster.ca> Message-ID: <20050408181314.89274.qmail@web50205.mail.yahoo.com> --- "David M. Cooke" wrote: > > > Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort > > of error checking (it can allegedly raise errors). I suppose that means > > one is supposed to call PyError_Occurred after every call? That's sort > > of painful! > > Yes! Check all C API functions that may return errors! That includes > PySequence_GetItem() and PyLong_AsLongLong. > Sorry, I should have been clear that I was writing example code. I only put the error checking in where I thought it was demonstrating the point. I'd be surprized if it even compiled... Note that the additional error checking is required in the "success" path where the attributes are present. In other words, mandating the attributes be there when they aren't strictly required could make things slower... Cheers, -Scott From xscottg at yahoo.com Fri Apr 8 12:24:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 12:24:02 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: 6667 Message-ID: <20050408192312.91215.qmail@web50206.mail.yahoo.com> --- "David M. Cooke" wrote: > > It seems that people are worried about speed of the attribute-based > array interface when using small arrays in C. > I'm really not worried about it... I just don't want "performance" to be used as an argument for a given design decisions when the proposed change won't actually make things faster. > > Here's an alternative: Define some attribute (for now, call it > [snip] > This would definitely be faster. Faster yet would be doing a PyNumeric_Check (or PyNumarray_Check, or whatever they're called) and just cast the pointer to the underlying representation. If you must go fast, go as fast as possible... I'd rather we didn't add a lot complexity to the array protocol to just go at a medium speed. Cheers, -Scott From oliphant at ee.byu.edu Fri Apr 8 13:55:27 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 8 13:55:27 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> Message-ID: <4256EF45.6070004@ee.byu.edu> David M. Cooke wrote: >It seems that people are worried about speed of the attribute-based >array interface when using small arrays in C. > > I think we are talking about here an *array protocol* (i.e. like the buffer protocol and sequence protocol). So far we have just described the Python level interface. I would like to see an array protocol added (perhaps to the buffer protocol table). This could be done just as David describes --- we don't even need to use the C-pointer (just return a void *pointer which has a version as the first entry). I think this is how the C-level should be handled, I think. Yes, it does not require changes to Python to implement the __array_c__ attribute. But, ultimately, it would be better if we used the C-level protocol concept that Python already uses for other objects. -Travis From perry at stsci.edu Fri Apr 8 14:05:05 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Apr 8 14:05:05 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255B7D6.9000109@ee.byu.edu> References: <20050407211501.60155.qmail@web50203.mail.yahoo.com> <4255B7D6.9000109@ee.byu.edu> Message-ID: <819eb85df29878341dd00521bbba280d@stsci.edu> On Apr 7, 2005, at 6:44 PM, Travis Oliphant wrote: > > I can't think of anything you've missed. > > I'm very supportive of this, but I have to finish scipy.base first. > I think Perry is supportive as well. I know he's been playing > catch-up in the reading. I'm not sure of Todd's opinion. I suspect > he would welcome these changes to Python. > > My preference order is > > 1) the ndarray module and ndarray.h header with these interface > definitions and methods. 2) Add array interface attributes to array > module > 3) Flesh out locked buffer API > 4) Bytes object (with Pickling support) > 5) Fix current buffer object. > I agree as well (I think). Just to be sure I'll restate. These issues are all important, and the the discussion has been very useful to flesh out the proposed array protocol. Nevertheless, I'd put the priority of getting these into Python, or accepted by the Python Dev community lower than actually implementing Numeric3 (aka scipy.base) to the point that it acceptable to both Numeric and numarray communities. True, subsequent changes forced by the acceptance process may require reworking in scipy.base, but I put unification far ahead of getting these various components finished and into Python. I think that's what Travis is getting at too. I've been tied up in other things, but frankly, I haven't seen that much that I have objected to so far in the array protocol discussions to warrant comments from me. I think it has been pretty well done (and I'm about to leave town so I'm going to be out of touch for a week or so, at least mostly) Perry From xscottg at yahoo.com Fri Apr 8 14:43:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 14:43:02 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408214214.45907.qmail@web50206.mail.yahoo.com> --- Andrew Straw wrote: > > > > This behavior is explained by Tim Peters: > > > > http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a > > > I feared it was something like that. (No platform independent way to > represent special values like nan, inf, and so on.) So I think if we're > going to require an encoding character such as '<' or '>' we should also > include one that means native which CAN handle these special values... > And document why it's needed and why it may get one into trouble. > The data is either big endian or little endian (or possibly a single byte in which case it doesn't matter). Whether or not the (hardware, operating system, C runtime library, C compiler, or Python implementation) can handle NaNs or Infs is not a property of the data. What does an additional code or two get you? Let's say we used ']' for big endian native, and '[' for little endian native? Does that just indicate the possible presence of NaNs for Infs in the data? Adding those codes doesn't have any affect on whether or not libraries can deal with them. I guess I'm not understanding something. Cheers, -Scott From cookedm at physics.mcmaster.ca Fri Apr 8 14:52:02 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 8 14:52:02 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: <4256EF45.6070004@ee.byu.edu> (Travis Oliphant's message of "Fri, 08 Apr 2005 14:53:25 -0600") References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> <4256EF45.6070004@ee.byu.edu> Message-ID: Travis Oliphant writes: > David M. Cooke wrote: > >>It seems that people are worried about speed of the attribute-based >>array interface when using small arrays in C. >> >> > I think we are talking about here an *array protocol* (i.e. like the > buffer protocol and sequence > protocol). > > So far we have just described the Python level interface. I would > like to see an array protocol added (perhaps to the buffer protocol > table). This could be done just as David describes --- we don't even > need to use the C-pointer (just return a void *pointer which has a > version as the first entry). The purpose of the CObject was to make it possible to pass it through Python (through the attribute access). > I think this is how the C-level should be handled, I think. Yes, it > does not require changes to Python to implement the __array_c__ > attribute. But, ultimately, it would be better if we used the C-level > protocol concept that Python already uses for other objects. Ah, ok, so you'd have a slot in the type object (like the number, sequence, or buffer protocols), with the appropriate (C-level) functions. This would require it to be in the Python core, though, and would only work for a new version of Python. Alternatively, you have a special attribute/method that returns an object with the right C API -- much like CObjects are used for wrapping Numeric's C API. I would really like to see something working at the C level (so you're not passing dimensions back-and-forth as Python tuples with Python ints), but the Python-level array interface you've proposed will work for now. This should be revisited once people are using the new array interface, and we have an idea of how it's being used, and the performance costs. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From xscottg at yahoo.com Fri Apr 8 16:06:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 16:06:02 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408230455.35465.qmail@web50209.mail.yahoo.com> --- Scott Gilbert wrote: > > --- Andrew Straw wrote: > > > > I feared it was something like that. (No platform independent way to > > represent special values like nan, inf, and so on.) So I think if > > we're going to require an encoding character such as '<' or '>' we > > should also include one that means native which CAN handle these > > special values... And document why it's needed and why it may get one > > into trouble. > > > > Let's say we used ']' for big endian native, and '[' for little endian > native? Does that just indicate the possible presence of NaNs for Infs > in the data? > > Adding those codes doesn't have any affect on whether or not libraries > can deal with them. I guess I'm not understanding something. > I think I'm understanding my problem in understanding :-). There IS a platform independant way to represent NaNs and Infs. It's pretty clearly spelled out in IEEE-754: http://stevehollasch.com/cgindex/coding/ieeefloat.html I think something we've been assuming is that the array data is basically IEEE-754 compliant (maybe it needs to be byteswapped). If that's not true, then we're going to need some new typecodes. We're not supporting the ability to pass VAX floating point around (Are we????). The problem is that you can't make any safe assumptions about whether your current platform will deal with IEEE-754 data in any predictable way if it contains NaNs or Infs. So additional typecodes won't really solve anything. Tim Peter's explanation is a good representation of Python's official position regarding floating point issues, but a much simpler explanation is possible... The struct module in "standard mode" decodes the data one character at a time and builds a float from them. You can see this in the _PyFloat_Unpack8 function in the floatobject.c file. In other words, this routine probably works on a VAX too (taking a IEEE-754 double and building a VAX floating point as it goes). You can also see the comment in there that says it doesn't handle NaNs or Infs. I don't think we need another indicator for '>' big-endian or '<' for little-endian. Cheers, -Scott From konrad.hinsen at laposte.net Fri Apr 8 23:46:00 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Apr 8 23:46:00 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408230455.35465.qmail@web50209.mail.yahoo.com> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> Message-ID: <95b362f578483f1a9ee3e850e108c6d8@laposte.net> On 09.04.2005, at 01:04, Scott Gilbert wrote: > I think something we've been assuming is that the array data is > basically > IEEE-754 compliant (maybe it needs to be byteswapped). If that's not > true, > then we're going to need some new typecodes. We're not supporting the > ability to pass VAX floating point around (Are we????). This discussion has been coming up regularly for a few years. Until now the concensus has always been that Python should make no assumptions that go beyond what a C compiler can promise. Which means no assumptions about floating-point representation. Of course the computing world is changing, and IEEE format may well be ubiquitous by now. Vaxes must be in the museum by now. But how about mainframes? IBM mainframes didn't use IEEE when I used them (last time 15 years ago), and they are still around, possibly compatible to their ancestors. Another detail to consider is that although most machines use the IEEE representation, hardly any respects the IEEE rules for floating point operations in all detail. In particular, trusting that Inf and NaN will be treated as IEEE postulates is a risky business. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From xscottg at yahoo.com Sat Apr 9 09:36:05 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Sat Apr 9 09:36:05 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050409163525.93733.qmail@web50201.mail.yahoo.com> --- konrad.hinsen at laposte.net wrote: > > This discussion has been coming up regularly for a few years. Until now > the concensus has always been that Python should make no assumptions > that go beyond what a C compiler can promise. Which means no > assumptions about floating-point representation. > > Of course the computing world is changing, and IEEE format may well be > ubiquitous by now. Vaxes must be in the museum by now. But how about > mainframes? IBM mainframes didn't use IEEE when I used them (last time > 15 years ago), and they are still around, possibly compatible to their > ancestors. > I've been following this mailing list for a few years now, but I skip a lot of threads. I almost certainly skipped this topic in the past since it wasn't relevant to me. I'm only interested in it now since it's relevant to this data interchange business, so I'm sorry if this is a rehash... Trying to stay portable is a good goal, and I can understand why Python proper would try to adhere to the restrictions it does. Despite the claim, Python makes plenty of assumptions that a standards conformant C compiler could break. If numpy doesn't make some assumptions about floating point representation, it's going to kill the possibity of passing data across machines, and that's pretty unacceptable. I'm not comfortable saying "ubiquitous" since I don't know what the mainframe or super computing community is making use of, and I don't know what sort of little machines Python is running on. The closest thing to a mainframe that I've ever used was a Convex, and I never knew what it's floating point representation was. However, I know that x86, PPC, AMD-64, IA64, Alpha, Sparc, and whatever HPUX and SGIs are running on all use IEEE-754 format. That's probably 99.999% of all machines capable of running Python, and at least that percentage of users. It would be a shame to gum up this typecode thing for situations that don't occur in practice. If it has to be done, then I recommend we use the '@' code in place of the '<' or '>' for platforms that are out of the ordinary. It's important to specify that '@' is only to be used on floating point data that is not IEEE-754. In this case it doesn't mean "native" like it does in the struct module, it means "weird" :-). > > Another detail to consider is that although most machines use the IEEE > representation, hardly any respects the IEEE rules for floating point > operations in all detail. In particular, trusting that Inf and NaN will > be treated as IEEE postulates is a risky business. > See that's the thing. Why burden how you label the data with the restrictions of the current machine? You can take the data off the machine. Whether or not I can rely on what NaN*Inf will give me, I know that I can take NaN and Inf to another machine and get the same interpretation of the data. This whole thread started because Andrew Straw showed that struct.pack(' References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> Message-ID: <425808B4.8070005@ee.byu.edu> konrad.hinsen at laposte.net wrote: > On 09.04.2005, at 01:04, Scott Gilbert wrote: > >> I think something we've been assuming is that the array data is >> basically >> IEEE-754 compliant (maybe it needs to be byteswapped). If that's >> not true, >> then we're going to need some new typecodes. We're not supporting the >> ability to pass VAX floating point around (Are we????). > No, in moving from the struct modules character codes we are trying to do something more platform independent because it is very likely that different platforms will want to exchange binary data. IEEE-754 is a great standard to build an interface around. Data sharing was the whole reason the standard emerged and a lot of companies got on board. > > This discussion has been coming up regularly for a few years. Until > now the concensus has always been that Python should make no > assumptions that go beyond what a C compiler can promise. Which means > no assumptions about floating-point representation. > > Of course the computing world is changing, and IEEE format may well > be ubiquitous by now. Vaxes must be in the museum by now. But how > about mainframes? IBM mainframes didn't use IEEE when I used them > (last time 15 years ago), and they are still around, possibly > compatible to their ancestors. I found the following piece, written about 6 years ago interesting: http://www.research.ibm.com/journal/rd/435/schwarz.html Basically, it states that chips in newer IBM mainframes support the IEEE 754 standard. > > Another detail to consider is that although most machines use the > IEEE representation, hardly any respects the IEEE rules for floating > point operations in all detail. In particular, trusting that Inf and > NaN will be treated as IEEE postulates is a risky business. But, this can be handled with platform-dependendent C-code when and if problems arise. -Travis From strawman at astraw.com Sat Apr 9 12:36:03 2005 From: strawman at astraw.com (Andrew Straw) Date: Sat Apr 9 12:36:03 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <425808B4.8070005@ee.byu.edu> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu> Message-ID: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com> Here's an email Todd Miller sent me (I hoped he'd send it directly to the list, but I'll forward it. Todd, I hope you don't mind.) Todd Miller wrote: > On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote: >> Hi Todd, >> >> Could you join in on this thread? I think you wrote the ieeespecial >> stuff in numarray, so it's clear you have a much better understanding >> of >> the issues than I do... >> >> Cheers! >> Andrew > > My own understanding is limited, but I can say a few things that might > make the status of numarray clearer. My assumptions for numarray were > that: > > 1. Floating point values are 32-bit or 64-bit entities which are stored > in IEEE-754 format. This is a basic assumption of numarray.ieeespecial > so I expect it simply won't work on a VAX. There's no checking for > this. > > 2. The platforms that I care about, AMD/Intel Windows/Linux, PowerPC > OS-X, and Ultra-SPARC Solaris, all seem to provide IEEE-754 floating > point. ieeespecial has been tested to work there. > > 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit > unsigned > integers, and contiguous ranges on those integers are used to > represent > special values like NAN and INF. Platform byte ordering for the > IEEE-754 floating point numbers mirrors byte ordering for integers so > the ieeespecial NAN detection code works in a cross platform way *and* > values exported from one IEEE-754 platform will work with ieeespecial > when imported on another. It's important to note that special values > are not unique: there is no single NAN value; it's a bit range. > > 4. numarray leaks IEEE-754 special values out into Python floating > point > scalars. This may be bad form. I do this because (1) they repr > understandably if not in a platform independent way and (2) people need > to get at them. I noticed recently that ieeespecial.nan == > ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and > correct ones (False) for Python-2.4. I haven't looked at what the > array > version does yet: array(nan) == array(nan). The point to be taken > from > this is that the level at which numarray ieee special value handling > works or doesn't work is really restricted to (1) detecting certain > ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers > for array code (no guarantees) (3) the behavior of Python itself > (improving). > > In the context of the array protocol (looking very nice by the way) my > thinking is that non-IEEE-754 floating point could be described with > bit > fields and that the current type codes should mean IEEE-754. > > Some minor things I noticed in the array interface: > > 1. The packing order of bit fields is not clear. In C, my experience > is that some compilers pack bit structs towards the higher order bits > of > an integer, and some towards the lower. More info to clarify that > would be helpful. > > 2. I saw no mention that we're talking about a protocol. I'm sure > that's clear to everyone following this discussion closely, but I > didn't see it in the spec. It might make sense to allude to the C > helper functions and potential for additions to the Python type struct > even if they're not spelled out. > > Regards, > Todd On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote: > konrad.hinsen at laposte.net wrote: > >> On 09.04.2005, at 01:04, Scott Gilbert wrote: >> >>> I think something we've been assuming is that the array data is >>> basically >>> IEEE-754 compliant (maybe it needs to be byteswapped). If that's >>> not true, >>> then we're going to need some new typecodes. We're not supporting >>> the >>> ability to pass VAX floating point around (Are we????). >> > > No, in moving from the struct modules character codes we are trying to > do something more platform independent because it is very likely that > different platforms will want to exchange binary data. IEEE-754 is a > great standard to build an interface around. Data sharing was the > whole reason the standard emerged and a lot of companies got on board. > >> >> This discussion has been coming up regularly for a few years. Until >> now the concensus has always been that Python should make no >> assumptions that go beyond what a C compiler can promise. Which >> means no assumptions about floating-point representation. >> >> Of course the computing world is changing, and IEEE format may well >> be ubiquitous by now. Vaxes must be in the museum by now. But how >> about mainframes? IBM mainframes didn't use IEEE when I used them >> (last time 15 years ago), and they are still around, possibly >> compatible to their ancestors. > > I found the following piece, written about 6 years ago interesting: > > http://www.research.ibm.com/journal/rd/435/schwarz.html > > Basically, it states that chips in newer IBM mainframes support the > IEEE 754 standard. > >> >> Another detail to consider is that although most machines use the >> IEEE representation, hardly any respects the IEEE rules for floating >> point operations in all detail. In particular, trusting that Inf and >> NaN will be treated as IEEE postulates is a risky business. > > But, this can be handled with platform-dependendent C-code when and if > problems arise. > -Travis > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jmiller at stsci.edu Sat Apr 9 16:18:00 2005 From: jmiller at stsci.edu (Todd Miller) Date: Sat Apr 9 16:18:00 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu> <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com> Message-ID: <1113088643.5363.8.camel@jaytmiller.comcast.net> On Sat, 2005-04-09 at 12:35 -0700, Andrew Straw wrote: > Here's an email Todd Miller sent me (I hoped he'd send it directly to > the list, but I'll forward it. Todd, I hope you don't mind.) No, I don't mind. I intended to send it to the list but left in a rush this morning. Todd > > > On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote: > >> Hi Todd, > >> > >> Could you join in on this thread? I think you wrote the ieeespecial > >> stuff in numarray, so it's clear you have a much better understanding > >> of > >> the issues than I do... > >> > >> Cheers! > >> Andrew > > > > My own understanding is limited, but I can say a few things that might > > make the status of numarray clearer. My assumptions for numarray were > > that: > > > > 1. Floating point values are 32-bit or 64-bit entities which are stored > > in IEEE-754 format. This is a basic assumption of numarray.ieeespecial > > so I expect it simply won't work on a VAX. There's no checking for > > this. > > > > 2. The platforms that I care about, AMD/Intel Windows/Linux, PowerPC > > OS-X, and Ultra-SPARC Solaris, all seem to provide IEEE-754 floating > > point. ieeespecial has been tested to work there. > > > > 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit > > unsigned > > integers, and contiguous ranges on those integers are used to > > represent > > special values like NAN and INF. Platform byte ordering for the > > IEEE-754 floating point numbers mirrors byte ordering for integers so > > the ieeespecial NAN detection code works in a cross platform way *and* > > values exported from one IEEE-754 platform will work with ieeespecial > > when imported on another. It's important to note that special values > > are not unique: there is no single NAN value; it's a bit range. > > > > 4. numarray leaks IEEE-754 special values out into Python floating > > point > > scalars. This may be bad form. I do this because (1) they repr > > understandably if not in a platform independent way and (2) people need > > to get at them. I noticed recently that ieeespecial.nan == > > ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and > > correct ones (False) for Python-2.4. I haven't looked at what the > > array > > version does yet: array(nan) == array(nan). The point to be taken > > from > > this is that the level at which numarray ieee special value handling > > works or doesn't work is really restricted to (1) detecting certain > > ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers > > for array code (no guarantees) (3) the behavior of Python itself > > (improving). > > > > In the context of the array protocol (looking very nice by the way) my > > thinking is that non-IEEE-754 floating point could be described with > > bit > > fields and that the current type codes should mean IEEE-754. > > > > Some minor things I noticed in the array interface: > > > > 1. The packing order of bit fields is not clear. In C, my experience > > is that some compilers pack bit structs towards the higher order bits > > of > > an integer, and some towards the lower. More info to clarify that > > would be helpful. > > > > 2. I saw no mention that we're talking about a protocol. I'm sure > > that's clear to everyone following this discussion closely, but I > > didn't see it in the spec. It might make sense to allude to the C > > helper functions and potential for additions to the Python type struct > > even if they're not spelled out. > > > > Regards, > > Todd > > > On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote: > > > konrad.hinsen at laposte.net wrote: > > > >> On 09.04.2005, at 01:04, Scott Gilbert wrote: > >> > >>> I think something we've been assuming is that the array data is > >>> basically > >>> IEEE-754 compliant (maybe it needs to be byteswapped). If that's > >>> not true, > >>> then we're going to need some new typecodes. We're not supporting > >>> the > >>> ability to pass VAX floating point around (Are we????). > >> > > > > No, in moving from the struct modules character codes we are trying to > > do something more platform independent because it is very likely that > > different platforms will want to exchange binary data. IEEE-754 is a > > great standard to build an interface around. Data sharing was the > > whole reason the standard emerged and a lot of companies got on board. > > > >> > >> This discussion has been coming up regularly for a few years. Until > >> now the concensus has always been that Python should make no > >> assumptions that go beyond what a C compiler can promise. Which > >> means no assumptions about floating-point representation. > >> > >> Of course the computing world is changing, and IEEE format may well > >> be ubiquitous by now. Vaxes must be in the museum by now. But how > >> about mainframes? IBM mainframes didn't use IEEE when I used them > >> (last time 15 years ago), and they are still around, possibly > >> compatible to their ancestors. > > > > I found the following piece, written about 6 years ago interesting: > > > > http://www.research.ibm.com/journal/rd/435/schwarz.html > > > > Basically, it states that chips in newer IBM mainframes support the > > IEEE 754 standard. > > > >> > >> Another detail to consider is that although most machines use the > >> IEEE representation, hardly any respects the IEEE rules for floating > >> point operations in all detail. In particular, trusting that Inf and > >> NaN will be treated as IEEE postulates is a risky business. > > > > But, this can be handled with platform-dependendent C-code when and if > > problems arise. > > -Travis > > > > > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real > > users. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From tchur at optushome.com.au Sat Apr 9 17:25:43 2005 From: tchur at optushome.com.au (Tim Churches) Date: Sat Apr 9 17:25:43 2005 Subject: [Numpy-discussion] Silent overflow of Int32 array Message-ID: <4258721E.1080905@optushome.com.au> I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit Linux): >>> import Numeric as N >>> a = N.array((2000000000,1000000000),typecode=N.Int32) >>> N.add.reduce(a) -1294967296 OK, it is an elementary mistake, but the silent overflow caught me unawares. casting the array to Float64 before summing it avoids the error, but in my instance the actual data is a rank-1 array of 21 million integers with a mean value of about 140 (which adds up more than sys.maxint), and casting to Float64 will use quite a lot of memory (as well as taking some time). Any advice for catching or avoiding such overflow without always incurring a performance and memory hit by always casting to Float64? Shouldn't add.reduce() be checking for overflow and raising an error? Then it would be easy to upcast only when overflow (or underflow) occurs, rather than always. Tim C From jmiller at stsci.edu Sun Apr 10 07:25:08 2005 From: jmiller at stsci.edu (Todd Miller) Date: Sun Apr 10 07:25:08 2005 Subject: [Numpy-discussion] Silent overflow of Int32 array In-Reply-To: <4258721E.1080905@optushome.com.au> References: <4258721E.1080905@optushome.com.au> Message-ID: <1113143026.5359.35.camel@jaytmiller.comcast.net> On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote: > I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit > Linux): > > >>> import Numeric as N > >>> a = N.array((2000000000,1000000000),typecode=N.Int32) > >>> N.add.reduce(a) > -1294967296 > > OK, it is an elementary mistake, but the silent overflow caught me > unawares. casting the array to Float64 before summing it avoids the > error, but in my instance the actual data is a rank-1 array of 21 > million integers with a mean value of about 140 (which adds up more than > sys.maxint), and casting to Float64 will use quite a lot of memory (as > well as taking some time). > > Any advice for catching or avoiding such overflow without always > incurring a performance and memory hit by always casting to Float64? Here's what numarray does: >>> import numarray as N >>> a = N.array((2000000000,1000000000),typecode=N.Int32) >>> N.add.reduce(a) -1294967296 So basic reductions in numarray have the same "careful while you're shaving" behavior as Numeric; it's fast but easy to screw up. But: >>> a.sum() 3000000000L >>> a.sum(type='d') 3000000000.0 a.sum() blockwise upcasts to the largest type of kind on the fly, in this case, Int64. This avoids the storage overhead of typecasting the entire array. A better name for the method would have been sumall() since it sums all elements of a multi-dimensional array. The flattening process reduces on one dimension before flattening preventing a full copy of a discontiguous array. It could be smarter about choosing the dimension of the initial reduction. Regards, Todd From pearu at cens.ioc.ee Mon Apr 11 00:59:14 2005 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Mon Apr 11 00:59:14 2005 Subject: [Numpy-discussion] scipy.base Message-ID: Hi Travis, I have committed scipy.{distutils,base} to Numeric3 CVS repository. scipy.distutils is a reviewed version of scipy_distutils and as one of its new features there is Configuration class that allows one to write much simpler setup.py files for subpackages. See setup.py files under Numeric3/scipy directory for examples. scipy.base is a very minimal copy of scipy_base plus ndarray modules. When using setup_scipy.py for building, the ndarray package is installed as scipy.base and from scipy.base import * should work equivalently to from ndarray import * for instance. I have used information from Numeric3/setup.py to implement Numeric3/scipy/base/setup.py and it should be updated whenever Numeric3/setup.py is changed. However, I would recommend start using scipy.base instead of ndarray as using both may cause unexpected behaviour when installed ndarray is older than scipy.base installation (see [*]). In Numeric3 CVS repository that would mean replacing setup.py with setup_scipy.py and any modification to ndarray setup scripts should be done in scipy/base/setup.py. We can apply this step whenever you feel confident with new setup.py files. Let me know if you have any troubles with them. To clean up Numeric3 CVS repository completely then Include, Src, Lib, CodeGenerators directories should be moved under the scipy/base directory. However, this step can be omitted if you would prefer working with files at the top directory of Numeric3. Current setup.py scripts fully support this approach as well. There are also few open issues and questions. First, how to name Numeric3 project when it installs scipy.base, scipy.distutils, Numeric packages, etc? This name will be used when creating source distributions and also as part of the path where header files will be installed. At the moment setup_scipy.py uses the name 'ndarray'. And so `setup_scipy.py sdist`, for example, produces ndarray-30.0.tar.gz file; `setup_scipy.py install` installs header files under /include/ndarray/ directory. Though this is fine with me, I am not sure that this is an ideal situation. I think we should choose the name now and stick to it forever, especially since 3rd party extension modules need to know where to look for ndarray header files. This name cannot be 'numarray', obviously, but there are options like 'ndarray', 'numpy', and may be others. In fact, 'Numeric' (with version 3x.x) would be also an option but that would be certainly cause some problems when one wants both Numeric 2x.x and Numeric 3x.x to be installed in the system, the header files would end up in the same directory, for instance. As a workaround, we could force installing Numeric3 header files to /include/Numeric/3/ or something. I acctually like this idea but I wonder what other think about this. Second, is it already possible to use ndarray C/API as a replacement of Numeric C/API, i.e. would simple replacement of #include "Numeric/arrayobject.h" with #include "ndarray/arrayobject.h" work? And if not, will it ever be? This would be interesting to know as an extension writer. [*] Due to keeping changes to Numeric3 sources minimal, scipy.base multiarray and umath modules first try to import ndarray and then scipy.base whenever ndarray is missing. One should remove ndarray installation from the system before using scipy.base. Regards, Pearu From konrad.hinsen at laposte.net Mon Apr 11 02:30:28 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Apr 11 02:30:28 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <425808B4.8070005@ee.byu.edu> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu> Message-ID: On Apr 9, 2005, at 18:54, Travis Oliphant wrote: > No, in moving from the struct modules character codes we are trying to > do something more platform independent because it is very likely that > different platforms will want to exchange binary data. IEEE-754 is a > great standard to build For data exchange between platforms, i.e. through files and network connections, XDR is arguably a better choice. It actually uses IEEE for floats, but XDR libraries provide conversion code for other platforms. It also takes care of byte ordering. > an interface around. Data sharing was the whole reason the standard > emerged and a lot of companies got on board. I think the main reason was standardization of precision, range, and operations, to make floating-point code more portable. This has had moderate success, as 100% IEEE platforms are rare if they exist at all. >> Another detail to consider is that although most machines use the >> IEEE representation, hardly any respects the IEEE rules for floating >> point operations in all detail. In particular, trusting that Inf and >> NaN will be treated as IEEE postulates is a risky business. > > But, this can be handled with platform-dependendent C-code when and if > problems arise. Can it? I have faint memories about Tim Peters explaining why and how handling IEEE in C code is a pain. Anyway, it would be a good idea to get his opinion on whatever proposal about IEEE before implementing it. Konrad. From tchur at optushome.com.au Mon Apr 11 13:52:19 2005 From: tchur at optushome.com.au (Tim Churches) Date: Mon Apr 11 13:52:19 2005 Subject: [Numpy-discussion] Silent overflow of Int32 array In-Reply-To: <1113143026.5359.35.camel@jaytmiller.comcast.net> References: <4258721E.1080905@optushome.com.au> <1113143026.5359.35.camel@jaytmiller.comcast.net> Message-ID: <425AE33C.30403@optushome.com.au> Todd Miller wrote: > On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote: > >>I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit >>Linux): >> >> >>> import Numeric as N >> >>> a = N.array((2000000000,1000000000),typecode=N.Int32) >> >>> N.add.reduce(a) >>-1294967296 >> >>OK, it is an elementary mistake, but the silent overflow caught me >>unawares. casting the array to Float64 before summing it avoids the >>error, but in my instance the actual data is a rank-1 array of 21 >>million integers with a mean value of about 140 (which adds up more than >>sys.maxint), and casting to Float64 will use quite a lot of memory (as >>well as taking some time). >> >>Any advice for catching or avoiding such overflow without always >>incurring a performance and memory hit by always casting to Float64? > > > Here's what numarray does: > > >>>>import numarray as N >>>>a = N.array((2000000000,1000000000),typecode=N.Int32) >>>>N.add.reduce(a) > > -1294967296 > > So basic reductions in numarray have the same "careful while you're > shaving" behavior as Numeric; it's fast but easy to screw up. Sure, but how does one be careful? It seems that for any array of two integers or more which could sum to more than sys.maxint or less than -sys.maxint, add.reduce() in both NumPy and Numeric will give either a) the correct answer or b) the incorrect answer, and short of adding up the array using a safer but much slower method, there is no way of determining if the answer provided (quickly) by add.reduce is right or wrong? Which seems to make it fast but useless (for integer arrays, at least? Is that an unfair summary? Can anyone point me towards a method for using add.reduce() on small arrays of large integers with values in the billions, or on large arrays of fairly small integer values, which will not suddenly and without warning give the wrong answer? > > But: > > >>>>a.sum() > > 3000000000L > >>>>a.sum(type='d') > > 3000000000.0 > > a.sum() blockwise upcasts to the largest type of kind on the fly, in > this case, Int64. This avoids the storage overhead of typecasting the > entire array. That's on a 64-bit platform, right? The same method could be used to cast the accumulator to a Float64 on a 32-bit platform to avoid casting the entire array? > A better name for the method would have been sumall() since it sums all > elements of a multi-dimensional array. The flattening process reduces > on one dimension before flattening preventing a full copy of a > discontiguous array. It could be smarter about choosing the dimension > of the initial reduction. OK, thanks. Unfortunately it is not possible for us to port our application to numarray at the moment. But the insight is most helpful. Tim C From oliphant at ee.byu.edu Mon Apr 11 17:12:25 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 11 17:12:25 2005 Subject: [Numpy-discussion] scipy.base In-Reply-To: References: Message-ID: <425B1182.7060102@ee.byu.edu> Pearu Peterson wrote: >Hi Travis, > >I have committed scipy.{distutils,base} to Numeric3 CVS repository. >scipy.distutils is a reviewed version of scipy_distutils and >as one of its new features there is Configuration class that allows >one to write much simpler setup.py files for subpackages. See setup.py >files under Numeric3/scipy directory for examples. scipy.base is a >very minimal copy of scipy_base plus ndarray modules. > > Thank you, thank you for your help with this. >When using setup_scipy.py for building, the ndarray package is installed >as scipy.base and > > from scipy.base import * > >should work equivalently to > > from ndarray import * > >for instance. > > I don't like from ndarray import *. It's only been a place-holder. Let's get rid of it as soon as possible. >To clean up Numeric3 CVS repository completely then Include, Src, Lib, >CodeGenerators directories should be moved under the scipy/base directory. >However, this step can be omitted if you would prefer working with files >at the top directory of Numeric3. > I have no preference here. Whatever works best. >First, how to name Numeric3 project when it installs scipy.base, >scipy.distutils, Numeric packages, etc? This name will be used when >creating source distributions and also as part of the path where header >files will be installed. At the moment setup_scipy.py uses the name >'ndarray'. > I don't like the name ndarray -- it's too limiting. Why not scipy_core? >In fact, 'Numeric' (with version 3x.x) would be also an option but that >would be certainly cause some problems when one wants both Numeric 2x.x >and Numeric 3x.x to be installed in the system, the header files would end >up in the same directory, for instance. As a workaround, we could force >installing Numeric3 header files to /include/Numeric/3/ or >something. I acctually like this idea but I wonder what other think about >this. > > How about include/scipy? >Second, is it already possible to use ndarray C/API as a replacement of >Numeric C/API, i.e. would simple replacement of > > #include "Numeric/arrayobject.h" > >with > > #include "ndarray/arrayobject.h" > >work? And if not, will it ever be? This would be interesting to know as an >extension writer. > > This should work fine. All of the old C-API is there (there are some new calls, but the old ones should still work). The only issue is that one of the calls (PyArray_Take I think now uses a standardized PyArrayObject * as one of it's arguments instead of a PyObject *). This shouldn't be a problem, since you always had to call it with an array. It's just now more explicit, but could lead to a warning. >[*] Due to keeping changes to Numeric3 sources minimal, scipy.base >multiarray and umath modules first try to import ndarray and then >scipy.base whenever ndarray is missing. One should remove ndarray >installation from the system before using scipy.base. > > I don't mind changing the package names entirely at this point. -Travis From oliphant at ee.byu.edu Tue Apr 12 16:39:23 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 12 16:39:23 2005 Subject: [Numpy-discussion] Subclassing and metadata Message-ID: <425C5BDF.1010802@ee.byu.edu> I think I've found a possible solution for subclasses that want to handle metadata. Essentially, any subclass that defines the method _update_meta(self, other) will get that method called when an array is sliced, or subscripted. Anytime an array is created where a subtype is the caller, this method will be called if it is available. Here is a simple example: import ndarray class subclass(ndarray.ndarray): def __new__(self, shape, *args, **kwds): self = ndarray.ndarray.__new__(subclass, shape, 'V4') return self def __init__(self, shape, *args, **kwds): self.dict = kwds return def _update_meta(self, obj): self.dict = obj.dict Comments? -Travis From pearu at cens.ioc.ee Wed Apr 13 04:06:00 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Wed Apr 13 04:06:00 2005 Subject: [Numpy-discussion] scipy.base In-Reply-To: <425B1182.7060102@ee.byu.edu> Message-ID: On Mon, 11 Apr 2005, Travis Oliphant wrote: > >When using setup_scipy.py for building, the ndarray package is installed > >as scipy.base and > > > > from scipy.base import * > > > >should work equivalently to > > > > from ndarray import * > > > >for instance. > > > > > I don't like from ndarray import *. It's only been a place-holder. > Let's get rid of it as soon as possible. Done in CVS. > >To clean up Numeric3 CVS repository completely then Include, Src, Lib, > >CodeGenerators directories should be moved under the scipy/base directory. > >However, this step can be omitted if you would prefer working with files > >at the top directory of Numeric3. > > > I have no preference here. Whatever works best. Directory Include/ndarray/ is now moved to scipy/base/Include/scipy/base/. I'l move other directories as well. > >First, how to name Numeric3 project when it installs scipy.base, > >scipy.distutils, Numeric packages, etc? This name will be used when > >creating source distributions and also as part of the path where header > >files will be installed. At the moment setup_scipy.py uses the name > >'ndarray'. > > > I don't like the name ndarray -- it's too limiting. Why not scipy_core? > > >In fact, 'Numeric' (with version 3x.x) would be also an option but that > >would be certainly cause some problems when one wants both Numeric 2x.x > >and Numeric 3x.x to be installed in the system, the header files would end > >up in the same directory, for instance. As a workaround, we could force > >installing Numeric3 header files to /include/Numeric/3/ or > >something. I acctually like this idea but I wonder what other think about > >this. > > > > > How about include/scipy? Without going into details of distutils restrictions for various options, I found that #include "scipy/base/arrayobject.h" option works best. And the name of the Numeric3 package is now scipy_core. All this is implemented in Numeric3 CVS now. > >Second, is it already possible to use ndarray C/API as a replacement of > >Numeric C/API, i.e. would simple replacement of > > > > #include "Numeric/arrayobject.h" > > > >with > > > > #include "ndarray/arrayobject.h" > > > >work? And if not, will it ever be? This would be interesting to know as an > >extension writer. > > > > > This should work fine. Great! Thanks, Pearu From alexandre.guimond at mirada-solutions.com Wed Apr 13 18:10:47 2005 From: alexandre.guimond at mirada-solutions.com (Alexandre Guimond) Date: Wed Apr 13 18:10:47 2005 Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images Message-ID: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com> Hi all. I've been looking at numarray to do some image processing. A lot of the work I do deal with transforming images, either with affine transformations, or vector field. Numarray seems somewhat well equiped to address these issues, but I am concerned about one aspect. It seems that the transformation code (affine_transforrm and geometric_transform) computes input coordonates for every output coordinate in the resulting array. If I have an RGB image for which the transformation is the same for all 3 RGB channels, I would assume that this will triple the workload unncessarily. It might have a dramatic effect for the geometric transformation which will most often be slower then affine. Is there any way around this, e.g. is it possible to specify numarray to use the same interpolation coefficients for the last "n" dimention of the array, or to tell numarray to only compute interpolation coefficients and apply those seperatly for each channel? thx for any help / info. alex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From verveer at embl-heidelberg.de Thu Apr 14 02:45:45 2005 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Thu Apr 14 02:45:45 2005 Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images In-Reply-To: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com> References: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com> Message-ID: <14ba52860a6e1f838975c3c04a0dafc9@embl-heidelberg.de> Hi Alex, It is correct that there is an amount of work duplicated, if you do an identical interpolation operation on several arrays. There is currently no way to avoid this. This can be fixed and I will have a look to see how easy that is to do. If it is not easy to factor out that part of the code, I will most likely not be able to spend the time to do it though... You could at least use the map_coordinates function that will allow you to use a pre-calculated coordinate mapping. There will still be duplication of work, but al least you avoid the duplication of the calculation of the coordinate transformation. Peter > Hi all. > ? > I've been looking at numarray to do some image processing. A lot of > the work I do deal with transforming images, either with affine > transformations, or vector field. Numarray seems somewhat well equiped > to address these issues, but I am concerned about one aspect. It seems > that the transformation code (affine_transforrm and > geometric_transform) computes input coordonates for every output > coordinate in the resulting array. If I have an RGB image for which > the transformation is the same for all 3 RGB channels, I would assume > that this will triple the workload unncessarily. It might have a > dramatic effect for the geometric transformation which will most often > be slower then affine. Is there any way around this, e.g. is it > possible to specify numarray to use the same interpolation > coefficients for the last "n" dimention of the array, or to tell > numarray to only compute interpolation coefficients and apply those > seperatly for each channel? > ? > thx for any help / info. > ? > alex. From jmiller at stsci.edu Thu Apr 14 07:47:02 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 14 07:47:02 2005 Subject: [Numpy-discussion] ANN: numarray-1.3.0 Message-ID: <1113489855.29880.14.camel@halloween.stsci.edu> Release Notes for numarray-1.3.0 Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, arrays of heterogeneous records, string arrays, and in-place operation on memory mapped files. I. ENHANCEMENTS 1. Migration of NumArray.__del__ to C (tp_dealloc). Overall performance. 2. Removal of dictionary update from array view creation improves performance of view/slice/subarray creation. This should e.g. improve the performance of wxPython sequence protocol access to Nx2 arrays. Subclasses now need to do a.flags |= numarray.generic._UPDATEDICT to ensure that dictionary based attributes are inherited by views. NumArrays no longer do this by default. 2. Modifications to support scipy.special. 3. Removal of an unnecessary getattr() from ufunc calling sequence. Ufunc performance. II. BUGS FIXED / CLOSED 1179355 average() broken in numarray 1.2.3 1167184 Floating point exception in numarray's dot() 1151892 Bug in matrixmultiply with zero size arrays 1160184 RecArray reversal 1156172 Incorect error message for shape incompatability 1155538 Incorrect error message when multiplying arrays See http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse for more details. III. CAUTIONS This release should be backward binary compatible with numarray 1.1.1 and 1.2.3. WHERE ----------- Numarray-1.3.0 windows executable installers, source code, and manual is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS ------------------------------ numarray-1.3.0 requires Python 2.2.2 or greater. Python-2.3.4 or Python-2.4.1 is recommended. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. We'd like to acknowledge the assitance of Francesc Alted, Paul Dubois, Sebastian Haase, Chuck Harris, Tim Hochberg, Nadav Horesh, Edward C. Jones, Eric Jones, Jochen Kuepper, Travis Oliphant, Pearu Peterson, Peter Verveer, Colin Williams, Rory Yorke, and everyone else who has contributed with comments and feedback. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From jdhunter at ace.bsd.uchicago.edu Thu Apr 14 14:14:13 2005 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Thu Apr 14 14:14:13 2005 Subject: [Numpy-discussion] ANN: matplotlib-0.80 Message-ID: A lot of development has gone into matplotlib since the last major release, which I'll summarize here. For details, see the notes for the incremental releases at http://matplotlib.sf.net/whats_new.html. Improvements since 0.70 -- contouring: Lots of new contour funcitonality with line and polygon contours provided by contour and contourf. Automatic inline contour labeling with clabel. See http://matplotlib.sourceforge.net/screenshots.html#pcolor_demo -- QT backend Sigve Tjoraand, Ted Drain and colleagues at the JPL collaborated on a QTAgg backend -- Unicode strings are rendered in the agg and postscript backends. Currently, all the symbols in the unicode string have to be in the active font file. In later releases we'll try and support symbols from multiple ttf files in one string. See examples/unicode_demo.py -- map and projections A new release of the basemap toolkit - See http://matplotlib.sourceforge.net/screenshots.html#plotmap -- Auto-legends The automatic placement of legends is now supported with loc='best'; see examples/legend_auto.py. We did this at the matplotlib sprint at pycon -- Thanks John Gill and Phil! Note that your legend will move if you interact with your data and you force data under the legend line. If this is not what you want, use a designated location code. -- Quiver (direction fields) Ludovic Aubry contributed a patch for the matlab compatible quiver method. This makes a direction field with arrows. See examples/quiver_demo.py -- Performance optimizations Substantial optimizations in line marker drawing in agg -- Robust log plots Lots of work making log plots "just work". You can toggle log y Axes with the 'l' command -- nonpositive data are simply ignored and no longer raise exceptions. log plots should be a lot faster and more robust -- Many more plotting functions, bugfixes, and features, detailed in the 0.71, 0.72, 0.73 and 0.74 point release notes at http://matplotlib.sourceforge.net/whats_new.html http://matplotlib.sourceforge.net JDH From simon at arrowtheory.com Thu Apr 14 23:07:03 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Apr 14 23:07:03 2005 Subject: [Numpy-discussion] numarray cholesky solver ? Message-ID: <20050415160425.42cb20a6.simon@arrowtheory.com> Hi, I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to convert to and from Numeric (scipy) array's without a memcopy ? thankyou, Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From arnd.baecker at web.de Thu Apr 14 23:58:08 2005 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 14 23:58:08 2005 Subject: [Numpy-discussion] % and fmod Message-ID: Dear all, I encountered the following puzzling behaviour of the modulo operator %: In [1]: import Numeric In [2]: print Numeric.__version__ 23.8 In [3]: x=Numeric.arange(10.0) In [4]: print x%4 [ 0. 1. 2. 3. 0. 1. 2. 3. 0. 1.] In [5]: print 3.0%4 3.0 In [6]: print (-x)%4 [-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.] # <====== In [7]: print (-3.0)%4 # vs. 1.0 # <====== (OK) In [8]: print Numeric.fmod(x,4) [ 0. 1. 2. 3. 0. 1. 2. 3. 0. 1.] In [9]: print Numeric.fmod(-x,4) [-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.] So it seems that for arrays % behaves like fmod! This seems in contrast to what one finds in the python 2.3 documentation: "5.6. Binary arithmetic operations" """The % (modulo) operator yields the remainder from the division of the first argument by the second. [...] The arguments may be floating point numbers, e.g., 3.14%0.7 equals 0.34 (since 3.14 equals 4*0.7 + 0.34.) The modulo operator always yields a result with the same sign as its second operand (or zero); the absolute value of the result is strictly smaller than the absolute value of the second operand.""" I am presently teaching a course on computational physics with python and the students have huge difficulties with % behaving differently for arrays and scalars. I am aware that (according to Kernighan/Ritchie) the C standard does not define the result of % when any of the operands is negative. So can someone help me: is the different behaviour of % for scalars and arrays a bug, a feature, or what should I tell my students ? ;-). Many thanks, Arnd P.S.: BTW: the documentation for fmod and remainder is pretty short on this: In [3]:fmod? Type: ufunc String Form: Namespace: Interactive Docstring: fmod(x,y) is remainder(x,y) In [4]:remainder? Type: ufunc String Form: Namespace: Interactive Docstring: returns remainder of division elementwise Are contributions of more detailed doc-strings welcome ? P.P.S.: for numarray one gets even less information: In [1]: import numarray In [2]: numarray.fmod? Type: _BinaryUFunc Base Class: String Form: Namespace: Interactive Docstring: Class for ufuncs with 2 input and 1 output arguments In [3]: numarray.remainder? Type: _BinaryUFunc Base Class: String Form: Namespace: Interactive Docstring: Class for ufuncs with 2 input and 1 output arguments In [4]: print numarray.__version__ 1.1.1 P^3.S: scipy's mod seems to be an alternative: In [1]: import scipy In [2]: scipy.mod? Type: function Base Class: String Form: Namespace: Interactive File: /usr/lib/python2.3/site-packages/scipy_base/function_base.py Definition: scipy.mod(x, y) Docstring: x - y*floor(x/y) For numeric arrays, x % y has the same sign as x while mod(x,y) has the same sign as y. In [3]: x=-scipy.arange(10) In [4]: x%4 Out[4]: array([ 0, -1, -2, -3, 0, -1, -2, -3, 0, -1]) In [5]: scipy.mod(x,4) Out[5]: array([ 0., 3., 2., 1., 0., 3., 2., 1., 0., 3.]) In [6]: scipy.mod?? Type: function Base Class: String Form: Namespace: Interactive File: /usr/lib/python2.3/site-packages/scipy_base/function_base.py Definition: scipy.mod(x, y) Source: def mod(x,y): """ x - y*floor(x/y) For numeric arrays, x % y has the same sign as x while mod(x,y) has the same sign as y. """ return x - y*Numeric.floor(x*1.0/y) From jmiller at stsci.edu Fri Apr 15 03:46:37 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 15 03:46:37 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com> References: <20050415160425.42cb20a6.simon@arrowtheory.com> Message-ID: <1113561843.5030.9.camel@jaytmiller.comcast.net> On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote: > Hi, > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > Is this in the pipeline, No. Most of the add-on subpackages in numarray, with the exception of convolve, image, and nd_image, are ports from Numeric. > or do we go ahead and add the dpotrs based functionality ourselves ? > > Alternatively, are we able to > convert to and from Numeric (scipy) array's without a memcopy ? Unless Numeric has been adapted to support the new array interface, I think this (converting from numarray to Numeric) has still not been properly addressed. Regards, Todd From luszczek at cs.utk.edu Fri Apr 15 07:11:20 2005 From: luszczek at cs.utk.edu (Piotr Luszczek) Date: Fri Apr 15 07:11:20 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com> References: <20050415160425.42cb20a6.simon@arrowtheory.com> Message-ID: <425FCAFC.3010603@cs.utk.edu> Hi all, the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I apologize if every body knows that). I'm on the LAPACK team right now and we were wondering if we should provide bindings for Python. It is almost trivial to do with Pyrex. But Numeric and numarray already have some functionality in it. Also, I don't know about popularity of PyLapack. So my question is if there is a need for the specialized LAPACK routines. And if so, which API it should use (Numeric, numarray, Numeric3, scipy_core, standard array, minimum standard array implementation or array protocol meta info). Any comments are appreciated, Piotr Luszczek Simon Burton wrote: > Hi, > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to > convert to and from Numeric (scipy) array's without a memcopy ? > > thankyou, > > Simon. From perry at stsci.edu Fri Apr 15 07:21:23 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Apr 15 07:21:23 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <425FCAFC.3010603@cs.utk.edu> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: On Apr 15, 2005, at 10:09 AM, Piotr Luszczek wrote: > Hi all, > > the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I > apologize if every body knows that). > > I'm on the LAPACK team right now and we were wondering if we should > provide bindings for Python. It is almost trivial to do with Pyrex. > But Numeric and numarray already have some functionality in it. > Also, I don't know about popularity of PyLapack. > > So my question is if there is a need for the specialized LAPACK > routines. And if so, which API it should use (Numeric, numarray, > Numeric3, scipy_core, standard array, minimum standard array > implementation > or array protocol meta info). > > Any comments are appreciated, > > Piotr Luszczek > If you don't need anything unusual, using the Numeric C-API should be safe. There is the intent to preserve backward compatibility for that in numarray and Numeric3 for the most part (numarray's ufunc api is different however, but it isn't clear you need to use that). Numeric3 and numarray will/do have other capabilities not part of the Numeric api, but again, I suspect that for a first version, one can probably avoid needing those. I'd also like to hear what Travis thinks about this. Perry Greenfield From pjssilva at ime.usp.br Fri Apr 15 08:00:44 2005 From: pjssilva at ime.usp.br (Paulo J. S. Silva) Date: Fri Apr 15 08:00:44 2005 Subject: [Numpy-discussion] Pycoin - Python interface to COIN/CLP Linear Programming solver Message-ID: <1113577115.9013.9.camel@localhost.localdomain> Hello, I am finally releasing the code I have to interface COIN/CLP linear programming solver with Python/Numarray. You can download the code at: http://www.ime.usp.br/~pjssilva/pycoin/index.html In the page you can see sample client code. The interface is very simple, consisting mostly of swing interfaces files, but it is very useful to me. It also can be used as an example on how to interface C++ and Python/Numarray using swig. I plan to make this interface grow to something much better, with an interface to full Clp, another to OsiClp (only this one is available right now) and maybe other COIN optimization libraries like IPOPT. Please, download, use, test, comment. Best, Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: pjssilva at ime.usp.br Web: http://www.ime.usp.br/~pjssilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica. (understand well enough to call) (practice) From cookedm at physics.mcmaster.ca Fri Apr 15 10:48:55 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 15 10:48:55 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <425FCAFC.3010603@cs.utk.edu> (Piotr Luszczek's message of "Fri, 15 Apr 2005 10:09:00 -0400") References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: Piotr Luszczek writes: > Hi all, > > the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I > apologize if every body knows that). > > I'm on the LAPACK team right now and we were wondering if we should > provide bindings for Python. It is almost trivial to do with Pyrex. > But Numeric and numarray already have some functionality in it. > Also, I don't know about popularity of PyLapack. > > So my question is if there is a need for the specialized LAPACK > routines. And if so, which API it should use (Numeric, numarray, > Numeric3, scipy_core, standard array, minimum standard array implementation > or array protocol meta info). You'll probably first want to look at scipy, which already wraps (all? most?) of LAPACK in its scipy.linalg package (including dpotrs :-) It uses f2py to make the process much easier. Since you mention you're on the LAPACK team ... I've been working on redoing the f2c'd LAPACK wrappers in Numeric, updating them to the current version...except: what *is* the current version? The patches on netlib are 2-3 years old, and you have to grab them separately, file-by-file (can I say how insanely stupid that is?). Also ... they break: with some test cases (derived from ones posted to our bug tracker) some routines segfault. Is it the LAPACK 3e? If that's the case, we can't use it unless there are C versions (Numeric only requires Python and a C compiler; throwing a F90 compiler in there is *not* an option -- we don't even require a F77 compiler). I ended up using the source from Debian unstable from the lapack3 package, and those work fine. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From haase at msg.ucsf.edu Fri Apr 15 12:38:51 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Apr 15 12:38:51 2005 Subject: [Numpy-discussion] Why does nd_image require writable input array ? Message-ID: <200504151235.48573.haase@msg.ucsf.edu> Hi, I'm using memmap to read my MRC-imagedata files. I just thought this might be a case of general interest - see below: >>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest", cval=0.0, origin=0, output_type=None) Traceback (most recent call last): File "", line 1, in ? File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in boxcar_filter cval = cval, output_type = output_type) File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in boxcar_filter1d cval, origin, _ni_support._type_to_num[output_type]) TypeError: NA_IoArray: I/O numarray must be writable NumArrays. >>> na.__version__ '1.2.3' >>> Thanks, Sebastian Haase From verveer at embl.de Fri Apr 15 12:55:33 2005 From: verveer at embl.de (Peter Verveer) Date: Fri Apr 15 12:55:33 2005 Subject: [Numpy-discussion] Why does nd_image require writable input array ? In-Reply-To: <200504151235.48573.haase@msg.ucsf.edu> References: <200504151235.48573.haase@msg.ucsf.edu> Message-ID: <9396f2dea14c14fb7a6bd04f6077c448@embl.de> You may have run in an older bug which I fixed. Please try upgrading to the new numarray 1.3 and see if the problem disappears. If not let me know. Note: the function you are using (boxcar_filter) has been renamed in 1.3 to uniform_filter (to be more in line with common image processing terminology.) Cheers, Peter On Apr 15, 2005, at 9:35 PM, Sebastian Haase wrote: > Hi, > I'm using memmap to read my MRC-imagedata files. > I just thought this might be a case of general interest - see below: > >>>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest", > cval=0.0, origin=0, output_type=None) > Traceback (most recent call last): > File "", line 1, in ? > File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in > boxcar_filter > cval = cval, output_type = output_type) > File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in > boxcar_filter1d > cval, origin, _ni_support._type_to_num[output_type]) > TypeError: NA_IoArray: I/O numarray must be writable NumArrays. >>>> na.__version__ > '1.2.3' >>>> > > > Thanks, > Sebastian Haase From luszczek at cs.utk.edu Fri Apr 15 20:41:05 2005 From: luszczek at cs.utk.edu (Piotr Luszczek) Date: Fri Apr 15 20:41:05 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: <426088F5.90602@cs.utk.edu> David M. Cooke wrote: > Piotr Luszczek writes: > > >>Hi all, >> >>the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I >>apologize if every body knows that). >> >>I'm on the LAPACK team right now and we were wondering if we should >>provide bindings for Python. It is almost trivial to do with Pyrex. >>But Numeric and numarray already have some functionality in it. >>Also, I don't know about popularity of PyLapack. >> >>So my question is if there is a need for the specialized LAPACK >>routines. And if so, which API it should use (Numeric, numarray, >>Numeric3, scipy_core, standard array, minimum standard array implementation >>or array protocol meta info). > > > You'll probably first want to look at scipy, which already wraps (all? > most?) of LAPACK in its scipy.linalg package (including dpotrs :-) It seems to have almost all routines. > It uses f2py to make the process much easier. > > > Since you mention you're on the LAPACK team ... > > I've been working on redoing the f2c'd LAPACK wrappers in Numeric, > updating them to the current version...except: what *is* the current Current version is 3.0. > version? The patches on netlib are 2-3 years old, and you have to grab After funding ran out there were only volunteers left. It's hard to get free open-source developers these days. > them separately, file-by-file (can I say how insanely stupid that Frankly, I had the same comment when I first saw it. Hopefully, next update will straighten things out. > is?). Also ... they break: with some test cases (derived from ones > posted to our bug tracker) some routines segfault. Yes I know. We have postings about it on the mailing list almost weekly. > Is it the LAPACK 3e? If that's the case, we can't use it unless there LAPACK 3E is only somewhat related to LAPACK. But it's not "current version". > are C versions (Numeric only requires Python and a C compiler; > throwing a F90 compiler in there is *not* an option -- we don't even > require a F77 compiler). We've been thinking about languages for a while. CLAPACK user base is too strong to ignore. So we think of keeping F77 as the base language. Or maybe we should do f90toC. f2c and f2j are on Netlib already and f2py has some F90 support. > I ended up using the source from Debian unstable from the lapack3 > package, and those work fine. Again, it's hard to get grant money for support. Thanks for the comments. Piotr From pearu at cens.ioc.ee Fri Apr 15 23:09:01 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Fri Apr 15 23:09:01 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <426088F5.90602@cs.utk.edu> Message-ID: On Fri, 15 Apr 2005, Piotr Luszczek wrote: > > You'll probably first want to look at scipy, which already wraps (all? > > most?) of LAPACK in its scipy.linalg package (including dpotrs :-) > > It seems to have almost all routines. You should look at scipy.lib.lapack package that has more wrappers than in scipy.linalg and it will be used in scipy.linalg in future. scipy.lib.lapack certainly does not wrap all of LAPACK but adding new wrappers is easy and is done on demand basis. What's wrapped and what's not in scipy.lib.lapack is well documented in the headers of .pyf.src files. My current plan is to add CLAPACK sources to scipy.lib.lapack so that it could be included to Numeric3 project because it has a requirement that everything should compile having only C compiler available. > We've been thinking about languages for a while. CLAPACK user base > is too strong to ignore. So we think of keeping F77 as the base language. > Or maybe we should do f90toC. f2c and f2j are on Netlib already and > f2py has some F90 support. f2py will have limited support for F90 derived types as soon as I get a chance to review Jeffrey Hagelberg patches on this. However, keeping F77 as the base language is a good idea, imho, free F90 compilers are still rare these days. Pearu From florian.proff.schulze at gmx.net Sat Apr 16 03:25:37 2005 From: florian.proff.schulze at gmx.net (Florian Schulze) Date: Sat Apr 16 03:25:37 2005 Subject: [Numpy-discussion] bytes object info Message-ID: Hi! I just discovered this: http://members.dsl-only.net/~daniels/Block.html I didn't try it out, but maybe it's helpful to you. Regards, Florian Schulze From cjw at sympatico.ca Sat Apr 16 11:29:01 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Apr 16 11:29:01 2005 Subject: [Numpy-discussion] bytes object info In-Reply-To: References: Message-ID: <426158FD.8060507@sympatico.ca> Florian Schulze wrote: > Hi! > > I just discovered this: > http://members.dsl-only.net/~daniels/Block.html Ugh! Letter codes to identify data types - I thought that we had moved beyond that. ;-) Colin W. > > I didn't try it out, but maybe it's helpful to you. > > Regards, > Florian Schulze > > From oliphant at ee.byu.edu Sat Apr 16 21:16:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Apr 16 21:16:07 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <425FCAFC.3010603@cs.utk.edu> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: <4261E2A5.1060109@ee.byu.edu> Piotr Luszczek wrote: > Hi all, > > the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I > apologize if every body knows that). > > I'm on the LAPACK team right now and we were wondering if we should > provide bindings for Python. It is almost trivial to do with Pyrex. > But Numeric and numarray already have some functionality in it. > Also, I don't know about popularity of PyLapack. Scipy already has extensive bindings for LAPACK. There is even a lot of development that has been done for c-compiled bindings. Right now, scipy_core is being developed to be a single replacement for Numeric/numarray. Lapack bindings are a huge part of that effort. But, as I said, the work has been done (using f2py). The biggest issue is supporting f2c'd versions of Lapack so that folks without Fortran compilers can still install it. scipy_core will allow this. Again, most of the effort is accomplished through f2py and scipy_distutils which are really good tools. Pyrex is nice, but f2py is really, really nice (it even supports wrapping basic c-code). > > So my question is if there is a need for the specialized LAPACK > routines. And if so, which API it should use (Numeric, numarray, > Numeric3, scipy_core, standard array, minimum standard array > implementation > or array protocol meta info). I think if LAPACK were going to go through the trouble, it would be best for LAPACK to provide "array protocol" style wrappers. That way any Python array user could take advantage of them. While current scipy users and future scipy_core users do not need LAPACK-provided Python wrappers, we would welcome any native support by the LAPACK team. Again, though, I think this should be done through the array_protocol API. A C-API is likely in the near future as well (which will provide a little speed up for many small arrays). -Travis -Travis From simon at arrowtheory.com Sun Apr 17 20:44:16 2005 From: simon at arrowtheory.com (Simon Burton) Date: Sun Apr 17 20:44:16 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <1113561843.5030.9.camel@jaytmiller.comcast.net> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <1113561843.5030.9.camel@jaytmiller.comcast.net> Message-ID: <20050418134337.1b3f8ae8.simon@arrowtheory.com> On Fri, 15 Apr 2005 06:44:02 -0400 Todd Miller wrote: > On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote: > > Hi, > > > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > > Is this in the pipeline, > > No. Most of the add-on subpackages in numarray, with the exception of > convolve, image, and nd_image, are ports from Numeric. > Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this that would be much appreciated. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From arnd.baecker at web.de Mon Apr 18 00:30:10 2005 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 18 00:30:10 2005 Subject: [Numpy-discussion] scipy.base - % and fmod segfault Message-ID: Hi (in particular Travis), concerning my recent question on % on fmod for Numeric and numarray I was curious to see how scipy.base behaves. With a CVS check-out this morning I get: In [1]: from scipy.base import * In [2]: x=arange(10) In [3]: print x%4 array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1], 'l') In [4]: print (-x)%4 zsh: 12391 segmentation fault ipython (The same holds for fmod, and also for x=arange(10.0) ). Personally I would prefer if in the end % behaves the same way for arrays as for scalars. Do you think that this is possible with scipy.base? Best, Arnd P.S.: I haven't tested much more of scipy.base this time (but the few things concerning array operations I looked at, seem fine. Ah there is one: Doing import scipy.base scipy.base.fmod? in ipython gives a segmentation fault (the same with .sin, .exp etc. ...) ) From jmiller at stsci.edu Mon Apr 18 06:38:21 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Apr 18 06:38:21 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <20050418134337.1b3f8ae8.simon@arrowtheory.com> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <1113561843.5030.9.camel@jaytmiller.comcast.net> <20050418134337.1b3f8ae8.simon@arrowtheory.com> Message-ID: <1113831328.29165.30.camel@halloween.stsci.edu> On Sun, 2005-04-17 at 23:43, Simon Burton wrote: > On Fri, 15 Apr 2005 06:44:02 -0400 > Todd Miller wrote: > > > On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote: > > > Hi, > > > > > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > > > Is this in the pipeline, > > > > No. Most of the add-on subpackages in numarray, with the exception of > > convolve, image, and nd_image, are ports from Numeric. > > > > Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this > that would be much appreciated. If you're doing a port of something that already works for Numeric chances are good that numarray's Numeric compatibility API will make things "just work." In any case, be sure to use the compatibility API since it's the easiest path forward to Numeric3 should that effort prove successful (which I think it will). Usually what's involved in porting from Numeric to numarray is just making sure that the numarray files can be used rather than the Numeric header files. I think the style we used for matplotlib, while not fully general, is the simplest and best compromise: #ifdef NUMARRAY #include "numarray/arrayobject.h" #else #include "Numeric/arrayobject.h" #endif In setup.py, you have to pass extra_compile_args=["-DNUMARRAY=1"] or similar to the Extension() constructions to build for numarray. There are more details we could discuss if you want to build for both Numeric and numarray simultaneously. Two limitations of the numarray Numeric compatible C-API are: (1) a partially compatible array descriptor structure (PyArray_Descr) and (2) the UFunc C-API. Generally, neither of those is an issue, but for large projects (e.g. scipy) they matter. Good luck porting. Feel free to ask questions either on the list or privately if you run into trouble. Regards, Todd From haase at msg.ucsf.edu Mon Apr 18 09:16:15 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Mon Apr 18 09:16:15 2005 Subject: [Numpy-discussion] bytes object info In-Reply-To: References: Message-ID: <200504180914.33383.haase@msg.ucsf.edu> Hey, this _really_ is no SPAM ... ;-) (Maybe different wording next time) Thanks, Sebastian Haase On Saturday 16 April 2005 03:22, Florian Schulze wrote: > Hi! > > I just discovered this: > http://members.dsl-only.net/~daniels/Block.html > > I didn't try it out, but maybe it's helpful to you. > > Regards, > Florian Schulze > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From oliphant at ee.byu.edu Mon Apr 18 17:09:49 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 18 17:09:49 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: <42644B7C.9030907@ee.byu.edu> I am going to release Numeric 24.0 today or tomorrow unless I hear from anybody about some changes that need to get made. -Travis From faltet at carabos.com Tue Apr 19 03:05:27 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 19 03:05:27 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42644B7C.9030907@ee.byu.edu> References: <42644B7C.9030907@ee.byu.edu> Message-ID: <200504191202.52097.faltet@carabos.com> Hi, I was curious about the newly introduced array protocol in Numeric 24.0 (as well as in current numarray CVS), and wanted to check if there is better speed during Numeric <-> numarray objects conversion. There answer is "partially" affirmative: >>> import numarray >>> import Numeric >>> print numarray.__version__ 1.4.0 >>> print Numeric.__version__ 24.0 >>> from time import time >>> a = numarray.arange(100*1000) >>> t1=time();b=Numeric.array(a);time()-t1 # numarray --> Numeric 0.0021419525146484375 # It was 1.58109998703 with Numeric 23.8 ! So, numarray --> Numeric speed has been improved quite a lot On the other way round, Numeric to numarray is not as efficient: >>> Na = Numeric.arange(100*1000) >>> t1=time();c=numarray.array(Na);time()-t1 # Numeric --> numarray 0.15217900276184082 # It is much slower than numarray --> Numeric I guess that the numarray --> Numeric can be speed-up because: >>> t1=time();Nb=numarray.array(buffer(Na),typecode=Na.typecode(),shape=Na.shape);time()-t1 0.00017499923706054688 # Numeric --> numarray using the buffer protocol So, I guess CVS numarray is still refining the array protocol. But the thing that mostly shocks me is why the array protocol is still allowing doing conversions with memory copies because, as you can see in the last example that uses a buffer protocol, a non-copy memory conversion is indeed possible for numarray --> Numeric. So the question is: Would the array protocol bring numarray <-> Numeric <-> Numeric3 conversions without memory copies or this is more a wish on my half than an actual possibility? Thanks and keep the nice work! -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From eric at enthought.com Tue Apr 19 22:48:17 2005 From: eric at enthought.com (eric jones) Date: Tue Apr 19 22:48:17 2005 Subject: [Numpy-discussion] job openings at Enthought Message-ID: <4265ECEF.6050004@enthought.com> Hey group, We have a number of scientific/python related jobs open. If you have any interest, please see: http://www.enthought.com/careers.htm thanks, eric From cjw at sympatico.ca Wed Apr 20 00:45:21 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Apr 20 00:45:21 2005 Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler Message-ID: <42660855.4090600@sympatico.ca> I have tried: python setup.py install build_ext --compiler=bcpp It seems that the distutils call uses scipy.distutils, rather than the standard, and that the scipy version is based on an older version of distutils. Is there some way to work around this? Colin W. From pearu at cens.ioc.ee Wed Apr 20 12:00:34 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Wed Apr 20 12:00:34 2005 Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler In-Reply-To: <42660855.4090600@sympatico.ca> Message-ID: On Wed, 20 Apr 2005, Colin J. Williams wrote: > I have tried: > > python setup.py install build_ext --compiler=bcpp > > It seems that the distutils call uses scipy.distutils, rather than the > standard, and that the scipy version is based on an older version of > distutils. > > Is there some way to work around this? So, what problems exactly to you experience with the above command? Using scipy.distutils should not be much different compared to std distutils when building std extension modules. Pearu From oliphant at ee.byu.edu Wed Apr 20 12:05:30 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 20 12:05:30 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: <4266A7AD.5090600@ee.byu.edu> I've released Numeric 24.0 as a beta (2nd version) release. Right now it's just a tar file. Please find any bugs. I'll wait a week or two and release a final version unless I hear reports of problems. Thanks to those who have found bugs already. David Cooke has been especially active in helping fix problems. Many kudos to him. -Travis From jmiller at stsci.edu Thu Apr 21 08:12:30 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 21 08:12:30 2005 Subject: [Numpy-discussion] ANN: numarray-1.3.1 Message-ID: <1114096238.4446.18.camel@jaytmiller.comcast.net> Release Notes for numarray-1.3.1 Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, arrays of heterogeneous records, string arrays, and in-place operation on memory mapped files. I. ENHANCEMENTS None. 1.3.1 fixes the problem with gcc-3.4.3 II. BUGS FIXED / CLOSED 1152323 /usr/include/fenv.h:96: error: conflicting types for 'fegete 1185024 numarray-1.2.3 fails to compile with gcc-3.4.3 1187162 Numarray 1.3.0 installation failure See http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse for more details. From oliphant at ee.byu.edu Fri Apr 22 03:51:14 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 22 03:51:14 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: References: <4266A7AD.5090600@ee.byu.edu> Message-ID: <4268D6BD.9000100@ee.byu.edu> Alexander Schmolck wrote: >Travis Oliphant writes: > > > >>I've released Numeric 24.0 as a beta (2nd version) release. Right now it's >>just a tar file. >> >>Please find any bugs. I'll wait a week or two and release a final version >>unless I hear reports of problems. >> >> > > >I suspect some other problems I haven't tried to track down yet are due to >this: > > >>> a = num.array([[1],[2],[3]]) > >>> ~(a==a) > array([[-2], > [-2], > [-2]]) > > What is wrong with this? ~ is bit-wise not and gives the correct answer, here. > >Object array comparisons still produce haphazard behaviour: > > >>> a = num.array(["ab", "cd", "efg"], 'O') > >>> a == 'ab' > 0 > > You are mixing Object arrays and character arrays here and expecting too much. String arrays in Numeric and their relationship with object arrays have never been too useful. You need to be explicit about how 'ab' is going to be interpreted and do a == array('ab','O') to get what you were probably expecting. >Finally -- not necessarily a bug, but a change of behaviour that seems undocumented (I'm >pretty sure this used to give a float array as return value): > > >>> num.zeros((2.0,)) > *** TypeError: an integer is required > > > >'as > > I don't think this worked as you think it did (I looked at Numeric 21.3). num.zeros(2.0) works but it shouldn't. This is a bug that I'll fix. Shapes should be integers, not floats. If this was not checked before than that was a bug. It looks like it's always been checked differently for single-element tuples and scalars So, in short, I see only one small bug here. Thanks for testing things out. -Travis From stephen.walton at csun.edu Mon Apr 25 11:50:28 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Apr 25 11:50:28 2005 Subject: [Numpy-discussion] Value selections? Message-ID: <426D3BA8.6020500@csun.edu> I'm trying out Numeric 24b2. In numarray, the following code will plot the values of an array which are not equal to 'flag': f = array!=flag plot(array[f]) What is the equivalent in Numeric 24b2? From rkern at ucsd.edu Mon Apr 25 11:59:03 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Apr 25 11:59:03 2005 Subject: [Numpy-discussion] Value selections? In-Reply-To: <426D3BA8.6020500@csun.edu> References: <426D3BA8.6020500@csun.edu> Message-ID: <426D3D4C.5070302@ucsd.edu> Stephen Walton wrote: > I'm trying out Numeric 24b2. In numarray, the following code will plot > the values of an array which are not equal to 'flag': > > f = array!=flag > plot(array[f]) > > What is the equivalent in Numeric 24b2? compress(f, array) is the lowest common denominator. I'm not sure if Numeric 24 gets fancier like numarray. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com Tue Apr 26 03:10:12 2005 From: confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com (Yahoo! Groups) Date: Tue Apr 26 03:10:12 2005 Subject: [Numpy-discussion] Please confirm your request to join IErussian Message-ID: <1114509872.69.19665.m18@yahoogroups.com> Hello numpy-discussion at lists.sourceforge.net, We have received your request to join the IErussian group hosted by Yahoo! Groups, a free, easy-to-use community service. This request will expire in 7 days. TO BECOME A MEMBER OF THE GROUP: 1) Go to the Yahoo! Groups site by clicking on this link: http://groups.yahoo.com/i?i=anNSKqzsyA7slXUGUdYHvlkpsPI&e=numpy-discussion%40lists%2Esourceforge%2Enet (If clicking doesn't work, "Cut" and "Paste" the line above into your Web browser's address bar.) -OR- 2) REPLY to this email by clicking "Reply" and then "Send" in your email program If you did not request, or do not want, a membership in the IErussian group, please accept our apologies and ignore this message. Regards, Yahoo! Groups Customer Care Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ From jswhit at fastmail.fm Tue Apr 26 07:58:36 2005 From: jswhit at fastmail.fm (Jeff Whitaker) Date: Tue Apr 26 07:58:36 2005 Subject: [Numpy-discussion] numarray problems on AIX Message-ID: <426E5637.1080305@fastmail.fm> Hi: I'm having problems with numarray 1.3.1/Python 2.4.1 on AIX 5.2: Python 2.4.1 (#3, Apr 26 2005, 10:34:56) [C] on aix5 Type "help", "copyright", "credits" or "license" for more information. >>> import numarray Traceback (most recent call last): File "", line 1, in ? File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/__init__.py", line 42, in ? from numarrayall import * File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarrayall.py", line 2, in ? from generic import * File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/generic.py", line 1116, in ? import numarraycore as _nc File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarraycore.py", line 1751, in ? import ufunc File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/ufunc.py", line 13, in ? import _converter ImportError: dynamic module does not define init function (init_converter) it works with AIX 4 - anyone seen this before? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jeffrey.S.Whitaker at noaa.gov 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg From faltet at carabos.com Tue Apr 26 10:45:02 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 26 10:45:02 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms Message-ID: <200504261942.46011.faltet@carabos.com> Hi, I'm having problems converting numarray objects into Numeric in 64-bit platforms, and I think this is numarray fault, but I'm not completely sure. The problem can be easily visualized in an example (I'm using numarray 1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux): >>> Num=Numeric.array((3,),typecode='l') >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) array([3],'i') # The conversion has finished correctly In 64-bit platforms (AMD64, Linux): >>> Num=Numeric.array((3,),typecode='l') >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) Traceback (most recent call last): File "", line 1, in ? TypeError: typecode argument must be a valid type. The problem is that, for 32-bit platforms, na.typecode() == 'i' as it should be, but for 64-bit platforms na.typecode() == 'N' that is not a valid type in Numeric. I guess that na.typecode() should be mapped to 'l' in 64-bit platforms so that Numeric can recognize the Int64 correctly. Any suggestion? -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From jmiller at stsci.edu Tue Apr 26 13:57:14 2005 From: jmiller at stsci.edu (Todd Miller) Date: Tue Apr 26 13:57:14 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms In-Reply-To: <200504261942.46011.faltet@carabos.com> References: <200504261942.46011.faltet@carabos.com> Message-ID: <1114548937.24120.97.camel@halloween.stsci.edu> On Tue, 2005-04-26 at 13:42, Francesc Altet wrote: > Hi, > > I'm having problems converting numarray objects into Numeric in 64-bit > platforms, and I think this is numarray fault, but I'm not completely > sure. > > The problem can be easily visualized in an example (I'm using numarray > 1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux): > > >>> Num=Numeric.array((3,),typecode='l') > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > array([3],'i') # The conversion has finished correctly > > In 64-bit platforms (AMD64, Linux): > > >>> Num=Numeric.array((3,),typecode='l') > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: typecode argument must be a valid type. > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it > should be, but for 64-bit platforms na.typecode() == 'N' that is not a > valid type in Numeric. I guess that na.typecode() should be mapped to > 'l' in 64-bit platforms so that Numeric can recognize the Int64 > correctly. > > Any suggestion? I agree that since the typecode() method exists for backward compatibility, returning 'N' rather than 'l' on an LP64 platform can be considered a bug. However, there are two problems I see: 1. Returning 'l' doesn't handle the case of converting a numarray Int64 array on a 32-bit platform. AFIK, there is no typecode that will work for that case. So, we're only getting a partial solution. 2. numarray uses typecodes internally to encode type signatures. There, platform-independent typecodes are useful and making this change will add confusion. I think we may be butting up against the absolute/relative type definition problem. Comments? Todd From faltet at carabos.com Wed Apr 27 05:40:35 2005 From: faltet at carabos.com (Francesc Altet) Date: Wed Apr 27 05:40:35 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms In-Reply-To: <1114548937.24120.97.camel@halloween.stsci.edu> References: <200504261942.46011.faltet@carabos.com> <1114548937.24120.97.camel@halloween.stsci.edu> Message-ID: <200504271432.46852.faltet@carabos.com> A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure: > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it > > should be, but for 64-bit platforms na.typecode() == 'N' that is not a > > valid type in Numeric. I guess that na.typecode() should be mapped to > > 'l' in 64-bit platforms so that Numeric can recognize the Int64 > > correctly. > > I agree that since the typecode() method exists for backward > compatibility, returning 'N' rather than 'l' on an LP64 platform can be > considered a bug. However, there are two problems I see: > > 1. Returning 'l' doesn't handle the case of converting a numarray Int64 > array on a 32-bit platform. AFIK, there is no typecode that will work > for that case. So, we're only getting a partial solution. One can always do a separate case for 64-bit platforms. This solution is already used in Lib/numerictypes.py > 2. numarray uses typecodes internally to encode type signatures. There, > platform-independent typecodes are useful and making this change will > add confusion. Well, this is the root of the problem for 'l' (long int) types, that their meaning depends on the platform. Anyway, I've tried with the next patch, and everything seems to work well (i.e. it's done what it is itended): -------------------------------------------------------------- --- Lib/numerictypes.py Wed Apr 27 07:13:08 2005 +++ Lib/numerictypes.py.modif Wed Apr 27 07:21:48 2005 @@ -389,7 +389,11 @@ # at code generation / installation time. from codegenerator.ufunccode import typecode for tname, tcode in typecode.items(): - typecode[ eval(tname)] = tcode + if tname == "Int64" and numinclude.LP64: + typecode[ eval(tname)] = 'l' + else: + typecode[ eval(tname)] = tcode + if numinclude.hasUInt64: _MaximumType = { --------------------------------------------------------------- With that, we have on 64-bit platforms: >>> import Numeric >>> Num=Numeric.array((3,),typecode='l') >>> import numarray >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) array([3]) >>> Numeric.array(na,typecode=na.typecode()).typecode() 'l' and on 32-bit: >>> Num=Numeric.array((3,),typecode='l') >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) array([3],'i') >>> Numeric.array(na,typecode=na.typecode()).typecode() 'i' Which should be the correct behaviour. > I think we may be butting up against the absolute/relative type > definition problem. Comments? That may add some confusion, but if we want to be consistent with the 'l' (long int) meaning for different platforms, I think the suggested patch (or other more elegant) is the way to go, IMHO. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From jmiller at stsci.edu Wed Apr 27 08:36:09 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Apr 27 08:36:09 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms In-Reply-To: <200504271432.46852.faltet@carabos.com> References: <200504261942.46011.faltet@carabos.com> <1114548937.24120.97.camel@halloween.stsci.edu> <200504271432.46852.faltet@carabos.com> Message-ID: <1114615773.28309.95.camel@halloween.stsci.edu> On Wed, 2005-04-27 at 08:32, Francesc Altet wrote: > A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure: > > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it > > > should be, but for 64-bit platforms na.typecode() == 'N' that is not a > > > valid type in Numeric. I guess that na.typecode() should be mapped to > > > 'l' in 64-bit platforms so that Numeric can recognize the Int64 > > > correctly. > > > > I agree that since the typecode() method exists for backward > > compatibility, returning 'N' rather than 'l' on an LP64 platform can be > > considered a bug. However, there are two problems I see: > > > > 1. Returning 'l' doesn't handle the case of converting a numarray Int64 > > array on a 32-bit platform. AFIK, there is no typecode that will work > > for that case. So, we're only getting a partial solution. > > One can always do a separate case for 64-bit platforms. This solution > is already used in Lib/numerictypes.py True. I'm just pointing out that doing this is still "half broken". On the other hand, it is also "half fixed". > if numinclude.hasUInt64: > _MaximumType = { > --------------------------------------------------------------- > > With that, we have on 64-bit platforms: > > >>> import Numeric > >>> Num=Numeric.array((3,),typecode='l') > >>> import numarray > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > array([3]) > >>> Numeric.array(na,typecode=na.typecode()).typecode() > 'l' > > and on 32-bit: > > >>> Num=Numeric.array((3,),typecode='l') > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > array([3],'i') > >>> Numeric.array(na,typecode=na.typecode()).typecode() > 'i' > > Which should be the correct behaviour. My point was that if you have a numarray Int64 array, there's nothing in 32-bit Numeric to convert it to. Round tripping from Numeric-to-numarray works, but not from numarray-to-Numeric. In this case, I think "half-fixed" still has some merit, I just wanted it to be clear what we're not doing. > > I think we may be butting up against the absolute/relative type > > definition problem. Comments? > > That may add some confusion, but if we want to be consistent with the > 'l' (long int) meaning for different platforms, I think the suggested > patch (or other more elegant) is the way to go, IMHO. I logged this on Source Forge and will get something in for numarray-1.4 so that the typecode() method gives a workable answer on LP64. Intersted parties should stick to using the typecode() method rather than any of numarray's typecode related mappings. Cheers, Todd From simon at arrowtheory.com Thu Apr 28 17:38:08 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Apr 28 17:38:08 2005 Subject: [Numpy-discussion] numarray dotblas problem on OSX Message-ID: <20050429103116.092907a7.simon@arrowtheory.com> Hi, I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink) who has managed to bomb on this little code example: >>> import numarray as na >>> import numarray.random_array as ra >>> a = ra.random(shape=(257,256)) >>> b = ra.random(shape=(1,256)) >>> na.innerproduct(a, b) He gets a blas error: ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0 Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From rkern at ucsd.edu Thu Apr 28 18:05:30 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 28 18:05:30 2005 Subject: [Numpy-discussion] numarray dotblas problem on OSX In-Reply-To: <20050429103116.092907a7.simon@arrowtheory.com> References: <20050429103116.092907a7.simon@arrowtheory.com> Message-ID: <42718719.1010206@ucsd.edu> Simon Burton wrote: > Hi, > > I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink) > who has managed to bomb on this little code example: > > >>>>import numarray as na >>>>import numarray.random_array as ra >>>>a = ra.random(shape=(257,256)) >>>>b = ra.random(shape=(1,256)) >>>>na.innerproduct(a, b) > > > He gets a blas error: > > ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect > Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0 On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed Python with vecLib as the BLAS, I don't get an error. I don't get a result that's sensible to me, either; I get a (257,1)-shape array with only the first and last entries non-zero. Your colleague might want to reconsider whether he wants innerproduct() or dot(), with the appropriate change of shape for b. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From rkern at ucsd.edu Thu Apr 28 18:09:53 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 28 18:09:53 2005 Subject: [Numpy-discussion] numarray dotblas problem on OSX In-Reply-To: <42718719.1010206@ucsd.edu> References: <20050429103116.092907a7.simon@arrowtheory.com> <42718719.1010206@ucsd.edu> Message-ID: <427188D1.201@ucsd.edu> Robert Kern wrote: > Simon Burton wrote: > >> Hi, >> >> I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from >> fink) >> who has managed to bomb on this little code example: >> >> >>>>> import numarray as na >>>>> import numarray.random_array as ra >>>>> a = ra.random(shape=(257,256)) >>>>> b = ra.random(shape=(1,256)) >>>>> na.innerproduct(a, b) >> >> >> >> He gets a blas error: >> >> ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine >> cblas_dgemm was incorrect >> Mac OS BLAS parameter error in cblas_dgemm, parameter #0, >> (unavailable), is 0 > > > On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed > Python with vecLib as the BLAS, I don't get an error. > > I don't get a result that's sensible to me, either; I get a > (257,1)-shape array with only the first and last entries non-zero. Oh yes, and apparently a segfault on exit, too. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From edcjones at comcast.net Fri Apr 29 11:26:05 2005 From: edcjones at comcast.net (Edward C. Jones) Date: Fri Apr 29 11:26:05 2005 Subject: [Numpy-discussion] numarray: problem with numarray.records Message-ID: <42727B35.9050401@comcast.net> #! /usr/bin/env python import numarray, numarray.strings, numarray.records doubles = numarray.array([1.0], 'Float64') strings = numarray.strings.array('abcdefgh', itemsize=8, kind=numarray.strings.RawCharArray) print numarray.records.array(buffer=[strings, strings]) print print numarray.records.array(buffer=[doubles, doubles]) print print numarray.records.array(buffer=[strings, doubles]) """ The output is: RecArray[ ('abcdefgh'), ('abcdefgh') ] RecArray[ (1.0, 1.0) ] Traceback (most recent call last): File "./mess.py", line 12, in ? print numarray.records.array(buffer=[strings, doubles]) File "/usr/local/lib/python2.4/site-packages/numarray/records.py", line 397, in array byteorder=byteorder, aligned=aligned) File "/usr/local/lib/python2.4/site-packages/numarray/records.py", line 106, in fromrecords raise ValueError, "inconsistent data at row %d,field %d" % (row, col) ValueError: inconsistent data at row 1,field 0 The numarray docs (11.2) say: The first argument, buffer, may be any one of the following: ... (5) a list of numarrays. There must be one such numarray for each field. What is going on here? """ From edcjones at comcast.net Fri Apr 29 11:32:07 2005 From: edcjones at comcast.net (Edward C. Jones) Date: Fri Apr 29 11:32:07 2005 Subject: [Numpy-discussion] numarray: lexicographical sort Message-ID: <42727D37.8070700@comcast.net> Suppose arr is a two dimensional numarray. Can the following be done entirely within numarray? alist = arr.tolist() alist.sort() arr = numarray.array(alist, arr.type()) From jmiller at stsci.edu Fri Apr 29 12:42:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 29 12:42:22 2005 Subject: [Numpy-discussion] numarray: lexicographical sort In-Reply-To: <42727D37.8070700@comcast.net> References: <42727D37.8070700@comcast.net> Message-ID: <1114803546.21036.30.camel@halloween.stsci.edu> On Fri, 2005-04-29 at 14:30, Edward C. Jones wrote: > Suppose arr is a two dimensional numarray. Can the following be done > entirely within numarray? > > alist = arr.tolist() > alist.sort() > arr = numarray.array(alist, arr.type()) > I'm pretty sure the answer is no. The comparisons in numarray's sort() functions are all single element numerical comparisons. The list sort() is using a polymorphic comparison which in this case is the comparison of two lists. There's nothing like that in numarray so I don't think it's possible. Todd From verveer at embl-heidelberg.de Fri Apr 1 00:40:06 2005 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Fri Apr 1 00:40:06 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424C8D05.7030006@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> Message-ID: Good idea, for many applications such an extension would be 'good enough'. 1) python code using such arrays should be 100% compatible with numarray/numeric/scipy. Should be possible if a sub-set of numeric/numarray/scipy is used. 2) Extensions written in C should handle such arrays transparently (without unnecessary copying). Should also be possible given a compatible data layout. Peter > To all interested in the future of arrays... > > I'm still very committed to Numeric3 as I want to bring the numarray > and Numeric people together behind a single array object for > scientific computing. > > But, I've been thinking about the array protocol and thinking that it > would be a good thing if this became universal. One of the ways to > make it universal is by having something that follows it in the Python > core. > > > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some C-API > associated with it. > > > This minimal array object would support 5 basic types ('bool', > 'integer', 'float', 'complex', 'Object'). (Maybe a void type could > be defined and a void "scalar" introduced (which would be the bytes > object)). These types correspond to scalars already available in > Python and so the whole 0-dim array Python scalar arguments could be > ignored. > > Math could be done without ufuncs initially (people really needing > speed would use scipy.base anyway). But, more people in the Python > community would be able to use arrays and get used to them. And we > would have a reference array_protocol object so that extension writers > could write to it. > > > I would not try a project like this until after scipy_core is out, but > it's an interesting thing to think about. I mainly wanted feedback on > the basic concept. > > > An alternative would be to "add" multidimensionality to the array > object already part of Python, fix it's reallocating with an exposed > buffer problem, and add the array protocol. From oliphant at ee.byu.edu Fri Apr 1 01:30:38 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 01:30:38 2005 Subject: [Numpy-discussion] __array_typestr__ Message-ID: <424D14E9.70607@ee.byu.edu> For the most part, it seems the array protocol is easy to agree on. The one difficulty is typestr. For what it's worth, here are my opinions on what has been said regarding the typestr. * Endian-ness should be included in the typestr --- it is how the data is viewed and an intrinsic part of the type as much as int, or float. * I like the fact that struct character codes are documented, but it is hard to remember. The simpler division into basic types and byte-widths that the numarray record module uses is easier to remember. * I'm mixed on whether or not support for describing complex data types should be used or if their description as a record is good enough. On the one hand we think of complex numbers as additional types, but on the other hand, in terms of machine layout they really are just two floats, so perhaps it is better to look at them that way in a protocol whose purpose is just describing how to interpret a block of memory. Especially since complex numbers could conceivably be built on top of any of the other types. In addition, it is conceivable that a rational array might be supported by some array object in the future and that would most easily be handled by a record array where the names were now something like ("numer", "denom") . The typestr argument should just help us specify what is in the memory chunk at each array element (how should it be described). * I'm wondering about including multiple types in the typestr. On the one hand we could describe complicated structures by packing all the information into the typestr. On the other hand, it may be better if we just use 'V8' to describe an 8-byte memory buffer with an additional attribute that contains both the names and the typestr: __array_recinfo__ = (('real','f4'),('imag','f4')) or for a "rational type" __array_recinfo__ = (('numer','i4'),('denom','i4')) so that the detail of the typecode for a "record" type is handled by another special method using tuples. On this level, we could add the possibility of specifying a shape for a small array inside (just like the record array of numarray does). -Travis From faltet at carabos.com Fri Apr 1 02:01:11 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Apr 1 02:01:11 2005 Subject: [Numpy-discussion] Re: Array Metadata In-Reply-To: <20050401041204.18335.qmail@web50208.mail.yahoo.com> References: <20050401041204.18335.qmail@web50208.mail.yahoo.com> Message-ID: <200504011146.44549.faltet@carabos.com> I'm very much with the opinions of Scott. Just some remarks. A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure: > > __array_names__ (optional comma-separated names for record fields) > > I really like this idea. Although I agree with David M. Cooke that it > should be a tuple of names. Unless there is a use case I'm not > considering, it would be preferrable if the names were restricted to valid > Python identifiers. Ok. I was thinking on easing the life of C extension writers, but I agree that a tuple of names should be relatively easily dealed in C as well. However, as the __array_typestr__ would be a plain string, then an __array_names__ being a plain string would be consistent with that. Also, it would be worth to know how to express a record of different shaped fields. I mean, how to represent a record like: [array(Int32,shape=(2,3)), array(Float64,shape=(3,))] The possibilities are: __array_shapes__ = ((2,3),(3,)) __array_typestr__ = (i,d) Other possibility could be an extension of the current struct format: __array_typestr__ = "(2,3)i(3,)d" more on that later on. > The struct module has a portable set of typecodes. They call it > "standard", but it's the same thing. The struct module let's you specify > either standard or native. For instance, the typecode for "standard long" > ("=l") is always 4 bytes while a "native long" ("@l") is likely to be 4 or > 8 bytes depending on the platform. The __array_typestr__ codes should > require the "standard" sizes. There is a table at the bottom of the > documentation that goes into detail: > > http://docs.python.org/lib/module-struct.html I fully agree with Scott here. Struct typecodes are offering a way to approach the Python standards, and this is a good thing for many developers that knows nothing of array packages and its different typecodes. IMO, the set of portable set of typecodes in struct module should only be abandoned if they cannot fulfil all the requirements of Numeric3/numarray. But I'm pretty confident that they will eventually do. > The only problem with the struct module is that it's missing a few types... > (long double, PyObject, unicode, bit). Well, bit is not used either in Numeric/numarray and I think few people would complain on this (they can always pack bits into bytes). PyObject and unicode can be reduced to a sequence of bytes and some other metadata to the array protocol can be added to complement its meaning (say __array_str_encoding__ = "UTF-8" or similar). long double is the only type that should be added to struct typecodes, but convincing the Python crew to do that should be not difficult, I guess. > > I also think that rather than attach < or > to the start of the > > string it would be easier to have another protocol for endianness. > > Perhaps something like: > > > > __array_endian__ (optional Python integer with the value 1 in it). > > If it is not 1, then a byteswap must be necessary. > > A limitation of this approach is that it can't adequately represent > struct/record arrays where some fields are big endian and others are little > endian. Having a mix of different endianess data values in the same data record would be a bit ill-minded. In fact, numarray does not support this: a recarray should be all little or big endian. I think that '<' and '>' would be more than enough to represent this. > > Bool -- "b%d" % sizeof(bool) > > Signed Integer -- "i%d" % sizeof() > > Unsigned Integer -- "u%d" % sizeof() > > Float -- "f%d" % sizeof() > > Complex -- "c%d" % sizeof() > > Object -- "O%d" % sizeof(PyObject *) > > --- this would only be useful on shared memory > > String -- "S%d" % itemsize > > Unicode -- "U%d" % itemsize > > Void -- "V%d" % itemsize > > The above is a nice start at reinventing the struct module typecodes. If > you and Perry agree to it, that would be great. A few additions though: Again, I think it would be better to not get away from the struct typecodes. But if you end doing it, well, I would like to propose a couple of additions to the new protocol: 1.- Support shapes for record specification. I'm listing two possibilities: A) __array_typestr__ = "(2,3)i(3,)d" This would be an easy extension of the struct string type definition. B) __array_typestr__ = ("i4","f8") __array_shapes__ = ((2,3),(3,)) This is more '? la numarray'. 2.- Allow nested datatypes. Although numarray does not support this yet, I think it could be very advantageous to be able to express: [array(Int32,shape=(5,)),[array(Int16,shape=(2,)),array(Float32,shape=(3,4))]] i.e., the first field would be an array of ints with 6 elements, while the second field would be actually another record made of 2 fields: one array of short ints, and other array of simple precision floats. I'm not sure how exactly implement this, but, what about: A) __array_typestr__ = "(5,)i[(2,)h(3,4)f]" B) __array_typestr__ = ("i4",("i2","f8")) __array_shapes__ = ((5,),((2,),(3,4)) Because I'm suggesting to adhere the struct specification, I prefer option A), although I guess option B would be easier to use for developers (even for extension developers). > > So, what if we proposed for the Python core not something like > > Numeric3 (which would still exist in scipy.base and be everybody's > > favorite array :-) ), but a very minimal array object (scaled back > > even from Numeric) that followed the array protocol and had some > > C-API associated with it. > > > > This minimal array object would support 5 basic types ('bool', > > 'integer', 'float', 'complex', 'Object'). (Maybe a void type > > could be defined and a void "scalar" introduced (which would be > > the bytes object)). These types correspond to scalars already > > available in Python and so the whole 0-dim array Python scalar > > arguments could be ignored. > > I really like this idea. It could easily be implemented in C or Python > script. Since half it's purpose is for documentation, the Python script > implementation might make more sense. Yeah, I fully agree with this also. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From faltet at carabos.com Fri Apr 1 02:17:36 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Apr 1 02:17:36 2005 Subject: [Numpy-discussion] __array_typestr__ In-Reply-To: <424D14E9.70607@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> Message-ID: <200504011215.52914.faltet@carabos.com> A Divendres 01 Abril 2005 11:31, Travis Oliphant va escriure: > * I'm wondering about including multiple types in the typestr. On the > one hand we could describe complicated structures by packing all the > information into the typestr. On the other hand, it may be better if > we just use 'V8' to describe an 8-byte memory buffer with an additional > attribute that contains both the names and the typestr: > > __array_recinfo__ = (('real','f4'),('imag','f4')) > > or for a "rational type" > > __array_recinfo__ = (('numer','i4'),('denom','i4')) > > so that the detail of the typecode for a "record" type is handled by > another special method using tuples. On this level, we could add the > possibility of specifying a shape for a small array inside (just like > the record array of numarray does). Like: __array_recinfo__ = (('numer','i4', (3,4)),('denom','i4', (2,))) ? Also, this can be easily extended to nested types: __array_recinfo__ = (('a','i4',(3,4)),(('b','i4',(2,)),('c','f4',(10,2))) Well, this looks pretty good to me. It has nothing to do with struct format, but is much more usable, of course. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From cjw at sympatico.ca Fri Apr 1 04:57:57 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Apr 1 04:57:57 2005 Subject: [Numpy-discussion] Re: Bytes Object and Metadata In-Reply-To: References: <20050328182929.50411.qmail@web50205.mail.yahoo.com> <42489A65.2030201@ee.byu.edu> <200503301240.55483.faltet@carabos.com> Message-ID: <424D4504.4030606@sympatico.ca> David M. Cooke wrote: >Francesc Altet writes: > > > >>A Dimarts 29 Mar? 2005 01:59, Travis Oliphant va escriure: >> >> >>>My proposal: >>> >>>__array_data__ (optional object that exposes the PyBuffer protocol or a >>>sequence object, if not present, the object itself is used). >>>__array_shape__ (required tuple of int/longs that gives the shape of the >>>array) >>>__array_strides__ (optional provides how to step through the memory in >>>bytes (or bits if a bit-array), default is C-contiguous) >>>__array_typestr__ (optional struct-like string showing the type --- >>>optional endianness indicater + Numeric3 typechars, default is 'V') >>>__array_itemsize__ (required if above is 'S', 'U', or 'V') >>>__array_offset__ (optional offset to start of buffer, defaults to 0) >>> >>> >>Considering that heterogenous data is to be suported as well, and >>there is some tradition of assigning names to the different fields, I >>wonder if it would not be good to add something like: >> >>__array_names__ (optional comma-separated names for record fields) >> >> > >A sequence (list or tuple) of strings would be preferable. That >removes all worrying about using commas in the names. > > > As I understand it, record arrays can be heterogenous. If so, wouldn't it make sense for this to be a sequence of tuples? For example: [('Name', charStringType), ('Age', _nt.Int8), ...] Where _nt is defined by something like: import numarray.numerictypes as _nt Colin W. From cjw at sympatico.ca Fri Apr 1 05:49:53 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Apr 1 05:49:53 2005 Subject: [Numpy-discussion] __array_typestr__ In-Reply-To: <424D14E9.70607@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> Message-ID: <424D5136.8060703@sympatico.ca> Travis Oliphant wrote: > > For the most part, it seems the array protocol is easy to agree on. > The one difficulty is typestr. > > For what it's worth, here are my opinions on what has been said > regarding the typestr. > > * Endian-ness should be included in the typestr --- it is how the data > is viewed and an intrinsic part of the type as much as int, or float. In most cases, endian-ness is associated with the machine being used, rather than the data element. It seems to me that numarray's numeric types provides a good model, which may need enhancing for records, strings etc. numarray has: Numeric type objects: Bool Int8 Int16 Int32 Int64 UInt8 UInt16 UInt32 UInt64 Float32 Double64 Complex32 Complex64 Numeric type classes: NumericType BooleanType SignedType UnsignedType IntegralType SignedIntegralType UnsignedIntegralType FloatingType ComplexType > > * I like the fact that struct character codes are documented, but it > is hard to remember. This is the problem. numerictypes provides nmenonic names and, if one uses an editor with autocompletion, a prompt from the editor. For those interface to existing code, there could be a helper function: def toType(eltType= 'i'): => an instance of NumericType It should also be possible to derive the typeCode from the eltType, numarray doesn't seem to provide this. Colin W. From cjw at sympatico.ca Fri Apr 1 06:07:38 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Apr 1 06:07:38 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424C8D05.7030006@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> Message-ID: <424D5557.5010806@sympatico.ca> Travis Oliphant wrote: > > To all interested in the future of arrays... > > I'm still very committed to Numeric3 as I want to bring the numarray > and Numeric people together behind a single array object for > scientific computing. > Good. > But, I've been thinking about the array protocol and thinking that it > would be a good thing if this became universal. One of the ways to > make it universal is by having something that follows it in the Python > core. > > > So, what if we proposed for the Python core not something like > Numeric3 (which would still exist in scipy.base and be everybody's > favorite array :-) ), but a very minimal array object (scaled back > even from Numeric) that followed the array protocol and had some C-API > associated with it. > I thought that your original Numeric3 proposal was in this direction - a simple multidimensional array class/type which could eventually replace Python's array module. In addition, and separately, there were to be a collection of ufuncs. Later, discussion seemed to drift from the basic Numeric3 towards SciPy. > > This minimal array object would support 5 basic types ('bool', > 'integer', 'float', 'complex', 'Object'). (Maybe a void type could > be defined and a void "scalar" introduced (which would be the bytes > object)). These types correspond to scalars already available in > Python and so the whole 0-dim array Python scalar arguments could be > ignored. Could this be subclassed so that provision could be made for Int8 (or even Int1)? How would an array of records be handled? > > Math could be done without ufuncs initially (people really needing > speed would use scipy.base anyway). But, more people in the Python > community would be able to use arrays and get used to them. And we > would have a reference array_protocol object so that extension writers > could write to it. It would be good if the user could write his/her ufunc in Python. > > > I would not try a project like this until after scipy_core is out, but > it's an interesting thing to think about. I mainly wanted feedback on > the basic concept. > The concept looks good. Regarding timing, it seems better to build the foundation before building the house. Colin W. > > An alternative would be to "add" multidimensionality to the array > object already part of Python, fix it's reallocating with an exposed > buffer problem, and add the array protocol. > > > > -Travis From oliphant at ee.byu.edu Fri Apr 1 12:10:00 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 12:10:00 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <371840ef050401104875650ddd@mail.gmail.com> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> Message-ID: <424DAA16.10007@ee.byu.edu> >>I'm still very committed to Numeric3 as I want to bring the numarray and >>Numeric people together behind a single array object for scientific >>computing. >> >> Notice that regardless of what I said about what goes into standard Python, something like Numeric3 will always exist for use by scientific users. It may just be a useful add on package like Numeric has always been. There is no way I'm going to abandon use of a more capable Numeric. >Right. I believe that, among all libraries related with numeric array, >eventually only one library in the Python core will survive no matter >how much advanced functions are available, because of the strong >compatibility with other packages. > > I don't think this is true. Things will survive based on utility. What we are trying to do with the Python core is define a standard protocol that is flexible enough to handle anybody's concept of an advanced array (in particular the advanced array that will be in scipy.base). >Totally agree. I doubt that Guido will accept a large and complex >library into the standard Python core. I think Numeric is already too >complex, and numarray is far more complex to be a standard lib in the >Python core. Numeric3 must shift its focus from better Numeric to >scale-downed Numeric. > > I disagree about "shifting focus." Personally, I'm not going to work on something like that until we have a single array package that fulfills the needs of all Numeric and most numarray users. I'm just pointing out that what goes in to the Python core should probably be a scaled down object with a souped-up "protocol" so that the array object in scipy.base can be used through the array protocol by any other package without worrying about having scipy_core at compile time. >For example, how many Python users care about masked arrays? How many >Python users want the advanced type from the Python core? I think the >advanced array type should in some extension lib, not in core array >lib. > Perhaps you do see my point of view. Not all Python users care about an advanced array object but nearly all technical (scientific and engineering users) will. We just need interoperability. >If we make clear our target ? becoming a standard library in the >Python core, we may have no problem in determining what functions >should be in the core array lib and what functions should be in >extension libraries using the core array type. > > >Today, the array type in the Python core is almost useless. >If Numeric3 offers just much faster performance on numeric types, many >Python users will start to use new array type in their applications. >Once it happens, we can create a bunch of extension libraries for more >advanced operations on the new array type. > > The "bunch of extension libraries" is already happening and is already in progress. I think we've overshot the mark for the Python core, however. No need to wait "til something happens" >With all my heart I hope that Numeric3 gears to this direction before > > >we get the tragedy to have Numeric4, Numeric5, and so on. > > I'm coming to see that what is most important for the Python core is "protocols". Then, there can be a "million" different array types that can all share each other's memory without hassle and much overhead. I'm still personally interested in a better Numeric, however, and so won't be abandoning the concept of Numeric3 (notice I now call it scipy.base --- not a change of focus just a change of name). I just wanted to encourage some discussion on the array protocol. -Travis From oliphant at ee.byu.edu Fri Apr 1 12:23:19 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 12:23:19 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424D5557.5010806@sympatico.ca> References: <424C8D05.7030006@ee.byu.edu> <424D5557.5010806@sympatico.ca> Message-ID: <424DAD00.1050203@ee.byu.edu> > I thought that your original Numeric3 proposal was in this direction - > a simple multidimensional array class/type which could > eventually replace Python's array module. In addition, and > separately, there were to be a collection of ufuncs. No, that's a misunderstanding. Original Numeric3 was never about "simplyifying." Because, we can't "simplify" and still support the uses that Numeric and numarray have enjoyed. I'm more interested in using something like Numeric and will always install it should it exist. I was iunterested in getting it into the Python core for standardization. I now believe that "universal" standardization should occur around a "protocol" and perhaps a simple implementation. I'm still interested in a more "local standardization" for numarray and Numeric users (not all Python users) which is the focus of scipy.base (used to call it Numeric3). In the process we are generating good ideas that can be used for "global standardization" among all Python users. But, I can't do it all. I have to keep focused on what I'm doing with the current Numeric arrayobject (and that has never been about "getting rid of functionality"). > > Later, discussion seemed to drift from the basic Numeric3 towards SciPy. The context of the problem as I see it intimately involves scipy and the collection of packages surrounding numarray. The small community we have built up was diverging in the creation of external packages. This is what troubled me most deeply. So, there is no Numeric3 separate from the larger issue of "a collection of standard scientific packages" that scipy has tried to be. That is why reference to scipy is made. I see no "drifting occurring" There is a separate issue of a good array module for Python. I now see the solution there as being more of a "good array protocol" for Python with a default very simple implementation that is improved by extension modules. > > Could this be subclassed so that provision could be made for Int8 (or > even Int1)? I suppose, but this is kind of missing the point, because Numeric3 will support those types. If you need a more advanced array you install scipy.base. > > How would an array of records be handled? By installing a more advanced array. > The concept looks good. Regarding timing, it seems better to build > the foundation before building the house. The problem with your analogy is that the "sprawling mansion in the suburbs is already built" (Numeric has been around for a long time). The question is what kind of housing to build for the city dwellers and what kind of transportation system do we establish so people can move back and forth easily. -Travis From sdhyok at gmail.com Fri Apr 1 12:59:07 2005 From: sdhyok at gmail.com (Daehyok Shin) Date: Fri Apr 1 12:59:07 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424DAA16.10007@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu> Message-ID: <371840ef05040112574b6a86bd@mail.gmail.com> On Apr 1, 2005 8:07 PM, Travis Oliphant wrote: snip > I disagree about "shifting focus." Personally, I'm not going to work on > something like that until we have a single array package that fulfills > the needs of all Numeric and most numarray users. I'm just pointing > out that what goes in to the Python core should probably be a scaled > down object with a souped-up "protocol" so that the array object in > scipy.base can be used through the array protocol by any other package > without worrying about having scipy_core at compile time. Would you tell me what exactly you means by "protocol"? Do you mean a standard defintion of a series of "interfaces" for array type in Python? -- Daehyok Shin Geography Department University of North Carolina-Chapel Hill USA From oliphant at ee.byu.edu Fri Apr 1 15:14:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 1 15:14:07 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <371840ef05040112574b6a86bd@mail.gmail.com> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu> <371840ef05040112574b6a86bd@mail.gmail.com> Message-ID: <424DD56E.6070801@ee.byu.edu> Daehyok Shin wrote: >On Apr 1, 2005 8:07 PM, Travis Oliphant wrote: > >snip > > > >>I disagree about "shifting focus." Personally, I'm not going to work on >>something like that until we have a single array package that fulfills >>the needs of all Numeric and most numarray users. I'm just pointing >>out that what goes in to the Python core should probably be a scaled >>down object with a souped-up "protocol" so that the array object in >>scipy.base can be used through the array protocol by any other package >>without worrying about having scipy_core at compile time. >> >> > >Would you tell me what exactly you means by "protocol"? >Do you mean a standard defintion of a series of "interfaces" for array >type in Python? > > Yes, pretty much. I would even go so far as to say a set of hooks in the typeobject (like the sequence, mapping, and buffer protocols). -Travis From steve at shrogers.com Sat Apr 2 06:50:58 2005 From: steve at shrogers.com (Steven H. Rogers) Date: Sat Apr 2 06:50:58 2005 Subject: [Numpy-discussion] Thoughts on getting "something" in the Python core In-Reply-To: <424DAA16.10007@ee.byu.edu> References: <424C8D05.7030006@ee.byu.edu> <371840ef050401104875650ddd@mail.gmail.com> <424DAA16.10007@ee.byu.edu> Message-ID: <424EB08F.90909@shrogers.com> First, thanks for doing this Travis. Travis Oliphant wrote: > > I'm coming to see that what is most important for the Python core is > "protocols". Then, there can be a "million" different array types that > can all share each other's memory without hassle and much overhead. > I'm still personally interested in a better Numeric, however, and so > won't be abandoning the concept of Numeric3 (notice I now call it > scipy.base --- not a change of focus just a change of name). I just > wanted to encourage some discussion on the array protocol. > Your array protocol protocol idea sounds good. It should not only make it easier to interoperate with other Python packages, but foreign entities like APL/J, Matlab, and LabVIEW. Regards, Steve -- Steven H. Rogers, Ph.D., steve at shrogers.com Weblog: http://shrogers.com/weblog "Reach low orbit and you're half way to anywhere in the Solar System." -- Robert A. Heinlein From oliphant at ee.byu.edu Sat Apr 2 21:30:03 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Apr 2 21:30:03 2005 Subject: [Numpy-discussion] scipy.base (Numeric3) now has math Message-ID: <424F7F06.4090200@ee.byu.edu> I've updated scipy.base (Numeric3) so math is now supported (uses the old ufunc apparatus with new added types support). There is still some work to be done so this is still very alpha (but at least math operations work): - update the ufunc apparatus to use buffers to avoid copying an entire array just for type casting (and to support unaligned and non byteswapped arrays) - update the way error handling is done. - update the coercion strategy like numarray does - fix all the bugs. I've also fixed things so Numeric extension modules should compile --- Please report warnings and bugs with this as well. Thanks for all your help, -Travis From oliphant at ee.byu.edu Sun Apr 3 01:06:16 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 01:06:16 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <200504011215.52914.faltet@carabos.com> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> Message-ID: <424FB19B.4060800@ee.byu.edu> Hello all, I've updated the numeric web site and given special prominence to the array interface which I believe should be pushed. Numeric 24.0 will support it as will scipy.base (Numeric3). I hope that numarray will also support it in an upcoming release. Please read through the interface and feel free to comment. However, unless there is a glaring problem, I'm more interested that you feel free to start using the interface then that we debate it further. Scott has expressed interest in implementing a very basic Python-only implementation of an object exporting the interface. I suggest he and anyone else interested look at numarray for a starting point for a Python implementation, and Numeric for a C implementation. -Travis From mdehoon at ims.u-tokyo.ac.jp Sun Apr 3 01:24:07 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 3 01:24:07 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <424FB19B.4060800@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> Message-ID: <424FB72F.4020201@ims.u-tokyo.ac.jp> There are two questions that I have about the array interface: 1) To what degree will the new array interface look different to users of the existing Numerical Python? If I were to install the new array interface on the computer of a current Numerical Python user and I didn't tell them, would they notice a difference? 2) To what degree is the new array interface compatible with Numerical Python for the purpose of C extension modules? Do C extension modules need to be modified in order to use the new array interface? --Michiel. Travis Oliphant wrote: > > Hello all, > > I've updated the numeric web site and given special prominence to the > array interface which I believe should be pushed. Numeric 24.0 will > support it as will scipy.base (Numeric3). I hope that numarray will > also support it in an upcoming release. > > Please read through the interface and feel free to comment. However, > unless there is a glaring problem, I'm more interested that you feel > free to start using the interface then that we debate it further. > > Scott has expressed interest in implementing a very basic Python-only > implementation of an object exporting the interface. I suggest he and > anyone else interested look at numarray for a starting point for a > Python implementation, and Numeric for a C implementation. > > -Travis > > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Sun Apr 3 01:41:09 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 01:41:09 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <424FB72F.4020201@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> Message-ID: <424FB9FA.1090109@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > There are two questions that I have about the array interface: > > 1) To what degree will the new array interface look different to users > of the existing Numerical Python? If I were to install the new array > interface on the computer of a current Numerical Python user and I > didn't tell them, would they notice a difference? Nothing will look different. For now there is nothing to "install" so the array interface is just something to expect from other objects. The only thing that would be different is in Numeric 24.0 (if a users were to call array() and supported the array interface then Numeric could return an array (without copying data). Older versions of Numeric won't benefit from the interface but won't be harmed either. > 2) To what degree is the new array interface compatible with Numerical > Python for the purpose of C extension modules? Do C extension modules > need to be modified in order to use the new array interface? It is completely compatible. C-extensions don't need to be modified at all to make use of the interface (of course they should be re-compiled if using Numeric 24.0). Only two things will be modified in Numeric 24.0. 1) PyArray_FromObject and friends will be expanded so that if an object exposes the array interface the right thing will be done to use it's memory. 2) Attributes will be added so that Numeric arrays expose the array interface so other objects can use their memory intelligently -Travis From cjw at sympatico.ca Sun Apr 3 05:23:12 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Apr 3 05:23:12 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem Message-ID: <424FE002.6010800@sympatico.ca> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install running install running build running config error: The .NET Framework SDK needs to be installed before building extensions for Python. Is there any chance that a Windows binary could be made available for testing? Colin W. From mdehoon at ims.u-tokyo.ac.jp Sun Apr 3 05:35:05 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 3 05:35:05 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <424FE002.6010800@sympatico.ca> References: <424FE002.6010800@sympatico.ca> Message-ID: <424FE3D8.7040200@ims.u-tokyo.ac.jp> You can use Cygwin's MinGW compiler by adding --compiler=mingw after the setup command. --Michiel. Colin J. Williams wrote: > C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install > running install > running build > running config > error: The .NET Framework SDK needs to be installed before building > extensions for Python. > > Is there any chance that a Windows binary could be made available for > testing? > > Colin W. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Sun Apr 3 05:46:04 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 3 05:46:04 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <424FE3D8.7040200@ims.u-tokyo.ac.jp> References: <424FE002.6010800@sympatico.ca> <424FE3D8.7040200@ims.u-tokyo.ac.jp> Message-ID: <424FE64F.7030706@ims.u-tokyo.ac.jp> Sorry, that should be --compiler=mingw32. Michiel Jan Laurens de Hoon wrote: > You can use Cygwin's MinGW compiler by adding --compiler=mingw after the > setup command. > > --Michiel. > > Colin J. Williams wrote: > >> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install >> running install >> running build >> running config >> error: The .NET Framework SDK needs to be installed before building >> extensions for Python. >> >> Is there any chance that a Windows binary could be made available for >> testing? >> >> Colin W. >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real users. >> Discover which products truly live up to the hype. Start reading now. >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/numpy-discussion >> >> > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From gruben at bigpond.net.au Sun Apr 3 06:32:09 2005 From: gruben at bigpond.net.au (Gary Ruben) Date: Sun Apr 3 06:32:09 2005 Subject: [Numpy-discussion] array slicing question Message-ID: <424FF03A.4060107@bigpond.net.au> This may be relevant to Numeric 3, but is possibly just a general question about array slicing which will either reveal a deficiency in specifying slices or in my knowledge of slicing with numpy. A while ago I was trying to reimplement some Matlab image processing code in Numeric and revealed a deficiency in the way slices are defined. Suppose I have an n x m array and want to slice off the first and last p rows and columns where p can range from 0 to some number. Matlab provides a clean way of doing this, but in numpy it's a bit of a mess. You might think you could do >>> p=1 >>> b = a[p:-p] but if p=0, this fails. My final solution involved getting the array shape and explicitly calculating start and stop columns, but is there a better way? Gary R. From oliphant at ee.byu.edu Sun Apr 3 08:36:35 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 08:36:35 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> Message-ID: <42500D03.3030809@ee.byu.edu> I don't know if you have followed the array interface discussion. It is defined at http://numeric.scipy.org I have implemented consumer and exporter interfaces for Numeric and an exporter interface for numarray. The consumer interface needs a little help but shouldn't take too long for someone who understands numarray better. Now Numeric arrays can share data with numarray (no data copy). scipy.base arrays will also implement the array interface. I think the array interface is a good direction to go. -Travis From konrad.hinsen at laposte.net Sun Apr 3 13:03:19 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Sun Apr 3 13:03:19 2005 Subject: [Numpy-discussion] array slicing question In-Reply-To: <424FF03A.4060107@bigpond.net.au> References: <424FF03A.4060107@bigpond.net.au> Message-ID: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net> On 03.04.2005, at 15:31, Gary Ruben wrote: > You might think you could do > >>> p=1 > >>> b = a[p:-p] > > but if p=0, this fails. b = a[p:len(a)-p] works even for p=0. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From oliphant at ee.byu.edu Sun Apr 3 21:21:15 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 21:21:15 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <20050403165914.GC10730@idi.ntnu.no> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> Message-ID: <4250C0A4.9070707@ee.byu.edu> Magnus Lie Hetland wrote: >Travis Oliphant : > > >>I don't know if you have followed the array interface discussion. It >>is defined at http://numeric.scipy.org >> >> > >This very, very good! The numeric future of Python is looking very >bright, IMO :) > >Some tiny points: > > - Shouldn't the regexp for __array_typestr__ be > '[<>]?[tbiufcOSUV][0-9]+'? > > Probably. Since, I guess you can only have one of < or > . Thanks.. > - What are the semantics when __array_typestr__ isn't V[0-9]+ and > __array_descr__ is set? Is __array_typestr__ ignored? Or... What > would it be used for? > > I would say that the __array_descr__ always gives more information but not every array implementation will support looking at it. For example, current Numeric (24.0 in CVS) ignores __array_descr__ and just looks at the typestr (and doesn't support 'V'). So, I suspect that another array package that knows this may choose something else besides 'V' if it really wants Numeric to still understand it. Suppose you have a complex short int array with __array_descr__ = 'V8 > - Does the description of __array_data__ mean that the discussed > bytes type is no longer needed? (If we can use buffers, that > sounds very good to me.) > > Bytes is still needed because the buffer object is not very good and we need a good buffer object in Python for lots of other reasons. It would be very useful, for example to be able to allocate memory using the Python bytes object. But, it does mean less pressure to get it to work. > - Why the parentheses around "buffer protocol-satisfying object" in > the description of __array_mask__? And why must it be 'b1'? What > if I happen to have mask data from a non-array-protocol source, > which happens to be, say, b8 (not unreasonable, I think)? Wouldn't > it be good to allow any size of these, and just use zero/non-zero > as the criterion? Some of the point of this protocol is to avoid > copying and using the original data, after all...? (Same goes for > the requirement that it be C-contiguous. I guess I'm basically > saying that perhaps __array_mask__ should be an array itself. Or, > at least, that it could be *allowed* to be...) > > I added the mask late last night. It is probably the least thought out portion. Everything else has been through the ringer a couple more times. My whole thinking is that I just didn't want to explode the protocol with another special name for the mask type. But, saying that the mask object itself can support the array interface doesn't do that, so I think that is a good call. Last night, using the numarray exporter interface and the Numeric consumer interface I was able to share data between a Numeric array and numarray array with no copying of the data buffers. It was very nice. -Travis From oliphant at ee.byu.edu Sun Apr 3 21:29:12 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sun Apr 3 21:29:12 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <4250C0A4.9070707@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> Message-ID: <4250C276.5090300@ee.byu.edu> >> > Probably. Since, I guess you can only have one of < or > . Thanks.. > >> - What are the semantics when __array_typestr__ isn't V[0-9]+ and >> __array_descr__ is set? Is __array_typestr__ ignored? Or... What >> would it be used for? >> >> > I would say that the __array_descr__ always gives more information but > not every array implementation will support looking at it. For > example, current Numeric (24.0 in CVS) ignores __array_descr__ and > just looks at the typestr (and doesn't support 'V'). So, I suspect > that another array package that knows this may choose something else > besides 'V' if it really wants Numeric to still understand it. > Suppose you have a complex short int array with > > __array_descr__ = 'V8 Let me finish this example: Suppose you have a complex short int array with __array_descr__ = [('real','i2'),('imag','i2')] you could describe this as __array_typestr__ = 'V4' or think of it as a 4 byte integer if you want to make sure that another array package that may not support void pointers can still manipulate the data, and so the creator of the complex short int array may decide that __array_typestr__ = 'i4' is the right thing to do for packages that ignore the __array_descr__ attribute. -Travis From mdehoon at ims.u-tokyo.ac.jp Mon Apr 4 01:17:15 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Apr 4 01:17:15 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <424FB9FA.1090109@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> Message-ID: <4250F8E5.9020701@ims.u-tokyo.ac.jp> Travis Oliphant wrote: >> 1) To what degree will the new array interface look different to users >> of the existing Numerical Python? > > Nothing will look different. For now there is nothing to "install" so > the array interface is just something to expect from other objects. > The only thing that would be different is in Numeric 24.0 (if a users > were to call array() and supported the array > interface then Numeric could return an array (without copying data). > Older versions of Numeric won't benefit from the interface but won't be > harmed either. Very nice. Thanks, Travis. I'm not sure what you mean by "the array interface could become part of the Python standard as early as Python 2.5", since there is nothing to install. Or does this mean that Python's array will conform to the array interface? Some comments on the array interface: 1) The "__array_shape__" method is identical to the existing "shape" method in Numerical Python and numarray (except that "shape" does a little bit better checking, but it can be added easily to "__array_shape__"). To avoid code duplication, it might be better to keep that method. (and rename the other methods for consistency, if desired). 2) The __array_datalen__ is introduced to get around the 32-bit int limitation of len(). Another option is to fix len() in Python itself, so that it can return integers larger than 32 bits. So we can avoid adding a new method. 3) Where do default values come from? Is it the responsability of the extension module writer to find out if the array module implements e.g. __array_strides__, and substitute the default values if it doesn't? If so, I have a slight preference to make all methods required, since it's not a big effort to return the defaults, and there will be more extension modules than array packages (or so I hope). Whereas the array interface certainly helps extension writers to create an extension module that works with all array implementations, it also enables and perhaps encourages the creation of different array modules, while our original goal was to create a single array module that satisfies the needs of both Numerical Python and numarray users. I still think such a solution would be preferable. Inconsistencies other than the array interface (e.g. one implements argmax(x) while another implements x.argmax()) may mean that an extension module can work with one array implementation but not with another, even though they both conform to the array interface. We may end up with several array packages (we already have Numerical Python, numarray, and scipy), and extension modules that work with one package and not with another. So in a sense, the array interface is letting the genie out of the bottle. But maybe such a single array package is not attainable given the different needs of the different communities. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From magnus at hetland.org Mon Apr 4 02:05:28 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Apr 4 02:05:28 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <4250C0A4.9070707@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> Message-ID: <20050404090356.GB21527@idi.ntnu.no> Travis Oliphant : > [snip] > Last night, using the numarray exporter interface and the Numeric > consumer interface I was able to share data between a Numeric array and > numarray array with no copying of the data buffers. It was very nice. Wow -- a historic moment :) Now, if we can only get the stdlib's array module to support this protocol (and sprout some more dimensions), as you mentioned... That would really be cool. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Mon Apr 4 02:15:10 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Apr 4 02:15:10 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <4250C276.5090300@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <4250C0A4.9070707@ee.byu.edu> <4250C276.5090300@ee.byu.edu> Message-ID: <20050404091311.GC21527@idi.ntnu.no> Travis Oliphant : > [snip] > > Let me finish this example: > > Suppose you have a complex short int array with > > __array_descr__ = [('real','i2'),('imag','i2')] > > you could describe this as > > __array_typestr__ = 'V4' Sure -- I can see how using 'V' makes sense... You're just telling the host program how many bytes you've got, and that's it. That makes sense to me. What I wondered about was what happened when you use a more specific (and conflicting) type for the typestr... > or think of it as a 4 byte integer if you want to make sure that another > array package that may not support void pointers can still manipulate > the data, and so the creator of the complex short int array may decide that > > __array_typestr__ = 'i4' This is basically what I'm wondering about. It would make sense (to me) to say that the data type was 'V4', because that's simply less specific, in a way. But saying 'i4' is just as specific as the complex example, above -- but it means something else! You're basically giving the program permission to interpret a four-byte complex number as a four-byte integer, aren't you? Sounds almost like a recipe for disaster to me :} On the other hand -- there is no complex integer type in the interface, and using 'c4' probably would be completely wrong as well. I would almost be tempted to say that if __array_descr__ is in use, __array_typestr__ *has* to use the 'V' type. (Or, one could make some more complicated rules, perhaps, in order to allow other types.) As for not supporting the 'V' type -- would that really be considered a conforming implementation? According to the spec, "Objects wishing to support an N-dimensional array in application code should look for these attributes and use the information provided appropriately". The typestr is required, so... Perhaps the spec should be explicit about the shoulds/musts/mays of the specific typecodes? What must be supported, what may be supported etc.? Or perhaps that doesn't make sense? It just seems almost too bad that one package would have to know what another package supports in order to formulate its own typestr... It sort of throws part of the interoperability out the window. > is the right thing to do for packages that ignore the __array_descr__ > attribute. > > -Travis -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Mon Apr 4 02:25:17 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Mon Apr 4 02:25:17 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> Message-ID: <20050404092421.GD21527@idi.ntnu.no> Michiel Jan Laurens de Hoon : > [snip] > 1) The "__array_shape__" method is identical to the existing "shape" method > in Numerical Python and numarray (except that "shape" does a little bit > better checking, but it can be added easily to "__array_shape__"). To avoid > code duplication, it might be better to keep that method. (and rename the > other methods for consistency, if desired). Why not just use 'shape' as an alias for '__array_shape__' (or vice versa)? > 2) The __array_datalen__ is introduced to get around the 32-bit int > limitation of len(). Another option is to fix len() in Python > itself, so that it can return integers larger than 32 bits. So we > can avoid adding a new method. That would bee good, IMO. But how realistic is it? (I have no idea -- this is not a rhetorical question :) > 3) Where do default values come from? Is it the responsability of the > extension module writer to find out if the array module implements e.g. > __array_strides__, and substitute the default values if it doesn't? If the support of these attributes is optional, that would have to be the case. > If so, I have a slight preference to make all methods required, > since it's not a big effort to return the defaults, and there will > be more extension modules than array packages (or so I hope). But isn't the point that you should be able to export other things (such as images or sounds or what-have-you) *as* arrays? As for implementing the defaults: How about having some utility functions (or a wrapper object or whatever) that does just this -- so neither array nor client code need think about it? This could, perhaps, be put in the stdlib array module or something... > Whereas the array interface certainly helps extension writers to > create an extension module that works with all array > implementations, it also enables and perhaps encourages the creation > of different array modules, while our original goal was to create a > single array module that satisfies the needs of both Numerical > Python and numarray users. I still think such a solution would be > preferable. I agree. But what I think would be cool if such a standardized package could take any object conforming to this protocol and use it (possibly as the argument to the array() constructor) -- with all the ufuncs and operators it has. Because then I could implement specialized arrays where the specialized behaviour lies just in the data itself, not the behaviour. For example, I might want to create a thin array wrapper around a memory-mapped, compressed video file, and treat it as a three-dimensional array of rgb triples... (And so forth.) > Inconsistencies other than the array interface (e.g. one implements > argmax(x) while another implements x.argmax()) may mean that an > extension module can work with one array implementation but not with > another, This does *not* sound like a good thing -- I agree. Certainly not what I would hope this protocol is used for. > even though they both conform to the array interface. We may end up > with several array packages (we already have Numerical Python, > numarray, and scipy), and extension modules that work with one > package and not with another. So in a sense, the array interface is > letting the genie out of the bottle. Well, perhaps -- but the current APIs of e.g., Numeric or numarray could be used in the same way (i.e., writing your own array implementations with the same interface). As (I think) Travis has said, there is still a goal (somewhat separate from the protocol) of getting one standard heavy-duty numerical array package. I think that would be very beneficial. The point (as I see it) is just to make it easier for various array implementations (i.e., the data, not the ufuncs/operators etc.) to interoperate with it. > But maybe such a single array package is not attainable given the > different needs of the different communities. I would certainly hope it is. > --Michiel. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From gruben at bigpond.net.au Mon Apr 4 05:14:09 2005 From: gruben at bigpond.net.au (Gary Ruben) Date: Mon Apr 4 05:14:09 2005 Subject: [Numpy-discussion] array slicing question In-Reply-To: <9d9c98344e25f20ac8509e76f3917ec6@laposte.net> References: <424FF03A.4060107@bigpond.net.au> <9d9c98344e25f20ac8509e76f3917ec6@laposte.net> Message-ID: <42512F57.2050007@bigpond.net.au> Thanks Konrad, Sorry, my example was too simple. The actual example representing an image should have been 2-D and not necessarily square. Therefore I used shape instead of len and it seemed messy doing it this way. Gary konrad.hinsen at laposte.net wrote: > On 03.04.2005, at 15:31, Gary Ruben wrote: > >> You might think you could do >> >>> p=1 >> >>> b = a[p:-p] >> >> but if p=0, this fails. > > > b = a[p:len(a)-p] works even for p=0. > > Konrad. > -- > ------------------------------------------------------------------------ > ------- > Konrad Hinsen > Laboratoire Leon Brillouin, CEA Saclay, > 91191 Gif-sur-Yvette Cedex, France > Tel.: +33-1 69 08 79 25 > Fax: +33-1 69 08 82 61 > E-Mail: khinsen at cea.fr > ------------------------------------------------------------------------ > ------- > > From oliphant at ee.byu.edu Mon Apr 4 12:16:09 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 4 12:16:09 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4250F8E5.9020701@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> Message-ID: <4251920B.6060708@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >>> 1) To what degree will the new array interface look different to >>> users of the existing Numerical Python? >> >> >> Nothing will look different. For now there is nothing to "install" >> so the array interface is just something to expect from other >> objects. The only thing that would be different is in Numeric 24.0 >> (if a users were to call array() and supported the >> array interface then Numeric could return an array (without copying >> data). Older versions of Numeric won't benefit from the interface but >> won't be harmed either. > > > Very nice. Thanks, Travis. > I'm not sure what you mean by "the array interface could become part > of the Python standard as early as Python 2.5", since there is nothing > to install. Or does this mean that Python's array will conform to the > array interface? The latter is what I mean... I think it is important to have something in Python itself that "conforms to the interface." I wonder if it would also be nice to have some protocol slots in the object type so that extension writers can avoid converting some objects. There is also the possibility that a very simple N-d array type could be included in Python 2.5 that conforms to the interface, if somebody wants to champion that. I think it is important to realize what the array interface is trying to accomplish. From my perspective, I still think it is better for the scientific community to build off of a single array object that is "best of breed." The purpose of the array interface is to allow us scientific users to share information with other Python extension writers who may be wary to require scipy.base for their users but who really should be able to interoperate with scipy.base arrays. I'm thinking of extensions like wxPython, PIL, and so forth. There are also lots of uses for arrays that don't necessarily need the complexity of the scipy.base array (or uses that need even more types). At some point we may be able to accomodate dynamic type additions to the scipy.base array. But, right now it requires enough work that others may want to design their own simple arrays. It's very useful if all such arrays could speak together with a common basic language. The fact that numarray and Numeric arrays can talk to each other more seamlessly was not the main goal of the array interface but it is a nice side benefit. I'd still like to see the scientific community use a single array. But, others may not see it that way. The array interface lets us share more easily. > > Some comments on the array interface: > > 1) The "__array_shape__" method is identical to the existing "shape" > method in Numerical Python and numarray (except that "shape" does a > little bit better checking, but it can be added easily to > "__array_shape__"). To avoid code duplication, it might be better to > keep that method. (and rename the other methods for consistency, if > desired). There is no code duplication. In these cases it is just another name for .shape. What "better checking" are you referring to? > > 2) The __array_datalen__ is introduced to get around the 32-bit int > limitation of len(). Another option is to fix len() in Python itself, > so that it can return integers larger than 32 bits. So we can avoid > adding a new method. Python len() will never return a 64-bit number on a 32-bit platform. > > 3) Where do default values come from? Is it the responsability of the > extension module writer to find out if the array module implements > e.g. __array_strides__, and substitute the default values if it > doesn't? If so, I have a slight preference to make all methods > required, since it's not a big effort to return the defaults, and > there will be more extension modules than array packages (or so I hope). Optional attributes let modules that care talk to each other on a "higher level" without creating noise for simpler extensions. Both the consumer and exporter have to use it to matter. The defaults are just clarifying what is being assumed if it isn't there. > > Whereas the array interface certainly helps extension writers to > create an extension module that works with all array implementations, > it also enables and perhaps encourages the creation of different array > modules, while our original goal was to create a single array module > that satisfies the needs of both Numerical Python and numarray users. > I still think such a solution would be preferable. I agree with you. I would like a single array module for scientific users. But, satisfying everybody is probably impossible with a single array object. Yes, there could be a proliferation of array objects but sometimes we need multiple array objects to learn from each other. It's nice to have actual code that implements some idea rather than just words in a mailing list. The interface allows us to talk to each other while we learn from each other's actual working implementations. In a way this is like the old argument between the 1920-era communists and the free-marketers. The communists say that we should have only one company that produces some product because having multiple companies is "wasteful" of resources, while the free-marketers point out that satisfying consumers is tricky business, and there is not only "one right way to do it." Therefore, having multiple companies each trying to satisfy consumers actually creates wealth as new and better ideas are tried by the different companies. The successful ideas are emulated by the rest. In mature markets there tend to be a reduction in the number of producers while in developing markets there are all kinds of companies producing basically the same thing. Of course software creates it's own issues that aren't addressed by that simple analogy, but I think it's been shown repeatedly that good interfaces (http, smtp anyone?) create a lot of utility. > Inconsistencies other than the array interface (e.g. one implements > argmax(x) while another implements x.argmax()) may mean that an > extension module can work with one array implementation but not with > another, even though they both conform to the array interface. We may > end up with several array packages (we already have Numerical Python, > numarray, and scipy), and extension modules that work with one package > and not with another. So in a sense, the array interface is letting > the genie out of the bottle. I think this genie is out of the bottle already. We need to try and get our wishes from it now. -Travis From xscottg at yahoo.com Mon Apr 4 19:09:30 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 4 19:09:30 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: 6667 Message-ID: <20050404233322.61350.qmail@web50208.mail.yahoo.com> --- Michiel Jan Laurens de Hoon wrote: > > I'm not sure what you mean by "the array interface could become > part of the Python standard as early as Python 2.5", since there > is nothing to install. Or does this mean that Python's array will > conform to the array interface? > It would be nice to have the Python array module support the protocol for the 1-Dimensional arrays that it implements. It would also be nice to add a *simple* ndarray object in the core that supports multi-dimensional arrays. I think breaking backward compatibility of the existing Python array module to support multiple dimensions would be a mistake and unlikely to get accepted. A PEP would likely be required to make the changes to the array module, and also to add an ndarray module would likely document the interface. In that regard, it could "make it into the core" for Python 2.5. But you're right that external packages could support this interface today. There is nothing to install... > > 1) The "__array_shape__" method is identical to the existing "shape" > method in Numerical Python and numarray (except that "shape" does a > little bit better checking, but it can be added easily > to "__array_shape__"). To avoid code duplication, it might be better > to keep that method. (and rename the other methods for consistency, > if desired). > The intent is that all array packages would have the required/optional protocol attributes. Of course at a higher level, this information will probably be presented to the users, but they might choose a different mechanism. So while A.__array_shape__ always returns a tuple of longs, A.shape is free to return a ShapeObject or be an assignable attribute that changes the shape of the object. With the property mechanism, there is no need to store duplicated data (__array_shape__ can be a property method that returns a dynamically generated tuple). Separating the low level description of the array data in memory from the high level interface that particular packages like scipy.base or numarray present to their users is a good thing. > > 3) Where do default values come from? Is it the responsability of the > extension module writer to find out if the array module implements e.g. > __array_strides__, and substitute the default values if it doesn't? If > so, I have a slight preference to make all methods required, since it's > not a big effort to return the defaults, and there will be more extension > modules than array packages (or so I hope). > If we can get a *simple* package into the core, in addition to implementing an ndarray object, this module could have helper functions that do this sort of thing. For instance: def get_strides(A): if hasattr(A, "__array_strides__"): return A.__array_strides__ shape = A.__array_shape__ size = get_itemsize(A) for i in range(len(shape)-1, -1, -1): strides.append(size) size *= shape[i] return tuple(strides) def get_itemsize(A): typestr = A.__array_typestr__ # skip the endian if typestr[0] in '<>': typestr = typestr[1:] # skip the char code typestr = typestr[1:] return long(typestr) def is_contiguous(A): # etc.... Those are probably buggy and need work, but you get the idea... A C implementation of the above would be easy to do and useful, and it could be done inline in a single include file (no linking headaches). Cheers, -Scott From xscottg at yahoo.com Mon Apr 4 19:09:34 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 4 19:09:34 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: 6667 Message-ID: <20050404233447.26327.qmail@web50204.mail.yahoo.com> --- Magnus Lie Hetland wrote: > > I would almost be tempted to say that if __array_descr__ is in use, > __array_typestr__ *has* to use the 'V' type. (Or, one could make some > more complicated rules, perhaps, in order to allow other types.) > Yup, having multiple ways to spell the same information will likely cause problems. Wouldn't be bad for the protocol to say "thou shalt use the specfic typestr when possible". Or to say that the __array_descr__ is only for 'V' typestrs. > > As for not supporting the 'V' type -- would that really be considered > a conforming implementation? According to the spec, "Objects wishing > to support an N-dimensional array in application code should look for > these attributes and use the information provided appropriately". The > typestr is required, so... > I think the intent is that libraries like wxPython or PIL can recognize data that they *want* to work with. They can raise an exception when passed anything that is more complicated than they're willing to deal with. I think many packages will simply punt when they see a 'V' typestr and not look at the more complicated description at all. Nothing wrong with that... The packages that produce more complicated data structures have a way to express it and pass it to the packages that are capable of consuming it. Easy things are easy, and hard things are possible. > > Perhaps the spec should be explicit about the shoulds/musts/mays of > the specific typecodes? What must be supported, what may be supported > etc.? Or perhaps that doesn't make sense? It just seems almost too bad > that one package would have to know what another package supports in > order to formulate its own typestr... It sort of throws part of the > interoperability out the window. > Being very precise in the language describing the protocol is probably a good thing, but I don't see anything that requires packages to formulate their typestr's differently. The little bit of ambiguity that is in the __array_typestr__ and __array_descr__ attributes can be easily clarified. Cheers, -Scott From xscottg at yahoo.com Mon Apr 4 19:09:38 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 4 19:09:38 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404092421.GD21527@idi.ntnu.no> Message-ID: <20050404233620.70070.qmail@web50209.mail.yahoo.com> --- Magnus Lie Hetland wrote: > > Why not just use 'shape' as an alias for '__array_shape__' (or vice > versa)? > The protocol just describes the layout and format of the data in memory. As such, most users won't use it directly just as most users don't call obj.__add__ directly... If an array implementation has a .shape attribute, it can be whatever the implementor wants. Perhaps it's assignable. Maybe it's a method that returns a ShapeObject with methods and attributes of it's own. Features like these are the things that make the high level array packages like Numeric and Numarray enjoyable to use. The low level __array_*metadata__ interface should be simple and precisely defined and just for data interchange. > > > 3) Where do default values come from? Is it the responsability of the > > extension module writer to find out if the array module implements e.g. > > __array_strides__, and substitute the default values if it doesn't? > > If the support of these attributes is optional, that would have to be > the case. > > > As for implementing the defaults: How about having some utility > functions (or a wrapper object or whatever) that does just this -- so > neither array nor client code need think about it? This could, > perhaps, be put in the stdlib array module or something... > There will be a simple Python module or C include file for such things. Hopefully it will eventually be included in the Python standard distribution, but even if that doesn't happen, it will be easier than requiring and linking against the Numeric/Numarray/scipy.base libraries directly. > > But what I think would be cool if such a standardized package could > take any object conforming to this protocol and use it (possibly as > the argument to the array() constructor) -- with all the ufuncs and > operators it has. Because then I could implement specialized arrays > where the specialized behaviour lies just in the data itself, not the > behaviour. For example, I might want to create a thin array wrapper > around a memory-mapped, compressed video file, and treat it as a > three-dimensional array of rgb triples... (And so forth.) > If you want the ufuncs, you probably want one of the full featured library packages like scipy.base or numarray. It looks like Travis is able to promote any "array protocol object" to a full blown scipy.base.array already. > > > Inconsistencies other than the array interface (e.g. one implements > > argmax(x) while another implements x.argmax()) may mean that an > > extension module can work with one array implementation but not with > > another, > > This does *not* sound like a good thing -- I agree. Certainly not what > I would hope this protocol is used for. > Things like argmax(x) are not part of this protocol. The high level array packages and libraries will have all sorts of crazy and useful features. The protocol only describes the layout and format of the data. It enables higher level packages to work seemlessly with all the different array objects. That said, this protocol would allow a version argmax(x) to be written in such a way as to handle *any* array object. Cheers, -Scott From mdehoon at ims.u-tokyo.ac.jp Mon Apr 4 19:13:33 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Apr 4 19:13:33 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404092421.GD21527@idi.ntnu.no> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> Message-ID: <4251F40C.6000402@ims.u-tokyo.ac.jp> Magnus Lie Hetland wrote: > Michiel Jan Laurens de Hoon : >>2) The __array_datalen__ is introduced to get around the 32-bit int >>limitation of len(). Another option is to fix len() in Python >>itself, so that it can return integers larger than 32 bits. So we >>can avoid adding a new method. > > > That would bee good, IMO. But how realistic is it? (I have no idea -- > this is not a rhetorical question :) Actually, why is __array_datalen__ needed at all? Can't it be calculated trivially from __array_shape__? --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Mon Apr 4 19:56:23 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Apr 4 19:56:23 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4251920B.6060708@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu> Message-ID: <4251F384.7080506@ims.u-tokyo.ac.jp> Travis Oliphant wrote: >> Some comments on the array interface: >> >> 1) The "__array_shape__" method is identical to the existing "shape" >> method in Numerical Python and numarray (except that "shape" does a >> little bit better checking, but it can be added easily to >> "__array_shape__"). To avoid code duplication, it might be better to >> keep that method. (and rename the other methods for consistency, if >> desired). > > > > There is no code duplication. In these cases it is just another name > for .shape. What "better checking" are you referring to? The method __array_shape__ is if (strcmp(name, "__array_shape__") == 0) { PyObject *res; int i; res = PyTuple_New(self->nd); for (i=0; ind; i++) { PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i])); } return res; } while the method shape is if (strcmp(name, "shape") == 0) { PyObject *s, *o; int i; if ((s=PyTuple_New(self->nd)) == NULL) return NULL; for(i=self->nd; --i >= 0;) { if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL; if (PyTuple_SetItem(s,i,o) == -1) return NULL; } return s; } so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be fine. --Michiel. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Mon Apr 4 20:37:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 4 20:37:07 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4251F40C.6000402@ims.u-tokyo.ac.jp> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp> Message-ID: <4252078C.3050300@ee.byu.edu> > Actually, why is __array_datalen__ needed at all? Can't it be > calculated trivially from __array_shape__? Lovely point. I've taken away the __array_datalen__ from the interface description. -Travis From cookedm at physics.mcmaster.ca Mon Apr 4 21:17:19 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Mon Apr 4 21:17:19 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4251F384.7080506@ims.u-tokyo.ac.jp> (Michiel Jan Laurens de Hoon's message of "Tue, 05 Apr 2005 11:10:12 +0900") References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <4251920B.6060708@ee.byu.edu> <4251F384.7080506@ims.u-tokyo.ac.jp> Message-ID: Michiel Jan Laurens de Hoon writes: > Travis Oliphant wrote: >>> Some comments on the array interface: >>> >>> 1) The "__array_shape__" method is identical to the existing >>> "shape" method in Numerical Python and numarray (except that >>> "shape" does a little bit better checking, but it can be added >>> easily to "__array_shape__"). To avoid code duplication, it might >>> be better to keep that method. (and rename the other methods for >>> consistency, if desired). >> There is no code duplication. In these cases it is just another >> name for .shape. What "better checking" are you referring to? > > The method __array_shape__ is > > if (strcmp(name, "__array_shape__") == 0) { > PyObject *res; > int i; > res = PyTuple_New(self->nd); > for (i=0; ind; i++) { > PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i])); > } > return res; > } > > while the method shape is > > if (strcmp(name, "shape") == 0) { > PyObject *s, *o; > int i; > > if ((s=PyTuple_New(self->nd)) == NULL) return NULL; > > for(i=self->nd; --i >= 0;) { > if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL; > if (PyTuple_SetItem(s,i,o) == -1) return NULL; > } > return s; > } > > so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I > don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be > fine. The #1 rule of thumb when using the Python C API: _always_ check your returned results (this usually means checking for NULL). In this, PyInt_FromLong _can_ fail (if there's an error creating the int free list). I've fixed this in CVS. You're right on PyTuple_SET_ITEM: the space for it is guaranteed to exist after the PyTuple_New. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Mon Apr 4 22:18:23 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 4 22:18:23 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <20050403165914.GC10730@idi.ntnu.no> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> Message-ID: <42521F76.5080309@ee.byu.edu> Magnus Lie Hetland wrote: > - Does the description of __array_data__ mean that the discussed > bytes type is no longer needed? (If we can use buffers, that > sounds very good to me.) > > We can use the buffer object, now and it works as far as it goes. But, there are very important reasons for the creation of a good bytes object. Probably, THE most important reason for the bytes object is Pickle support without always making an intermediate string (and the accompanying copy that is involved). Right now, a string is the only way to Pickle array data. A bytes object would allow a way to Pickle without making a copy. -Travis From Chris.Barker at noaa.gov Tue Apr 5 00:32:17 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Apr 5 00:32:17 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <42521F76.5080309@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> Message-ID: <42523EC0.5000303@noaa.gov> Travis Oliphant wrote: > Right now, a string is the only > way to Pickle array data. A bytes object would allow a way to Pickle > without making a copy. So could the new array protocol allow us to make a Python String from an array without copying? That could be pretty handy. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From magnus at hetland.org Tue Apr 5 01:49:25 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:49:25 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404233620.70070.qmail@web50209.mail.yahoo.com> References: <20050404092421.GD21527@idi.ntnu.no> <20050404233620.70070.qmail@web50209.mail.yahoo.com> Message-ID: <20050405084839.GD29671@idi.ntnu.no> Scott Gilbert : > [snip] > > > Inconsistencies other than the array interface (e.g. one implements > > > argmax(x) while another implements x.argmax()) may mean that an > > > extension module can work with one array implementation but not with > > > another, > > > > This does *not* sound like a good thing -- I agree. Certainly not what > > I would hope this protocol is used for. > > > > Things like argmax(x) are not part of this protocol. The high level array > packages and libraries will have all sorts of crazy and useful features. Sure -- I realise that. I just mean that I hope there won't be several scientific array modules that implement similar concepts with different APIs, just because they can (because of the new array API). > The protocol only describes the layout and format of the data. It enables > higher level packages to work seemlessly with all the different array > objects. Exactly. > That said, this protocol would allow a version argmax(x) to be > written in such a way as to handle *any* array object. ... given that you can compare the values in the array, of course. But, yes. This would be (IMO) the ideal situation. Instead of spawning several equivalent-but-different scientific array modules (i.e. the ones implementing such functionality as argmax()) we would have *one* main, standard such module, whose operations would work with almost any conceivable array object (e.g. from wxPython or PIL). That seems like a very, very good situation, IMO. > Cheers, > -Scott -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:51:35 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:51:35 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <42521F76.5080309@ee.byu.edu> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> Message-ID: <20050405085041.GE29671@idi.ntnu.no> Travis Oliphant : > > Magnus Lie Hetland wrote: > > > - Does the description of __array_data__ mean that the discussed > > bytes type is no longer needed? (If we can use buffers, that > > sounds very good to me.) > > > > > > We can use the buffer object, now and it works as far as it goes. But, > there are very important reasons for the creation of a good bytes object. > > Probably, THE most important reason for the bytes object is Pickle > support without always making an intermediate string (and the > accompanying copy that is involved). Right now, a string is the only > way to Pickle array data. A bytes object would allow a way to Pickle > without making a copy. Ah. Very good argument, of course. But, as I understand it, the protocol as it stands could work with buffers until we get bytes objects? > -Travis -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:52:09 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:52:09 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <42523EC0.5000303@noaa.gov> References: <20050328020731.85506.qmail@web50202.mail.yahoo.com> <4247CEC9.1030903@ee.byu.edu> <42489275.7060600@ee.byu.edu> <5dd884d6dc28bd85af323bb3e42567a7@stsci.edu> <42500D03.3030809@ee.byu.edu> <20050403165914.GC10730@idi.ntnu.no> <42521F76.5080309@ee.byu.edu> <42523EC0.5000303@noaa.gov> Message-ID: <20050405085108.GF29671@idi.ntnu.no> Chris Barker : > > Travis Oliphant wrote: > >Right now, a string is the only > >way to Pickle array data. A bytes object would allow a way to Pickle > >without making a copy. > > So could the new array protocol allow us to make a Python String from an > array without copying? That could be pretty handy. Or treat a string as an array... Yay! :) > -Chris -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:52:25 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:52:25 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <4252078C.3050300@ee.byu.edu> References: <424D14E9.70607@ee.byu.edu> <200504011215.52914.faltet@carabos.com> <424FB19B.4060800@ee.byu.edu> <424FB72F.4020201@ims.u-tokyo.ac.jp> <424FB9FA.1090109@ee.byu.edu> <4250F8E5.9020701@ims.u-tokyo.ac.jp> <20050404092421.GD21527@idi.ntnu.no> <4251F40C.6000402@ims.u-tokyo.ac.jp> <4252078C.3050300@ee.byu.edu> Message-ID: <20050405085138.GG29671@idi.ntnu.no> Travis Oliphant : > > > >Actually, why is __array_datalen__ needed at all? Can't it be > >calculated trivially from __array_shape__? > > Lovely point. I've taken away the __array_datalen__ from the > interface description. This is only getting prettier and prettier :) > -Travis -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 01:57:12 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 01:57:12 2005 Subject: [Numpy-discussion] Array interface In-Reply-To: <20050404233447.26327.qmail@web50204.mail.yahoo.com> References: <20050404233447.26327.qmail@web50204.mail.yahoo.com> Message-ID: <20050405085642.GH29671@idi.ntnu.no> Scott Gilbert : > [snip] > I think the intent is that libraries like wxPython or PIL can > recognize data that they *want* to work with. They can raise an > exception when passed anything that is more complicated than they're > willing to deal with. Sure. I'm just saying that it would be good to have a baseline -- a basic, mandatory level of conformance, so that if I expose an array using only that part of the API (or, with the rest being optional information) I know that any conforming array consumer will understand me. As long as we have this, I have to know the capabilities of my consumer before I can write an appropriate typestr, for example. E.g., one application may only accept b1, while another would only accept i1 etc. Who knows -- there may well be sets of consumer applications that have mutually exclusive sets of accepted typestrings unless a minimum is mandated. That's really what I was after here. In addition to saying that typestr *must* be supported, one might say something about what typestrs must be supported. On the other hand -- perhaps such requirements should only be made on the array side? What requirements can/should one really make on the consumer side? I mean -- even though we have a strict sequence protocol, there is nothing wrong with creating something sequence-like (e.g., supporting floats as indices) and having consumer functions that aren't as strict as the official protocol... I just think it's something that it might be worth being explicit about. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From magnus at hetland.org Tue Apr 5 02:00:24 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Tue Apr 5 02:00:24 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050404233322.61350.qmail@web50208.mail.yahoo.com> References: <20050404233322.61350.qmail@web50208.mail.yahoo.com> Message-ID: <20050405085905.GI29671@idi.ntnu.no> Scott Gilbert : > > > --- Michiel Jan Laurens de Hoon wrote: > > > > I'm not sure what you mean by "the array interface could become > > part of the Python standard as early as Python 2.5", since there > > is nothing to install. Or does this mean that Python's array will > > conform to the array interface? > > > > It would be nice to have the Python array module support the protocol for > the 1-Dimensional arrays that it implements. It would also be nice to add > a *simple* ndarray object in the core that supports multi-dimensional > arrays. I think breaking backward compatibility of the existing Python > array module to support multiple dimensions would be a mistake and unlikely > to get accepted. Do we really have to break backward compatibility in order to add more dimensions to the array module? There may be some issues with, e.g., typecode, but still... -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From a.schmolck at gmx.net Tue Apr 5 05:28:13 2005 From: a.schmolck at gmx.net (Alexander Schmolck) Date: Tue Apr 5 05:28:13 2005 Subject: [Numpy-discussion] array slicing question In-Reply-To: <424FF03A.4060107@bigpond.net.au> (Gary Ruben's message of "Sun, 03 Apr 2005 23:31:38 +1000") References: <424FF03A.4060107@bigpond.net.au> Message-ID: Gary Ruben writes: > This may be relevant to Numeric 3, but is possibly just a general question > about array slicing which will either reveal a deficiency in specifying slices > or in my knowledge of slicing with numpy. > A while ago I was trying to reimplement some Matlab image processing code in > Numeric and revealed a deficiency in the way slices are defined. Suppose I > have an n x m array and want to slice off the first and last p rows and > columns where p can range from 0 to some number. Matlab provides a clean way > of doing this, but in numpy it's a bit of a mess. > > You might think you could do > >>> p=1 > >>> b = a[p:-p] b = a[p:-p or None] 'as From werner.bruhin at free.fr Tue Apr 5 11:26:36 2005 From: werner.bruhin at free.fr (Werner F. Bruhin) Date: Tue Apr 5 11:26:36 2005 Subject: [Numpy-discussion] AttributeError: _NumErrorMode instance has no attribute 'dividebyzero' Message-ID: <4252D77F.10600@free.fr> If I use "Numeric.Error.setMode(all='Raise')" I get the above AttributeError. I found this on 1.1.1 but just downloaded "numarray-1.2.3.win32-py2.4.exe" and I still find the same problem. I use numarray with wx.lib.plot.py to generate some simple charts. I would like to catch the exceptions and display an appropriate message to the user. Is the above the right approach or am I going about this the wrong way round? Any hints are appreciated. Werner From xscottg at yahoo.com Tue Apr 5 13:35:37 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Tue Apr 5 13:35:37 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: 6667 Message-ID: <20050405203434.38638.qmail@web50204.mail.yahoo.com> --- Magnus Lie Hetland wrote: > > Do we really have to break backward compatibility in order to add more > dimensions to the array module? > You're right. The Python array module could change in a backwards compatible way. Possibly using keyword arguments to specify parameters that have never been there before. We could probably make sense out of array.insert(), array.append(), array.extend(), array.pop(), and array.reverse() by giving those an "axis" keyword. Even array.remove() could be made to work for more dimensions, but it probably wouldn't get used often. Maybe some of these would just raise an exception for ndims > 1. Then we'd have to add some additional typecodes for complex and a few others. Under the hood, it would basically be a complete reimplementation, but maybe that is the way to go... It does keep the number of array modules down. I wonder which way would meet less resistance in getting accepted in the core. I think creating a new ndarray object would be less risk of breaking existing applications. > > There may be some issues with, e.g., typecode, but still... > The .typecode attribute could return the same values it always has. The .__array_typestr__ attribute would return the new style values. That's confusing, but probably unavoidable. It would be nice if there was only one set of typecodes for all of Python, but I think we're stuck with many (array module typecores, struct module typecodes, array protocol typecodes). Cheers, -Scott From oliphant at ee.byu.edu Tue Apr 5 14:28:39 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 5 14:28:39 2005 Subject: [Numpy-discussion] Questions about ufuncs now. Message-ID: <4253028D.4090407@ee.byu.edu> The arrayobject for scipy.base seems to be working. Currently the Numeric3 CVS tree is using the "old-style" ufuncs modified with new code for the newly added types. It should be quite functionable now for the brave at heart. I'm now working on modifying the ufunc object for scipy.base. These are the changes I'm working on: 1) a thread-specific? context that allows "buffer-size" level trapping of errors and retrieving of flags set. Similar to the decimal.context specification, but it uses the floating point sticky bits to implement. 2) implementation of buffers so that type-conversions (and byteswapping and alignment if necessary) never creates temporaries larger than the buffer-size (the buffer-size is user settable). 3) a reworking of the general N-dimensional loop to use array iterators with optimizations applied for contiguous arrays. 4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) do not dictate coercion rules Also, change so that certain mixed-type operations are computed in larger type for both. Most of this is pretty straightforward. But, I do have one addiitonal question. Do the new array scalars count as "non-coercing" scalars (i.e. like the Python scalars), or do they cause coercion? My preference is that ALL scalars (anything that becomes 0-dimensional arrays internally) cause only "kind-casting" (i.e. int to float, float to complex, etc.) but not "type-casting" -Travis From oliphant at ee.byu.edu Tue Apr 5 16:02:34 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 5 16:02:34 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: <42531880.3060600@ee.byu.edu> I'd like to release a Numeric 24.0 to get the array interface out there. There are also some other bug fixes in Numeric 24.0 Here is the list so far from Numeric 23.7 [Greenfield] Changed so a[0,0] and a[0][0] returns same type when a is 2-d of Int16 [unreported] Added array interface [unreported] Allow Long Integers to be used in slices [1123145] Handle mu==0.0 appropiately in ranlib/ignpoi. [unreported] Return error info in ranlib instead of printing it to stderr [1151892] dot() would quit python with zero-sized arrays when using dotblas. The BLAS routines *gemv and *gemm need LDA >= 1. [unreported] Fixed empty for Object arrays Version 23.8 March 2005 [Cooke] Fixed more 64-bit issues (patch 117603) [unreported] Changed arrayfnsmodule back to PyArray_INT where the code typecasts to (int *). Changed CanCastSafely to check if sizeof(long) == sizeof(int) I'll wait a little bit to allow last minute bug fixes to go in, but I'd realy like to see this release get out there. For users of Numeric >23.7 try Numeric.empty((10,20),'O') if you want to see an *interesting* bug that is fixed in CVS. -Travis From cookedm at physics.mcmaster.ca Tue Apr 5 16:13:31 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Tue Apr 5 16:13:31 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42531880.3060600@ee.byu.edu> (Travis Oliphant's message of "Tue, 05 Apr 2005 17:00:16 -0600") References: <42531880.3060600@ee.byu.edu> Message-ID: Travis Oliphant writes: > I'd like to release a Numeric 24.0 to get the array interface out > there. There are also some other bug fixes in Numeric 24.0 > > Here is the list so far from Numeric 23.7 > > [Greenfield] Changed so a[0,0] and a[0][0] returns same type when a > is 2-d of Int16 > [unreported] Added array interface > [unreported] Allow Long Integers to be used in slices > [1123145] Handle mu==0.0 appropiately in ranlib/ignpoi. > [unreported] Return error info in ranlib instead of printing it to stderr > [1151892] dot() would quit python with zero-sized arrays when using > dotblas. The BLAS routines *gemv and *gemm need LDA >= 1. > [unreported] Fixed empty for Object arrays > > Version 23.8 March 2005 > [Cooke] Fixed more 64-bit issues (patch 117603) > [unreported] Changed arrayfnsmodule back to PyArray_INT where the code > typecasts to (int *). Changed CanCastSafely to check > if sizeof(long) == sizeof(int) > > > I'll wait a little bit to allow last minute bug fixes to go in, but > I'd realy like to see this release get out there. For users of > Numeric >23.7 try > Numeric.empty((10,20),'O') if you want to see an *interesting* bug > that is fixed in CVS. Can you hold on? I've got some bugs I'm working on. There's some 64-bit things I'm working (various places that a long is cast to an int). For instance, a = Numeric.array((3,)) a.resize((2**32,)) gives a.shape == (1,) instead of an error. Stuff like this happens in the new array interface too :-) I'd suggest, before releasing with a bumped version number to 24.0, we release a beta version first. Shake out bugs in the array interface, and potentially allow for some changes if necessary. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From mdehoon at ims.u-tokyo.ac.jp Tue Apr 5 20:34:03 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Apr 5 20:34:03 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42531880.3060600@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> Message-ID: <4253597F.1090501@ims.u-tokyo.ac.jp> Travis Oliphant wrote: > I'd like to release a Numeric 24.0 to get the array interface out > there. There are also some other bug fixes in Numeric 24.0 Thanks for the notification, Travis. I have commited patch #732520 (Eigenvalues on cygwin bug fix), which fixes bug #706716 (eigenvalues is broken). It's great to be a Numerical Python developer, I get to accept my own patches :-). The same patch was previously accepted by numarray. About the array interface, my feeling is that while it may be helpful in the short run, it is likely to damage SciPy in the long run. The array interface allows different array implementations to move in different directions. These different implementations will be compatible with respect to the array interface, but incompatible otherwise (depending on the level of self-restraint of the developers of the different array implementations). So in the end, extension modules will be written for a specific array implementation anyway. At this point, Numerical Python is the most established and has most users. Numarray, as far as I can tell, keeps closer to the Numerical Python tradition, so maybe extension modules can work with either one without further modification (e.g., pygist seems to work with both Numerical Python and numarray). But SciPy has been moving away (e.g. by replacing functions by methods). As extension module writers are usually busy people, they may not be willing to modify their code so that it works with SciPy, and even less to maintain two versions of their code, one for Numerical Python/numarray and one for SciPy. Users who could previously choose to install SciPy as an addition to Numerical Python, now find that they have to choose between SciPy and Numerical Python. As Numerical Python has many more extension packages, I expect that SciPy will end up losing users. Personally I use Numerical Python, and I plan to continue to use it for years to come, so it doesn't matter much to me. I'm just warning that the array interface may be a Trojan horse for the SciPy project. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From oliphant at ee.byu.edu Tue Apr 5 22:26:38 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 5 22:26:38 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <4253597F.1090501@ims.u-tokyo.ac.jp> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> Message-ID: <425372A4.7020900@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >> I'd like to release a Numeric 24.0 to get the array interface out >> there. There are also some other bug fixes in Numeric 24.0 > > > > About the array interface, my feeling is that while it may be helpful > in the short run, it is likely to damage SciPy in the long run. Well, I guess we'll just have to see. Again, I see the array interface as important for talking to other modules that may not need or want the "full power" of a packed array module like scipy.base is. > The array interface allows different array implementations to move in > different directions. These different implementations will be > compatible with respect to the array interface, but incompatible > otherwise (depending on the level of self-restraint of the developers > of the different array implementations). So in the end, extension > modules will be written for a specific array implementation anyway. At > this point, Numerical Python is the most established and has most > users. Numarray, as far as I can tell, keeps closer to the Numerical > Python tradition, so maybe extension modules can work with either one > without further modification (e.g., pygist seems to work with both > Numerical Python and numarray). > But SciPy has been moving away (e.g. by replacing functions by methods). Michiel, you seem to want to create this impression that "SciPy" is "moving away." I'm not sure of your motivations. But, since this is a public forum, I have to restate emphatically, that "SciPy" is not "moving away from Numeric." It is all about bringing together the communities. For the 5 years that scipy has been in development, it has always been about establishing a library of common routines that we could all share. It has built on Numeric from the beginning. Now, there is another "library" of routines that is developing around numarray. It is this very real break that I'm trying to help fix. I have no other "desire" to "move away" or "create a break" or any other such notions that you seem to want to spread. That is precisely why I have publically discussed practically every step of my work. You seem to be the only vocal one who thinks that scipy.base is not just a replacement for Numeric, but something else entirely. So, I repeat: **scipy.base is just a new version of Numeric with a few minor compatibility issues and a lot of added functionality and features** For example, despite your claims, I have not "replaced" functions by methods. The functions are still all there just like before. I've simply noticed that numarray has a lot of methods and so I've added similar methods to the Numeric object to help numarray users make the transition back. Everything else that I've changed, I've done to bring Numeric up-to-date with modern Python versions, and to fix old warts that have sat around for years. If there are problems with my changes, speak up. Tell me what to do to make the new Numeric better. > As extension module writers are usually busy people, they may not be > willing to modify their code so that it works with SciPy, and even > less to maintain two versions of their code, one for Numerical > Python/numarray and one for SciPy. It's comments like this that make me wonder what you are thinking. It seems to me that you are the only one I've talked to that wants to maintain the notion of a "split". Everybody else, I'm in contact with is in full support of merging the two communities behind a single scientific array object. Every extension module that compiles for Numeric should compile for scipy.base. Notice that full scipy already has a huge number of extension modules that needs to compile for scipy.base. So, I have every motivation to make that a painless process. > Users who could previously choose to install SciPy as an addition to > Numerical Python, now find that they have to choose between SciPy and > Numerical Python. As Numerical Python has many more extension > packages, I expect that SciPy will end up losing users. Again, scipy.base should *replace* Numerical Python for all users (except the most adamant who don't seem to want to go with the rest of the community). scipy.base is a new version of Numeric. On the C-level I don't know of any incompatibilities, on the Python level there are a very few (most of them rarely-used typecode character issues which a simple search and replace will fix). I should emphasize this next point, since I don't seem to be coming across very clearly to some people. As head Numeric developer, I'm stating that **Numeric 24 is the last release that will be called Numeric**. New releases of Numeric will be called scipy.base. Of course, I realize that people can do whatever they want with the old Numeric code base, but then they will be the ones responsible for continuing a "split," because the Numerical Python project at sourceforge will point people to install scipy.base. Help me make the transition as painless as possible, that's all I'm asking. People transitioning from Numeric should have no trouble at all as I repeatedly point out. People transitioning from numarray will have a *little* harder time which is why the array interface should help out during that process. It is helping people transition back from numarray that is 90% of the reason I've made any changes to the internals of Numeric. I've been a happy and quiet Numeric user and developer for years, but I respect the problems that Perry, Rick, Paul, and Todd have pointed out with their numarray implementation, and I saw a way to support their needs inside of Numeric. That is the whole reason for my efforts. I wish people would stop trying to make it seem to casual readers of this forum that I'm trying to create a "whole new" incompatible system. Help me fix the obviously unnecessary incompatibilites where they may exist, and help me make automatic transistion scripts to help people upgrade painlessly to the newer Numeric. I very much appreciate all who voice your concerns. Michiel, you are particularly appreciated because you are voice from a solid Numeric user. I just think that such concerns would be more productive in the context of accepting the fact that an upgrade from Numeric to scipy.base is going to happen, rather than trying to make it look like some new "split" is occurring. I've received a lot of offline support for the Numeric/numarray unification effort that scipy.base is. It would help if more people could provide public support on this forum so that others can see that I'm not just some outsider pushing some random ideas, but I am simply someone who decided to sacrifice some time for what I think is a very important effort. It would also help if other people who have concerns would voice them (I'm very grateful for those who have expressed their concerns) so that we can all address them and get on the same page for future development. Right now, the CVS version of Numeric3 works reasonably. It compiles and uses the old ufunc objects (which have only been extended to support the new types). I could use a lot of help in finding bugs. You can also try out the new array scalars to see how they work (math works on them now) and also see what may still be missing in their implementation. > > Personally I use Numerical Python, and I plan to continue to use it > for years to come, so it doesn't matter much to me. I'm just warning > that the array interface may be a Trojan horse for the SciPy project. As long as you realize that as far as I know the other developers of Numerical Python are going to be moving to scipy.base, and so you will be using obsolete technology, you are free to do as you wish. But, I really hope we can persuade you to join us. It is much better if we work together. -Travis From Fernando.Perez at colorado.edu Tue Apr 5 22:43:33 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Tue Apr 5 22:43:33 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <425372A4.7020900@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> Message-ID: <42537690.5040400@colorado.edu> Travis Oliphant wrote: > Michiel Jan Laurens de Hoon wrote: >>But SciPy has been moving away (e.g. by replacing functions by methods). > > > > Michiel, you seem to want to create this impression that "SciPy" is > "moving away." I'm not sure of your motivations. But, since this is a > public forum, I have to restate emphatically, that "SciPy" is not > "moving away from Numeric." It is all about bringing together the > communities. For the 5 years that scipy has been in development, it has > always been about establishing a library of common routines that we > could all share. It has built on Numeric from the beginning. Now, > there is another "library" of routines that is developing around > numarray. It is this very real break that I'm trying to help fix. I > have no other "desire" to "move away" or "create a break" or any other > such notions that you seem to want to spread. FWIW, I think you (Travis) have been exceedingly clear in explaining this process, and in pointing out how this is: a) NOT a further split, but rather the EXACT OPPOSITE (numarray users will have a transition path back into a project which will provide the best of the old Numeric, along with all the critical enhancements which Perry, Todd et al. added to numarray). b) a way, via the array protocol, to provide third-party low-level libraries an easy way to, AT THE C LEVEL, interact easily and efficiently (without unnecessary copies) with numeri* arrays. I fail to see where Michiel gets his split/Trojan horse arguments, or what line of reasoning can connect your detailed explanations with such a conclusion. In particular, the comments on the whole 'trojan' issue seem to me absolutely unfounded. Nobody in their sane mind will use this protocol to invent a scipy.base competitor, which most likely would end up (if done right) being simply a copy. What it provides is a minimal, compact, low-level API which will be a huge boon for interoperability with things like PIL, WX or other simliar libraries. This protocol has been extensively debated, and Scott's extensive comments have made this discussion a very productive one (along with the help of others, of course). I can only see this as a GREAT step forward for numerical python support and reliability 'in the wild'. I hesitated to send this message, but since you (Travis) have sunk an enormous amount of your time into this effort, which I can only applaud and rejoice in, I figure the least I can do is contribute a little to dispel some unnecessary confusion. Users with less knowledge of the details may become afraid of using Python for scientific computing by reading Michiel's comments, which I think would be a shame. Michiel, please note that none of what I said is meant to be a personal attack. I simply feel it is necessary to clarify, in no uncertain terms, how your recent comments of impending doom are unfounded. Best to all, and again thanks to Travis for this much needed hard work, f From Chris.Barker at noaa.gov Tue Apr 5 23:59:31 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Apr 5 23:59:31 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <425372A4.7020900@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> Message-ID: <42538880.7010301@noaa.gov> Travis Oliphant wrote: > It would help > if more people could provide public support on this forum Easy enough. I. for one am very happy about what Travis is doing. It seems to be exactly what is needed to mend the Numeric-numarray split, which has been an annoyance for a couple years now. I'm also VERY happy about the proposed array protocol. While I suppose it could facilitate the creation of other array packages, that is only speculation, and unlikely, in my judgment. What is I'm quite sure is going to happen is that other packages that do not provide an array implementation will be able to efficiently take arrays as input without crating a dependence on any particular package. I intend to make sure wxPython can efficiently take Numeric24 arrays, for instance. (Now that I think about it, it would be great if we could get this into wxPython2.6, which will be out pretty darn soon. I'm very pressed for time right now..can anyone help?) > It would also help if other > people who have concerns would voice them (I'm very grateful for those > who have expressed their concerns) so that we can all address them and > get on the same page for future development. My only concern is versioning. Particularly when under rapid development (but really this applies anytime), I'd really love to be able to have more than one version of Numeric (or SciPy.base, or whatever) installed at once, and be able to select which one is used at runtime, in code (before importing the first time, of course). This would facilitate testing, but also allow me to have a working environment for older apps that will continue to work, without modification or re-compiling, after installing a newer version. Something like wxPython's wxversion is what I have in mind. http://wiki.wxpython.org/index.cgi/MultiVersionInstalls -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From magnus at hetland.org Wed Apr 6 00:30:48 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Wed Apr 6 00:30:48 2005 Subject: [Numpy-discussion] Possible example application of the array interface Message-ID: <20050406072854.GA12700@idi.ntnu.no> I was just thinking about some experimental designs, and whether I could, perhaps, do the statistics in Python. I remembered having used RPy [1] briefly at some time (there may be other similar bindings out there -- I don't remember) and started thinking about whether I could, perhaps, combine it with numpy in some way. My first thought was to reimplement the relevant statistical functions; then I thought about how to convert data back and forth -- but then it occurred to me that R also uses arrays extensively, and that it could, perhaps, be possible to expose those (through something like RPy) through the array interface/protocol! This would be (IMO) a good example of the benefits of the array protocol; it's not a matter of "getting yet another array module". RPy is an external library/language with *lots* of features that might be useful to numpy users, many of which aren't likely to be implemented in Python for quite a while, I'd guess (unless, perhaps, someone writes a translator from R, which I'm sure is doable). I don't know enough (at least yet ;) about the implementation of RPy and the R library to say for sure whether this would even be possible, but it does seem like it could be really useful... [1] rpy.sf.net -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From sdementen at hotmail.com Wed Apr 6 00:36:39 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 00:36:39 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: Hi Travis, Could you look at bug [ 635104 ] segfault unpickling Numeric 'O' array [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of previous one) I proposed a (rather simple) solution that I put in the comment of bug [ 635104 ]. But apparently, nobody is looking at those bugs... > >I'd like to release a Numeric 24.0 to get the array interface out there. >There are also some other bug fixes in Numeric 24.0 > >Here is the list so far from Numeric 23.7 > >[Greenfield] Changed so a[0,0] and a[0][0] returns same type when a is 2-d >of Int16 This is quite disturbing. In fact for all types that are not exactly equivalent to python type, indexing a multidimensional array (rank > 1) return arrays even if the final shape is (). So type(zeros((5,2,4), Int8 )[0,0,0]) => type(zeros((5,2,4), Int32 )[0,0,0]) => type(zeros((5,2), Float32 )[0,0]) => But type(zeros((5,2,4), Int )[0,0,0]) => type(zeros((5,2,4), Float64)[0,0,0]) => type(zeros((5,2,4), Float)[0,0,0]) => type(zeros((5,2,4), PyObject)[0,0,0]) => Notice too the weird difference betweeb Int <> Int32 and Float == Float64. However, when indexing a onedimensional array (rank == 1), then we get back scalar for indexing operations on all types. So, when you say "return the same type", do you think scalar or array (it smells like a recent discussion on Numeric3 ...) ? >[unreported] Added array interface >[unreported] Allow Long Integers to be used in slices >[1123145] Handle mu==0.0 appropiately in ranlib/ignpoi. >[unreported] Return error info in ranlib instead of printing it to stderr >[1151892] dot() would quit python with zero-sized arrays when using > dotblas. The BLAS routines *gemv and *gemm need LDA >= 1. >[unreported] Fixed empty for Object arrays > >Version 23.8 March 2005 >[Cooke] Fixed more 64-bit issues (patch 117603) >[unreported] Changed arrayfnsmodule back to PyArray_INT where the code > typecasts to (int *). Changed CanCastSafely to check > if sizeof(long) == sizeof(int) > > >I'll wait a little bit to allow last minute bug fixes to go in, but I'd >realy like to see this release get out there. For users of Numeric >23.7 >try >Numeric.empty((10,20),'O') if you want to see an *interesting* bug that is >fixed in CVS. > >-Travis > > From nwagner at mecha.uni-stuttgart.de Wed Apr 6 01:01:42 2005 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Apr 6 01:01:42 2005 Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical Message-ID: <42539706.3000503@mecha.uni-stuttgart.de> Hi all, Using Numeric 24.0 >>> scipy.__version__ '0.3.3_303.4599' scipy.test() results in ====================================================================== ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", line 152, in check_simple_todense b = mmread(fn).todense() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 254, in todense csc = self.tocsc() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1437, in tocsc return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1]) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_add (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_getelement (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matvec (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_setelement (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsc (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocsr (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_todense (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor1 (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor2 (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_constructor3 (scipy.sparse.Sparse.test_Sparse.test_csr) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 30, in setUp self.datsp = self.spmatrix(self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 712, in __init__ ocsc = csc_matrix(transpose(s)) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_elmul (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 60, in check_elmul c = a ** b File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 186, in __pow__ return csc ** other File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 485, in __pow__ return csc_matrix(c,(rowc,ptrc),M=M,N=N) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_matmat (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 71, in check_matmat assert_array_almost_equal((asp*bsp).todense(),dot(asp.todense(),bsp.todense())) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1184, in __mul__ return self.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 239, in matmat res = csc.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 568, in matmat return csc_matrix(c, (rowc, ptrc), M=M, N=N) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_tocoo (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 75, in check_tocoo assert_array_almost_equal(a.todense(),self.dat) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 254, in todense csc = self.tocsc() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1437, in tocsc return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1]) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ====================================================================== ERROR: check_mult (scipy.sparse.Sparse.test_Sparse.test_dok) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/scipy/sparse/tests/test_Sparse.py", line 155, in check_mult D = A*A.T File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 1184, in __mul__ return self.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 239, in matmat res = csc.matmat(other) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 568, in matmat return csc_matrix(c, (rowc, ptrc), M=M, N=N) File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 357, in __init__ self._check() File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line 375, in _check if (nnz>0) and (max(self.rowind[:nnz]) >= M): IndexError: invalid slice ---------------------------------------------------------------------- Ran 1173 tests in 3.113s FAILED (errors=31) >>> From cookedm at physics.mcmaster.ca Wed Apr 6 02:23:11 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 02:23:11 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: References: Message-ID: <20050406092143.GA31688@arbutus.physics.mcmaster.ca> On Wed, Apr 06, 2005 at 07:33:56AM +0000, S?bastien de Menten wrote: > > Hi Travis, > > Could you look at bug > [ 635104 ] segfault unpickling Numeric 'O' array > [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of > previous one) > > I proposed a (rather simple) solution that I put in the comment of bug [ > 635104 ]. But apparently, nobody is looking at those bugs... This is too true. Travis added myself and Michiel de Hoon recently to the developers, so there's some new blood, and we've been banging on things, though. I'll have a look at it if I've got time. I personally really hate bugs that crash my interpreter :-) > >I'd like to release a Numeric 24.0 to get the array interface out there. > >There are also some other bug fixes in Numeric 24.0 > > > >Here is the list so far from Numeric 23.7 > > > >[Greenfield] Changed so a[0,0] and a[0][0] returns same type when a is > >2-d of Int16 > > This is quite disturbing. In fact for all types that are not exactly > equivalent to python type, indexing a multidimensional array (rank > 1) > return arrays even if the final shape is (). > So > type(zeros((5,2,4), Int8 )[0,0,0]) => > type(zeros((5,2,4), Int32 )[0,0,0]) => > type(zeros((5,2), Float32 )[0,0]) => > But > type(zeros((5,2,4), Int )[0,0,0]) => > type(zeros((5,2,4), Float64)[0,0,0]) => > type(zeros((5,2,4), Float)[0,0,0]) => > type(zeros((5,2,4), PyObject)[0,0,0]) => > Notice too the weird difference betweeb Int <> Int32 and Float == Float64. That's because Int is *not* Int32. Int32 is the first typecode of '1sil' that has 32 bits. For (all?) platforms I've seen, that'll be 'i'. Int corresponds to a Python integer, and Float corresponds to a Python float. Now, a Python integer is actually a C long, and a Python float is actually a C double. I've made a table: Numeric type typecode Python type C type Array type Int 'l' int long PyArray_LONG Int32 'i' [1] N/A int PyArray_INT Float 'd' float double PyArray_DOUBLE Float32 'f' N/A float PyArray_FLOAT Float64 'd' float double PyArray_DOUBLE [1] assuming sizeof(int)==4, which is true on most platforms. There are some 64-bit platforms where this won't be true, I think. On (all? most?) 32-bit platforms, sizeof(int) == sizeof(long) == 4, so both Int and Int32 be 32-bit quantities. Not so on some 64-bit platforms (Linux on an Athlon 64, like the one I'm typing at now), where sizeof(long) == 8. I've been fixing oodles of assumptions in Numeric where ints and longs have been used interchangeably, hence the extended discussion :-) [I haven't addressed here why you get an array sometimes and a Python type the others. This is the standard, old, behaviour -- it's likely not going to change in Numeric. Whether it's a *good* thing is another question. scipy.base and numarray do it differently.] -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Wed Apr 6 02:46:55 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 02:46:55 2005 Subject: [Numpy-discussion] errors=31 in scipy.test() with latest cvs versions of scipy and Numerical In-Reply-To: <42539706.3000503@mecha.uni-stuttgart.de> References: <42539706.3000503@mecha.uni-stuttgart.de> Message-ID: <20050406094438.GA32297@arbutus.physics.mcmaster.ca> On Wed, Apr 06, 2005 at 10:00:06AM +0200, Nils Wagner wrote: > Hi all, > > Using Numeric 24.0 > >>> scipy.__version__ > '0.3.3_303.4599' > > scipy.test() results in > > ====================================================================== > ERROR: check_simple_todense (scipy.io.mmio.test_mmio.test_mmio_coordinate) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python2.3/site-packages/scipy/io/tests/test_mmio.py", > line 152, in check_simple_todense > b = mmread(fn).todense() > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 254, in todense > csc = self.tocsc() > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 1437, in tocsc > return csc_matrix(a, (rowa, ptra), M=self.shape[0], N=self.shape[1]) > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 357, in __init__ > self._check() > File "/usr/lib/python2.3/site-packages/scipy/sparse/Sparse.py", line > 375, in _check > if (nnz>0) and (max(self.rowind[:nnz]) >= M): > IndexError: invalid slice (etc. -- note to self: use scipy for regression testing :-) nnz is coming from nnz = self.indptr[-1] where self.indptr is an array of Int32. Hmm, this corresponds to the behaviour I just responded to Sebastien de Menten about. The problem is that nnz is *not* an Python integer; it's an array, so the slice fails. I think I was wrong in that email about saying this was expected behaviour :-) This comes from the recent fix of a[0,0] and a[0][0] returning the same type. Either change that back, or else we need to spruce up the slicing logic to consider 0-dimensional integer arrays as scalars. A minimal test case: a = Numeric.array([5,6,7,8]) b = Numeric.array([0,1,2,3], 'i') n = b[-1] assert a[:n] == 8 (I'm not tackling this right now) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From magnus at hetland.org Wed Apr 6 02:59:18 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Wed Apr 6 02:59:18 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050405203434.38638.qmail@web50204.mail.yahoo.com> References: <20050405203434.38638.qmail@web50204.mail.yahoo.com> Message-ID: <20050406095639.GA16810@idi.ntnu.no> Scott Gilbert : > > > --- Magnus Lie Hetland wrote: > > > > Do we really have to break backward compatibility in order to add more > > dimensions to the array module? > > > > You're right. The Python array module could change in a backwards > compatible way. Possibly using keyword arguments to specify parameters > that have never been there before. > > We could probably make sense out of array.insert(), array.append(), > array.extend(), array.pop(), and array.reverse() by giving those an "axis" > keyword. Even array.remove() could be made to work for more dimensions, > but it probably wouldn't get used often. Maybe some of these would just > raise an exception for ndims > 1. Sure. I guess basically the extend/pop/reverse/etc. methods and the ndim-functionality would sort of be two quite different ways of using arrays, so keeping them mutually exclusive doesn't seem like a problem to me. This might speak in favour of separating the functionality into two different classes, but I think there's merit to keeping it gathered, because this is partly for basic use(rs) who just want to get an array and do things to it that make sense. Appending to a multidimensional array (as long as we don't tempt them with an axis keyword) just doesn't make sense -- so people (hopefully) won't do it. > Then we'd have to add some additional typecodes for complex and a > few others. Yeah; the question is how compatible the typecode system is with the new array protocol -- some overlap and some differences, I believe (without checking right now)? So -- this might look a bit like patchwork. But I think might get that if we have two modules (or classes) too -- one, called array, with the existing functionality, and one, called (e.g.) ndarray, with a similar but incompatible interface... It *may* be better, but I'm not quite sure I think so. In my experience (which may be very biased and selective here ;) the array module isn't exactly among the "hottest" features of Python or the standard libs. In fact, it seems almost a bit pointless to me. It claims to have "efficient arrays of numeric values" but is the efficiency really that great, if you write your code in Python? (Using lists and psyco would, quite possibly, be just as good, for example.) So -- at *least* adding the array protocol to it would be doing it a favour, i.e., making it a useful module, and sort of a prototypical example of the protocol and such. Adding more dimensions might simply make it more useful. (I've many times been asked by people how to create e.g. two-dimensional arrays in Python. It would be nice if there was actually some basic support for it.) > Under the hood, it would basically be a complete reimplementation, Sure; except for the (possibly minor?) work involved, I don't see that this is a problem? (Well... The inherent instability of new code, perhaps... But still.) > but maybe that is the way to go... It does keep the number of array > modules down. Yes. > I wonder which way would meet less resistance in getting accepted in > the core. I think creating a new ndarray object would be less risk > of breaking existing applications. I guess that's true. > > > > There may be some issues with, e.g., typecode, but still... > > > > The .typecode attribute could return the same values it always has. Sure. But we might end up with, e.g., a constructor that looks almost exactly like the numpy array() constructor -- but whose typecodes are different... :/ > The .__array_typestr__ attribute would return the new style values. > That's confusing, but probably unavoidable. Yes, if we do use this approach. If we only allow one-dimensional arrays here (i.e., only add the protocol to the existing functionality) there might be less confusion? Oh, I don't know. Having a separate module or class/type might be just as good an idea. Perhaps I'm just being silly :-> > It would be nice if there was only one set of typecodes for all of > Python, Yeah -- or some similar system (using type objects). > but I think we're stuck with many (array module typecores, struct > module typecodes, array protocol typecodes). :( Yes, lots of history here. Oh, well. Not the greatest of problems, I guess. But using different typecodes in the explicit user-part of the ND-array interface in the stdlibs from those in scipy, for example, seems like a decidedly Bad Idea(tm). So ... that might be a good enough reason for using a separate ndarray entity, unless there can be some upward compatibility somehow. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From sdementen at hotmail.com Wed Apr 6 03:12:32 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 03:12:32 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: Hi, I follow with great interest the threads around Numeric3/scipy.base. As Travis suggested (?It would also help if other people who have concerns would voice them (I'm very grateful for those who have expressed their concerns) so that we can all address them and get on the same page for future development.?), I voice my concert J Sometimes it is quite useful to treat data at a higher level than just an ?array of number of some types?. Adding metadata to array (I called them ?augmented arrays?) is a simple way to add sense to an array. I see different user cases like: 1) attaching a physical unit to array data (see for instance Unum http://home.tiscali.be/be052320/Unum.html ) 2) description of axis (see http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very useful to manipulate easily time series. 3) masked arrays as in MA module of Numeric 4) arrays for interval arithmetic where one keep another array with precision of data 5) record arrays (currently being integrated in scipy.base as a base type) The current solution for those situation is nicely summarized by quoting Konrad ?but rather a class written using arrays than a variety of the basic array type. It?s actually pretty straightforward to implement, the most difficult choice being the form of the constructor that gives most flexibility in use.? However, I disagree with the ?pretty straightforward to implement?. In fact, if one wants to inherit most of the functionalities of Numeric, it becomes quite cumbersome. Looking at MA module, I see that it needs to: 1) redefine all methods (__add__, ?) 2) redefine all ufuncs 3) redefine all array functions (like reshape, sort, argmax, ?) For other purposes, the same burden may apply. A general solution to this problem is not straightforward and may be out of reach (computationally and/or conceptually). However, a quite-general-enough elegant solution could solve most practical problems. Looking at threads in this list, I think that there is enough brain power to get to something usable in the medium term. An embryo of idea would be to add hooks in the machinery to allow an object to interact with an ufunc. Currently, this is done by calling __array__ to extract a ?naked array? (== Numeric.array vs ?augmented array?) but the result is then always a ?naked array?. In pseudocode, this looks like: def ufunc( augmented_array ): if not isarray(augmented_array): augmented_array = augmented_array.__array__() return ufunc.apply(augmented_array) where I would prefer something like def ufunc( augmented_array ): if not isarray(augmented_array): augmented_array, contructor = augmented_array.__array_constructor__() else: constructor = lambda x:x return constructor(ufunc.apply(augmented_array)) For array functions and methods, I have even less clues to a solution J. But calling hooks specified by some protocol would be a path: a) __array_constructor__ b) __array_binary_op__ (would be called for __add__, __sub__, ?) c) __array_rbinary_op__ (would be called for __radd__, __rsub__, ?) If I miss a point and there is an easy way to do this, I?ll be pleased to know it. Otherwise, any feedback on this ability to easily increase array functionalities by appending metadata and related behavior. Sebastien From cjw at sympatico.ca Wed Apr 6 03:15:13 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Apr 6 03:15:13 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <424FE8E7.4040904@ee.byu.edu> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> Message-ID: <4253B691.5030902@sympatico.ca> Travis Oliphant wrote: > Colin J. Williams wrote: > >> C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py install >> running install >> running build >> running config >> error: The .NET Framework SDK needs to be installed before building >> extensions for Python. >> >> Is there any chance that a Windows binary could be made available for >> testing? > > > Probably not in the near term (but you could ask Michiel). > > I'm assuming you have mingw32 installed which would allow you to build > it provided you have created an exports file for python2.4 (look on > the net for how to compile extensions with mingw32 using a MSVC > compiled python). > You have to tell distutils what compiler to use: > > python setup.py config --compiler=mingw32 > python setup.py build --compiler=mingw32 > python setup.py install > > -Travis Thanks to Michiel and Travis for their suggestions. I am using Windows XP and get the following result: C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py config --compiler=minw32 running config error: don't know how to compile C/C++ code on platform 'nt' with 'minw32' compiler C:\Python24\Lib\site-packages\Numeric3\Download> I would welcome any comments. Colin W. From cookedm at physics.mcmaster.ca Wed Apr 6 03:31:40 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 03:31:40 2005 Subject: [Numpy-discussion] array interface nitpicks Message-ID: Just some small nitpicks in the array interface document (http://numeric.scipy.org/array_interface.html): As written: """ __array_shape__ (required) Tuple showing size in each dimension. Each entry in the tuple must be a Python (long) integer. Note that these integers could be larger than the platform "int" or "long" could hold. Use Py_LONG_LONG if accessing the entries of this tuple in C. """ Since this is supposed to be an interface, not an implementation (duck-typing and all that), I think this is too strict: __array_shape__ should just be a sequence of integers, not necessarily a tuple. I'd suggest something like this: ''' __array_shape__ (required) Sequence whose elements are the size in each dimension. Each entry is an integer (a Python int or long). Note that these integers could be larger than the platform "int" or "long" could hold (a Python int is a C long). It is up to the calling code to handle this appropiately; either by raising an error when overflow is possible, or by using Py_LONG_LONG as the C type for the shapes. ''' This is clearer about the users responsibility -- note that Numeric is taking the first approach (error), as the dimensions in PyArrayObject are ints. Similiar comments about __array_strides. I'd reword it along the lines of ''' __array_strides__ (optional) Sequence of strides which provides the number of bytes needed to jump to the next array element in the corresponding dimension. Each entry must be integer (a Python int or long). As with __array_shape__, the values may be larger than can be represented by a C "int" or "long"; the calling code should handle this appropiately, either by raising an error, or by using Py_LONG_LONG in C. Default is a strides tuple which implies a C-style contiguous memory buffer. In this model, the last dimension of the array varies the fastest. For example, the default __array_strides__ tuple for an object whose array entries are 8 bytes long and whose __array_shape__ is (10,20,30) would be (4800, 240, 8) Default: C-style contiguous ''' I'm mostly worried about the use of Python longs; it shouldn't be necessary in almost all cases, and adds extra complications (in normal usage, you don't see Python longs all that much). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cjw at sympatico.ca Wed Apr 6 03:33:05 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Apr 6 03:33:05 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <4253BAA1.7010403@sympatico.ca> S?bastien de Menten wrote: > Hi, > > I follow with great interest the threads around Numeric3/scipy.base. > As Travis suggested (?It would also help if other people who have > concerns would voice them (I'm very grateful for those who have > expressed their concerns) so that we can all address them and get on > the same page for future development.?), I voice my concert J > > Sometimes it is quite useful to treat data at a higher level than just > an ?array of number of some types?. Adding metadata to array (I called > them ?augmented arrays?) is a simple way to add sense to an array. I > see different user cases like: > 1) attaching a physical unit to array data (see for instance Unum > http://home.tiscali.be/be052320/Unum.html ) > 2) description of axis (see > http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very > useful to manipulate easily time series. Does the record array provide a means of addressing this need? > 3) masked arrays as in MA module of Numeric > 4) arrays for interval arithmetic where one keep another array with > precision of data > 5) record arrays (currently being integrated in scipy.base as a base > type) > Yes, and there is numarray's array of objects. > The current solution for those situation is nicely summarized by > quoting Konrad > ?but rather a class written using arrays than a variety of the basic > array type. > It?s actually pretty straightforward to implement, the most difficult > choice being the form of the constructor that gives most flexibility > in use.? > [snip] Colin W. From rkern at ucsd.edu Wed Apr 6 03:36:51 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Apr 6 03:36:51 2005 Subject: [Numpy-discussion] The array interface published In-Reply-To: <20050406095639.GA16810@idi.ntnu.no> References: <20050405203434.38638.qmail@web50204.mail.yahoo.com> <20050406095639.GA16810@idi.ntnu.no> Message-ID: <4253BB73.5000605@ucsd.edu> Magnus Lie Hetland wrote: > So -- at *least* adding the array protocol to it would be doing it a > favour, i.e., making it a useful module, and sort of a prototypical > example of the protocol and such. Adding more dimensions might simply > make it more useful. (I've many times been asked by people how to > create e.g. two-dimensional arrays in Python. It would be nice if > there was actually some basic support for it.) Re-implementing the stdlib-array module to support multiple dimensions is almost certainly a non-starter. You can't easily do it without breaking its pre-allocation strategy. It preallocates memory for elements using the same algorithm that lists do, so .append() has reasonable amortized time behaviour. python-dev will not appreciate changing the algorithmic complexity of a long-existing component to accomodate a half-arsed implementation of N-D arrays. OTOH, it is the one reason for stdlib-array's use in a Numeric world: sometimes, you just need to append values; you can't pre-allocate with Numeric.empty() and index in values. Using stdlib-array to collect the values, then using the buffer interface (soon-to-be __array__ interface) to convert to a Numeric array is faster than the alternatives. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From sdementen at hotmail.com Wed Apr 6 03:59:35 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 03:59:35 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <4253BAA1.7010403@sympatico.ca> Message-ID: >>1) attaching a physical unit to array data (see for instance Unum >>http://home.tiscali.be/be052320/Unum.html ) >>2) description of axis (see >>http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very >>useful to manipulate easily time series. > >Does the record array provide a means of addressing this need? > Not really, when I mean axis, I speak about indexing. For an array (named a) with shape (10, 5, 33), I would like to attach 3 arrays or list or tuple (named axis_information[0], axis_information[1] and axis_information[2]) of size (10,), (5,) and (33,) which give sense to the first, second and third index. For instance, A[i,j,k] => means the element of A at (axis_information[0][i], axis_information[1][j], axis_information[2][k]) instead of A[i,j,k] => means the element of A at index position [i,j,k] which makes less sense (you always need to track the meaning of i,j,k in parallel). >>3) masked arrays as in MA module of Numeric Maybe this one could be implemented using record array with a record like (data, mask). However, it would be cumbersome to use. E.g. a.field("data")[:] = cos( a.field("data")[:] ) instead of a[:] = cos(a[:]) with the current MA module >>4) arrays for interval arithmetic where one keep another array with >>precision of data >>5) record arrays (currently being integrated in scipy.base as a base type) >> >Yes, and there is numarray's array of objects. > This is overkilling as it eats way too much memory. E.g. your data represents instantaneous speeds and so it tagged with a "m/s" information (a complex object) valid for the full array. Distributing this information to each component of an array via an array object is not practical. From mdehoon at ims.u-tokyo.ac.jp Wed Apr 6 04:22:52 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Apr 6 04:22:52 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <4253B691.5030902@sympatico.ca> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> Message-ID: <4253C73E.4030703@ims.u-tokyo.ac.jp> Colin J. Williams wrote: > Thanks to Michiel and Travis for their suggestions. I am using Windows > XP and get the following result: > > C:\Python24\Lib\site-packages\Numeric3\Download>python setup.py > config --compiler=minw32 > running config > error: don't know how to compile C/C++ code on platform 'nt' with > 'minw32' compiler > > C:\Python24\Lib\site-packages\Numeric3\Download> > > I would welcome any comments. --mingw32 contains a 'g'. Also, make sure you have Cygwin installed, with all the necessary packages. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From steve at shrogers.com Wed Apr 6 05:12:39 2005 From: steve at shrogers.com (Steven H. Rogers) Date: Wed Apr 6 05:12:39 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <425372A4.7020900@ee.byu.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> Message-ID: <4253D1B9.90709@shrogers.com> Travis Oliphant wrote: > > Again, scipy.base should *replace* Numerical Python for all users > (except the most adamant who don't seem to want to go with the rest of > the community). scipy.base is a new version of Numeric. On the > C-level I don't know of any incompatibilities, on the Python level > there are a very few (most of them rarely-used typecode character issues > which a simple search and replace will fix). > > I should emphasize this next point, since I don't seem to be coming > across very clearly to some people. As head Numeric developer, I'm > stating that **Numeric 24 is the last release that will be called > Numeric**. New releases of Numeric will be called scipy.base. > I'm happy with the direction your taking to rejoin Numeric and Numarray. However, changing the name from Numeric to scipy.base may contribute to the confusion/concern. Is it really necessary? Steve -- Steven H. Rogers, Ph.D., steve at shrogers.com Weblog: http://shrogers.com/weblog "Reach low orbit and you're half way to anywhere in the Solar System." -- Robert A. Heinlein From konrad.hinsen at laposte.net Wed Apr 6 07:49:45 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Apr 6 07:49:45 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> On Apr 6, 2005, at 12:10, S?bastien de Menten wrote: > However, I disagree with the ?pretty straightforward to implement?. In > fact, if one wants to inherit most of the functionalities of Numeric, > it becomes quite cumbersome. Looking at MA module, I see that it needs > to: It is straightforward AND cumbersome. Lots of work, but nothing difficult. I agree of course that it would be nice to improve the situation. > An embryo of idea would be to add hooks in the machinery to allow an > object to interact with an ufunc. Currently, this is done by calling > __array__ to extract a ?naked array? (== Numeric.array vs ?augmented > array?) but the result is then always a ?naked array?. > In pseudocode, this looks like: > > def ufunc( augmented_array ): > if not isarray(augmented_array): > augmented_array = augmented_array.__array__() > return ufunc.apply(augmented_array) The current behaviour of Numeric is more like def ufunc(object): if isarray(object): return array_ufunc(object) elif is_array_like(object): return array_func(array(object)) else: return object.ufunc() A more general version, which should cover your case as well, would be: def ufunc(object): if isarray(object): return array_ufunc(object) else: try: return object.applyUfunc(ufunc) except AttributeError: if is_array_like(object): return array_func(array(object)) else: raise ValueError There are two advantages: 1) Classes can handle ufuncs in any way they like, even if they implement array-like objects. 2) Classes must implement only one method, not one per ufunc. Compared to the approach that you suggested: > where I would prefer something like > > def ufunc( augmented_array ): > if not isarray(augmented_array): > augmented_array, contructor = > augmented_array.__array_constructor__() > else: > constructor = lambda x:x > return constructor(ufunc.apply(augmented_array)) mine has the advantage of also covering classes that are not array-like at all. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From cjw at sympatico.ca Wed Apr 6 08:16:33 2005 From: cjw at sympatico.ca (cjw at sympatico.ca) Date: Wed Apr 6 08:16:33 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <4253FCD1.2090808@sympatico.ca> S?bastien de Menten wrote: >>> 1) attaching a physical unit to array data (see for instance Unum >>> http://home.tiscali.be/be052320/Unum.html ) >>> 2) description of axis (see >>> http://sourceforge.net/mailarchive/message.php?msg_id=11051806). >>> Very useful to manipulate easily time series. >> >> >> Does the record array provide a means of addressing this need? >> > > Not really, when I mean axis, I speak about indexing. Fair enough, I was thinking one dimensionally. > For an array (named a) with shape (10, 5, 33), I would like to attach > 3 arrays or list or tuple (named axis_information[0], > axis_information[1] and axis_information[2]) of size (10,), (5,) and > (33,) which give sense to the first, second and third index. > For instance, > A[i,j,k] => means the element of A at (axis_information[0][i], > axis_information[1][j], axis_information[2][k]) > instead of > A[i,j,k] => means the element of A at index position [i,j,k] which > makes less sense (you always need to track the meaning of i,j,k in > parallel). > >>> 3) masked arrays as in MA module of Numeric >> > > Maybe this one could be implemented using record array with a record > like (data, mask). > However, it would be cumbersome to use. > E.g. a.field("data")[:] = cos( a.field("data")[:] ) > instead of > a[:] = cos(a[:]) > with the current MA module Assuming "data" is the name of a field in a record array "a", why not have a.data to represent a view (or copy, depending on the convention adopted) of a column in a or a.data.Cos to provide the cosines of the values in the data column? "Cos" is used in place of "cos" to distinguish the method from the function. The former requires no parentheses. This assumes that the values in data are of the approriate numerictype ( with its appropriate typecode). Colin W. > > >>> 4) arrays for interval arithmetic where one keep another array with >>> precision of data >>> 5) record arrays (currently being integrated in scipy.base as a base >>> type) >>> >> Yes, and there is numarray's array of objects. >> > > This is overkilling as it eats way too much memory. > E.g. your data represents instantaneous speeds and so it tagged with a > "m/s" information (a complex object) valid for the full array. > Distributing this information to each component of an array via an > array object is not practical. > From sdementen at hotmail.com Wed Apr 6 08:52:05 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 08:52:05 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: >> >>Maybe this one could be implemented using record array with a record like >>(data, mask). However, it would be cumbersome to use. E.g. >>a.field("data")[:] = cos( a.field("data")[:] ) instead of a[:] = cos(a[:]) >>with the current MA module > >Assuming "data" is the name of a field in a record array "a", why not have >a.data to represent a view (or copy, depending on the convention adopted) >of a column in a or a.data.Cos to provide the cosines of the values in the >data column? > >"Cos" is used in place of "cos" to distinguish the method from the >function. The former requires no parentheses. > Well, I think the whole point is to be able to use "without changes" any library that manipulate arrays with "augmented arrays": same code for all arrays independently of them being "naked" or "augmented". The "without changes" and "any library" should be taken with a pinch of salt as operation that are accepted for any array will not necessarily mean something for some "augmented arrays". On a side note, I rather prefer to keep mathematical notation instead of OO notation ( cos as function vs method ) From sdementen at hotmail.com Wed Apr 6 09:07:07 2005 From: sdementen at hotmail.com (Sébastien de Menten) Date: Wed Apr 6 09:07:07 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> Message-ID: > >>However, I disagree with the ?pretty straightforward to implement?. In >>fact, if one wants to inherit most of the functionalities of Numeric, it >>becomes quite cumbersome. Looking at MA module, I see that it needs to: > >It is straightforward AND cumbersome. Lots of work, but nothing difficult. >I agree of course that it would be nice to improve the situation. My fault, I misunderstood your answer (... but it was a little bit misleading :-) >The current behaviour of Numeric is more like > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > elif is_array_like(object): > return array_func(array(object)) > else: > return object.ufunc() > >A more general version, which should cover your case as well, would be: > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > else: > try: > return object.applyUfunc(ufunc) > except AttributeError: > if is_array_like(object): > return array_func(array(object)) > else: > raise ValueError > >There are two advantages: > >1) Classes can handle ufuncs in any way they like, even if they implement > array-like objects. >2) Classes must implement only one method, not one per ufunc. > >Compared to the approach that you suggested: > >>where I would prefer something like >> >> def ufunc( augmented_array ): >> if not isarray(augmented_array): >> augmented_array, contructor = >>augmented_array.__array_constructor__() >> else: >> constructor = lambda x:x >> return constructor(ufunc.apply(augmented_array)) > >mine has the advantage of also covering classes that are not array-like at >all. > Yes !! That's a elegant solution for the ufunc part. Do you think it is possible to integrate a similar mechanism in array functions (like searchsorted, argmax, ...). If we can register functions taking one array as argument within scipy.base and let it dispatch those functions as ufunc, we could use a similar strategy. For instance, let "sort" and "argmax" be registered as gfunc (general functions on an array <> ufunc), then any class that would like to overide any of them could do it too with the same trick Konrad exposed here above. If another function uses those gfuncs and ufuncs, it inherits the genericity of the latter. Konrad, do you think it is tricky to have a prototype of your suggestion (i.e. the modification does not need a full understanding of Numeric and you can locate it approximately in the source code) ? Seb >Konrad. >-- From mike_lists at yahoo.com.au Wed Apr 6 10:12:39 2005 From: mike_lists at yahoo.com.au (Michael Sorich) Date: Wed Apr 6 10:12:39 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: 6667 Message-ID: <20050406171008.58480.qmail@web53602.mail.yahoo.com> I think that this is a great idea! While I have a strong preference for python, I generally use R for statistical analyses due to the large number of mature libraries available. There are also some aspects of the R data types (eg data-frames and column/row names for 2D arrays) that are really nice for spreadsheet like data. I hope that scipy.base record arrays will be as easily manipulated as data-frames are. While RPy works well for small simple problems, there are data conversion limitations between R and Python. If one could efficiently convert between the major R data types and python scipy.base data types without loss of data, it would become possible to do most of the data manipulation in python and freely mix in R functions when required. This may encourage the use of python for the development of statistical routines. >From my meager understanding of RPy: R vectors are converted to python lists. It may make more sense to convert them to an array (either stdlib or scipy.base version) - without copying data if possible. R arrays and matrices are converted to Numeric arrays. Eg In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) Out[8]: array([[1, 3, 5], [2, 4, 6]]) However, column and row names (or dimnames for arrays with >2 dimensions) are lost in R->Py conversion. I do not know whether these conversions require copying of the data. R data-frames are currently converted to python dictionaries and I don?t think that there is any simple way to convert a python object to an R data frame. This is the biggest limitation of rpy in my opinion. In [16]: r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) Out[16]: {'col2': ['one', 'two', 'three', 'four'], 'col1': [1, 2, 3, 4]} If it were possible to convert between an R data-frame and a scipy.base record array without copying or losing data, RPy would become more useful. I wish I understood C, scipy.base and R well enough to give this a go. However, this is Way over my head! Mike --- Magnus Lie Hetland wrote: > I was just thinking about some experimental designs, > and whether I > could, perhaps, do the statistics in Python. I > remembered having used > RPy [1] briefly at some time (there may be other > similar bindings out > there -- I don't remember) and started thinking > about whether I could, > perhaps, combine it with numpy in some way. My first > thought was to > reimplement the relevant statistical functions; then > I thought about > how to convert data back and forth -- but then it > occurred to me that > R also uses arrays extensively, and that it could, > perhaps, be > possible to expose those (through something like > RPy) through the > array interface/protocol! > > This would be (IMO) a good example of the benefits > of the array > protocol; it's not a matter of "getting yet another > array module". RPy > is an external library/language with *lots* of > features that might be > useful to numpy users, many of which aren't likely > to be implemented > in Python for quite a while, I'd guess (unless, > perhaps, someone > writes a translator from R, which I'm sure is > doable). > > I don't know enough (at least yet ;) about the > implementation of RPy > and the R library to say for sure whether this would > even be possible, > but it does seem like it could be really useful... > > [1] rpy.sf.net > > -- > Magnus Lie Hetland Fall seven > times, stand up eight > http://hetland.org > [Japanese proverb] > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT > Products from real users. > Discover which products truly live up to the hype. > Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > Find local movie times and trailers on Yahoo! Movies. http://au.movies.yahoo.com From bsouthey at gmail.com Wed Apr 6 11:38:37 2005 From: bsouthey at gmail.com (Bruce Southey) Date: Wed Apr 6 11:38:37 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> Message-ID: Hi, I don't see that it is feasible to link R and numerical python in this way. As you point out, R objects (R is an object orientated language) uses a lot of meta-data. Then there is the IEEE stuff (NaN etc) that would also need to be handled in numerical python. You probably could get RPy or RSPython to use numerical python rather than just baisc Python. What statistical functions would you want in numerical python? Regards Bruce On Apr 6, 2005 12:10 PM, Michael Sorich wrote: > I think that this is a great idea! While I have a > strong preference for python, I generally use R for > statistical analyses due to the large number of mature > libraries available. There are also some aspects of > the R data types (eg data-frames and column/row names > for 2D arrays) that are really nice for spreadsheet > like data. I hope that scipy.base record arrays will > be as easily manipulated as data-frames are. > > While RPy works well for small simple problems, there > are data conversion limitations between R and Python. > If one could efficiently convert between the major R > data types and python scipy.base data types without > loss of data, it would become possible to do most of > the data manipulation in python and freely mix in R > functions when required. This may encourage the use of > python for the development of statistical routines. > > From my meager understanding of RPy: > > R vectors are converted to python lists. It may make > more sense to convert them to an array (either stdlib > or scipy.base version) - without copying data if > possible. > > R arrays and matrices are converted to Numeric arrays. > Eg > > In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) > Out[8]: > array([[1, 3, 5], > [2, 4, 6]]) > > However, column and row names (or dimnames for arrays > with >2 dimensions) are lost in R->Py conversion. I do > not know whether these conversions require copying of > the data. > > R data-frames are currently converted to python > dictionaries and I don't think that there is any > simple way to convert a python object to an R data > frame. This is the biggest limitation of rpy in my > opinion. > > In [16]: > r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) > Out[16]: {'col2': ['one', 'two', 'three', 'four'], > 'col1': [1, 2, 3, 4]} > > If it were possible to convert between an R data-frame > and a scipy.base record array without copying or > losing data, RPy would become more useful. > > I wish I understood C, scipy.base and R well enough to > give this a go. However, this is Way over my head! > > Mike > > --- Magnus Lie Hetland wrote: > > I was just thinking about some experimental designs, > > and whether I > > could, perhaps, do the statistics in Python. I > > remembered having used > > RPy [1] briefly at some time (there may be other > > similar bindings out > > there -- I don't remember) and started thinking > > about whether I could, > > perhaps, combine it with numpy in some way. My first > > thought was to > > reimplement the relevant statistical functions; then > > I thought about > > how to convert data back and forth -- but then it > > occurred to me that > > R also uses arrays extensively, and that it could, > > perhaps, be > > possible to expose those (through something like > > RPy) through the > > array interface/protocol! > > > > This would be (IMO) a good example of the benefits > > of the array > > protocol; it's not a matter of "getting yet another > > array module". RPy > > is an external library/language with *lots* of > > features that might be > > useful to numpy users, many of which aren't likely > > to be implemented > > in Python for quite a while, I'd guess (unless, > > perhaps, someone > > writes a translator from R, which I'm sure is > > doable). > > > > I don't know enough (at least yet ;) about the > > implementation of RPy > > and the R library to say for sure whether this would > > even be possible, > > but it does seem like it could be really useful... > > > > [1] rpy.sf.net > > > > -- > > Magnus Lie Hetland Fall seven > > times, stand up eight > > http://hetland.org > > [Japanese proverb] > > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT > > Products from real users. > > Discover which products truly live up to the hype. > > Start reading now. > > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > Find local movie times and trailers on Yahoo! Movies. > http://au.movies.yahoo.com > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From oliphant at ee.byu.edu Wed Apr 6 12:28:50 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 12:28:50 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42537C6D.8040900@ims.u-tokyo.ac.jp> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537C6D.8040900@ims.u-tokyo.ac.jp> Message-ID: <425437E2.4090000@ee.byu.edu> Michiel Jan Laurens de Hoon wrote: > Travis Oliphant wrote: > >> Again, scipy.base should *replace* Numerical Python for all users > > > Sorry, I give up. I have been very happy with Numerical Python so far > and the new Numerical Python just looks too much like SciPy to me. > It's even called scipy.base. In practical terms, what I've noticed is > that what used to work with Numerical Python no longer works with > Numeric3. For example: It's apparent you have negative pre-conceptions about scipy (even though scipy has always just built on top of Numeric so I'm not sure what your difficulties have been). This is unfortunate. scipy.base is going to be a lot more like Numeric than scipy was. So, I think you can relax. > > >>> from ndarray import * > >>> argmax > Traceback (most recent call last): > File "", line 1, in ? > NameError: name 'argmax' is not defined This is only because the conversion hasn't completely taken place (I'm not importing the numeric.py module in __init__ yet because it hasn't been adjusted). Remember ndarray is just a place-holder while development happens, so of course quite a few things aren't there yet. I've been swamped so far. from ndarray import * won't even be the name to use. The package won't be called ndarray. This is all just for temporary development purposes. All of what you belive should work will still continue to work. So, relax..... > >>> > > From what I understand from the discussion, "from Numeric import *" > will still work, but it will be deprecated, which means that I will > have to change my code at some point. Not to mention the other > packages (LinearAlgebra, RandomArray, etc.). It's just too much trouble. Deprecated means new documentation won't teach that approach, that's pretty much it. The approach will still be supported for quite a while so people can switch when and if they want. I don't see "the trouble" at all. > Anyway, I am about to change jobs (I will be moving to Columbia > University soon), so I have decided to take some time off the > Numerical Python project and see where we stand in a few months time. > Hopefully, the situation will have cleared up by then. Sounds like an exciting move. Perhaps I can meet you in person if I'm in New York or if you are every in Utah. I sincerely hope you will find the new scipy.base to your liking. I can promise you that your concerns are near the top of my list. It's too bad you can't help us get there more quickly. -Travis From oliphant at ee.byu.edu Wed Apr 6 12:41:31 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 12:41:31 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: References: Message-ID: <42543B1B.3090209@ee.byu.edu> S?bastien de Menten wrote: > > Hi Travis, > > Could you look at bug > [ 635104 ] segfault unpickling Numeric 'O' array > [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of > previous one) > > I proposed a (rather simple) solution that I put in the comment of bug > [ 635104 ]. But apparently, nobody is looking at those bugs... One thing I don't like about sourceforge bug tracker is that I don't get any email notification of bugs. Is there an option for that? I check my email, far more often than I check a website. Sourceforge can be quite slow to manipulate around in. Now, that you've mentioned it, I'll look into it. I'm not sure that object arrays could every be pickled correctly. -Travis > >> >> I'd like to release a Numeric 24.0 to get the array interface out >> there. There are also some other bug fixes in Numeric 24.0 >> >> Here is the list so far from Numeric 23.7 >> >> [Greenfield] Changed so a[0,0] and a[0][0] returns same type when a >> is 2-d of Int16 > > > This is quite disturbing. In fact for all types that are not exactly > equivalent to python type, indexing a multidimensional array (rank > > 1) return arrays even if the final shape is (). So, what should it do? This is the crux of a long-standing wart in Numerical Python that nobody has had a good solution to (I think the array scalars that have been introduced for scipy.base are the best solution yet). Right now, the point is that different things are done for different indexing strategies. Is this a good thing? Maybe it is. We can certainly leave it the way it is now and back-out the change. The current behavior is: Subscripting always produces a rank-0 array if the type doesn't match a basic Python type. Item getting always produces a basic Python type (even if there is no match). So a[0,0] and a[0][0] will return different things if a is an array of short's for example. This may be what we live with and just call it a "feature" > So > type(zeros((5,2,4), Int8 )[0,0,0]) => > type(zeros((5,2,4), Int32 )[0,0,0]) => > type(zeros((5,2), Float32 )[0,0]) => > But > type(zeros((5,2,4), Int )[0,0,0]) => > type(zeros((5,2,4), Float64)[0,0,0]) => > type(zeros((5,2,4), Float)[0,0,0]) => > type(zeros((5,2,4), PyObject)[0,0,0]) => > > Notice too the weird difference betweeb Int <> Int32 and Float == > Float64. This has been in Numeric for a long time (the coercion problems was one of the big reasons for it). If you return a Python integer when indexing an Int8 array then use that for multiplication you get undesired up-casting. There is no scalar Int8 type to return (thus a 0-dimensional array that can act like a scalar is returned). In scipy.base there are now scalar-like objects for all of the supported array types which is one solution to this problem that was made possible by the ability to inherit in C that is now part of Python. What platform are you on? Notice that Int is interpreted as C-long (PyArray_LONG) while Int32 is PyArray_INT. This has been another wart in Numerical Python. By the way, I've fixed PyArray_Return so that if sizeof(long)==sizeof(int) then PyArray_INT also returns a Python integer. I think for places where sizeof(long)==sizeof(int) PyArray_LONG and PyArray_INT should be treated identically. > > However, when indexing a onedimensional array (rank == 1), then we get > back scalar for indexing operations on all types. > > So, when you say "return the same type", do you think scalar or array > (it smells like a recent discussion on Numeric3 ...) ? I just think the behavior ought to be the same for a[0,0] or a[0][0] but maybe I'm wrong and we should keep the dichotomy to satisfy both groups of people. Because of the problems I alluded to, sometimes a 0-dimensional array should be returned. -Travis From tchur at optushome.com.au Wed Apr 6 14:00:52 2005 From: tchur at optushome.com.au (Tim Churches) Date: Wed Apr 6 14:00:52 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050406171008.58480.qmail@web53602.mail.yahoo.com> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> Message-ID: <42544D54.7040507@optushome.com.au> Michael Sorich wrote: > While RPy works well for small simple problems, there > are data conversion limitations between R and Python. > If one could efficiently convert between the major R > data types and python scipy.base data types without > loss of data, it would become possible to do most of > the data manipulation in python and freely mix in R > functions when required. This may encourage the use of > python for the development of statistical routines. That's exactly what we do in our project (http://www.netepi.org) which uses NumPy, RPy and R. The Python<->R interface provided by RPy has a few wrinkles but overall is remarkably seemless and remarkably robust. >>From my meager understanding of RPy: > > R vectors are converted to python lists. It may make > more sense to convert them to an array (either stdlib > or scipy.base version) - without copying data if > possible. RPy directly converts (by copying) NumPy arrays to R arrays and vice versa. C code is used to do this and it is quite fast. No Python lists are involved. You do need to have NumPy installed (oncluding its header files) when you compile RPy for this to work - otherwise RPy *does* convert R arrays to Python lists. > R arrays and matrices are converted to Numeric arrays. > Eg > > In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) > Out[8]: > array([[1, 3, 5], > [2, 4, 6]]) > > However, column and row names (or dimnames for arrays > with >2 dimensions) are lost in R->Py conversion. I do > not know whether these conversions require copying of > the data. > > R data-frames are currently converted to python > dictionaries and I don?t think that there is any > simple way to convert a python object to an R data > frame. This is the biggest limitation of rpy in my > opinion. > > In [16]: > r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) > Out[16]: {'col2': ['one', 'two', 'three', 'four'], > 'col1': [1, 2, 3, 4]} > > If it were possible to convert between an R data-frame > and a scipy.base record array without copying or > losing data, RPy would become more useful. > > I wish I understood C, scipy.base and R well enough to > give this a go. However, this is Way over my head! You can extend the conversion routines of RPy (in either direction) using a very simple interface, using just Python and R. No knowledge of C is necessary. For example, if you want to convert an R data.frame into a custom class which you have written in Python, it is quite easy to add that to Rpy. There is an example for doing this with data.frames given in the Rpy documentation. (More comments below). > --- Magnus Lie Hetland wrote: > >>I was just thinking about some experimental designs, >>and whether I >>could, perhaps, do the statistics in Python. I >>remembered having used >>RPy [1] briefly at some time (there may be other >>similar bindings out >>there -- I don't remember) There is also RSPython, which allows Python to be called from R as well as R to be called from Python. However, it is far more experimental than RPy, and much harder to build and rather less robust, but more ambitious in its scope. RPy only allows calling of R functions (almost everything is done via functions in R) from Python, although as noted above it has good facilities for converting R objects back into Python objects, and also allows R objects to be returned to Python as native, unconverted R objects - so you can store native R objects in a Python list or dictionary if you wish. You can't see inside those native R objects with Python, but you can use them as arguments to R functions called via RPy. However, the default action in RPy is to do its best to convert R objects into Python data structures when R functions called via RPy return. That conversion is easily customisable as noted above. >> and started thinking >>about whether I could, >>perhaps, combine it with numpy in some way. My first >>thought was to >>reimplement the relevant statistical functions; then >>I thought about >>how to convert data back and forth -- but then it >>occurred to me that >>R also uses arrays extensively, and that it could, >>perhaps, be >>possible to expose those (through something like >>RPy) through the >>array interface/protocol! It seems that the new NumPy array interface could indeed be used to allow Python and R to share the same array data, rather than making copies as happens at present (albeit very quickly). >>This would be (IMO) a good example of the benefits >>of the array >>protocol; it's not a matter of "getting yet another >>array module". RPy >>is an external library/language with *lots* of >>features that might be >>useful to numpy users, many of which aren't likely >>to be implemented >>in Python for quite a while, I'd guess (unless, >>perhaps, someone >>writes a translator from R, which I'm sure is >>doable). R is a massive project with a huge library of statistical routines - it is several times larger in its extent than Python (that's a weakness as well as a strength, as R tends to be sprawling and rather intimidating in its size). R also has a very large community of top computational statisticians behind it. Better to work with R than to try to compete with it. That said, there is no reason not to port R libraries or specific R functions to NumPy where that provides performance gains, or where the data are large and already handled in NumPy. Our approach in NetEpi (http://www.netepi.org) is to do the data selection and reduction (usually summarisation) in NumPy (where we store data on disc as memory-mapped NumPy arrays) and then pass the much smaller summarised results to R for plotting or fitting complex statistical models. However, we do calculation of elementary statistics (means, quantiles and other measures of location, variance etc) in NumPy wherever possible to avoid copying large amounts of data to R via RPy. >>I don't know enough (at least yet ;) about the >>implementation of RPy >>and the R library to say for sure whether this would >>even be possible, >>but it does seem like it could be really useful... >> >>[1] rpy.sf.net I have copied this message to the RPy list - hopefully some fruitful discussion can ensue. Tim C From gregory.r.warnes at pfizer.com Wed Apr 6 14:02:05 2005 From: gregory.r.warnes at pfizer.com (Warnes, Gregory R) Date: Wed Apr 6 14:02:05 2005 Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example applicati on of the array interface] Message-ID: <915D2D65A9986440A277AC5C98AA466F978DC2@groamrexm02.amer.pfizer.com> Hi All, It is possible to establish conversion functions so that R dataframe, lists, and vector objects are better translated into python equivalents. I've made several aborted stabs at this, but my time has been extremely limited. The basic task is to create a functionally equivalent python class [The tricky bit here is that R list and vector objects have both order and names. It is possible to emulate this in python by creating a base object that maintains a dictionary of names in along side the data vector/matrix data.] See the example in the rpu documentation at http://rpy.sourceforge.net/rpy/doc/manual_html/DataFrame-class.html#DataFram e%20class. This shouldn't be very hard if someone can dedicate a bit of time to it. -Greg (Current RPy maintainer) > -----Original Message----- > From: rpy-list-admin at lists.sourceforge.net > [mailto:rpy-list-admin at lists.sourceforge.net]On Behalf Of Tim Churches > Sent: Wednesday, April 06, 2005 4:22 PM > To: rpy-list at lists.sourceforge.net > Subject: [Rpy] [Fwd: Re: [Numpy-discussion] Possible example > application > of the array interface] > > > The following discussion occured on the Numeric Python mailing list. > Others may wish to enjoin the conversation. > > Tim C > > -------- Original Message -------- > Subject: Re: [Numpy-discussion] Possible example application of the > array interface > Date: Thu, 7 Apr 2005 03:10:08 +1000 (EST) > From: Michael Sorich > To: numpy-discussion at lists.sourceforge.net > > I think that this is a great idea! While I have a > strong preference for python, I generally use R for > statistical analyses due to the large number of mature > libraries available. There are also some aspects of > the R data types (eg data-frames and column/row names > for 2D arrays) that are really nice for spreadsheet > like data. I hope that scipy.base record arrays will > be as easily manipulated as data-frames are. > > While RPy works well for small simple problems, there > are data conversion limitations between R and Python. > If one could efficiently convert between the major R > data types and python scipy.base data types without > loss of data, it would become possible to do most of > the data manipulation in python and freely mix in R > functions when required. This may encourage the use of > python for the development of statistical routines. > > >From my meager understanding of RPy: > > R vectors are converted to python lists. It may make > more sense to convert them to an array (either stdlib > or scipy.base version) - without copying data if > possible. > > R arrays and matrices are converted to Numeric arrays. > Eg > > In [8]: r.array([1,2,3,4,5,6],dim=[2,3]) > Out[8]: > array([[1, 3, 5], > [2, 4, 6]]) > > However, column and row names (or dimnames for arrays > with >2 dimensions) are lost in R->Py conversion. I do > not know whether these conversions require copying of > the data. > > R data-frames are currently converted to python > dictionaries and I don?t think that there is any > simple way to convert a python object to an R data > frame. This is the biggest limitation of rpy in my > opinion. > > In [16]: > r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four']) > Out[16]: {'col2': ['one', 'two', 'three', 'four'], > 'col1': [1, 2, 3, 4]} > > If it were possible to convert between an R data-frame > and a scipy.base record array without copying or > losing data, RPy would become more useful. > > I wish I understood C, scipy.base and R well enough to > give this a go. However, this is Way over my head! > > Mike > > --- Magnus Lie Hetland wrote: > > I was just thinking about some experimental designs, > > and whether I > > could, perhaps, do the statistics in Python. I > > remembered having used > > RPy [1] briefly at some time (there may be other > > similar bindings out > > there -- I don't remember) and started thinking > > about whether I could, > > perhaps, combine it with numpy in some way. My first > > thought was to > > reimplement the relevant statistical functions; then > > I thought about > > how to convert data back and forth -- but then it > > occurred to me that > > R also uses arrays extensively, and that it could, > > perhaps, be > > possible to expose those (through something like > > RPy) through the > > array interface/protocol! > > > > This would be (IMO) a good example of the benefits > > of the array > > protocol; it's not a matter of "getting yet another > > array module". RPy > > is an external library/language with *lots* of > > features that might be > > useful to numpy users, many of which aren't likely > > to be implemented > > in Python for quite a while, I'd guess (unless, > > perhaps, someone > > writes a translator from R, which I'm sure is > > doable). > > > > I don't know enough (at least yet ;) about the > > implementation of RPy > > and the R library to say for sure whether this would > > even be possible, > > but it does seem like it could be really useful... > > > > [1] rpy.sf.net > > > > -- > > Magnus Lie Hetland Fall seven > > times, stand up eight > > http://hetland.org > > [Japanese proverb] > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from > real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_ide95&alloc_id396&op=click > _______________________________________________ > rpy-list mailing list > rpy-list at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rpy-list > > LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. From cookedm at physics.mcmaster.ca Wed Apr 6 14:04:36 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 14:04:36 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42543B1B.3090209@ee.byu.edu> (Travis Oliphant's message of "Wed, 06 Apr 2005 13:40:11 -0600") References: <42543B1B.3090209@ee.byu.edu> Message-ID: Travis Oliphant writes: > S?bastien de Menten wrote: > >> >> Hi Travis, >> >> Could you look at bug >> [ 635104 ] segfault unpickling Numeric 'O' array >> [ 567796 ] unpickling of 'O' arrays causes segfault (duplicate of >> previous one) >> >> I proposed a (rather simple) solution that I put in the comment of >> bug [ 635104 ]. But apparently, nobody is looking at those bugs... > > > One thing I don't like about sourceforge bug tracker is that I don't > get any email notification of bugs. Is there an option for that? I > check my email, far more often than I check a website. Sourceforge > can be quite slow to manipulate around in. I think if the bug is assigned to you, you get email. > >> So >> type(zeros((5,2,4), Int8 )[0,0,0]) => >> type(zeros((5,2,4), Int32 )[0,0,0]) => >> type(zeros((5,2), Float32 )[0,0]) => >> But >> type(zeros((5,2,4), Int )[0,0,0]) => >> type(zeros((5,2,4), Float64)[0,0,0]) => >> type(zeros((5,2,4), Float)[0,0,0]) => >> type(zeros((5,2,4), PyObject)[0,0,0]) => >> >> Notice too the weird difference betweeb Int <> Int32 and Float == >> Float64. > > By the way, I've fixed PyArray_Return so that if > sizeof(long)==sizeof(int) then PyArray_INT also returns a Python > integer. I think for places where sizeof(long)==sizeof(int) > PyArray_LONG and PyArray_INT should be treated identically. I don't think this is good -- it's just papering over the problem. It leads to different behaviour on machines where sizeof(long) != sizeof(int) (specifically, the problem reported by Nils Wagner *won't* be fixed by this on my machine). On some machines x[0] will give you a int (where x is an array of Int32), on others an array: not fun. I see you already beat me in changing PyArray_PyIntAsInt to support rank-0 integer arrays. How about changing that to instead using anything that int() can handle (using PyNumber_AsInt)? This would include anything int-like (rank-0 integer arrays, scipy.base array scalars, etc.). The side-effect is that you can index using floats (since int() of a float truncates it towards 0). If this is a big deal, I can special-case floats to raise an error. This would make (almost) all Numeric behaviour consistent with regards to using Python ints, Python longs, and rank-0 integer arrays, and other int-like objects. >> However, when indexing a onedimensional array (rank == 1), then we >> get back scalar for indexing operations on all types. >> >> So, when you say "return the same type", do you think scalar or >> array (it smells like a recent discussion on Numeric3 ...) ? > > I just think the behavior ought to be the same for a[0,0] or a[0][0] > but maybe I'm wrong and we should keep the dichotomy to satisfy both > groups of people. Because of the problems I alluded to, sometimes a > 0-dimensional array should be returned. I'd prefer having a[0,0] and a[0][0] return the same thing: it's not the special case of how to do two indices: it's the special-casing of rank-1 arrays as compared to rank-n arrays. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Wed Apr 6 14:42:38 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 14:42:38 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric Message-ID: I've always found the Numeric setup.py to be not very user-friendly. So, I rewrote it. It's available as patch #1178095 http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 Basically, all the editing you need to do is in customize.py, instead of touching setup.py. No more commenting out files for lapack_lite (just tell it to use the system LAPACK, and tell it where to find it). Also, you could now use GSL's cblas interface for dotblas. Useful if you've already taken the trouble to link that with an optimized Fortran BLAS. I didn't want to just through this into CVS without feedback first :-) If it looks good, this can go in Numeric 24.0. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From perry at stsci.edu Wed Apr 6 15:05:47 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 6 15:05:47 2005 Subject: [Numpy-discussion] Re: Array Metadata In-Reply-To: <200504011146.44549.faltet@carabos.com> References: <20050401041204.18335.qmail@web50208.mail.yahoo.com> <200504011146.44549.faltet@carabos.com> Message-ID: <00c3ccc871b2107c78efa7cb3758fe8c@stsci.edu> Coming in very late... On Apr 1, 2005, at 4:46 AM, Francesc Altet wrote: > I'm very much with the opinions of Scott. Just some remarks. > > A Divendres 01 Abril 2005 06:12, Scott Gilbert va escriure: >>> I also think that rather than attach < or > to the start of the >>> string it would be easier to have another protocol for endianness. >>> Perhaps something like: >>> >>> __array_endian__ (optional Python integer with the value 1 in it). >>> If it is not 1, then a byteswap must be necessary. >> >> A limitation of this approach is that it can't adequately represent >> struct/record arrays where some fields are big endian and others are >> little >> endian. > > Having a mix of different endianess data values in the same data > record would be a bit ill-minded. In fact, numarray does not support > this: a recarray should be all little or big endian. I think that '<' > and '>' would be more than enough to represent this. > Nothing intrinsically prevents numarray from allowing this for records, but I'd agree that I have a hard time understanding when a given record array would have mixed endianess. >>> So, what if we proposed for the Python core not something like >>> Numeric3 (which would still exist in scipy.base and be everybody's >>> favorite array :-) ), but a very minimal array object (scaled back >>> even from Numeric) that followed the array protocol and had some >>> C-API associated with it. >>> >>> This minimal array object would support 5 basic types ('bool', >>> 'integer', 'float', 'complex', 'Object'). (Maybe a void type >>> could be defined and a void "scalar" introduced (which would be >>> the bytes object)). These types correspond to scalars already >>> available in Python and so the whole 0-dim array Python scalar >>> arguments could be ignored. >> >> I really like this idea. It could easily be implemented in C or >> Python >> script. Since half it's purpose is for documentation, the Python >> script >> implementation might make more sense. > > Yeah, I fully agree with this also. > > I'm not against it, but I wonder if it is the most important thing to do next. I can imagine that there are many other issues that deserve more attention than this. But I won't tell Travis what to do, obviously. Likewise about working on the current Python array module. Perry Perry From perry at stsci.edu Wed Apr 6 15:09:11 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 6 15:09:11 2005 Subject: [Numpy-discussion] Questions about ufuncs now. In-Reply-To: <4253028D.4090407@ee.byu.edu> References: <4253028D.4090407@ee.byu.edu> Message-ID: <0d2b3dd0b5f97750022b47de6f1fad33@stsci.edu> On Apr 5, 2005, at 5:26 PM, Travis Oliphant wrote: > > The arrayobject for scipy.base seems to be working. Currently the > Numeric3 CVS tree is using the "old-style" ufuncs modified with new > code for the newly added types. It should be quite functionable > now for the brave at heart. > > I'm now working on modifying the ufunc object for scipy.base. > > These are the changes I'm working on: > > 1) a thread-specific? context that allows "buffer-size" level > trapping > of errors and retrieving of flags set. Similar to the > decimal.context specification, but it uses the floating point > sticky bits to implement. > > 2) implementation of buffers so that type-conversions (and > byteswapping and alignment if necessary) never creates temporaries > larger than the buffer-size (the buffer-size is user settable). > > 3) a reworking of the general N-dimensional loop to use array > iterators with optimizations > applied for contiguous arrays. > > 4) Alteration of coercion rules so that scalars (i.e. rank-0 arrays) > do not dictate coercion rules > Also, change so that certain mixed-type operations are computed in > larger type for both. > > Most of this is pretty straightforward. But, I do have one addiitonal > question. Do the new array scalars count as "non-coercing" scalars > (i.e. like the Python scalars), or do they cause coercion? > > My preference is that ALL scalars (anything that becomes > 0-dimensional arrays internally) cause only "kind-casting" (i.e. int > to float, float to complex, etc.) but not "type-casting" > Seems reasonable. One could argue that since they have their own precision that normal coercion rules should apply, but so long as Python scalar literals don't, having different coercion rules for what look like scalars taken from arrays than for python scalars is bound to lead to great confusion. So I agree. Perry From perry at stsci.edu Wed Apr 6 15:09:51 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 6 15:09:51 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42537690.5040400@colorado.edu> References: <42531880.3060600@ee.byu.edu> <4253597F.1090501@ims.u-tokyo.ac.jp> <425372A4.7020900@ee.byu.edu> <42537690.5040400@colorado.edu> Message-ID: <7779a4425dd6f32659e9c5f15b48e180@stsci.edu> I'll echo Fernando's comments. On Apr 6, 2005, at 1:41 AM, Fernando Perez wrote: > Travis Oliphant wrote: >> Michiel Jan Laurens de Hoon wrote: > >>> But SciPy has been moving away (e.g. by replacing functions by >>> methods). >> Michiel, you seem to want to create this impression that "SciPy" is >> "moving away." I'm not sure of your motivations. But, since this >> is a public forum, I have to restate emphatically, that "SciPy" is >> not "moving away from Numeric." It is all about bringing together >> the communities. For the 5 years that scipy has been in development, >> it has always been about establishing a library of common routines >> that we could all share. It has built on Numeric from the >> beginning. Now, there is another "library" of routines that is >> developing around numarray. It is this very real break that I'm >> trying to help fix. I have no other "desire" to "move away" or >> "create a break" or any other such notions that you seem to want to >> spread. > > FWIW, I think you (Travis) have been exceedingly clear in explaining > this process, and in pointing out how this is: > > a) NOT a further split, but rather the EXACT OPPOSITE (numarray users > will have a transition path back into a project which will provide the > best of the old Numeric, along with all the critical enhancements > which Perry, Todd et al. added to numarray). > > b) a way, via the array protocol, to provide third-party low-level > libraries an easy way to, AT THE C LEVEL, interact easily and > efficiently (without unnecessary copies) with numeri* arrays. > > [...] From Chris.Barker at noaa.gov Wed Apr 6 15:37:05 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 15:37:05 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <4253C73E.4030703@ims.u-tokyo.ac.jp> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp> Message-ID: <42546439.5060301@noaa.gov> Michiel Jan Laurens de Hoon wrote: > Also, make sure you have Cygwin installed, with all the necessary packages. MinGw is NOT Cygwin. You need to have MinGw installed, with all the necessary packages. I don't remember which ones, but I think there is not a single large package that gives you the whole pile. I do remember it being pretty easy for me last time I did it. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cookedm at physics.mcmaster.ca Wed Apr 6 15:44:36 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 15:44:36 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> (konrad hinsen's message of "Wed, 6 Apr 2005 16:48:30 +0200") References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> Message-ID: konrad.hinsen at laposte.net writes: > On Apr 6, 2005, at 12:10, S?bastien de Menten wrote: > >> However, I disagree with the "pretty straightforward to >> implement". In fact, if one wants to inherit most of the >> functionalities of Numeric, it becomes quite cumbersome. Looking at >> MA module, I see that it needs to: > > It is straightforward AND cumbersome. Lots of work, but nothing > difficult. I agree of course that it would be nice to improve the > situation. > >> An embryo of idea would be to add hooks in the machinery to allow an >> object to interact with an ufunc. Currently, this is done by calling >> __array__ to extract a "naked array" (== Numeric.array vs >> "augmented array") but the result is then always a "naked >> array". >> In pseudocode, this looks like: >> >> def ufunc( augmented_array ): >> if not isarray(augmented_array): >> augmented_array = augmented_array.__array__() >> return ufunc.apply(augmented_array) > > The current behaviour of Numeric is more like > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > elif is_array_like(object): > return array_func(array(object)) > else: > return object.ufunc() > > A more general version, which should cover your case as well, would be: > > def ufunc(object): > if isarray(object): > return array_ufunc(object) > else: > try: > return object.applyUfunc(ufunc) > except AttributeError: > if is_array_like(object): > return array_func(array(object)) > else: > raise ValueError > > There are two advantages: > > 1) Classes can handle ufuncs in any way they like, even if they > implement > array-like objects. > 2) Classes must implement only one method, not one per ufunc. I like this! It's got namespace goodness all over it (last Python zen line in 'import this': Namespaces are one honking great idea -- let's do more of those!) I'd propose making the special method __ufunc__. > Compared to the approach that you suggested: > >> where I would prefer something like >> >> def ufunc( augmented_array ): >> if not isarray(augmented_array): >> augmented_array, contructor = >> augmented_array.__array_constructor__() >> else: >> constructor = lambda x:x >> return constructor(ufunc.apply(augmented_array)) > > mine has the advantage of also covering classes that are not > array-like at all. ... like your derivative classes, which are very useful. There are two different uses that ufuncs apply to, however. 1) arrays. Here, we want efficient computation of functions applied to lots of elements. That's where the output arguments and special methods (.reduce, .accumulate, and .outer) are useful 2) polymorphic functions. Output arguments aren't useful here. The special methods are useful for binary ufuncs only. For #2, just returning a callable from __ufunc__ would be fine. I'd suggest two levels of an informal ufunc interface corresponding to these two uses. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From Chris.Barker at noaa.gov Wed Apr 6 15:49:44 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 15:49:44 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: References: Message-ID: <42546709.1050600@noaa.gov> David M. Cooke wrote: > I've always found the Numeric setup.py to be not very user-friendly. > So, I rewrote it. It's available as patch #1178095 > http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 From that file: # If use_system_lapack is false, f2c'd versions of the required routines # will be used, except on Mac OS X, where the vecLib framework will be used # if found. Just to be clear, this does mean that vecLib will be used by default on OS-X? Very nice, setup.py has annoyed me too. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Apr 6 15:51:17 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 15:51:17 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: References: Message-ID: <42546766.5060802@noaa.gov> Hi all, (but mostly Travis), I've taken a look at: http://numeric.scipy.org/array_interface.html) to try and see how I would use this with wxPython. I have a few questions, and a little code I'd like you to look at to see if I understand how this works. Here's a first stab on how I might use this for the wxPython DrawPointsList method. The method takes a sequence of length-2 sequences of numbers, and draws a point at each point described by coordinates in the data: [(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints) Here's what I have: def DrawPointList(self, points, pens=None): ... # some checking code on the pens) ... if (hasattr(points,'__array_shape__') and hasattr(points,'__array_typestr__') and len(points.__array_shape__) == 2 and points.__array_shape__[1] == 2 and points.__array_typestr__ == 'i4' and ): # this means we have a compliant array # return the array protocol version return self._DrawPointArray(points.__array_data__, pens,[]) #This needs to be written now! else: #return the generic python sequence version return self._DrawPointList(points, pens, []) Then we'll need a function (in C++): _DrawPointArray(points.__array_data__, pens,[]) That takes a buffer object, and does the drawing. My questions: 1) Is this what you had in mind for how to use this? 2) As __array_strides__ is optional, I'd kind of like to have a __contiguous__ flag that I could just check, rather than checking for the existence of strides, then calculating what the strides should be, then checking them. 3) A number of the attributes are optional, but will always be there with SciPy arrays..(I assume) have you documented them anywhere? 4) a wxWidgets wxPoint is defined as such: class WXDLLEXPORT wxPoint { public: int x, y; etc. As wxWidgets is using "int", I"d like to be able to use "int". If I define it as a 4 byte integer, I'm losing platform independence, aren't I? Or can I use something like sizeof(int) ? 5) Why is: __array_data__ optional? Isn't that the whole point of this? 6) Should __array_offset__ be optional? I'd rather it were required, but default to zero. This way I have to check for it, then use it. Also, I assume it is an integer number of bytes, is that right? 7) An alternative to the above: A __simple_ flag, that means the data is a simple, C array of contiguous data of a single type. The most common use, and it would be nice to just check that flag and not have to take all other options into account. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From efiring at hawaii.edu Wed Apr 6 15:53:05 2005 From: efiring at hawaii.edu (Eric Firing) Date: Wed Apr 6 15:53:05 2005 Subject: [Numpy-discussion] masked arrays and NaNs Message-ID: <425467BB.305@hawaii.edu> Travis, I am whole-heartedly in favor of your efforts to end the Numeric/numarray split by combining the best of both. I am encouraged by the progress you have made, and by the depth and clarity of the accompanying technical discussions. Thank you! I am a long-time Matlab user in Physical Oceanography, and I have been trying to find a practical way to phase out Matlab. One key is matplotlib, which is coming along wonderfully. A second is the availability of a Num* (or scipy.base) module that provides the functionality and ease-of-use I presently get from Matlab. This leads to a request which I suspect and hope is consistent with your present plans: efficient handling of NaNs and/or masked arrays. In Physical Oceanography, and I suspect in many other fields, data sets are almost always full of holes. Matlab's ability to use NaN as a bad value flag provides a wonderfully simple and efficient way of dealing with missing or bad data values. A similar ease and transparency would be good in scipy.base. In addition, or as a way of implementing NaN-handling internally, it might be best to have masked arrays incorporated at the C level--with the functionality available by default--rather than bolted on as a pure-python package. I hope that inclusion of __array_mask__ in the protocol means that this is part of the plan. Eric From Chris.Barker at noaa.gov Wed Apr 6 16:00:09 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 16:00:09 2005 Subject: [Numpy-discussion] Numeric3 - a Windows Problem In-Reply-To: <42546439.5060301@noaa.gov> References: <424FE002.6010800@sympatico.ca> <424FE8E7.4040904@ee.byu.edu> <4253B691.5030902@sympatico.ca> <4253C73E.4030703@ims.u-tokyo.ac.jp> <42546439.5060301@noaa.gov> Message-ID: <425469AA.2030703@noaa.gov> Chris Barker wrote: > there is not a single large package OOPS. There IS a single large package. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Wed Apr 6 16:13:08 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 16:13:08 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: References: <425458F7.9020307@ee.byu.edu> Message-ID: <42546CC7.40408@ee.byu.edu> David M. Cooke wrote: >Travis Oliphant writes: > > > >>David M. Cooke wrote: >> >> >> >>>I've always found the Numeric setup.py to be not very user-friendly. >>>So, I rewrote it. It's available as patch #1178095 >>>http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 >>> >>>Basically, all the editing you need to do is in customize.py, instead >>>of touching setup.py. No more commenting out files for lapack_lite >>>(just tell it to use the system LAPACK, and tell it where to find it). >>> >>>Also, you could now use GSL's cblas interface for dotblas. Useful if >>>you've already taken the trouble to link that with an optimized >>>Fortran BLAS. >>> >>>I didn't want to just through this into CVS without feedback first :-) >>>If it looks good, this can go in Numeric 24.0. >>> >>> >>> >>I like the new changes. I also think the setup.py file is unfriendly. >>Put them in... >> >> > >While I'm at it, I'm also thinking of writing a 'cblas_lite' for >dotblas. This would mean that dotblas would be enabled all the time. >You could use a C BLAS if you've got one (from ATLAS, say), or a >Fortran BLAS (like the cxml library on an Alpha running Tru64), or it >would use the existing blas_lite.c if you don't. > > > This is a good idea, but for more than just dotblas. It is the essential problem that must be solved to make scipy.base installable everywhere yet use fast libraries for users who have them without much fuss. -Travis From rkern at ucsd.edu Wed Apr 6 16:28:40 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Apr 6 16:28:40 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: <42546709.1050600@noaa.gov> References: <42546709.1050600@noaa.gov> Message-ID: <42547060.30204@ucsd.edu> Chris Barker wrote: > > > David M. Cooke wrote: > >> I've always found the Numeric setup.py to be not very user-friendly. >> So, I rewrote it. It's available as patch #1178095 >> http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 >> > > > From that file: > > # If use_system_lapack is false, f2c'd versions of the required routines > # will be used, except on Mac OS X, where the vecLib framework will be used > # if found. > > Just to be clear, this does mean that vecLib will be used by default on > OS-X? I haven't tried it, yet, but my examination of it suggests that this is so. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Wed Apr 6 16:59:05 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 16:59:05 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <42546766.5060802@noaa.gov> References: <42546766.5060802@noaa.gov> Message-ID: <4254778A.1070100@ee.byu.edu> Chris Barker wrote: > Hi all, (but mostly Travis), > > I've taken a look at: > > http://numeric.scipy.org/array_interface.html) > > to try and see how I would use this with wxPython. I have a few > questions, and a little code I'd like you to look at to see if I > understand how this works. Great, fantastic!!! > > Here's a first stab on how I might use this for the wxPython > DrawPointsList method. The method takes a sequence of length-2 > sequences of numbers, and draws a point at each point described by > coordinates in the data: > > [(x,y), (x2,y2), (x3,y3), ...] (or a NX2 NumPy array of Ints) > > Here's what I have: > > def DrawPointList(self, points, pens=None): > ... > # some checking code on the pens) > ... > if (hasattr(points,'__array_shape__') and > hasattr(points,'__array_typestr__') and > len(points.__array_shape__) == 2 and > points.__array_shape__[1] == 2 and > points.__array_typestr__ == 'i4' and > ): # this means we have a compliant array > # return the array protocol version You should account for the '<' or '>' that might be present in __array_typestr__ (Numeric won't put it there, but scipy.base and numarray will---since they can have byteswapped arrays internally). A more generic interface would handle multiple integer types if possible (but this is a good start...) > return self._DrawPointArray(points.__array_data__, pens,[]) > #This needs to be written now! > else: > #return the generic python sequence version > return self._DrawPointList(points, pens, []) > > Then we'll need a function (in C++): > _DrawPointArray(points.__array_data__, pens,[]) > That takes a buffer object, and does the drawing. > > My questions: > > 1) Is this what you had in mind for how to use this? Yes, pretty much. > > 2) As __array_strides__ is optional, I'd kind of like to have a > __contiguous__ flag that I could just check, rather than checking for > the existence of strides, then calculating what the strides should be, > then checking them. I don't want to add too much. The other approach is to establish a set of helper functions in Python to check this sort of thing: Thus, if you can't handle a general array you check: ndarray.iscontiguous(obj) where obj exports the array interface. But, it could really go either way. What do others think? I think one idea here is that if __array_strides__ returns None, then C-style contiguousness is assumed. In fact, I like that idea so much that I just changed the interface. Thanks for the suggestion. > > 3) A number of the attributes are optional, but will always be there > with SciPy arrays..(I assume) have you documented them anywhere? No, they won't always be there for SciPy arrays (currently 4 of them are). Only record-arrays will provide __array_descr__ for example and __array_offset__ is unnecessary for SciPy arrays. I actually don't much like the __array_offset__ parameter myself, but Scott convinced me that it would could be useful for very complicated array classes. > > 4) a wxWidgets wxPoint is defined as such: > > class WXDLLEXPORT wxPoint > { > public: > int x, y; > > etc. > > As wxWidgets is using "int", I"d like to be able to use "int". If I > define it as a 4 byte integer, I'm losing platform independence, > aren't I? Or can I use something like sizeof(int) ? Ah, yes.. here is where we need some standard Python functions to help establish the array interface. Sometimes you want to match a particular c-type, other times you want to match a particular bit width. So, what do you do? I had considered having an additional interface called ctypestr but decided against it for fear of creep. I think in general we need to have in Python some constants to make this conversion easy e.g. ndarray.cint (gives 'iX' on the correct platform). For now, I would check (__array_typestr__ == 'i%d' % array.array('i',[0]).itemsize) But, on most platforms these days an int is 4 bytes, but the about would be just to make sure. > > 5) Why is: __array_data__ optional? Isn't that the whole point of this? Because the object itself might expose the buffer interface. We could make __array_data__ required and prefer that it return a buffer object. But, really all that is needed is something that exposes the buffer interface: remember the difference between the buffer object and the buffer interface. So, the correct consumer usage for grabbing the data is data = getattr(obj, '__array_data__', obj) Then, in C you use the Buffer *Protocol* to get a pointer to memory. For example, the function: int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int *buffer_len) Of course this approach has the 32-bit limit until we get this changed in Python. > > 6) Should __array_offset__ be optional? I'd rather it were required, > but default to zero. This way I have to check for it, then use it. > Also, I assume it is an integer number of bytes, is that right? A consumer has to check for most of the optional stuff if they want to support all types of arrays. Again a simple: getattr(obj, '__array_offset__', 0) works fine. > > 7) An alternative to the above: A __simple_ flag, that means the data > is a simple, C array of contiguous data of a single type. The most > common use, and it would be nice to just check that flag and not have > to take all other options into account. I think if __array_strides__ returns None (and if an object doesn't expose it you can assume it) it is probably good enough. -Travis From oliphant at ee.byu.edu Wed Apr 6 17:17:13 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 17:17:13 2005 Subject: [Numpy-discussion] masked arrays and NaNs In-Reply-To: <425467BB.305@hawaii.edu> References: <425467BB.305@hawaii.edu> Message-ID: <42547B2B.4030700@ee.byu.edu> Eric Firing wrote: > Travis, > > I am whole-heartedly in favor of your efforts to end the > Numeric/numarray split by combining the best of both. I am encouraged > by the progress you have made, and by the depth and clarity of the > accompanying technical discussions. Thank you! > > I am a long-time Matlab user in Physical Oceanography, and I have been > trying to find a practical way to phase out Matlab. One key is > matplotlib, which is coming along wonderfully. A second is the > availability of a Num* (or scipy.base) module that provides the > functionality and ease-of-use I presently get from Matlab. This leads > to a request which I suspect and hope is consistent with your present > plans: efficient handling of NaNs and/or masked arrays. I think both options will be available. With the new error handling numarray showed nans will be allowed if you set the error mode correctly. A verson of masked arrays will also be available (either in python or C). -Travis From cookedm at physics.mcmaster.ca Wed Apr 6 17:18:51 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 17:18:51 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: (David M. Cooke's message of "Wed, 06 Apr 2005 17:41:50 -0400") References: Message-ID: cookedm at physics.mcmaster.ca (David M. Cooke) writes: > I've always found the Numeric setup.py to be not very user-friendly. > So, I rewrote it. It's available as patch #1178095 > http://sf.net/tracker/index.php?func=detail&aid=1178095&group_id=1369&atid=301369 > > Basically, all the editing you need to do is in customize.py, instead > of touching setup.py. No more commenting out files for lapack_lite > (just tell it to use the system LAPACK, and tell it where to find it). > > Also, you could now use GSL's cblas interface for dotblas. Useful if > you've already taken the trouble to link that with an optimized > Fortran BLAS. > > I didn't want to just through this into CVS without feedback first :-) > If it looks good, this can go in Numeric 24.0. I've checked it in. Highlights: * You only need to edit customize.py * You don't need to edit if you're on OS X (>= 10.2): the vecLib framework for optimized BLAS and LAPACK will be used if found. * If you have an incomplete ATLAS library (one without LAPACK), you can use it for BLAS (instead of blas_lite.c), and the included f2c'd routines for LAPACK will be used. * Use whatever CBLAS interface you've got (ATLAS, GSL, the reference one available from netlib). There's also an INSTALL file now, although it could some comments about the 'python setup.py config' option. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Wed Apr 6 18:14:33 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 18:14:33 2005 Subject: [Numpy-discussion] New array interface helper file Message-ID: <4254890F.6080205@ee.byu.edu> At http://numeric.scipy.org/array_interface.py you will find the start of a set of helper functions for the array interface that can make it more easy to deal with. It also documents the array interface with docstrings. I tried to attach these to properties, but then I don't know how to "see" them from Python. This is the kind of thing I think should go into Python If anybody would like to try their hand at converter functions to go back and forth between the struct module strings and the __array_descr__ string, make my day. -Travis From cookedm at physics.mcmaster.ca Wed Apr 6 21:41:12 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 6 21:41:12 2005 Subject: [Numpy-discussion] Request for comments on a new setup.py for Numeric In-Reply-To: <42546CC7.40408@ee.byu.edu> (Travis Oliphant's message of "Wed, 06 Apr 2005 17:12:07 -0600") References: <425458F7.9020307@ee.byu.edu> <42546CC7.40408@ee.byu.edu> Message-ID: Travis Oliphant writes: > David M. Cooke wrote: >>While I'm at it, I'm also thinking of writing a 'cblas_lite' for >>dotblas. This would mean that dotblas would be enabled all the time. >>You could use a C BLAS if you've got one (from ATLAS, say), or a >>Fortran BLAS (like the cxml library on an Alpha running Tru64), or it >>would use the existing blas_lite.c if you don't. >> > This is a good idea, but for more than just dotblas. Hmm, like for what? dotblas is the only thing (in Numeric & numarray) that uses the cblas_* functions. Unless you're thinking of using them in more places, like ufuncs? cblas_lite would be thin shims with minimal error-checking, probably not much use outside of dotblas. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From rkern at ucsd.edu Wed Apr 6 21:47:30 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Apr 6 21:47:30 2005 Subject: [Numpy-discussion] New array interface helper file In-Reply-To: <4254890F.6080205@ee.byu.edu> References: <4254890F.6080205@ee.byu.edu> Message-ID: <4254BB2B.2000406@ucsd.edu> Travis Oliphant wrote: > > At http://numeric.scipy.org/array_interface.py > > you will find the start of a set of helper functions for the array > interface that can make it more easy to deal with. It also documents > the array interface with docstrings. I tried to attach these to > properties, but then I don't know how to "see" them from Python. Get it from the property object on the class itself. E.g. expanded.__array_shape__.__doc__ -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Wed Apr 6 22:13:04 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 6 22:13:04 2005 Subject: [Numpy-discussion] New array interface helper file In-Reply-To: <4254BB2B.2000406@ucsd.edu> References: <4254890F.6080205@ee.byu.edu> <4254BB2B.2000406@ucsd.edu> Message-ID: <4254C141.9040502@ee.byu.edu> Robert Kern wrote: > Travis Oliphant wrote: > >> >> At http://numeric.scipy.org/array_interface.py >> >> you will find the start of a set of helper functions for the array >> interface that can make it more easy to deal with. It also >> documents the array interface with docstrings. I tried to attach >> these to properties, but then I don't know how to "see" them from >> Python. > > > Get it from the property object on the class itself. > E.g. > > expanded.__array_shape__.__doc__ > Thank you. -Travis From Chris.Barker at noaa.gov Wed Apr 6 23:36:36 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Apr 6 23:36:36 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254778A.1070100@ee.byu.edu> References: <42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu> Message-ID: <4254D4A8.5020007@noaa.gov> Travis Oliphant wrote: > You should account for the '<' or '>' that might be present in > __array_typestr__ (Numeric won't put it there, but scipy.base and > numarray will---since they can have byteswapped arrays internally). Good point, but a pain. Maybe they should be required, that way I don't have to first check for the presence of '<' or '>', then check if they have the right value. > A more generic interface would handle multiple integer types if possible I'd like to support doubles as well... > (but this is a good start...) Right. I want to get _something_ working, before I try to make it universal! > I think one idea here is that if __array_strides__ returns None, then > C-style contiguousness is assumed. In fact, I like that idea so much > that I just changed the interface. Thanks for the suggestion. You're welcome. I like that too. > No, they won't always be there for SciPy arrays (currently 4 of them > are). Only record-arrays will provide __array_descr__ for example and > __array_offset__ is unnecessary for SciPy arrays. I actually don't much > like the __array_offset__ parameter myself, but Scott convinced me that > it would could be useful for very complicated array classes. I can see that it would, but then, we're stuck with checking for all these optional attributes. If I don't bother to check for it, one day, someone is going to pass a weird array in with an offset, and a strange bug will show up. > e.g. ndarray.cint (gives 'iX' on the correct platform). > For now, I would check (__array_typestr__ == 'i%d' % > array.array('i',[0]).itemsize) I can see that that would work, but it does feel like a hack. BEsides, I might be doign this in C++ anyway, so it would probably be easier to use sizeof() > But, on most platforms these days an int is 4 bytes, but the about would > be just to make sure. Right. Making that assumption will jsut lead to weird bugs way don't he line. Of course, I wouldn't be surprised if wxWidgets and/or python makes that assumption in other places anyway! >> 5) Why is: __array_data__ optional? Isn't that the whole point of this? > > Because the object itself might expose the buffer interface. We could > make __array_data__ required and prefer that it return a buffer object. Couldn't it be required, and return a reference to itself if that works? Maybe I'm just being lazy, but it feels clunky and prone to errors to keep having to check if a attribute exists, then use it (or not). > So, the correct consumer usage for grabbing the data is > > data = getattr(obj, '__array_data__', obj) Ah! I hadn't noticed the default parameter to getattr(). That makes it much easier. Is there an equivalent in C? It doesn't look like it to me, but I'm kind of a newbie with the C API. > int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int > *buffer_len) I'm starting to get this. > Of course this approach has the 32-bit limit until we get this changed > in Python. That's the least of my worries! >> 6) Should __array_offset__ be optional? I'd rather it were required, >> but default to zero. This way I have to check for it, then use it. >> Also, I assume it is an integer number of bytes, is that right? > > A consumer has to check for most of the optional stuff if they want to > support all types of arrays. That's not quite true. I'm happy to support only the simple types of arrays (contiguous, single type elements, zero offset(, but I have to check all that stuff to make sure that I have a simple array. The simplest arrays are the most common case, they should be as easy as possible to support. > Again a simple: > > getattr(obj, '__array_offset__', 0) > > works fine. not too bad. Also, what if we find the need for another optional attribute later? Any older code won't check for it. Or maybe I'm being paranoid.... >> 7) An alternative to the above: A __simple_ flag, that means the data >> is a simple, C array of contiguous data of a single type. The most >> common use, and it would be nice to just check that flag and not have >> to take all other options into account. > I think if __array_strides__ returns None (and if an object doesn't > expose it you can assume it) it is probably good enough. That and __array_typestr__ Travis Oliphant wrote: > > At http://numeric.scipy.org/array_interface.py > > you will find the start of a set of helper functions for the array > interface that can make it more easy to deal with. Ah! this may well address my concerns. Good idea. Thanks for all your work on this Travis. By the way, a quote form Robin Dunn about this: "Sweet!" Thought you might appreciate that. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From konrad.hinsen at laposte.net Wed Apr 6 23:55:02 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Apr 6 23:55:02 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> Message-ID: <2701da761c9f34fc1dc72fc97e87e788@laposte.net> On 07.04.2005, at 00:43, David M. Cooke wrote: > I like this! It's got namespace goodness all over it (last Python zen > line in 'import this': Namespaces are one honking great idea -- let's > do more of those!) Sounds like a good principle! > 1) arrays. Here, we want efficient computation of functions applied to > lots of elements. That's where the output arguments and special > methods (.reduce, .accumulate, and .outer) are useful All that is accessible if the class gets passed the ufunc object. > 2) polymorphic functions. Output arguments aren't useful here. The > special methods are useful for binary ufuncs only. Fine, then they just call the ufunc. And the rare cases that need explicit code for each ufunc (my Derivatives, for example) can retrieve the name of the ufunc and dispatch on it. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Thu Apr 7 00:24:04 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Apr 7 00:24:04 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: Message-ID: <1986f60349f1d4d146c6ddb727362fd9@laposte.net> On 06.04.2005, at 18:06, S?bastien de Menten wrote: > Do you think it is possible to integrate a similar mechanism in array > functions (like searchsorted, argmax, ...). That is less obvious. A generic interface for ufuncs is possible because of the uniform calling interface. Actually, there should perhaps be two ufunc application methods, for unary and for binary ufuncs. The other array functions each have a peculiar calling pattern. They can certainly be implemented through delegation to a method, but that would be one method per function. But I think that is inevitable if you want full flexibility. > If we can register functions taking one array as argument within > scipy.base and let it dispatch those functions as ufunc, we could use > a similar strategy. > > For instance, let "sort" and "argmax" be registered as gfunc (general > functions on an array <> ufunc), then any class that would like to > overide any of them could do it too with the same trick Konrad exposed > here above. Does that make sense in practice? Suppose you write a class that implements tables, i.e. arrays plus axis labels. You would want sort() to return an object of the same class, but argmax() to return a plain integer. The generic gfunc handler could do little else than dispatch on the name of the gfunc. > Konrad, do you think it is tricky to have a prototype of your > suggestion (i.e. the modification does not need a full understanding > of Numeric and you can locate it approximately in the source code) ? I haven't looked at the Numeric code in ages, but my guess is that the ufunc part should be easy to do, as it is just a modification of a generic handler that already exists. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From cookedm at physics.mcmaster.ca Thu Apr 7 00:55:37 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 7 00:55:37 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254D4A8.5020007@noaa.gov> (Chris Barker's message of "Wed, 06 Apr 2005 23:35:20 -0700") References: <42546766.5060802@noaa.gov> <4254778A.1070100@ee.byu.edu> <4254D4A8.5020007@noaa.gov> Message-ID: "Chris Barker" writes: > Travis Oliphant wrote: > >> You should account for the '<' or '>' that might be present in >> __array_typestr__ (Numeric won't put it there, but scipy.base and >> numarray will---since they can have byteswapped arrays internally). > > Good point, but a pain. Maybe they should be required, that way I > don't have to first check for the presence of '<' or '>', then check > if they have the right value. I'll second this. Pulling out more Python Zen: Explicit is better than implicit. >> A more generic interface would handle multiple integer types if >> possible > > I'd like to support doubles as well... > >> (but this is a good start...) > > Right. I want to get _something_ working, before I try to make it universal! > >> I think one idea here is that if __array_strides__ returns None, >> then C-style contiguousness is assumed. In fact, I like that idea >> so much that I just changed the interface. Thanks for the >> suggestion. > > You're welcome. I like that too. > >> No, they won't always be there for SciPy arrays (currently 4 of them >> are). Only record-arrays will provide __array_descr__ for example >> and __array_offset__ is unnecessary for SciPy arrays. I actually >> don't much like the __array_offset__ parameter myself, but Scott >> convinced me that it would could be useful for very complicated >> array classes. > > I can see that it would, but then, we're stuck with checking for all > these optional attributes. If I don't bother to check for it, one day, > someone is going to pass a weird array in with an offset, and a > strange bug will show up. Here's a summary: Attributes required by required array-like object to be checked __array_shape__ yes yes __array_typestr__ yes yes __array_descr__ no no __array_data__ no yes __array_strides__ no yes __array_mask__ no no? __array_offset__ no yes I'm assuming in "required to be checked" column a user of the array that's interested in looking at all of the elements, so we have to consider all possible situations where forgetting to consider an attribute could lead to invalid memory accesses. __array_strides__ and __array_offset__ in particular could be troublesome if forgotten. The __array_mask__ element is difficult: for most applications, you should check it, and raise an error if exists and is not None, unless you can handle missing elements. It's certainly not required that all users of an array object need to understand all array types! Since we have to check a bunch anyways, I think that's a good enough reason for having them to exist? There are suitable defaults defined in the protocol document (__array_strides__ in particular) that make it easy to add them in simple cases. >> So, the correct consumer usage for grabbing the data is >> data = getattr(obj, '__array_data__', obj) > > Ah! I hadn't noticed the default parameter to getattr(). That makes it > much easier. Is there an equivalent in C? It doesn't look like it to > me, but I'm kind of a newbie with the C API. You'd want something like adata = PyObject_GetAttrString(array_obj, "__attr_data__"); if (!adata) { /* error */ PyErr_Clear(); adata = array_obj; } >> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int >> *buffer_len) > > I'm starting to get this. > >> Of course this approach has the 32-bit limit until we get this >> changed in Python. > > That's the least of my worries! > >>> 6) Should __array_offset__ be optional? I'd rather it were >>> required, but default to zero. This way I have to check for it, >>> then use it. Also, I assume it is an integer number of bytes, is >>> that right? >> A consumer has to check for most of the optional stuff if they want >> to support all types of arrays. > > That's not quite true. I'm happy to support only the simple types of > arrays (contiguous, single type elements, zero offset(, but I have to > check all that stuff to make sure that I have a simple array. The > simplest arrays are the most common case, they should be as easy as > possible to support. > >> Again a simple: >> getattr(obj, '__array_offset__', 0) >> works fine. > > not too bad. > > Also, what if we find the need for another optional attribute later? > Any older code won't check for it. Or maybe I'm being paranoid.... This is a good point; all good protocols embed a version somewhere. Not doing it now could lead to grief/pain later. I'd suggest adding to __array_data__: If __array_data__ is None, then the array is implementing a newer version of the interface, and you'd either need to support that (maybe the new version uses __array_data2__ or something), or use the sequence protocol on the original object. The sequence protocol should definitely be safe all the time, whereas the buffer protocol may not. (Put it this way: I understand the sequence protocol well, but not the buffer one :-) That would also be a good argument for it existing, I think. Alternatively, we could add an __array_version__ attribute (required to exist, required to check) which is set to 1 for this protocol. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From magnus at hetland.org Thu Apr 7 01:05:03 2005 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 7 01:05:03 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> Message-ID: <20050407080429.GB20252@idi.ntnu.no> Bruce Southey : > > Hi, > I don't see that it is feasible to link R and numerical python in this > way. As you point out, R objects (R is an object orientated language) > uses a lot of meta-data. Then there is the IEEE stuff (NaN etc) that > would also need to be handled in numerical python. Too bad. (I seem to recall seing somehthing about numpy conversion on the Web pages of RPy, though; perhaps, if one can stand a bit of copying, the two can be used together after all?) > You probably could get RPy or RSPython to use numerical python rather > than just baisc Python. > > What statistical functions would you want in numerical python? I think I'd want most of the standard, parametrized probability distributions (as well as automatic estimation from data, perhaps) and a handful of common statistical tests (t-test, z-test, Fishcher, chi-squared, what-have-you). Perhaps some support for factorial experiments (not sure if R has anything specific there, though). And another thing: R seems to have vary fancy (although difficult to use) plotting capabilities... Until SciPy catches up (it hasn't yet, has it? ;) that might be a reason for using R(Py) as well, I guess. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb] From cookedm at physics.mcmaster.ca Thu Apr 7 01:08:11 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 7 01:08:11 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <2701da761c9f34fc1dc72fc97e87e788@laposte.net> (konrad hinsen's message of "Thu, 7 Apr 2005 08:53:06 +0200") References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> Message-ID: konrad.hinsen at laposte.net writes: > On 07.04.2005, at 00:43, David M. Cooke wrote: > >> I like this! It's got namespace goodness all over it (last Python zen >> line in 'import this': Namespaces are one honking great idea -- let's >> do more of those!) > > Sounds like a good principle! > >> 1) arrays. Here, we want efficient computation of functions applied to >> lots of elements. That's where the output arguments and special >> methods (.reduce, .accumulate, and .outer) are useful > > All that is accessible if the class gets passed the ufunc object. > >> 2) polymorphic functions. Output arguments aren't useful here. The >> special methods are useful for binary ufuncs only. > > Fine, then they just call the ufunc. And the rare cases that need > explicit code for each ufunc (my Derivatives, for example) can > retrieve the name of the ufunc and dispatch on it. Hmm, I had misread your previous code. Here it is again, made more specific, and I'll assume this function lives in the ndarray package (as there is more than one package that defines ufuncs) def cos(obj): if ndarray.isarray(obj): return ndarray.array_cos(obj) else: try: return obj.__ufunc__(cos) except AttributeError: if ndarray.is_array_like(obj): a = ndarray.array(obj) return ndarray.array_cos(a) else: raise ValueError The thing is obj.__ufunc__ must understand about the *particular* object cos: the ndarray one. I was thinking more along the lines of obj.__ufunc__('cos'), where the name is passed instead. For binary ufuncs, you could use (with arguments obj1 and obj2), obj1.__ufunc__('add', obj2) Output argument (obj3): obj1.__ufunc__('add', obj2, obj3) Special methods: obj1.__ufunc__('add.reduce') obj1.__ufunc__('add.accumulate') obj1.__ufunc__('add.outer', obj2) Basically, special methods are just another ufunc. This suggests that add.outer should optionally take an output argument... Alternatively, __ufunc__ could be an object of implemented ufuncs: obj.__ufunc__.cos() obj1.__ufunc__.add(obj2) obj1.__ufunc__.add(obj2, obj3) obj1.__ufunc__.add.reduce() obj1.__ufunc__.add.accumulate() obj1.__ufunc__.add.outer(obj2) It depends where you want to do the dispatch. I think this version is better: it's easier to discover what __ufunc__'s are supported with generic tools (IPython tab completion, pydoc, etc.). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From konrad.hinsen at laposte.net Thu Apr 7 01:34:37 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Apr 7 01:34:37 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> Message-ID: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net> On Apr 7, 2005, at 10:06, David M. Cooke wrote: > Hmm, I had misread your previous code. Here it is again, made more > specific, and I'll assume this function lives in the ndarray package > (as there is more than one package that defines ufuncs) At the moment, there is one in Numeric and one in numarray. The Python API of both is nearly or fully identical. > The thing is obj.__ufunc__ must understand about the *particular* > object cos: the ndarray one. I was thinking more along the lines of No, it must only know the interface. In most cases, it would do something like class MyArray: def __ufunc__(self, ufunc): return MyArray(apply(ufunc, self.data)) > obj.__ufunc__('cos'), where the name is passed instead. That's also an interesting option. It would require the implementing class to choose an appropriate function from an appropriate module. Alternatively, it would work if ufuncs were also accessible as methods on array objects. > For binary ufuncs, you could use (with arguments obj1 and obj2), > obj1.__ufunc__('add', obj2) Except that it would perhaps be better to have a different method, as otherwise nearly every implementation would have to start with a condition test to distinguish unary from binary ufuncs. > Output argument (obj3): obj1.__ufunc__('add', obj2, obj3) > Special methods: > obj1.__ufunc__('add.reduce') > obj1.__ufunc__('add.accumulate') > obj1.__ufunc__('add.outer', obj2) > > Basically, special methods are just another ufunc. This suggests that > add.outer should optionally take an output argument... But they are not just another ufunc, because a standard unary ufunc always returns an array of the same shape as its argument. I'd probably prefer a few explicit methods: object.__unary__(cos) object.__binary__(add, other) object.__binary_reduce__(add) etc. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From Sebastien.deMentendeHorne at electrabel.com Thu Apr 7 02:26:28 2005 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Thu Apr 7 02:26:28 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be> > > On Apr 7, 2005, at 10:06, David M. Cooke wrote: > > > Hmm, I had misread your previous code. Here it is again, made more > > specific, and I'll assume this function lives in the ndarray package > > (as there is more than one package that defines ufuncs) > > At the moment, there is one in Numeric and one in numarray. > The Python > API of both is nearly or fully identical. > > > The thing is obj.__ufunc__ must understand about the *particular* > > object cos: the ndarray one. I was thinking more along the lines of > > No, it must only know the interface. In most cases, it would do > something like > > class MyArray: > def __ufunc__(self, ufunc): > return MyArray(apply(ufunc, self.data)) Exactly ! I see this as a very common use (masked arrays and all the other examples could live with that). Or more precisely (just to be explicity as the previous MyArray example is the simplest (purest) one), class MyArray: def __ufunc__(self, ufunc): metadata= process(self.metadata, ufunc) data = apply(ufunc, self.data) return MyArray(data, metadata) Or variations on this same theme. BTW, looking at Numeric3, the presence of a __mask_array__ in the array protocol looks like we want to add a specific case of "augmented array" to the core protocol. Hmmm, rather prefer to build a more generic mechanism as well as a clean interface for interacting with "augmented array". > > > obj.__ufunc__('cos'), where the name is passed instead. > > That's also an interesting option. It would require the implementing > class to choose an appropriate function from an appropriate module. > Alternatively, it would work if ufuncs were also accessible > as methods > on array objects. > Why not have the ability to ask the name of an ufunc to be able to dispatch on it ? > > For binary ufuncs, you could use (with arguments obj1 and obj2), > > obj1.__ufunc__('add', obj2) > > Except that it would perhaps be better to have a different method, as > otherwise nearly every implementation would have to start with a > condition test to distinguish unary from binary ufuncs. > > > Output argument (obj3): obj1.__ufunc__('add', obj2, obj3) > > Special methods: > > obj1.__ufunc__('add.reduce') > > obj1.__ufunc__('add.accumulate') > > obj1.__ufunc__('add.outer', obj2) > > > > Basically, special methods are just another ufunc. This > suggests that > > add.outer should optionally take an output argument... > > But they are not just another ufunc, because a standard unary ufunc > always returns an array of the same shape as its argument. > > I'd probably prefer a few explicit methods: > > object.__unary__(cos) > object.__binary__(add, other) > object.__binary_reduce__(add) > What about : object.__unary__(cos, mode = "reduce") object.__binary__(cos, other, mode = "reduce") or object.__unary__(cos.reduce) object.__binary__(cos.apply, other) or object.__binary__(cos.__call__, other) with the ability to ask to the first argument its type (with cos.mode or cos.reduce.mode ...) However, for binary operations, how it the call dispatched if one of the operand is of a type while the other is another type ? This problem is related to multimethods http://www.artima.com/weblogs/viewpost.jsp?thread=101605 ======================================================= This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. http://www.electrabel.be/homepage/general/disclaimer_EN.asp ======================================================= From konrad.hinsen at laposte.net Thu Apr 7 02:42:07 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Apr 7 02:42:07 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be> References: <6E48F3D185CF644788F55917A0D50A9314A9AA@seebex02.eib.electrabel.be> Message-ID: On Apr 7, 2005, at 11:25, Sebastien.deMentendeHorne at electrabel.com wrote: > Why not have the ability to ask the name of an ufunc to be able to > dispatch on it ? That's already possible. > What about : > > object.__unary__(cos, mode = "reduce") > object.__binary__(cos, other, mode = "reduce") What does "reduce" mode mean for cos? What does a binary ufunc in reduce mode do with its second argument? > However, for binary operations, how it the call dispatched if one of > the operand is of a type while the other is another type ? This > problem is related to multimethods > http://www.artima.com/weblogs/viewpost.jsp?thread=101605 No need to be innovative: Python always dispatches on the first argument, and everybody is familiar with that approach even though it isn't perfect. If Python 3000 has multimethods, we can still adapt. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr --------------------------------------------------------------------- From Sebastien.deMentendeHorne at electrabel.com Thu Apr 7 02:54:57 2005 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Thu Apr 7 02:54:57 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) Message-ID: <6E48F3D185CF644788F55917A0D50A9314A9AB@seebex02.eib.electrabel.be> > > Why not have the ability to ask the name of an ufunc to be able to > > dispatch on it ? > > That's already possible. > > > What about : > > > > object.__unary__(cos, mode = "reduce") > > object.__binary__(cos, other, mode = "reduce") > > What does "reduce" mode mean for cos? > What does a binary ufunc in reduce mode do with its second argument? raise a ValueError :-) It was an example of a way to pass argument, the focus was on cos.reduce or "cos.reduce" or cos, "reduce". > > However, for binary operations, how it the call dispatched > if one of > > the operand is of a type while the other is another type ? This > > problem is related to multimethods > > http://www.artima.com/weblogs/viewpost.jsp?thread=101605 > > No need to be innovative: Python always dispatches on the first > argument, and everybody is familiar with that approach even though it > isn't perfect. If Python 3000 has multimethods, we can still adapt. The problematic is related to multimethods, the implementation should not be specially related. In an a call like object.__binary__(add, other), if other is not of the same type of object, the latter could throw an exception as ImplementationError to give the hand to other.__binary__(add, binary) or to other.__binary__(radd, binary) or similar (i.e. those expressions may not make sense but the idea is to have a convention to give the hand to the other operand, python does this already when one overloads an operator like __add__ (__radd__)). So if we can keep this same protocol for binary ufunc, that would be great. Otherwise, I think it is not that a big deal. Sebastien ======================================================= This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from Electrabel may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. http://www.electrabel.be/homepage/general/disclaimer_EN.asp ======================================================= From xscottg at yahoo.com Thu Apr 7 04:35:49 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 04:35:49 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254778A.1070100@ee.byu.edu> Message-ID: <20050407113421.49329.qmail@web50202.mail.yahoo.com> --- Travis Oliphant wrote: > > > > 2) As __array_strides__ is optional, I'd kind of like to have a > > __contiguous__ flag that I could just check, rather than checking for > > the existence of strides, then calculating what the strides should be, > > then checking them. > > > I don't want to add too much. The other approach is to establish a set > of helper functions in Python to check this sort of thing: Thus, if > you can't handle a general array you check: > > ndarray.iscontiguous(obj) > > where obj exports the array interface. > > But, it could really go either way. What do others think? > I think this should definitely be done in the helper functions. Having extra attributes encode redundant information is a recipe for trouble. Cheers, -Scott From xscottg at yahoo.com Thu Apr 7 04:43:37 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 04:43:37 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4254D4A8.5020007@noaa.gov> Message-ID: <20050407114157.23887.qmail@web50209.mail.yahoo.com> --- Chris Barker wrote: > > I can see that it would, but then, we're stuck with checking for all > these optional attributes. If I don't bother to check for it, one day, > someone is going to pass a weird array in with an offset, and a strange > bug will show up. > Everyone seems to think that an offset is so weird. I haven't looked at the internals of Numeric/scipy.base in a while so maybe it doesn't apply there. However, if you subscript an array and return a view to the data, you need an offset or you need to create a new buffer that encodes the offset for you. A = reshape(arange(9), (3,3)) 0, 1, 2 3, 4, 5 6, 7, 8 B = A[2] # create a view into A 6, 7, 8 # Shared with the data above Unless you're going to create a new buffer (which I guess is what Numeric is doing), the offset for B would be 6 in this very simple case. I think specifying the offset is much more elegant than creating a new buffer object with a hidden offset that refers to the old buffer object. I guess all I'm saying is that I wouldn't assume the offset is zero... > > Couldn't it be required, and return a reference to itself if that works? > > Maybe I'm just being lazy, but it feels clunky and prone to errors to > keep having to check if a attribute exists, then use it (or not). > The problem is that you aren't being lazy enough. :-) The fact that a lot of these attributes are optional should be hidden in helper functions like those in Travis's array_interface.py module, or a C/C++ include file (with inline functions). In a short while, you shouldn't have to check any __array_metadata__ attributes directly. There should even be a helper function for getting the array elements. It wouldn't be a horrible mistake to have all the attributes be mandatory, but it doesn't get array consumes any benefit that they can't get from a well written helper library, and it does add some burden to array producers. Cheers, -Scott From mrmaple at gmail.com Thu Apr 7 04:44:27 2005 From: mrmaple at gmail.com (James Carroll) Date: Thu Apr 7 04:44:27 2005 Subject: [Numpy-discussion] Re: Questions about the array interface. In-Reply-To: <42546766.5060802@noaa.gov> References: <42546766.5060802@noaa.gov> Message-ID: Hi Chris, Travis, ... Great conversation you've started. I have two questions at the moment... I do love the idea that an abstraction can bring the different but similar num* worlds together. Which sourceforge CVS repository is the interface (and an implementation) show up on first? My guess is numpy/numeric3 I see Travis has been updating it while I sleep. > def DrawPointList(self, points, pens=None): > ... > # some checking code on the pens) > ... > if (hasattr(points,'__array_shape__') and > hasattr(points,'__array_typestr__') and > len(points.__array_shape__) == 2 and > points.__array_shape__[1] == 2 and > points.__array_typestr__ == 'i4' and > ): # this means we have a compliant array > # return the array protocol version > return self._DrawPointArray(points.__array_data__, pens,[]) > #This needs to be written now! This means that whenever you have some complex multivalued multidementional structure with the data you want to plot, you have to reshape it into the above 'compliant' array before passing it on. I'm a newbie, but is this reshape something where the data has to be copied and take up memory twice? If not, then great, you would painlessly reshape into something that had a different set of strides that just accessed the data that complied in the big blob of data. If the reshape is expensive, then maybe we need the array abstraction, and then a second 'thing' that described which parts of the array to use for the sequence of 2-tuples to use for plotting the x,y s of a scatter plot. (or whatever) I do think we can accept more than just i4 for a datatype. Especially since a last-minute cast to i4 in inexpensive for almost every data type. > else: > #return the generic python sequence version > return self._DrawPointList(points, pens, []) > > Then we'll need a function (in C++): > _DrawPointArray(points.__array_data__, pens,[]) Looks great. -Jim From xscottg at yahoo.com Thu Apr 7 04:52:11 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 04:52:11 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: Message-ID: <20050407115141.96479.qmail@web50204.mail.yahoo.com> --- "David M. Cooke" wrote: > > > > Good point, but a pain. Maybe they should be required, that way I > > don't have to first check for the presence of '<' or '>', then check > > if they have the right value. > > I'll second this. Pulling out more Python Zen: Explicit is better than > implicit. > I'll third. > > This is a good point; all good protocols embed a version somewhere. > Not doing it now could lead to grief/pain later. > > I'd suggest adding to __array_data__: If __array_data__ is None, then > the array is implementing a newer version of the interface, and you'd > either need to support that (maybe the new version uses > __array_data2__ or something), or use the sequence protocol on the > original object. The sequence protocol should definitely be safe all > the time, whereas the buffer protocol may not. (Put it this way: I > understand the sequence protocol well, but not the buffer one :-) > > That would also be a good argument for it existing, I think. > > Alternatively, we could add an __array_version__ attribute (required > to exist, required to check) which is set to 1 for this protocol. > I like this, although I think having __array_data__ return None is confusing. I think __array_version__ (or __array_protocol__?) is the better choice. How about have it optional and default to 1? If it's present and greater than 1 then it means there is something new going on... Cheers, -Scott From cjw at sympatico.ca Thu Apr 7 05:57:36 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Apr 7 05:57:36 2005 Subject: [Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3) In-Reply-To: <9d8cfa0b284c9b9be787970030e6b3de@laposte.net> References: <8ae5a9fc6ceed6dd991adfe776d47df4@laposte.net> <2701da761c9f34fc1dc72fc97e87e788@laposte.net> <9d8cfa0b284c9b9be787970030e6b3de@laposte.net> Message-ID: <42552DD2.2040200@sympatico.ca> konrad.hinsen at laposte.net wrote: > On Apr 7, 2005, at 10:06, David M. Cooke wrote: > >> Hmm, I had misread your previous code. Here it is again, made more >> specific, and I'll assume this function lives in the ndarray package >> (as there is more than one package that defines ufuncs) > > > At the moment, there is one in Numeric and one in numarray. The Python > API of both is nearly or fully identical. > >> The thing is obj.__ufunc__ must understand about the *particular* >> object cos: the ndarray one. I was thinking more along the lines of > > > No, it must only know the interface. In most cases, it would do > something like > > class MyArray: > def __ufunc__(self, ufunc): > return MyArray(apply(ufunc, self.data)) > >> obj.__ufunc__('cos'), where the name is passed instead. > > > That's also an interesting option. It would require the implementing > class to choose an appropriate function from an appropriate module. > Alternatively, it would work if ufuncs were also accessible as methods > on array objects. > Yes, perhaps with a slightly different name (say Cos vs cos) to distinguish between methods and functions. Since they don't require arguments, the methods would not require parentheses. Colin W. From bsouthey at gmail.com Thu Apr 7 06:45:32 2005 From: bsouthey at gmail.com (Bruce Southey) Date: Thu Apr 7 06:45:32 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050407080429.GB20252@idi.ntnu.no> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <20050407080429.GB20252@idi.ntnu.no> Message-ID: Hi, > > What statistical functions would you want in numerical python? > > I think I'd want most of the standard, parametrized probability > distributions (as well as automatic estimation from data, perhaps) and > a handful of common statistical tests (t-test, z-test, Fishcher, > chi-squared, what-have-you). Perhaps some support for factorial > experiments (not sure if R has anything specific there, though). Most of this is in SciPy already based Gary's code. I have not looked at it in great detail because is doesn't meet my immediate needs. One of my major needs is to be able to handle missing values. Perhaps one day it will handle that or I will get the time to do so. I have been working on code with another person to do general linear models (along the lines of R's lm function and SAS's glm procedure) that would address factorial and other experimental designs. R just doesn't do enough for me in this aspect. Two real problems are data storage and model declaration. The mixed model component is really only for my area and I want to use symmetric matrices as the requirements of these models grow really fast. I would be willing to try to address and contribute to the statistical needs if people are interested because I prefer a 'pure python' approach. The other way is to directly call some of the R functions from Python since the main core of these functions are written in C and Fortran. > And another thing: R seems to have vary fancy (although difficult to > use) plotting capabilities... Until SciPy catches up (it hasn't yet, > has it? ;) that might be a reason for using R(Py) as well, I guess. > > -- > Magnus Lie Hetland Fall seven times, stand up eight > http://hetland.org [Japanese proverb] > Yeah, S/S+/R provides some nice graphs until you need to change from the defaults. Regards Bruce From Gilles.Simond at obs.unige.ch Thu Apr 7 07:55:08 2005 From: Gilles.Simond at obs.unige.ch (SIMOND Gilles) Date: Thu Apr 7 07:55:08 2005 Subject: [Numpy-discussion] Quite curious behaviour in Numeric Message-ID: <1112885601.15142.53.camel@obssf5> 2.6.8-1-686-smp (dilinger at toaster.hq.voxel.net) (gcc version 3.3.4 (Debian 1:3.3.4-9)) #1 SMP Sat Aug 28 12:51:43 EDT 2004: and python2.3 >>> a=Numeric.ones((2,3),'i') >>> b=Numeric.sum(a)+1 >>> a[1]=b+1 Traceback (most recent call last): File "", line 1, in ? TypeError: Array can not be safely cast to required type >>> a.itemsize() 4 >>> b.itemsize() 4 >>> a.typecode() 'i' and e following works >>> a=Numeric.ones((2,3)) >>> b=Numeric.sum(a)+1 >>> a[1]=b+1 >>> a.itemsize() 4 >>> b.itemsize() 4 >>> a.typecode() 'l' >>> type(1) >>> Numeric.__version__ '23.6' It seems that itemsize() does not return the correct value which should be 8 for 'l' type array. This is quite annoying since this function is the only way to know actual format of the array. Gilles Simond From rkern at ucsd.edu Thu Apr 7 08:17:44 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 7 08:17:44 2005 Subject: [Numpy-discussion] Possible example application of the array interface In-Reply-To: <20050407080429.GB20252@idi.ntnu.no> References: <20050406171008.58480.qmail@web53602.mail.yahoo.com> <20050407080429.GB20252@idi.ntnu.no> Message-ID: <42554EC6.9090807@ucsd.edu> Magnus Lie Hetland wrote: > Bruce Southey : >>What statistical functions would you want in numerical python? > > > I think I'd want most of the standard, parametrized probability > distributions (as well as automatic estimation from data, perhaps) and > a handful of common statistical tests (t-test, z-test, Fishcher, > chi-squared, what-have-you). Perhaps some support for factorial > experiments (not sure if R has anything specific there, though). Except for factorial designs, scipy.stats has all of that. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Thu Apr 7 08:23:13 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 7 08:23:13 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407115141.96479.qmail@web50204.mail.yahoo.com> References: <20050407115141.96479.qmail@web50204.mail.yahoo.com> Message-ID: <4255502D.6060306@ee.byu.edu> Scott Gilbert wrote: >--- "David M. Cooke" wrote: > > >>>Good point, but a pain. Maybe they should be required, that way I >>>don't have to first check for the presence of '<' or '>', then check >>>if they have the right value. >>> >>> >>I'll second this. Pulling out more Python Zen: Explicit is better than >>implicit. >> >> >> > >I'll third. > > O.K. It's done.... From curzio.basso at unibas.ch Thu Apr 7 09:58:40 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Thu Apr 7 09:58:40 2005 Subject: [Numpy-discussion] profile reveals calls to astype() Message-ID: <4255664F.2070107@unibas.ch> Hi all, I have a problem trying to profile a program using numarray, maybe someone with more experience can give me a hint... basically, the program I am profiling has a function like this: foo(): # some code # a call to astype() for i in xrange(N): # some other code and NO explicit call to astype() the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of N+1, as if it was called inside the loop. So now the first doubt I have is that astype() gets listed because called from some function called by foo(), even if this should not happen. Here is the list of numarray functions called in foo() Function called... generic.py:651(getshape)(14) 0.070 generic.py:918(reshape)(2) 0.000 generic.py:1013(where)(2) 0.050 generic.py:1069(concatenate)(2) 4.270 morphology.py:150(binary_erosion)(2) 0.070 numarraycore.py:698(__del__)(120032) 3.240 numarraycore.py:817(astype)(12002) 37.290 numarraycore.py:857(is_c_array)(36000) 10.450 numarraycore.py:878(type)(4) 0.000 numarraycore.py:964(__mul__)(12) 0.340 numarraycore.py:981(__div__)(8) 0.010 numarraycore.py:1068(__pow__)(8) 0.000 numarraycore.py:1180(__imul__)(12000) 0.930 numarraycore.py:1250(__eq__)(2) 0.080 numarraycore.py:1400(zeros)(54) 0.060 numarraycore.py:1409(ones)(8) 0.020 The second thing I can think of is that astype() is implicitly called by some conversion. Can this be? curzio From jmiller at stsci.edu Thu Apr 7 10:51:38 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 7 10:51:38 2005 Subject: [Numpy-discussion] profile reveals calls to astype() In-Reply-To: <4255664F.2070107@unibas.ch> References: <4255664F.2070107@unibas.ch> Message-ID: <1112896207.2437.34.camel@halloween.stsci.edu> astype() is used in a bunch of places, including the C-API, so it's hard to guess how it's getting called with the information here. In general, astype() gets called to "match up types" based on a particular parameterization of a function call, i.e. the c-code underlying some function call needs a different type than was passed in so astype() is used to convert an array to a workable type. One possibility for debugging this might be to drop N to something reasonable, like say 2, and then run under pdb with a breakpoint set on astype(). Something like this is what I have in mind; it may not be exactly right but with fiddling this approach might work: >>> from yourmodule import newfoo # you redefined foo to accept N as a parameter >>> import pdb >>> pdb.run("newfoo(N=2)") (pdb) s # step along a little to get into newfoo() ... step output (pdb) import numarray.numarraycore as nc (pdb) break nc.astype (pdb) c ... breakpoint output (pdb) where ... function traceback showing where astype() got called from (pdb) c ... breakpoint output (pdb) where ... more function traceback, eventually you should find it... ... Regards, Todd On Thu, 2005-04-07 at 12:56, Curzio Basso wrote: > Hi all, > > I have a problem trying to profile a program using numarray, maybe someone with more experience can > give me a hint... > > basically, the program I am profiling has a function like this: > > foo(): > # some code > # a call to astype() > for i in xrange(N): > # some other code and NO explicit call to astype() > > the problem is that when I print the 'callees' of foo(), astype() gets listed with an occurrence of > N+1, as if it was called inside the loop. > So now the first doubt I have is that astype() gets listed because called from some function called > by foo(), even if this should not happen. Here is the list of numarray functions called in foo() > > Function called... > generic.py:651(getshape)(14) 0.070 > generic.py:918(reshape)(2) 0.000 > generic.py:1013(where)(2) 0.050 > generic.py:1069(concatenate)(2) 4.270 > morphology.py:150(binary_erosion)(2) 0.070 > numarraycore.py:698(__del__)(120032) 3.240 > numarraycore.py:817(astype)(12002) 37.290 > numarraycore.py:857(is_c_array)(36000) 10.450 > numarraycore.py:878(type)(4) 0.000 > numarraycore.py:964(__mul__)(12) 0.340 > numarraycore.py:981(__div__)(8) 0.010 > numarraycore.py:1068(__pow__)(8) 0.000 > numarraycore.py:1180(__imul__)(12000) 0.930 > numarraycore.py:1250(__eq__)(2) 0.080 > numarraycore.py:1400(zeros)(54) 0.060 > numarraycore.py:1409(ones)(8) 0.020 > > The second thing I can think of is that astype() is implicitly called by some conversion. Can this be? > > curzio > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From Chris.Barker at noaa.gov Thu Apr 7 11:38:43 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Apr 7 11:38:43 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407114157.23887.qmail@web50209.mail.yahoo.com> References: <20050407114157.23887.qmail@web50209.mail.yahoo.com> Message-ID: <42557DE3.3010804@noaa.gov> Scott Gilbert wrote: > I think __array_version__ (or __array_protocol__?) is the > better choice. How about have it optional and default to 1? If it's > present and greater than 1 then it means there is something new going on... Again, I'm uncomfortable with something that I have to check being optional. If it is, we're encouraging people to not check it, and that' a recipe for bugs later on down the road. > Everyone seems to think that an offset is so weird. I haven't looked at > the internals of Numeric/scipy.base in a while so maybe it doesn't apply > there. However, if you subscript an array and return a view to the data, > you need an offset or you need to create a new buffer that encodes the > offset for you. > I guess all I'm saying is that I wouldn't assume the offset is zero... Good point. All the more reason to have the offset be mandatory. > The fact that a lot of these attributes are optional should be hidden in > helper functions like those in Travis's array_interface.py module, or a > C/C++ include file (with inline functions). Yes, if there is a C/C++ version of all these helper functions, I'll be a lot happier. And you're right, the same information should not be encoded in two places, so my "iscontiguous" attribute should be a helper function or maybe a method. > In a short while, you shouldn't have to check any __array_metadata__ > attributes directly. There should even be a helper function for getting > the array elements. Cool. How would that work? A C++ iterator? I"m thinking not, as this is all C, no? > It wouldn't be a horrible mistake to have all the attributes be mandatory, > but it doesn't get array consumes any benefit that they can't get from a > well written helper library, and it does add some burden to array > producers. Hardly any. I'm assuming that there will be a base_array class that can be used as a base class or mixin, so it wouldn't be any work at all to have a full set of attributes with defaults. It would take up a little bit of memory. I'm assuming that the whole point of this is to support large datasets, but maybe that isn't a valid assumption, After all, small array support has turned out to be very important for Numeric. As a rule of thumb, I think there will be consumers of arrays that producers, so I'd rather make it easy on the consumers that the producers, if we need to make such a trade off. Maybe I'm biased, because I'm a consumer. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Apr 7 12:20:05 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Apr 7 12:20:05 2005 Subject: [Numpy-discussion] Re: Questions about the array interface. In-Reply-To: References: <42546766.5060802@noaa.gov> Message-ID: <42558796.4070607@noaa.gov> James Carroll wrote: >> def DrawPointList(self, points, pens=None): >> ... >> # some checking code on the pens) >> ... >> if (hasattr(points,'__array_shape__') and >> hasattr(points,'__array_typestr__') and >> len(points.__array_shape__) == 2 and >> points.__array_shape__[1] == 2 and >> points.__array_typestr__ == 'i4' and >> ): # this means we have a compliant array >> # return the array protocol version >> return self._DrawPointArray(points.__array_data__, pens,[]) >> #This needs to be written now! > > > This means that whenever you have some complex multivalued > multidementional structure with the data you want to plot, you have to > reshape it into the above 'compliant' array before passing it on. I'm > a newbie, but is this reshape something where the data has to be > copied and take up memory twice? Probably. It depends on two things: 1) What structure the data is in at the moment 2) Whether we write the code to handle more "complex" arrangements of data: discontiguous arrays, for instance. But the idea is to require a data structure that makes sense for the data. For example, a natural way to store a whole set of coordinates is to use an NX2 NumPy array of doubles. This is exactly the data structure that I want the above function to accept. If the points are somehow a subset of a larger array, then they will be in a discontiguous array, and I'm not sure if I want to bother to try to handle that. You can always use the generic sequence interface to access the data, but that will be a lot slower. We're interfacing with a static language here, we can get optimum performance only by specifying a particular data structure. > If not, then great, you would > painlessly reshape into something that had a different set of strides > that just accessed the data that complied in the big blob of data. If > the reshape is expensive, then maybe we need the array abstraction, > and then a second 'thing' that described which parts of the array to > use for the sequence of 2-tuples to use for plotting the x,y s of a > scatter plot. (or whatever) The proposed array interface does provide a certain level of abstraction, that's what: __array_shape__ __array_typestr__ __array_descr__ __array_strides__ __array_offset__ Are all about we could certainly write the wxPy_LIST_helper functions to handle a larger variety of options that the simple contiguous C array, but I want to start with the simple case, and I'm not sure directly handling the more complex cases is worth it. I'm imagining that the user will need to do something like: dc.DrawPointList(asarray(points, Int)) It's easier to use the utility functions that Numeric provides than re-write similar code in wxPython. > I do think we can accept more than just i4 for a datatype. Especially > since a last-minute cast to i4 in inexpensive for almost every data > type. Sure, but we're interfacing with a static language, so for each data type supported, we need to cast the data pointer to the right type, then have a code to convert it to the type needed by wx. It's not a big deal, but I'd rather keep it simple. I do want to support at least doubles and ints. Users can use Numeric's astype() method to convert if need be. I've noticed that there is a wxRealPoint class that uses doubles, but it doesn't look like it can be used as input to any of the wxDC methods. Too bad. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From xscottg at yahoo.com Thu Apr 7 14:13:32 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 14:13:32 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050407211227.82679.qmail@web50206.mail.yahoo.com> --- Chris Barker wrote: > > Again, I'm uncomfortable with something that I have to check being > optional. If it is, we're encouraging people to not check it, and that' > a recipe for bugs later on down the road. > [snip] > > > I guess all I'm saying is that I wouldn't assume the offset is zero... > > Good point. All the more reason to have the offset be mandatory. > Lot's of protocols have optional parts. The helper functions would hide this level of detail. > > Yes, if there is a C/C++ version of all these helper functions, I'll be > a lot happier. And you're right, the same information should not be > encoded in two places, so my "iscontiguous" attribute should be a helper > function or maybe a method. > > > In a short while, you shouldn't have to check any __array_metadata__ > > attributes directly. There should even be a helper function for > > getting the array elements. > > Cool. How would that work? A C++ iterator? I"m thinking not, as this is > all C, no? > I think this will take shape as an include file with static/inline functions. No linking required, just #include and call the functions. It would be nice but not necessary that this was distributed with Python. I would be in favor of having some C++ iterator interfaces (possibly a template class) inside of a #ifdef __cplusplus block. Python doesn't seem to have a a lot C++ in the core so I wonder if this would meet resistance (even when it's inside of a #ifdef block). > > > It wouldn't be a horrible mistake to have all the attributes be > > mandatory, but it doesn't get array consumes any benefit that they > > can't get from a well written helper library, and it does add some > > burden to array producers. > > Hardly any. I'm assuming that there will be a base_array class that can > be used as a base class or mixin, so it wouldn't be any work at all to > have a full set of attributes with defaults. It would take up a little > bit of memory. I'm assuming that the whole point of this is to support > large datasets, but maybe that isn't a valid assumption, After all, > small array support has turned out to be very important for Numeric. > If the protocol can make things easy without the use of a mixin or base class, all the better to my way of thinking. I don't think the memory use is very relevant as the attributes would only require storage in the class object, not the instances. There is something elegant about making array creation as easy as: class easy_array: def __init__(self, filename): data = open(filename, 'r').read() self.__array_data__ = data self.__array_shape__ = (len(data)/4,) self.__array_typestr__ = '>i4' Like I said, I don't think it would be *horrible* to require all the attributes, but I don't see how it will benefit you at all. And even if all the attributes are mandatory, there are still a number of details to get right in reading the memory. You'll likely want to use the helper libraries/modules regardless. (Once they're completed of course...) > > As a rule of thumb, I think there will be [more] consumers of arrays > than producers, so I'd rather make it easy on the consumers that the > producers, if we need to make such a trade off. Maybe I'm biased, > because I'm a consumer. > I don't see the trade off. It will be easy for you either way, but harder for array producers (admittedly only a little). This has to be easier than the situation you have today right? Imagine the code you'd have to write to special case Numeric, scipy.base, Numarray, and Python's array module. Cheers, -Scott From tim.hochberg at cox.net Thu Apr 7 14:31:11 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 7 14:31:11 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407211227.82679.qmail@web50206.mail.yahoo.com> References: <20050407211227.82679.qmail@web50206.mail.yahoo.com> Message-ID: <4255A635.9010309@cox.net> Scott Gilbert wrote: >--- Chris Barker wrote: > > [SNIP] > >>As a rule of thumb, I think there will be [more] consumers of arrays >>than producers, so I'd rather make it easy on the consumers that the >>producers, if we need to make such a trade off. Maybe I'm biased, >>because I'm a consumer. >> >> >> > >I don't see the trade off. It will be easy for you either way, but harder >for array producers (admittedly only a little). > > I think there is a trade off, but not the one that Chris is worried about. It should be easy to hide complexity of dealing with missing attributes through the various helper functions. The cost will be in speed and will probably be most noticable in C extensions using small arrays where the extra code to check if an attribute is present will be signifigant. How signifigant this will be, I'm not sure. And frankly I don't care all that much since I generally only use large arrays. However, since one of the big faultlines between Numarray and Numeric involves the former's relatively poor small array performance, I suspect someone might care. -tim >This has to be easier than the situation you have today right? Imagine the >code you'd have to write to special case Numeric, scipy.base, Numarray, and >Python's array module. > > > From oliphant at ee.byu.edu Thu Apr 7 15:47:04 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 7 15:47:04 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050407211501.60155.qmail@web50203.mail.yahoo.com> References: <20050407211501.60155.qmail@web50203.mail.yahoo.com> Message-ID: <4255B7D6.9000109@ee.byu.edu> Scott Gilbert wrote: >I agree, we need a road map of some sort. It could be multiple PEPs >depending, but it should include most of the following: > > - Get the bytes object submitted. There are only a few small > things in PEP 296 that should be changed. > > > #4 > - I'm not particularly interested in implementing the new bytes > literal and other features discussed in PEP 332, but it is > related to this topic. (The proposal is for b"xxxxxx" to be a > bytes literal.) We should make note that while this is not > part of the numpy roadmap, nothing prohibits that from being > implemented by another user. > > > - Add an ndarray module. This module will contain the ndarray > object as well as a superset of your helper functions. I > think implementing it in pure Python on top of the bytes > object is the right course. It's partly for documentation. > > - Add an include file to make this protocol easily accessible > from C. It's not much code, and the entire thing could be > done with inline/static functions in the .h file. It would > be nice if this went into Python too, but not strictly > required. > > I put these together at #1 > - Add the array protocol attributes to the existing array > object. > > #2 > - Flesh out the "locked buffer" stuff in PEP 298. Add support > for locking the buffer to the existing array object, the > bytes object, the mmap object, and anything else (string?) > that doesn't meet too much resistance. > > #3 > - Fix the existing buffer object to regrab it's pointer > every time it's needed. Could also add support to use > the "locked buffer" interface where possible. I gather > that you are using this particular object in scipy.base > (is that true??). Several shortcomings of it could be > easily fixed at the Python level, but I don't feel > strongly that this would have to be done... Then again > it isn't much work. > > #5 I can't think of anything you've missed. I'm very supportive of this, but I have to finish scipy.base first. I think Perry is supportive as well. I know he's been playing catch-up in the reading. I'm not sure of Todd's opinion. I suspect he would welcome these changes to Python. My preference order is 1) the ndarray module and ndarray.h header with these interface definitions and methods. 2) Add array interface attributes to array module 3) Flesh out locked buffer API 4) Bytes object (with Pickling support) 5) Fix current buffer object. -Travis From strawman at astraw.com Thu Apr 7 15:56:03 2005 From: strawman at astraw.com (Andrew Straw) Date: Thu Apr 7 15:56:03 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255502D.6060306@ee.byu.edu> References: <20050407115141.96479.qmail@web50204.mail.yahoo.com> <4255502D.6060306@ee.byu.edu> Message-ID: <4255BA56.7000001@astraw.com> Travis Oliphant wrote: > Scott Gilbert wrote: > >> --- "David M. Cooke" wrote: >> >> >>>> Good point, but a pain. Maybe they should be required, that way I >>>> don't have to first check for the presence of '<' or '>', then check >>>> if they have the right value. >>>> >>> >>> I'll second this. Pulling out more Python Zen: Explicit is better than >>> implicit. >>> >>> >> >> >> I'll third. >> >> > > O.K. It's done.... > Here's a bit of weirdness which has prevented me from using '<' or '>' in the past with the struct module. I'm not guru enough to know what's going on, but it has prevented me from being explicit rather than implicit. In [1]:import struct In [2]:from numarray.ieeespecial import nan In [3]:nan Out[3]:nan In [4]:struct.pack(' SystemError: frexp() result out of range In [5]:struct.pack('d',nan) Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' From Chris.Barker at noaa.gov Thu Apr 7 16:01:03 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Apr 7 16:01:03 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255A635.9010309@cox.net> References: <20050407211227.82679.qmail@web50206.mail.yahoo.com> <4255A635.9010309@cox.net> Message-ID: <4255BA80.4090201@noaa.gov> Tim Hochberg wrote: > Scott Gilbert wrote: >> --- Chris Barker wrote: >> I don't see the trade off. I wasn't sure it applied in this case, but if there were a trade off, we should make things easiest for the consumers of arrays. > I think there is a trade off, but not the one that Chris is worried > about. It should be easy to hide complexity of dealing with missing > attributes through the various helper functions. The cost will be in > speed and will probably be most noticable in C extensions using small > arrays where the extra code to check if an attribute is present will be > signifigant. Actually, that is one I'm worried about. You're quite right, if I'm dealing with a 2X2 array, those helper functions are going to take much longer to run than accessing (and maybe using) the data. Like Tim, I'm mostly interested in using this for large data sets, but I think the small array thing might crop up unexpectedly. For example, with the current numarray, if you pass in an NX2 array to wxPython (to draw a polygon, for instance), it's very slow. It turns out that that's because a whole set of (2,) arrays are created when extracting the data, so even though you're dealing with a large data set, you end up dealing with a LOT of small arrays. Of course, the whole point of this is to avoid that, but I don't think we should assume that any overhead is negligible. > >> This has to be easier than the situation you have today right? well, sure. Though it seems to be harder than using the Numeric API. Directly. However, I'll shut up now, as it seems that the proposed utility functions will address my issues. -Chris PS to Tim: Want to help out with the wxPython integration? -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From xscottg at yahoo.com Thu Apr 7 20:05:48 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 20:05:48 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408030336.54970.qmail@web50209.mail.yahoo.com> --- Andrew Straw wrote: > > Here's a bit of weirdness which has prevented me from using '<' or '>' > in the past with the struct module. I'm not guru enough to know what's > going on, but it has prevented me from being explicit rather than > implicit. > > In [1]:import struct > > In [2]:from numarray.ieeespecial import nan > > In [3]:nan > Out[3]:nan > > In [4]:struct.pack(' --------------------------------------------------------------------------- > exceptions.SystemError Traceback (most > recent call last) > > /home/astraw/ > > SystemError: frexp() result out of range > > In [5]:struct.pack('d',nan) > Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' > No clue why that is, but it certainly looks like a bug in the struct module. It shouldn't make any difference about whether or not the array protocol reports the endian though. It's using a different notation for typecodes. Cheers, -Scott From rkern at ucsd.edu Thu Apr 7 20:24:38 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 7 20:24:38 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408030336.54970.qmail@web50209.mail.yahoo.com> References: <20050408030336.54970.qmail@web50209.mail.yahoo.com> Message-ID: <4255F79D.4000501@ucsd.edu> Scott Gilbert wrote: > --- Andrew Straw wrote: > >>Here's a bit of weirdness which has prevented me from using '<' or '>' >>in the past with the struct module. I'm not guru enough to know what's >>going on, but it has prevented me from being explicit rather than >>implicit. >> >>In [1]:import struct >> >>In [2]:from numarray.ieeespecial import nan >> >>In [3]:nan >>Out[3]:nan >> >>In [4]:struct.pack('> > > --------------------------------------------------------------------------- > >>exceptions.SystemError Traceback (most >>recent call last) >> >>/home/astraw/ >> >>SystemError: frexp() result out of range >> >>In [5]:struct.pack('d',nan) >>Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' >> > > > > No clue why that is, but it certainly looks like a bug in the struct > module. It shouldn't make any difference about whether or not the array > protocol reports the endian though. It's using a different notation for > typecodes. This behavior is expplained by Tim Peters: http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From xscottg at yahoo.com Thu Apr 7 21:07:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 21:07:02 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408040601.86838.qmail@web50203.mail.yahoo.com> --- Tim Hochberg wrote: > > I think there is a trade off, but not the one that Chris is worried > about. It should be easy to hide complexity of dealing with missing > attributes through the various helper functions. The cost will be in > speed and will probably be most noticable in C extensions using small > arrays where the extra code to check if an attribute is present will be > signifigant. > > How signifigant this will be, I'm not sure. And frankly I don't care all > that much since I generally only use large arrays. However, since one of > the big faultlines between Numarray and Numeric involves the former's > relatively poor small array performance, I suspect someone might care. > You must check the return value of the PyObject_GetAttr (or PyObject_GetAttrString) calls regardless. Otherwise the extension will die with an ugly segfault the first time one passes an float where an array was expected. If we're talking about small light-weight arrays and a C/C++ function that wants to work with them very efficiently, I'm not convinced that requiring the attributes be present will make things faster. As we're talking about small light weight arrays, it's unlikely the individual arrays will have __array_shape__ or __array_strides__ already stored as tuples. They'll probably store them as a C array as part of their PyObject structure. In the world where some of these attributes are optional: If an attribute like __array_offset__ or __array_shape__ isn't present, the C code will know to use zero or the default C-contiguous layout. So the check failed, but the failure case is probably very fast (since a temporary tuple object doesn't have to be built by the array on the fly). In the world where all of the attributes are required: The array object will have to generate the __array_offset__ int/long or __array_shape___ tuple from it's own internal representation. Then the C/C++ consumer code will bust apart the tuple to get the values. So the check succeeded, but the success code needs to grab the parts of the tuple. The C helper code could look like: struct PyNDArrayInfo { int ndims; int endian; char itemcode; size_t itemsize; Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */ Py_LONG_LONG offset; Py_LONG_LONG strides[40]; /* More Array Info goes here */ }; int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) { PyObject* shape; PyObject* offset; PyObject* strides; int ii, len; info->itemsize = too_long_for_this_example(obj); shape = PyObject_GetAttrString(obj, "__array_shape__"); if (!shape) return 0; len = PySequence_Size(shape); if (len < 0) return 0; if (len > 40) return 0; /* This needs work */ info->ndims = len; for (ii = 0; iishape[ii] = PyLong_AsLongLong(val); Py_DECREF(val); } Py_DECREF(shape); offset = PyObject_GetAttrString(obj, "__array_offset__"); if (offset) { /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/ info->offset = PyLong_AsLongLong(offset); Py_DECREF(offset); } else { PyErr_Clear(); info->offset = 0; } strides = PyObject_GetAttrString(obj, "__array_strides__"); if (strides) { /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/ for (ii = 0; iistrides[ii] = PyLong_AsLongLong(val); Py_DECREF(val); } Py_DECREF(strides); } else { /*** THIS FAILURE PATH IS PROBABLY FASTER ***/ size_t size = info->size; PyErr_Clear(); for (ii = ndims-1; ii>=0; ii--) { info->strides[ii] = size; size *= info->shape[ii]; } } /* More code goes here */ } I have no idea how expensive PyErr_Clear() is. We'd have to profile it to see for certain. If PyErr_Clear() is not expensive, then we could make a strong argument that *not* requiring the attributes will be more efficient. It could also be so close that it doesn't matter - in which case it's back to being a matter of taste... Cheers, -Scott From xscottg at yahoo.com Thu Apr 7 21:16:06 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 7 21:16:06 2005 Subject: [Numpy-discussion] Questions about the array interface. Message-ID: <20050408041417.61390.qmail@web50210.mail.yahoo.com> Oops, sent too fast. Quick correction... > > In the world where some of these attributes are optional: If an > attribute like __array_offset__ or __array_shape__ isn't present, > the C code will know to use zero or the default C-contiguous layout. > So the check failed, but the failure case is probably very fast > (since a temporary tuple object doesn't have to be built by the array > on the fly). > I meant to say "__array_offset__ or __array_stides___". The __array_shape__ attribute would always be required for arrays... Cheers, -Scott From tim.hochberg at cox.net Thu Apr 7 23:56:10 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 7 23:56:10 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408040601.86838.qmail@web50203.mail.yahoo.com> References: <20050408040601.86838.qmail@web50203.mail.yahoo.com> Message-ID: <42562AC5.3040502@cox.net> Scott Gilbert wrote: >--- Tim Hochberg wrote: > > >>I think there is a trade off, but not the one that Chris is worried >>about. It should be easy to hide complexity of dealing with missing >>attributes through the various helper functions. The cost will be in >>speed and will probably be most noticable in C extensions using small >>arrays where the extra code to check if an attribute is present will be >>signifigant. >> >>How signifigant this will be, I'm not sure. And frankly I don't care all >>that much since I generally only use large arrays. However, since one of >>the big faultlines between Numarray and Numeric involves the former's >>relatively poor small array performance, I suspect someone might care. >> >> >> > >You must check the return value of the PyObject_GetAttr (or >PyObject_GetAttrString) calls regardless. Otherwise the extension will die >with an ugly segfault the first time one passes an float where an array was >expected. > >If we're talking about small light-weight arrays and a C/C++ function that >wants to work with them very efficiently, I'm not convinced that requiring >the attributes be present will make things faster. > > >As we're talking about small light weight arrays, it's unlikely the >individual arrays will have __array_shape__ or __array_strides__ already >stored as tuples. They'll probably store them as a C array as part of >their PyObject structure. > > >In the world where some of these attributes are optional: If an attribute >like __array_offset__ or __array_shape__ isn't present, the C code will >know to use zero or the default C-contiguous layout. So the check failed, >but the failure case is probably very fast (since a temporary tuple object >doesn't have to be built by the array on the fly). > > >In the world where all of the attributes are required: The array object >will have to generate the __array_offset__ int/long or __array_shape___ >tuple from it's own internal representation. Then the C/C++ consumer code >will bust apart the tuple to get the values. So the check succeeded, but >the success code needs to grab the parts of the tuple. > > > >The C helper code could look like: > > I'm not convinced it's legit to assume that a failure to get the attribute means that it's not present and call PyErrorClear. Just as a for instance, what if the attribute in question is implemented as a descriptor in which there is some internal error. Then your burying the error and most likely doing the wrong thing. As far as I can tell, the only correct way to do this is to use PyObject_HasAttrString, then PyObject_GetAttrString if that succeeds. The point about not passing around the tuples probably being faster is a good one. Another thought is that requiring tuples instead of general sequences would make the helper faster (since one could use *PyTuple_GET_**ITEM*, which I believe is much faster than PySequence_GetItem). This would possibly shift more pain onto the implementer of the object though. I suspect that the best strategy, orthogonal to requiring all attributes or not, is to use PySequence_Fast to get a fast sequence and work with that. This means that objects that return tuples for strides, etc would run at maximum possible speed, while other sequences would still work. Back to requiring attributes or not. I suspect that the fastest correct way is to require all attributes, but allow them to be None, in which case the default value is used. Then any errors are easily bubbled up and a fast check for None choses whether to use the defaults or not. It's late, so I hope that's not too incoherent. Or too wrong. Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort of error checking (it can allegedly raise errors). I suppose that means one is supposed to call PyError_Occurred after every call? That's sort of painful! -tim ** > struct PyNDArrayInfo { > int ndims; > int endian; > char itemcode; > size_t itemsize; > Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */ > Py_LONG_LONG offset; > Py_LONG_LONG strides[40]; > /* More Array Info goes here */ > }; > > int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) { > PyObject* shape; > PyObject* offset; > PyObject* strides; > int ii, len; > > info->itemsize = too_long_for_this_example(obj); > > shape = PyObject_GetAttrString(obj, "__array_shape__"); > if (!shape) return 0; > len = PySequence_Size(shape); > if (len < 0) return 0; > if (len > 40) return 0; /* This needs work */ > info->ndims = len; > for (ii = 0; ii PyObject* val = PySequence_GetItem(shape, ii); > info->shape[ii] = PyLong_AsLongLong(val); > Py_DECREF(val); > } > Py_DECREF(shape); > > offset = PyObject_GetAttrString(obj, "__array_offset__"); > if (offset) { > /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/ > info->offset = PyLong_AsLongLong(offset); > Py_DECREF(offset); > } else { > PyErr_Clear(); > info->offset = 0; > } > > strides = PyObject_GetAttrString(obj, "__array_strides__"); > if (strides) { > /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/ > for (ii = 0; ii PyObject* val = PySequence_GetItem(strides, ii); > info->strides[ii] = PyLong_AsLongLong(val); > Py_DECREF(val); > } > Py_DECREF(strides); > } else { > /*** THIS FAILURE PATH IS PROBABLY FASTER ***/ > size_t size = info->size; > PyErr_Clear(); > for (ii = ndims-1; ii>=0; ii--) { > info->strides[ii] = size; > size *= info->shape[ii]; > } > } > > /* More code goes here */ > } > > > >I have no idea how expensive PyErr_Clear() is. We'd have to profile it to >see for certain. If PyErr_Clear() is not expensive, then we could make a >strong argument that *not* requiring the attributes will be more efficient. > > >It could also be so close that it doesn't matter - in which case it's back >to being a matter of taste... > > >Cheers, > -Scott > > > > > > > From cookedm at physics.mcmaster.ca Fri Apr 8 00:43:08 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 8 00:43:08 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <42562AC5.3040502@cox.net> References: <20050408040601.86838.qmail@web50203.mail.yahoo.com> <42562AC5.3040502@cox.net> Message-ID: <20050408074129.GA16479@arbutus.physics.mcmaster.ca> On Thu, Apr 07, 2005 at 11:55:01PM -0700, Tim Hochberg wrote: > Scott Gilbert wrote: > > >--- Tim Hochberg wrote: > > > > > >>I think there is a trade off, but not the one that Chris is worried > >>about. It should be easy to hide complexity of dealing with missing > >>attributes through the various helper functions. The cost will be in > >>speed and will probably be most noticable in C extensions using small > >>arrays where the extra code to check if an attribute is present will be > >>signifigant. > >> > >>How signifigant this will be, I'm not sure. And frankly I don't care all > >>that much since I generally only use large arrays. However, since one of > >>the big faultlines between Numarray and Numeric involves the former's > >>relatively poor small array performance, I suspect someone might care. > >> > >> > >> > > > >You must check the return value of the PyObject_GetAttr (or > >PyObject_GetAttrString) calls regardless. Otherwise the extension will die > >with an ugly segfault the first time one passes an float where an array was > >expected. > > > >If we're talking about small light-weight arrays and a C/C++ function that > >wants to work with them very efficiently, I'm not convinced that requiring > >the attributes be present will make things faster. > > > > > >As we're talking about small light weight arrays, it's unlikely the > >individual arrays will have __array_shape__ or __array_strides__ already > >stored as tuples. They'll probably store them as a C array as part of > >their PyObject structure. > > > > > >In the world where some of these attributes are optional: If an attribute > >like __array_offset__ or __array_shape__ isn't present, the C code will > >know to use zero or the default C-contiguous layout. So the check failed, > >but the failure case is probably very fast (since a temporary tuple object > >doesn't have to be built by the array on the fly). > > > >In the world where all of the attributes are required: The array object > >will have to generate the __array_offset__ int/long or __array_shape___ > >tuple from it's own internal representation. Then the C/C++ consumer code > >will bust apart the tuple to get the values. So the check succeeded, but > >the success code needs to grab the parts of the tuple. > > > >The C helper code could look like: > > I'm not convinced it's legit to assume that a failure to get the > attribute means that it's not present and call PyErrorClear. Just as a > for instance, what if the attribute in question is implemented as a > descriptor in which there is some internal error. Then your burying the > error and most likely doing the wrong thing. As far as I can tell, the > only correct way to do this is to use PyObject_HasAttrString, then > PyObject_GetAttrString if that succeeds. No point: PyObject_HasAttrString *calls* PyObject_GetAttrString, then clears the error if there is one. [Side note: hasattr() in Python works the same way, which is why using properties is a pain when you've got code that's using it] > The point about not passing around the tuples probably being faster is a > good one. Another thought is that requiring tuples instead of general > sequences would make the helper faster (since one could use > *PyTuple_GET_**ITEM*, which I believe is much faster than > PySequence_GetItem). This would possibly shift more pain onto the > implementer of the object though. I suspect that the best strategy, > orthogonal to requiring all attributes or not, is to use PySequence_Fast > to get a fast sequence and work with that. This means that objects that > return tuples for strides, etc would run at maximum possible speed, > while other sequences would still work. How about objects that use a lightweight array as the strides sequence? I'm thinking that if you've got a fast 1-d array object, you'd be tempted to use an instance of that as the shape or strides attribute. You'd be saving on temporary tuple creation (but you'd still be losing some in making Python ints). I haven't benchmarked it, but I'm looking at the code for PySequence_GetItem(): it does a few pointer derefences to get the sq_item() method in the tp_as_sequence struct of an object implementing the sequence protocol, which for the tuple does an array indexing of the tuple's data. You've got about two function calls more compared to using PyTuple_GET_ITEM. It really depends on how big the arrays you expect to get passed to you. If they're big, this is all amortized: you'll hardly see it. It also depends on how your routines get used. If the routine is buried below a few layers of API, you'd likely be better off doing a typecast higher up to your own representation, or something. If it's at the border, so the user will call it directly *often*, you're going to be screwed for speed anyways (giving the user the option of casting arrays to something else would probably help a lot here also). > Back to requiring attributes or not. I suspect that the fastest correct > way is to require all attributes, but allow them to be None, in which > case the default value is used. Then any errors are easily bubbled up > and a fast check for None choses whether to use the defaults or not. > > It's late, so I hope that's not too incoherent. Or too wrong. > > Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort > of error checking (it can allegedly raise errors). I suppose that means > one is supposed to call PyError_Occurred after every call? That's sort > of painful! Yes! Check all C API functions that may return errors! That includes PySequence_GetItem() and PyLong_AsLongLong. > > struct PyNDArrayInfo { > > int ndims; > > int endian; > > char itemcode; > > size_t itemsize; > > Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */ > > Py_LONG_LONG offset; > > Py_LONG_LONG strides[40]; > > /* More Array Info goes here */ > > }; > > > > int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) { > > PyObject* shape; > > PyObject* offset; > > PyObject* strides; > > int ii, len; > > > > info->itemsize = too_long_for_this_example(obj); > > > > shape = PyObject_GetAttrString(obj, "__array_shape__"); > > if (!shape) return 0; > > len = PySequence_Size(shape); > > if (len < 0) return 0; > > if (len > 40) return 0; /* This needs work */ > > info->ndims = len; > > for (ii = 0; ii > PyObject* val = PySequence_GetItem(shape, ii); Like here > > info->shape[ii] = PyLong_AsLongLong(val); and here > > Py_DECREF(val); (if you don't check PySequence_GetItem -- not a good idea anyways -- this should be Py_XDECREF) [snip more code that needs checks :-)] > >I have no idea how expensive PyErr_Clear() is. We'd have to profile it to > >see for certain. If PyErr_Clear() is not expensive, then we could make a > >strong argument that *not* requiring the attributes will be more efficient. Not much; it's about three Py_XDECREF's. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Fri Apr 8 01:22:09 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 8 01:22:09 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? Message-ID: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> It seems that people are worried about speed of the attribute-based array interface when using small arrays in C. Here's an alternative: Define some attribute (for now, call it __array_c__), which returns a CObject whose value (which you get with PyCObject_GetVoidPtr) would be a pointer to a struct describing the array. It would look something like typedef struct { int version; int nd; Py_LONG_LONG *shape; char typecode; Py_LONG_LONG *strides; Py_LONG_LONG offset; void *data; } SimpleCArray; (The order here follows that of the array interface spec; if somebody's got any comments on what mixing int's, Py_LONG_LONG, and char's in a struct does to the packing and potential alignment problems I'd like to know.) version is there as a sanity check: I'd say for this version it's something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily a check that you've got the right thing (sinc CObjects are intrinsically opaque types). Then: - the array object guarantees that the data, etc. remains alive, probably by passing itself as the desc parameter to the CObject. The array data would have to stay at the same location and the same size while the reference is held. - typecode follows that of the __array_typestr__ attribute - shape and strides are pointers to arrays of at least nd elements. - this doesn't handle byteswapped as-is. Maybe a flags, or endian, attribute could be added. - you can still have the full attribute-base array interface (__array_strides__, etc.) to fall back on. If the typecode is 'V', you'll have to look at __array_descr__. Creating one from a Numeric PyArrayObject would go like this: PyObject *create_SimpleCArray(PyArrayObject *a) { SimpleCArray *ca = PyMem_New(SimpleCArray, 1); ca->version = 0xDECAF; ca->nd = a->nd; ca->shape = PyMem_New(Py_LONG_LONG, ca->nd); for (i = 0; i < ca->nd; i++) { ca->shape[i] = a->dimensions[i]; } ca->strides = PyMem_New(Py_LONG_LONG, ca->nd); for (i = 0; i < ca->nd; i++) { ca->strides[i] = a->strides[i]; } ca->offset = 0; ca->data = &my_data; Py_INCREF(a); PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray); return co; } where void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a) { PyMem_Free(ca->shape); PyMem_Free(ca->strides); PyMem_Free(ca); Py_DECREF(a); } Some points: - you have to keep the CObject around: destroying it will potentially destroy the array you're looking at. - I was thinking that maybe adding a PyObject *owner could make it easier to keep track of the owner; I'm not sure, as the descr argument in CObjects can easily play that role. - The creator of the SimpleCArray is free to add elements to the end (as long as they don't affect the padding/alignment of the previous ones: haven't thought about this). You could put the real owner of the array data there, for example (say, if it was wrapping a Blitz++ array). Or have a small _strides[30] array at the end, and strides would point to that (saving you a memory allocation). This simple C interface would, I think, alleviate much worries about speed for small arrays, and even for large arrays. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From curzio.basso at unibas.ch Fri Apr 8 06:30:05 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Fri Apr 8 06:30:05 2005 Subject: [Numpy-discussion] profile reveals calls to astype() In-Reply-To: <1112896207.2437.34.camel@halloween.stsci.edu> References: <4255664F.2070107@unibas.ch> <1112896207.2437.34.camel@halloween.stsci.edu> Message-ID: <4256873B.2060501@unibas.ch> Todd Miller wrote: > astype() is used in a bunch of places, including the C-API, so it's > hard to guess how it's getting called with the information here. In ok, so probably C functions are somehow 'transparent' to the profiler which does not report them, but reports the python functions called by the C one... >>>>from yourmodule import newfoo # you redefined foo to accept N as a parameter >>>>import pdb >>>>pdb.run("newfoo(N=2)") > > (pdb) s # step along a little to get into newfoo() > ... step output > (pdb) import numarray.numarraycore as nc > (pdb) break nc.astype strange, what I get now is: > (Pdb) b nc.astype > *** The specified object 'nc.astype' is not a function > or was not found along sys.path. and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather than just the function) under ipython, starting it with > %run -d myprog.py maybe this could mess up things? curzio From jmiller at stsci.edu Fri Apr 8 06:45:13 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 8 06:45:13 2005 Subject: [Numpy-discussion] profile reveals calls to astype() In-Reply-To: <4256873B.2060501@unibas.ch> References: <4255664F.2070107@unibas.ch> <1112896207.2437.34.camel@halloween.stsci.edu> <4256873B.2060501@unibas.ch> Message-ID: <1112967803.5142.29.camel@halloween.stsci.edu> On Fri, 2005-04-08 at 09:29, Curzio Basso wrote: > Todd Miller wrote: > > > astype() is used in a bunch of places, including the C-API, so it's > > hard to guess how it's getting called with the information here. In > > ok, so probably C functions are somehow 'transparent' to the profiler which does not report them, > but reports the python functions called by the C one... > > >>>>from yourmodule import newfoo # you redefined foo to accept N as a parameter > >>>>import pdb > >>>>pdb.run("newfoo(N=2)") > > > > (pdb) s # step along a little to get into newfoo() > > ... step output > > (pdb) import numarray.numarraycore as nc > > (pdb) break nc.astype > > strange, what I get now is: > > > (Pdb) b nc.astype > > *** The specified object 'nc.astype' is not a function > > or was not found along sys.path. > > and in fact if I look at nc.__dict__ there is no 'astype' key. I'm running the whole program (rather > than just the function) under ipython, starting it with > > > %run -d myprog.py > > maybe this could mess up things? No. I should have said "b nc.NumArray.astype". I just tried this out with an astype() callback from numarray.convolve's C-code and it worked OK for me. Regards, Todd From strawman at astraw.com Fri Apr 8 08:00:13 2005 From: strawman at astraw.com (Andrew Straw) Date: Fri Apr 8 08:00:13 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255F79D.4000501@ucsd.edu> References: <20050408030336.54970.qmail@web50209.mail.yahoo.com> <4255F79D.4000501@ucsd.edu> Message-ID: <42569C4D.2080904@astraw.com> Robert Kern wrote: > Scott Gilbert wrote: > >> --- Andrew Straw wrote: >> >>> Here's a bit of weirdness which has prevented me from using '<' or >>> '>' in the past with the struct module. I'm not guru enough to know >>> what's going on, but it has prevented me from being explicit rather >>> than >>> implicit. >>> >>> In [1]:import struct >>> >>> In [2]:from numarray.ieeespecial import nan >>> >>> In [3]:nan >>> Out[3]:nan >>> >>> In [4]:struct.pack('>> >> >> --------------------------------------------------------------------------- >> >> >>> exceptions.SystemError Traceback (most >>> recent call last) >>> >>> /home/astraw/ >>> >>> SystemError: frexp() result out of range >>> >>> In [5]:struct.pack('d',nan) >>> Out[5]:'\x00\x00\x00\x00\x00\x00\xf8\xff' >>> >> >> >> >> No clue why that is, but it certainly looks like a bug in the struct >> module. It shouldn't make any difference about whether or not the array >> protocol reports the endian though. It's using a different notation for >> typecodes. > > > This behavior is expplained by Tim Peters: > > http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a > I feared it was something like that. (No platform independent way to represent special values like nan, inf, and so on.) So I think if we're going to require an encoding character such as '<' or '>' we should also include one that means native which CAN handle these special values... And document why it's needed and why it may get one into trouble. From jmiller at stsci.edu Fri Apr 8 10:14:04 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 8 10:14:04 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> Message-ID: <1112980431.5142.116.camel@halloween.stsci.edu> On Fri, 2005-04-08 at 04:21, David M. Cooke wrote: > It seems that people are worried about speed of the attribute-based > array interface when using small arrays in C. I was a little worried too, but think the array protocol idea is a good one in any case. Thinking about this, I'm wondering if what we used to do in early numarray (0.2) wouldn't work here. Our "consumer interface" / helper function looked more like this: int getSimpleCArray(PyObject *o, SimpleCArray *info); It basically just fills in the caller's SimpleCArray struct using information from o and returns 0, or -1 with an exception set if there's some problem. In numarray's SimpleCArray struct, the shape and strides arrays were fully allocated (i.e. Py_LONG_LONG shape[MAXDIM];) so the struct could be placed in an auto variable with nothing to free() later. In this interface, there is no implied getattr at all, since the helper function getSimpleCArray() can be made as smart (i.e. given knowledge about specific types) as people are motivated to make it. So, for a Numeric array or a numarray or a Numeric3 array, getSimpleCArray would presumably just copy from struct to struct, but for other types, it might fall back on the many-getattr approach. Regards, Todd > Here's an alternative: Define some attribute (for now, call it > __array_c__), which returns a CObject whose value (which you get with > PyCObject_GetVoidPtr) would be a pointer to a struct describing the > array. It would look something like > > typedef struct { > int version; > int nd; > Py_LONG_LONG *shape; > char typecode; > Py_LONG_LONG *strides; > Py_LONG_LONG offset; > void *data; > } SimpleCArray; > > (The order here follows that of the array interface spec; if somebody's > got any comments on what mixing int's, Py_LONG_LONG, and char's in a > struct does to the packing and potential alignment problems I'd like to > know.) > > version is there as a sanity check: I'd say for this version it's > something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily > a check that you've got the right thing (sinc CObjects are > intrinsically opaque types). > > Then: > - the array object guarantees that the data, etc. remains alive, > probably by passing itself as the desc parameter to the CObject. > The array data would have to stay at the same location and the same > size while the reference is held. > > - typecode follows that of the __array_typestr__ attribute > > - shape and strides are pointers to arrays of at least nd elements. > > - this doesn't handle byteswapped as-is. Maybe a flags, or endian, > attribute could be added. > > - you can still have the full attribute-base array interface > (__array_strides__, etc.) to fall back on. If the typecode is 'V', > you'll have to look at __array_descr__. > > Creating one from a Numeric PyArrayObject would go like this: > > PyObject *create_SimpleCArray(PyArrayObject *a) > { > SimpleCArray *ca = PyMem_New(SimpleCArray, 1); > ca->version = 0xDECAF; > ca->nd = a->nd; > ca->shape = PyMem_New(Py_LONG_LONG, ca->nd); > for (i = 0; i < ca->nd; i++) { > ca->shape[i] = a->dimensions[i]; > } > ca->strides = PyMem_New(Py_LONG_LONG, ca->nd); > for (i = 0; i < ca->nd; i++) { > ca->strides[i] = a->strides[i]; > } > ca->offset = 0; > ca->data = &my_data; > > Py_INCREF(a); > PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray); > return co; > } > > where > void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a) > { > PyMem_Free(ca->shape); > PyMem_Free(ca->strides); > PyMem_Free(ca); > Py_DECREF(a); > } > > Some points: > - you have to keep the CObject around: destroying it will potentially > destroy the array you're looking at. > - I was thinking that maybe adding a PyObject *owner could make it > easier to keep track of the owner; I'm not sure, as the descr argument > in CObjects can easily play that role. > - The creator of the SimpleCArray is free to add elements to the end > (as long as they don't affect the padding/alignment of the previous > ones: haven't thought about this). You could put the real owner of the > array data there, for example (say, if it was wrapping a Blitz++ > array). Or have a small _strides[30] array at the end, and strides > would point to that (saving you a memory allocation). > > This simple C interface would, I think, alleviate much worries about > speed for small arrays, and even for large arrays. -- From xscottg at yahoo.com Fri Apr 8 11:06:04 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 11:06:04 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <42562AC5.3040502@cox.net> Message-ID: <20050408180523.95022.qmail@web50207.mail.yahoo.com> --- Tim Hochberg wrote: > > The point about not passing around the tuples probably being faster is a > good one. Another thought is that requiring tuples instead of general > sequences would make the helper faster (since one could use > *PyTuple_GET_**ITEM*, which I believe is much faster than > PySequence_GetItem). This would possibly shift more pain onto the > implementer of the object though. I suspect that the best strategy, > orthogonal to requiring all attributes or not, is to use PySequence_Fast > to get a fast sequence and work with that. This means that objects that > return tuples for strides, etc would run at maximum possible speed, > while other sequences would still work. > I hadn't seen this "fast" sequence stuff before. Thanks for the pointer. > > Back to requiring attributes or not. I suspect that the fastest correct > way is to require all attributes, but allow them to be None, in which > case the default value is used. Then any errors are easily bubbled up > and a fast check for None choses whether to use the defaults or not. > How about saying that, for all the optional attributes, if they return None that's to be treated the same way as if they weren't present at all? In other words, they're still optional, but people in the know would know that returning None was probably faster... From xscottg at yahoo.com Fri Apr 8 11:14:27 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 11:14:27 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408074129.GA16479@arbutus.physics.mcmaster.ca> Message-ID: <20050408181314.89274.qmail@web50205.mail.yahoo.com> --- "David M. Cooke" wrote: > > > Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort > > of error checking (it can allegedly raise errors). I suppose that means > > one is supposed to call PyError_Occurred after every call? That's sort > > of painful! > > Yes! Check all C API functions that may return errors! That includes > PySequence_GetItem() and PyLong_AsLongLong. > Sorry, I should have been clear that I was writing example code. I only put the error checking in where I thought it was demonstrating the point. I'd be surprized if it even compiled... Note that the additional error checking is required in the "success" path where the attributes are present. In other words, mandating the attributes be there when they aren't strictly required could make things slower... Cheers, -Scott From xscottg at yahoo.com Fri Apr 8 12:24:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 12:24:02 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: 6667 Message-ID: <20050408192312.91215.qmail@web50206.mail.yahoo.com> --- "David M. Cooke" wrote: > > It seems that people are worried about speed of the attribute-based > array interface when using small arrays in C. > I'm really not worried about it... I just don't want "performance" to be used as an argument for a given design decisions when the proposed change won't actually make things faster. > > Here's an alternative: Define some attribute (for now, call it > [snip] > This would definitely be faster. Faster yet would be doing a PyNumeric_Check (or PyNumarray_Check, or whatever they're called) and just cast the pointer to the underlying representation. If you must go fast, go as fast as possible... I'd rather we didn't add a lot complexity to the array protocol to just go at a medium speed. Cheers, -Scott From oliphant at ee.byu.edu Fri Apr 8 13:55:27 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 8 13:55:27 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> Message-ID: <4256EF45.6070004@ee.byu.edu> David M. Cooke wrote: >It seems that people are worried about speed of the attribute-based >array interface when using small arrays in C. > > I think we are talking about here an *array protocol* (i.e. like the buffer protocol and sequence protocol). So far we have just described the Python level interface. I would like to see an array protocol added (perhaps to the buffer protocol table). This could be done just as David describes --- we don't even need to use the C-pointer (just return a void *pointer which has a version as the first entry). I think this is how the C-level should be handled, I think. Yes, it does not require changes to Python to implement the __array_c__ attribute. But, ultimately, it would be better if we used the C-level protocol concept that Python already uses for other objects. -Travis From perry at stsci.edu Fri Apr 8 14:05:05 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Apr 8 14:05:05 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <4255B7D6.9000109@ee.byu.edu> References: <20050407211501.60155.qmail@web50203.mail.yahoo.com> <4255B7D6.9000109@ee.byu.edu> Message-ID: <819eb85df29878341dd00521bbba280d@stsci.edu> On Apr 7, 2005, at 6:44 PM, Travis Oliphant wrote: > > I can't think of anything you've missed. > > I'm very supportive of this, but I have to finish scipy.base first. > I think Perry is supportive as well. I know he's been playing > catch-up in the reading. I'm not sure of Todd's opinion. I suspect > he would welcome these changes to Python. > > My preference order is > > 1) the ndarray module and ndarray.h header with these interface > definitions and methods. 2) Add array interface attributes to array > module > 3) Flesh out locked buffer API > 4) Bytes object (with Pickling support) > 5) Fix current buffer object. > I agree as well (I think). Just to be sure I'll restate. These issues are all important, and the the discussion has been very useful to flesh out the proposed array protocol. Nevertheless, I'd put the priority of getting these into Python, or accepted by the Python Dev community lower than actually implementing Numeric3 (aka scipy.base) to the point that it acceptable to both Numeric and numarray communities. True, subsequent changes forced by the acceptance process may require reworking in scipy.base, but I put unification far ahead of getting these various components finished and into Python. I think that's what Travis is getting at too. I've been tied up in other things, but frankly, I haven't seen that much that I have objected to so far in the array protocol discussions to warrant comments from me. I think it has been pretty well done (and I'm about to leave town so I'm going to be out of touch for a week or so, at least mostly) Perry From xscottg at yahoo.com Fri Apr 8 14:43:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 14:43:02 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408214214.45907.qmail@web50206.mail.yahoo.com> --- Andrew Straw wrote: > > > > This behavior is explained by Tim Peters: > > > > http://groups-beta.google.com/group/comp.lang.python/msg/16dbf848c050405a > > > I feared it was something like that. (No platform independent way to > represent special values like nan, inf, and so on.) So I think if we're > going to require an encoding character such as '<' or '>' we should also > include one that means native which CAN handle these special values... > And document why it's needed and why it may get one into trouble. > The data is either big endian or little endian (or possibly a single byte in which case it doesn't matter). Whether or not the (hardware, operating system, C runtime library, C compiler, or Python implementation) can handle NaNs or Infs is not a property of the data. What does an additional code or two get you? Let's say we used ']' for big endian native, and '[' for little endian native? Does that just indicate the possible presence of NaNs for Infs in the data? Adding those codes doesn't have any affect on whether or not libraries can deal with them. I guess I'm not understanding something. Cheers, -Scott From cookedm at physics.mcmaster.ca Fri Apr 8 14:52:02 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 8 14:52:02 2005 Subject: [Numpy-discussion] Alternate C-only array protocol for speed? In-Reply-To: <4256EF45.6070004@ee.byu.edu> (Travis Oliphant's message of "Fri, 08 Apr 2005 14:53:25 -0600") References: <20050408082147.GA16977@arbutus.physics.mcmaster.ca> <4256EF45.6070004@ee.byu.edu> Message-ID: Travis Oliphant writes: > David M. Cooke wrote: > >>It seems that people are worried about speed of the attribute-based >>array interface when using small arrays in C. >> >> > I think we are talking about here an *array protocol* (i.e. like the > buffer protocol and sequence > protocol). > > So far we have just described the Python level interface. I would > like to see an array protocol added (perhaps to the buffer protocol > table). This could be done just as David describes --- we don't even > need to use the C-pointer (just return a void *pointer which has a > version as the first entry). The purpose of the CObject was to make it possible to pass it through Python (through the attribute access). > I think this is how the C-level should be handled, I think. Yes, it > does not require changes to Python to implement the __array_c__ > attribute. But, ultimately, it would be better if we used the C-level > protocol concept that Python already uses for other objects. Ah, ok, so you'd have a slot in the type object (like the number, sequence, or buffer protocols), with the appropriate (C-level) functions. This would require it to be in the Python core, though, and would only work for a new version of Python. Alternatively, you have a special attribute/method that returns an object with the right C API -- much like CObjects are used for wrapping Numeric's C API. I would really like to see something working at the C level (so you're not passing dimensions back-and-forth as Python tuples with Python ints), but the Python-level array interface you've proposed will work for now. This should be revisited once people are using the new array interface, and we have an idea of how it's being used, and the performance costs. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From xscottg at yahoo.com Fri Apr 8 16:06:02 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Fri Apr 8 16:06:02 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050408230455.35465.qmail@web50209.mail.yahoo.com> --- Scott Gilbert wrote: > > --- Andrew Straw wrote: > > > > I feared it was something like that. (No platform independent way to > > represent special values like nan, inf, and so on.) So I think if > > we're going to require an encoding character such as '<' or '>' we > > should also include one that means native which CAN handle these > > special values... And document why it's needed and why it may get one > > into trouble. > > > > Let's say we used ']' for big endian native, and '[' for little endian > native? Does that just indicate the possible presence of NaNs for Infs > in the data? > > Adding those codes doesn't have any affect on whether or not libraries > can deal with them. I guess I'm not understanding something. > I think I'm understanding my problem in understanding :-). There IS a platform independant way to represent NaNs and Infs. It's pretty clearly spelled out in IEEE-754: http://stevehollasch.com/cgindex/coding/ieeefloat.html I think something we've been assuming is that the array data is basically IEEE-754 compliant (maybe it needs to be byteswapped). If that's not true, then we're going to need some new typecodes. We're not supporting the ability to pass VAX floating point around (Are we????). The problem is that you can't make any safe assumptions about whether your current platform will deal with IEEE-754 data in any predictable way if it contains NaNs or Infs. So additional typecodes won't really solve anything. Tim Peter's explanation is a good representation of Python's official position regarding floating point issues, but a much simpler explanation is possible... The struct module in "standard mode" decodes the data one character at a time and builds a float from them. You can see this in the _PyFloat_Unpack8 function in the floatobject.c file. In other words, this routine probably works on a VAX too (taking a IEEE-754 double and building a VAX floating point as it goes). You can also see the comment in there that says it doesn't handle NaNs or Infs. I don't think we need another indicator for '>' big-endian or '<' for little-endian. Cheers, -Scott From konrad.hinsen at laposte.net Fri Apr 8 23:46:00 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Apr 8 23:46:00 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <20050408230455.35465.qmail@web50209.mail.yahoo.com> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> Message-ID: <95b362f578483f1a9ee3e850e108c6d8@laposte.net> On 09.04.2005, at 01:04, Scott Gilbert wrote: > I think something we've been assuming is that the array data is > basically > IEEE-754 compliant (maybe it needs to be byteswapped). If that's not > true, > then we're going to need some new typecodes. We're not supporting the > ability to pass VAX floating point around (Are we????). This discussion has been coming up regularly for a few years. Until now the concensus has always been that Python should make no assumptions that go beyond what a C compiler can promise. Which means no assumptions about floating-point representation. Of course the computing world is changing, and IEEE format may well be ubiquitous by now. Vaxes must be in the museum by now. But how about mainframes? IBM mainframes didn't use IEEE when I used them (last time 15 years ago), and they are still around, possibly compatible to their ancestors. Another detail to consider is that although most machines use the IEEE representation, hardly any respects the IEEE rules for floating point operations in all detail. In particular, trusting that Inf and NaN will be treated as IEEE postulates is a risky business. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen at cea.fr ------------------------------------------------------------------------ ------- From xscottg at yahoo.com Sat Apr 9 09:36:05 2005 From: xscottg at yahoo.com (Scott Gilbert) Date: Sat Apr 9 09:36:05 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: 6667 Message-ID: <20050409163525.93733.qmail@web50201.mail.yahoo.com> --- konrad.hinsen at laposte.net wrote: > > This discussion has been coming up regularly for a few years. Until now > the concensus has always been that Python should make no assumptions > that go beyond what a C compiler can promise. Which means no > assumptions about floating-point representation. > > Of course the computing world is changing, and IEEE format may well be > ubiquitous by now. Vaxes must be in the museum by now. But how about > mainframes? IBM mainframes didn't use IEEE when I used them (last time > 15 years ago), and they are still around, possibly compatible to their > ancestors. > I've been following this mailing list for a few years now, but I skip a lot of threads. I almost certainly skipped this topic in the past since it wasn't relevant to me. I'm only interested in it now since it's relevant to this data interchange business, so I'm sorry if this is a rehash... Trying to stay portable is a good goal, and I can understand why Python proper would try to adhere to the restrictions it does. Despite the claim, Python makes plenty of assumptions that a standards conformant C compiler could break. If numpy doesn't make some assumptions about floating point representation, it's going to kill the possibity of passing data across machines, and that's pretty unacceptable. I'm not comfortable saying "ubiquitous" since I don't know what the mainframe or super computing community is making use of, and I don't know what sort of little machines Python is running on. The closest thing to a mainframe that I've ever used was a Convex, and I never knew what it's floating point representation was. However, I know that x86, PPC, AMD-64, IA64, Alpha, Sparc, and whatever HPUX and SGIs are running on all use IEEE-754 format. That's probably 99.999% of all machines capable of running Python, and at least that percentage of users. It would be a shame to gum up this typecode thing for situations that don't occur in practice. If it has to be done, then I recommend we use the '@' code in place of the '<' or '>' for platforms that are out of the ordinary. It's important to specify that '@' is only to be used on floating point data that is not IEEE-754. In this case it doesn't mean "native" like it does in the struct module, it means "weird" :-). > > Another detail to consider is that although most machines use the IEEE > representation, hardly any respects the IEEE rules for floating point > operations in all detail. In particular, trusting that Inf and NaN will > be treated as IEEE postulates is a risky business. > See that's the thing. Why burden how you label the data with the restrictions of the current machine? You can take the data off the machine. Whether or not I can rely on what NaN*Inf will give me, I know that I can take NaN and Inf to another machine and get the same interpretation of the data. This whole thread started because Andrew Straw showed that struct.pack(' References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> Message-ID: <425808B4.8070005@ee.byu.edu> konrad.hinsen at laposte.net wrote: > On 09.04.2005, at 01:04, Scott Gilbert wrote: > >> I think something we've been assuming is that the array data is >> basically >> IEEE-754 compliant (maybe it needs to be byteswapped). If that's >> not true, >> then we're going to need some new typecodes. We're not supporting the >> ability to pass VAX floating point around (Are we????). > No, in moving from the struct modules character codes we are trying to do something more platform independent because it is very likely that different platforms will want to exchange binary data. IEEE-754 is a great standard to build an interface around. Data sharing was the whole reason the standard emerged and a lot of companies got on board. > > This discussion has been coming up regularly for a few years. Until > now the concensus has always been that Python should make no > assumptions that go beyond what a C compiler can promise. Which means > no assumptions about floating-point representation. > > Of course the computing world is changing, and IEEE format may well > be ubiquitous by now. Vaxes must be in the museum by now. But how > about mainframes? IBM mainframes didn't use IEEE when I used them > (last time 15 years ago), and they are still around, possibly > compatible to their ancestors. I found the following piece, written about 6 years ago interesting: http://www.research.ibm.com/journal/rd/435/schwarz.html Basically, it states that chips in newer IBM mainframes support the IEEE 754 standard. > > Another detail to consider is that although most machines use the > IEEE representation, hardly any respects the IEEE rules for floating > point operations in all detail. In particular, trusting that Inf and > NaN will be treated as IEEE postulates is a risky business. But, this can be handled with platform-dependendent C-code when and if problems arise. -Travis From strawman at astraw.com Sat Apr 9 12:36:03 2005 From: strawman at astraw.com (Andrew Straw) Date: Sat Apr 9 12:36:03 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <425808B4.8070005@ee.byu.edu> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu> Message-ID: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com> Here's an email Todd Miller sent me (I hoped he'd send it directly to the list, but I'll forward it. Todd, I hope you don't mind.) Todd Miller wrote: > On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote: >> Hi Todd, >> >> Could you join in on this thread? I think you wrote the ieeespecial >> stuff in numarray, so it's clear you have a much better understanding >> of >> the issues than I do... >> >> Cheers! >> Andrew > > My own understanding is limited, but I can say a few things that might > make the status of numarray clearer. My assumptions for numarray were > that: > > 1. Floating point values are 32-bit or 64-bit entities which are stored > in IEEE-754 format. This is a basic assumption of numarray.ieeespecial > so I expect it simply won't work on a VAX. There's no checking for > this. > > 2. The platforms that I care about, AMD/Intel Windows/Linux, PowerPC > OS-X, and Ultra-SPARC Solaris, all seem to provide IEEE-754 floating > point. ieeespecial has been tested to work there. > > 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit > unsigned > integers, and contiguous ranges on those integers are used to > represent > special values like NAN and INF. Platform byte ordering for the > IEEE-754 floating point numbers mirrors byte ordering for integers so > the ieeespecial NAN detection code works in a cross platform way *and* > values exported from one IEEE-754 platform will work with ieeespecial > when imported on another. It's important to note that special values > are not unique: there is no single NAN value; it's a bit range. > > 4. numarray leaks IEEE-754 special values out into Python floating > point > scalars. This may be bad form. I do this because (1) they repr > understandably if not in a platform independent way and (2) people need > to get at them. I noticed recently that ieeespecial.nan == > ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and > correct ones (False) for Python-2.4. I haven't looked at what the > array > version does yet: array(nan) == array(nan). The point to be taken > from > this is that the level at which numarray ieee special value handling > works or doesn't work is really restricted to (1) detecting certain > ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers > for array code (no guarantees) (3) the behavior of Python itself > (improving). > > In the context of the array protocol (looking very nice by the way) my > thinking is that non-IEEE-754 floating point could be described with > bit > fields and that the current type codes should mean IEEE-754. > > Some minor things I noticed in the array interface: > > 1. The packing order of bit fields is not clear. In C, my experience > is that some compilers pack bit structs towards the higher order bits > of > an integer, and some towards the lower. More info to clarify that > would be helpful. > > 2. I saw no mention that we're talking about a protocol. I'm sure > that's clear to everyone following this discussion closely, but I > didn't see it in the spec. It might make sense to allude to the C > helper functions and potential for additions to the Python type struct > even if they're not spelled out. > > Regards, > Todd On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote: > konrad.hinsen at laposte.net wrote: > >> On 09.04.2005, at 01:04, Scott Gilbert wrote: >> >>> I think something we've been assuming is that the array data is >>> basically >>> IEEE-754 compliant (maybe it needs to be byteswapped). If that's >>> not true, >>> then we're going to need some new typecodes. We're not supporting >>> the >>> ability to pass VAX floating point around (Are we????). >> > > No, in moving from the struct modules character codes we are trying to > do something more platform independent because it is very likely that > different platforms will want to exchange binary data. IEEE-754 is a > great standard to build an interface around. Data sharing was the > whole reason the standard emerged and a lot of companies got on board. > >> >> This discussion has been coming up regularly for a few years. Until >> now the concensus has always been that Python should make no >> assumptions that go beyond what a C compiler can promise. Which >> means no assumptions about floating-point representation. >> >> Of course the computing world is changing, and IEEE format may well >> be ubiquitous by now. Vaxes must be in the museum by now. But how >> about mainframes? IBM mainframes didn't use IEEE when I used them >> (last time 15 years ago), and they are still around, possibly >> compatible to their ancestors. > > I found the following piece, written about 6 years ago interesting: > > http://www.research.ibm.com/journal/rd/435/schwarz.html > > Basically, it states that chips in newer IBM mainframes support the > IEEE 754 standard. > >> >> Another detail to consider is that although most machines use the >> IEEE representation, hardly any respects the IEEE rules for floating >> point operations in all detail. In particular, trusting that Inf and >> NaN will be treated as IEEE postulates is a risky business. > > But, this can be handled with platform-dependendent C-code when and if > problems arise. > -Travis > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jmiller at stsci.edu Sat Apr 9 16:18:00 2005 From: jmiller at stsci.edu (Todd Miller) Date: Sat Apr 9 16:18:00 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu> <7bbd3fb27f77a4058fd8675bf53de12e@astraw.com> Message-ID: <1113088643.5363.8.camel@jaytmiller.comcast.net> On Sat, 2005-04-09 at 12:35 -0700, Andrew Straw wrote: > Here's an email Todd Miller sent me (I hoped he'd send it directly to > the list, but I'll forward it. Todd, I hope you don't mind.) No, I don't mind. I intended to send it to the list but left in a rush this morning. Todd > > > On Fri, 2005-04-08 at 15:46 -0700, Andrew Straw wrote: > >> Hi Todd, > >> > >> Could you join in on this thread? I think you wrote the ieeespecial > >> stuff in numarray, so it's clear you have a much better understanding > >> of > >> the issues than I do... > >> > >> Cheers! > >> Andrew > > > > My own understanding is limited, but I can say a few things that might > > make the status of numarray clearer. My assumptions for numarray were > > that: > > > > 1. Floating point values are 32-bit or 64-bit entities which are stored > > in IEEE-754 format. This is a basic assumption of numarray.ieeespecial > > so I expect it simply won't work on a VAX. There's no checking for > > this. > > > > 2. The platforms that I care about, AMD/Intel Windows/Linux, PowerPC > > OS-X, and Ultra-SPARC Solaris, all seem to provide IEEE-754 floating > > point. ieeespecial has been tested to work there. > > > > 3. I viewed IEEE-754 floating point numbers as 32-bit or 64-bit > > unsigned > > integers, and contiguous ranges on those integers are used to > > represent > > special values like NAN and INF. Platform byte ordering for the > > IEEE-754 floating point numbers mirrors byte ordering for integers so > > the ieeespecial NAN detection code works in a cross platform way *and* > > values exported from one IEEE-754 platform will work with ieeespecial > > when imported on another. It's important to note that special values > > are not unique: there is no single NAN value; it's a bit range. > > > > 4. numarray leaks IEEE-754 special values out into Python floating > > point > > scalars. This may be bad form. I do this because (1) they repr > > understandably if not in a platform independent way and (2) people need > > to get at them. I noticed recently that ieeespecial.nan == > > ieeespecial.nan returns incorrect answers (True!) for Python-2.3 and > > correct ones (False) for Python-2.4. I haven't looked at what the > > array > > version does yet: array(nan) == array(nan). The point to be taken > > from > > this is that the level at which numarray ieee special value handling > > works or doesn't work is really restricted to (1) detecting certain > > ieee-754 bit ranges (2) the basic behavior of C code for C89 complilers > > for array code (no guarantees) (3) the behavior of Python itself > > (improving). > > > > In the context of the array protocol (looking very nice by the way) my > > thinking is that non-IEEE-754 floating point could be described with > > bit > > fields and that the current type codes should mean IEEE-754. > > > > Some minor things I noticed in the array interface: > > > > 1. The packing order of bit fields is not clear. In C, my experience > > is that some compilers pack bit structs towards the higher order bits > > of > > an integer, and some towards the lower. More info to clarify that > > would be helpful. > > > > 2. I saw no mention that we're talking about a protocol. I'm sure > > that's clear to everyone following this discussion closely, but I > > didn't see it in the spec. It might make sense to allude to the C > > helper functions and potential for additions to the Python type struct > > even if they're not spelled out. > > > > Regards, > > Todd > > > On Apr 9, 2005, at 9:54 AM, Travis Oliphant wrote: > > > konrad.hinsen at laposte.net wrote: > > > >> On 09.04.2005, at 01:04, Scott Gilbert wrote: > >> > >>> I think something we've been assuming is that the array data is > >>> basically > >>> IEEE-754 compliant (maybe it needs to be byteswapped). If that's > >>> not true, > >>> then we're going to need some new typecodes. We're not supporting > >>> the > >>> ability to pass VAX floating point around (Are we????). > >> > > > > No, in moving from the struct modules character codes we are trying to > > do something more platform independent because it is very likely that > > different platforms will want to exchange binary data. IEEE-754 is a > > great standard to build an interface around. Data sharing was the > > whole reason the standard emerged and a lot of companies got on board. > > > >> > >> This discussion has been coming up regularly for a few years. Until > >> now the concensus has always been that Python should make no > >> assumptions that go beyond what a C compiler can promise. Which > >> means no assumptions about floating-point representation. > >> > >> Of course the computing world is changing, and IEEE format may well > >> be ubiquitous by now. Vaxes must be in the museum by now. But how > >> about mainframes? IBM mainframes didn't use IEEE when I used them > >> (last time 15 years ago), and they are still around, possibly > >> compatible to their ancestors. > > > > I found the following piece, written about 6 years ago interesting: > > > > http://www.research.ibm.com/journal/rd/435/schwarz.html > > > > Basically, it states that chips in newer IBM mainframes support the > > IEEE 754 standard. > > > >> > >> Another detail to consider is that although most machines use the > >> IEEE representation, hardly any respects the IEEE rules for floating > >> point operations in all detail. In particular, trusting that Inf and > >> NaN will be treated as IEEE postulates is a risky business. > > > > But, this can be handled with platform-dependendent C-code when and if > > problems arise. > > -Travis > > > > > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real > > users. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From tchur at optushome.com.au Sat Apr 9 17:25:43 2005 From: tchur at optushome.com.au (Tim Churches) Date: Sat Apr 9 17:25:43 2005 Subject: [Numpy-discussion] Silent overflow of Int32 array Message-ID: <4258721E.1080905@optushome.com.au> I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit Linux): >>> import Numeric as N >>> a = N.array((2000000000,1000000000),typecode=N.Int32) >>> N.add.reduce(a) -1294967296 OK, it is an elementary mistake, but the silent overflow caught me unawares. casting the array to Float64 before summing it avoids the error, but in my instance the actual data is a rank-1 array of 21 million integers with a mean value of about 140 (which adds up more than sys.maxint), and casting to Float64 will use quite a lot of memory (as well as taking some time). Any advice for catching or avoiding such overflow without always incurring a performance and memory hit by always casting to Float64? Shouldn't add.reduce() be checking for overflow and raising an error? Then it would be easy to upcast only when overflow (or underflow) occurs, rather than always. Tim C From jmiller at stsci.edu Sun Apr 10 07:25:08 2005 From: jmiller at stsci.edu (Todd Miller) Date: Sun Apr 10 07:25:08 2005 Subject: [Numpy-discussion] Silent overflow of Int32 array In-Reply-To: <4258721E.1080905@optushome.com.au> References: <4258721E.1080905@optushome.com.au> Message-ID: <1113143026.5359.35.camel@jaytmiller.comcast.net> On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote: > I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit > Linux): > > >>> import Numeric as N > >>> a = N.array((2000000000,1000000000),typecode=N.Int32) > >>> N.add.reduce(a) > -1294967296 > > OK, it is an elementary mistake, but the silent overflow caught me > unawares. casting the array to Float64 before summing it avoids the > error, but in my instance the actual data is a rank-1 array of 21 > million integers with a mean value of about 140 (which adds up more than > sys.maxint), and casting to Float64 will use quite a lot of memory (as > well as taking some time). > > Any advice for catching or avoiding such overflow without always > incurring a performance and memory hit by always casting to Float64? Here's what numarray does: >>> import numarray as N >>> a = N.array((2000000000,1000000000),typecode=N.Int32) >>> N.add.reduce(a) -1294967296 So basic reductions in numarray have the same "careful while you're shaving" behavior as Numeric; it's fast but easy to screw up. But: >>> a.sum() 3000000000L >>> a.sum(type='d') 3000000000.0 a.sum() blockwise upcasts to the largest type of kind on the fly, in this case, Int64. This avoids the storage overhead of typecasting the entire array. A better name for the method would have been sumall() since it sums all elements of a multi-dimensional array. The flattening process reduces on one dimension before flattening preventing a full copy of a discontiguous array. It could be smarter about choosing the dimension of the initial reduction. Regards, Todd From pearu at cens.ioc.ee Mon Apr 11 00:59:14 2005 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Mon Apr 11 00:59:14 2005 Subject: [Numpy-discussion] scipy.base Message-ID: Hi Travis, I have committed scipy.{distutils,base} to Numeric3 CVS repository. scipy.distutils is a reviewed version of scipy_distutils and as one of its new features there is Configuration class that allows one to write much simpler setup.py files for subpackages. See setup.py files under Numeric3/scipy directory for examples. scipy.base is a very minimal copy of scipy_base plus ndarray modules. When using setup_scipy.py for building, the ndarray package is installed as scipy.base and from scipy.base import * should work equivalently to from ndarray import * for instance. I have used information from Numeric3/setup.py to implement Numeric3/scipy/base/setup.py and it should be updated whenever Numeric3/setup.py is changed. However, I would recommend start using scipy.base instead of ndarray as using both may cause unexpected behaviour when installed ndarray is older than scipy.base installation (see [*]). In Numeric3 CVS repository that would mean replacing setup.py with setup_scipy.py and any modification to ndarray setup scripts should be done in scipy/base/setup.py. We can apply this step whenever you feel confident with new setup.py files. Let me know if you have any troubles with them. To clean up Numeric3 CVS repository completely then Include, Src, Lib, CodeGenerators directories should be moved under the scipy/base directory. However, this step can be omitted if you would prefer working with files at the top directory of Numeric3. Current setup.py scripts fully support this approach as well. There are also few open issues and questions. First, how to name Numeric3 project when it installs scipy.base, scipy.distutils, Numeric packages, etc? This name will be used when creating source distributions and also as part of the path where header files will be installed. At the moment setup_scipy.py uses the name 'ndarray'. And so `setup_scipy.py sdist`, for example, produces ndarray-30.0.tar.gz file; `setup_scipy.py install` installs header files under /include/ndarray/ directory. Though this is fine with me, I am not sure that this is an ideal situation. I think we should choose the name now and stick to it forever, especially since 3rd party extension modules need to know where to look for ndarray header files. This name cannot be 'numarray', obviously, but there are options like 'ndarray', 'numpy', and may be others. In fact, 'Numeric' (with version 3x.x) would be also an option but that would be certainly cause some problems when one wants both Numeric 2x.x and Numeric 3x.x to be installed in the system, the header files would end up in the same directory, for instance. As a workaround, we could force installing Numeric3 header files to /include/Numeric/3/ or something. I acctually like this idea but I wonder what other think about this. Second, is it already possible to use ndarray C/API as a replacement of Numeric C/API, i.e. would simple replacement of #include "Numeric/arrayobject.h" with #include "ndarray/arrayobject.h" work? And if not, will it ever be? This would be interesting to know as an extension writer. [*] Due to keeping changes to Numeric3 sources minimal, scipy.base multiarray and umath modules first try to import ndarray and then scipy.base whenever ndarray is missing. One should remove ndarray installation from the system before using scipy.base. Regards, Pearu From konrad.hinsen at laposte.net Mon Apr 11 02:30:28 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Apr 11 02:30:28 2005 Subject: [Numpy-discussion] Questions about the array interface. In-Reply-To: <425808B4.8070005@ee.byu.edu> References: <20050408230455.35465.qmail@web50209.mail.yahoo.com> <95b362f578483f1a9ee3e850e108c6d8@laposte.net> <425808B4.8070005@ee.byu.edu> Message-ID: On Apr 9, 2005, at 18:54, Travis Oliphant wrote: > No, in moving from the struct modules character codes we are trying to > do something more platform independent because it is very likely that > different platforms will want to exchange binary data. IEEE-754 is a > great standard to build For data exchange between platforms, i.e. through files and network connections, XDR is arguably a better choice. It actually uses IEEE for floats, but XDR libraries provide conversion code for other platforms. It also takes care of byte ordering. > an interface around. Data sharing was the whole reason the standard > emerged and a lot of companies got on board. I think the main reason was standardization of precision, range, and operations, to make floating-point code more portable. This has had moderate success, as 100% IEEE platforms are rare if they exist at all. >> Another detail to consider is that although most machines use the >> IEEE representation, hardly any respects the IEEE rules for floating >> point operations in all detail. In particular, trusting that Inf and >> NaN will be treated as IEEE postulates is a risky business. > > But, this can be handled with platform-dependendent C-code when and if > problems arise. Can it? I have faint memories about Tim Peters explaining why and how handling IEEE in C code is a pain. Anyway, it would be a good idea to get his opinion on whatever proposal about IEEE before implementing it. Konrad. From tchur at optushome.com.au Mon Apr 11 13:52:19 2005 From: tchur at optushome.com.au (Tim Churches) Date: Mon Apr 11 13:52:19 2005 Subject: [Numpy-discussion] Silent overflow of Int32 array In-Reply-To: <1113143026.5359.35.camel@jaytmiller.comcast.net> References: <4258721E.1080905@optushome.com.au> <1113143026.5359.35.camel@jaytmiller.comcast.net> Message-ID: <425AE33C.30403@optushome.com.au> Todd Miller wrote: > On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote: > >>I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit >>Linux): >> >> >>> import Numeric as N >> >>> a = N.array((2000000000,1000000000),typecode=N.Int32) >> >>> N.add.reduce(a) >>-1294967296 >> >>OK, it is an elementary mistake, but the silent overflow caught me >>unawares. casting the array to Float64 before summing it avoids the >>error, but in my instance the actual data is a rank-1 array of 21 >>million integers with a mean value of about 140 (which adds up more than >>sys.maxint), and casting to Float64 will use quite a lot of memory (as >>well as taking some time). >> >>Any advice for catching or avoiding such overflow without always >>incurring a performance and memory hit by always casting to Float64? > > > Here's what numarray does: > > >>>>import numarray as N >>>>a = N.array((2000000000,1000000000),typecode=N.Int32) >>>>N.add.reduce(a) > > -1294967296 > > So basic reductions in numarray have the same "careful while you're > shaving" behavior as Numeric; it's fast but easy to screw up. Sure, but how does one be careful? It seems that for any array of two integers or more which could sum to more than sys.maxint or less than -sys.maxint, add.reduce() in both NumPy and Numeric will give either a) the correct answer or b) the incorrect answer, and short of adding up the array using a safer but much slower method, there is no way of determining if the answer provided (quickly) by add.reduce is right or wrong? Which seems to make it fast but useless (for integer arrays, at least? Is that an unfair summary? Can anyone point me towards a method for using add.reduce() on small arrays of large integers with values in the billions, or on large arrays of fairly small integer values, which will not suddenly and without warning give the wrong answer? > > But: > > >>>>a.sum() > > 3000000000L > >>>>a.sum(type='d') > > 3000000000.0 > > a.sum() blockwise upcasts to the largest type of kind on the fly, in > this case, Int64. This avoids the storage overhead of typecasting the > entire array. That's on a 64-bit platform, right? The same method could be used to cast the accumulator to a Float64 on a 32-bit platform to avoid casting the entire array? > A better name for the method would have been sumall() since it sums all > elements of a multi-dimensional array. The flattening process reduces > on one dimension before flattening preventing a full copy of a > discontiguous array. It could be smarter about choosing the dimension > of the initial reduction. OK, thanks. Unfortunately it is not possible for us to port our application to numarray at the moment. But the insight is most helpful. Tim C From oliphant at ee.byu.edu Mon Apr 11 17:12:25 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 11 17:12:25 2005 Subject: [Numpy-discussion] scipy.base In-Reply-To: References: Message-ID: <425B1182.7060102@ee.byu.edu> Pearu Peterson wrote: >Hi Travis, > >I have committed scipy.{distutils,base} to Numeric3 CVS repository. >scipy.distutils is a reviewed version of scipy_distutils and >as one of its new features there is Configuration class that allows >one to write much simpler setup.py files for subpackages. See setup.py >files under Numeric3/scipy directory for examples. scipy.base is a >very minimal copy of scipy_base plus ndarray modules. > > Thank you, thank you for your help with this. >When using setup_scipy.py for building, the ndarray package is installed >as scipy.base and > > from scipy.base import * > >should work equivalently to > > from ndarray import * > >for instance. > > I don't like from ndarray import *. It's only been a place-holder. Let's get rid of it as soon as possible. >To clean up Numeric3 CVS repository completely then Include, Src, Lib, >CodeGenerators directories should be moved under the scipy/base directory. >However, this step can be omitted if you would prefer working with files >at the top directory of Numeric3. > I have no preference here. Whatever works best. >First, how to name Numeric3 project when it installs scipy.base, >scipy.distutils, Numeric packages, etc? This name will be used when >creating source distributions and also as part of the path where header >files will be installed. At the moment setup_scipy.py uses the name >'ndarray'. > I don't like the name ndarray -- it's too limiting. Why not scipy_core? >In fact, 'Numeric' (with version 3x.x) would be also an option but that >would be certainly cause some problems when one wants both Numeric 2x.x >and Numeric 3x.x to be installed in the system, the header files would end >up in the same directory, for instance. As a workaround, we could force >installing Numeric3 header files to /include/Numeric/3/ or >something. I acctually like this idea but I wonder what other think about >this. > > How about include/scipy? >Second, is it already possible to use ndarray C/API as a replacement of >Numeric C/API, i.e. would simple replacement of > > #include "Numeric/arrayobject.h" > >with > > #include "ndarray/arrayobject.h" > >work? And if not, will it ever be? This would be interesting to know as an >extension writer. > > This should work fine. All of the old C-API is there (there are some new calls, but the old ones should still work). The only issue is that one of the calls (PyArray_Take I think now uses a standardized PyArrayObject * as one of it's arguments instead of a PyObject *). This shouldn't be a problem, since you always had to call it with an array. It's just now more explicit, but could lead to a warning. >[*] Due to keeping changes to Numeric3 sources minimal, scipy.base >multiarray and umath modules first try to import ndarray and then >scipy.base whenever ndarray is missing. One should remove ndarray >installation from the system before using scipy.base. > > I don't mind changing the package names entirely at this point. -Travis From oliphant at ee.byu.edu Tue Apr 12 16:39:23 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 12 16:39:23 2005 Subject: [Numpy-discussion] Subclassing and metadata Message-ID: <425C5BDF.1010802@ee.byu.edu> I think I've found a possible solution for subclasses that want to handle metadata. Essentially, any subclass that defines the method _update_meta(self, other) will get that method called when an array is sliced, or subscripted. Anytime an array is created where a subtype is the caller, this method will be called if it is available. Here is a simple example: import ndarray class subclass(ndarray.ndarray): def __new__(self, shape, *args, **kwds): self = ndarray.ndarray.__new__(subclass, shape, 'V4') return self def __init__(self, shape, *args, **kwds): self.dict = kwds return def _update_meta(self, obj): self.dict = obj.dict Comments? -Travis From pearu at cens.ioc.ee Wed Apr 13 04:06:00 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Wed Apr 13 04:06:00 2005 Subject: [Numpy-discussion] scipy.base In-Reply-To: <425B1182.7060102@ee.byu.edu> Message-ID: On Mon, 11 Apr 2005, Travis Oliphant wrote: > >When using setup_scipy.py for building, the ndarray package is installed > >as scipy.base and > > > > from scipy.base import * > > > >should work equivalently to > > > > from ndarray import * > > > >for instance. > > > > > I don't like from ndarray import *. It's only been a place-holder. > Let's get rid of it as soon as possible. Done in CVS. > >To clean up Numeric3 CVS repository completely then Include, Src, Lib, > >CodeGenerators directories should be moved under the scipy/base directory. > >However, this step can be omitted if you would prefer working with files > >at the top directory of Numeric3. > > > I have no preference here. Whatever works best. Directory Include/ndarray/ is now moved to scipy/base/Include/scipy/base/. I'l move other directories as well. > >First, how to name Numeric3 project when it installs scipy.base, > >scipy.distutils, Numeric packages, etc? This name will be used when > >creating source distributions and also as part of the path where header > >files will be installed. At the moment setup_scipy.py uses the name > >'ndarray'. > > > I don't like the name ndarray -- it's too limiting. Why not scipy_core? > > >In fact, 'Numeric' (with version 3x.x) would be also an option but that > >would be certainly cause some problems when one wants both Numeric 2x.x > >and Numeric 3x.x to be installed in the system, the header files would end > >up in the same directory, for instance. As a workaround, we could force > >installing Numeric3 header files to /include/Numeric/3/ or > >something. I acctually like this idea but I wonder what other think about > >this. > > > > > How about include/scipy? Without going into details of distutils restrictions for various options, I found that #include "scipy/base/arrayobject.h" option works best. And the name of the Numeric3 package is now scipy_core. All this is implemented in Numeric3 CVS now. > >Second, is it already possible to use ndarray C/API as a replacement of > >Numeric C/API, i.e. would simple replacement of > > > > #include "Numeric/arrayobject.h" > > > >with > > > > #include "ndarray/arrayobject.h" > > > >work? And if not, will it ever be? This would be interesting to know as an > >extension writer. > > > > > This should work fine. Great! Thanks, Pearu From alexandre.guimond at mirada-solutions.com Wed Apr 13 18:10:47 2005 From: alexandre.guimond at mirada-solutions.com (Alexandre Guimond) Date: Wed Apr 13 18:10:47 2005 Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images Message-ID: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com> Hi all. I've been looking at numarray to do some image processing. A lot of the work I do deal with transforming images, either with affine transformations, or vector field. Numarray seems somewhat well equiped to address these issues, but I am concerned about one aspect. It seems that the transformation code (affine_transforrm and geometric_transform) computes input coordonates for every output coordinate in the resulting array. If I have an RGB image for which the transformation is the same for all 3 RGB channels, I would assume that this will triple the workload unncessarily. It might have a dramatic effect for the geometric transformation which will most often be slower then affine. Is there any way around this, e.g. is it possible to specify numarray to use the same interpolation coefficients for the last "n" dimention of the array, or to tell numarray to only compute interpolation coefficients and apply those seperatly for each channel? thx for any help / info. alex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From verveer at embl-heidelberg.de Thu Apr 14 02:45:45 2005 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Thu Apr 14 02:45:45 2005 Subject: [Numpy-discussion] numarray, nd_image transforms, and multi-channel images In-Reply-To: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com> References: <4926A5BE4AFE7C4A83D5CF5CDA7B7754B1F9B0@oxcore01.mirada-solutions.com> Message-ID: <14ba52860a6e1f838975c3c04a0dafc9@embl-heidelberg.de> Hi Alex, It is correct that there is an amount of work duplicated, if you do an identical interpolation operation on several arrays. There is currently no way to avoid this. This can be fixed and I will have a look to see how easy that is to do. If it is not easy to factor out that part of the code, I will most likely not be able to spend the time to do it though... You could at least use the map_coordinates function that will allow you to use a pre-calculated coordinate mapping. There will still be duplication of work, but al least you avoid the duplication of the calculation of the coordinate transformation. Peter > Hi all. > ? > I've been looking at numarray to do some image processing. A lot of > the work I do deal with transforming images, either with affine > transformations, or vector field. Numarray seems somewhat well equiped > to address these issues, but I am concerned about one aspect. It seems > that the transformation code (affine_transforrm and > geometric_transform) computes input coordonates for every output > coordinate in the resulting array. If I have an RGB image for which > the transformation is the same for all 3 RGB channels, I would assume > that this will triple the workload unncessarily. It might have a > dramatic effect for the geometric transformation which will most often > be slower then affine. Is there any way around this, e.g. is it > possible to specify numarray to use the same interpolation > coefficients for the last "n" dimention of the array, or to tell > numarray to only compute interpolation coefficients and apply those > seperatly for each channel? > ? > thx for any help / info. > ? > alex. From jmiller at stsci.edu Thu Apr 14 07:47:02 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 14 07:47:02 2005 Subject: [Numpy-discussion] ANN: numarray-1.3.0 Message-ID: <1113489855.29880.14.camel@halloween.stsci.edu> Release Notes for numarray-1.3.0 Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, arrays of heterogeneous records, string arrays, and in-place operation on memory mapped files. I. ENHANCEMENTS 1. Migration of NumArray.__del__ to C (tp_dealloc). Overall performance. 2. Removal of dictionary update from array view creation improves performance of view/slice/subarray creation. This should e.g. improve the performance of wxPython sequence protocol access to Nx2 arrays. Subclasses now need to do a.flags |= numarray.generic._UPDATEDICT to ensure that dictionary based attributes are inherited by views. NumArrays no longer do this by default. 2. Modifications to support scipy.special. 3. Removal of an unnecessary getattr() from ufunc calling sequence. Ufunc performance. II. BUGS FIXED / CLOSED 1179355 average() broken in numarray 1.2.3 1167184 Floating point exception in numarray's dot() 1151892 Bug in matrixmultiply with zero size arrays 1160184 RecArray reversal 1156172 Incorect error message for shape incompatability 1155538 Incorrect error message when multiplying arrays See http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse for more details. III. CAUTIONS This release should be backward binary compatible with numarray 1.1.1 and 1.2.3. WHERE ----------- Numarray-1.3.0 windows executable installers, source code, and manual is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS ------------------------------ numarray-1.3.0 requires Python 2.2.2 or greater. Python-2.3.4 or Python-2.4.1 is recommended. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. We'd like to acknowledge the assitance of Francesc Alted, Paul Dubois, Sebastian Haase, Chuck Harris, Tim Hochberg, Nadav Horesh, Edward C. Jones, Eric Jones, Jochen Kuepper, Travis Oliphant, Pearu Peterson, Peter Verveer, Colin Williams, Rory Yorke, and everyone else who has contributed with comments and feedback. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From jdhunter at ace.bsd.uchicago.edu Thu Apr 14 14:14:13 2005 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Thu Apr 14 14:14:13 2005 Subject: [Numpy-discussion] ANN: matplotlib-0.80 Message-ID: A lot of development has gone into matplotlib since the last major release, which I'll summarize here. For details, see the notes for the incremental releases at http://matplotlib.sf.net/whats_new.html. Improvements since 0.70 -- contouring: Lots of new contour funcitonality with line and polygon contours provided by contour and contourf. Automatic inline contour labeling with clabel. See http://matplotlib.sourceforge.net/screenshots.html#pcolor_demo -- QT backend Sigve Tjoraand, Ted Drain and colleagues at the JPL collaborated on a QTAgg backend -- Unicode strings are rendered in the agg and postscript backends. Currently, all the symbols in the unicode string have to be in the active font file. In later releases we'll try and support symbols from multiple ttf files in one string. See examples/unicode_demo.py -- map and projections A new release of the basemap toolkit - See http://matplotlib.sourceforge.net/screenshots.html#plotmap -- Auto-legends The automatic placement of legends is now supported with loc='best'; see examples/legend_auto.py. We did this at the matplotlib sprint at pycon -- Thanks John Gill and Phil! Note that your legend will move if you interact with your data and you force data under the legend line. If this is not what you want, use a designated location code. -- Quiver (direction fields) Ludovic Aubry contributed a patch for the matlab compatible quiver method. This makes a direction field with arrows. See examples/quiver_demo.py -- Performance optimizations Substantial optimizations in line marker drawing in agg -- Robust log plots Lots of work making log plots "just work". You can toggle log y Axes with the 'l' command -- nonpositive data are simply ignored and no longer raise exceptions. log plots should be a lot faster and more robust -- Many more plotting functions, bugfixes, and features, detailed in the 0.71, 0.72, 0.73 and 0.74 point release notes at http://matplotlib.sourceforge.net/whats_new.html http://matplotlib.sourceforge.net JDH From simon at arrowtheory.com Thu Apr 14 23:07:03 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Apr 14 23:07:03 2005 Subject: [Numpy-discussion] numarray cholesky solver ? Message-ID: <20050415160425.42cb20a6.simon@arrowtheory.com> Hi, I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to convert to and from Numeric (scipy) array's without a memcopy ? thankyou, Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From arnd.baecker at web.de Thu Apr 14 23:58:08 2005 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 14 23:58:08 2005 Subject: [Numpy-discussion] % and fmod Message-ID: Dear all, I encountered the following puzzling behaviour of the modulo operator %: In [1]: import Numeric In [2]: print Numeric.__version__ 23.8 In [3]: x=Numeric.arange(10.0) In [4]: print x%4 [ 0. 1. 2. 3. 0. 1. 2. 3. 0. 1.] In [5]: print 3.0%4 3.0 In [6]: print (-x)%4 [-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.] # <====== In [7]: print (-3.0)%4 # vs. 1.0 # <====== (OK) In [8]: print Numeric.fmod(x,4) [ 0. 1. 2. 3. 0. 1. 2. 3. 0. 1.] In [9]: print Numeric.fmod(-x,4) [-0. -1. -2. -3. -0. -1. -2. -3. -0. -1.] So it seems that for arrays % behaves like fmod! This seems in contrast to what one finds in the python 2.3 documentation: "5.6. Binary arithmetic operations" """The % (modulo) operator yields the remainder from the division of the first argument by the second. [...] The arguments may be floating point numbers, e.g., 3.14%0.7 equals 0.34 (since 3.14 equals 4*0.7 + 0.34.) The modulo operator always yields a result with the same sign as its second operand (or zero); the absolute value of the result is strictly smaller than the absolute value of the second operand.""" I am presently teaching a course on computational physics with python and the students have huge difficulties with % behaving differently for arrays and scalars. I am aware that (according to Kernighan/Ritchie) the C standard does not define the result of % when any of the operands is negative. So can someone help me: is the different behaviour of % for scalars and arrays a bug, a feature, or what should I tell my students ? ;-). Many thanks, Arnd P.S.: BTW: the documentation for fmod and remainder is pretty short on this: In [3]:fmod? Type: ufunc String Form: Namespace: Interactive Docstring: fmod(x,y) is remainder(x,y) In [4]:remainder? Type: ufunc String Form: Namespace: Interactive Docstring: returns remainder of division elementwise Are contributions of more detailed doc-strings welcome ? P.P.S.: for numarray one gets even less information: In [1]: import numarray In [2]: numarray.fmod? Type: _BinaryUFunc Base Class: String Form: Namespace: Interactive Docstring: Class for ufuncs with 2 input and 1 output arguments In [3]: numarray.remainder? Type: _BinaryUFunc Base Class: String Form: Namespace: Interactive Docstring: Class for ufuncs with 2 input and 1 output arguments In [4]: print numarray.__version__ 1.1.1 P^3.S: scipy's mod seems to be an alternative: In [1]: import scipy In [2]: scipy.mod? Type: function Base Class: String Form: Namespace: Interactive File: /usr/lib/python2.3/site-packages/scipy_base/function_base.py Definition: scipy.mod(x, y) Docstring: x - y*floor(x/y) For numeric arrays, x % y has the same sign as x while mod(x,y) has the same sign as y. In [3]: x=-scipy.arange(10) In [4]: x%4 Out[4]: array([ 0, -1, -2, -3, 0, -1, -2, -3, 0, -1]) In [5]: scipy.mod(x,4) Out[5]: array([ 0., 3., 2., 1., 0., 3., 2., 1., 0., 3.]) In [6]: scipy.mod?? Type: function Base Class: String Form: Namespace: Interactive File: /usr/lib/python2.3/site-packages/scipy_base/function_base.py Definition: scipy.mod(x, y) Source: def mod(x,y): """ x - y*floor(x/y) For numeric arrays, x % y has the same sign as x while mod(x,y) has the same sign as y. """ return x - y*Numeric.floor(x*1.0/y) From jmiller at stsci.edu Fri Apr 15 03:46:37 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 15 03:46:37 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com> References: <20050415160425.42cb20a6.simon@arrowtheory.com> Message-ID: <1113561843.5030.9.camel@jaytmiller.comcast.net> On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote: > Hi, > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > Is this in the pipeline, No. Most of the add-on subpackages in numarray, with the exception of convolve, image, and nd_image, are ports from Numeric. > or do we go ahead and add the dpotrs based functionality ourselves ? > > Alternatively, are we able to > convert to and from Numeric (scipy) array's without a memcopy ? Unless Numeric has been adapted to support the new array interface, I think this (converting from numarray to Numeric) has still not been properly addressed. Regards, Todd From luszczek at cs.utk.edu Fri Apr 15 07:11:20 2005 From: luszczek at cs.utk.edu (Piotr Luszczek) Date: Fri Apr 15 07:11:20 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <20050415160425.42cb20a6.simon@arrowtheory.com> References: <20050415160425.42cb20a6.simon@arrowtheory.com> Message-ID: <425FCAFC.3010603@cs.utk.edu> Hi all, the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I apologize if every body knows that). I'm on the LAPACK team right now and we were wondering if we should provide bindings for Python. It is almost trivial to do with Pyrex. But Numeric and numarray already have some functionality in it. Also, I don't know about popularity of PyLapack. So my question is if there is a need for the specialized LAPACK routines. And if so, which API it should use (Numeric, numarray, Numeric3, scipy_core, standard array, minimum standard array implementation or array protocol meta info). Any comments are appreciated, Piotr Luszczek Simon Burton wrote: > Hi, > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > Is this in the pipeline, or do we go ahead and add the dpotrs based functionality ourselves ? Alternatively, are we able to > convert to and from Numeric (scipy) array's without a memcopy ? > > thankyou, > > Simon. From perry at stsci.edu Fri Apr 15 07:21:23 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Apr 15 07:21:23 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <425FCAFC.3010603@cs.utk.edu> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: On Apr 15, 2005, at 10:09 AM, Piotr Luszczek wrote: > Hi all, > > the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I > apologize if every body knows that). > > I'm on the LAPACK team right now and we were wondering if we should > provide bindings for Python. It is almost trivial to do with Pyrex. > But Numeric and numarray already have some functionality in it. > Also, I don't know about popularity of PyLapack. > > So my question is if there is a need for the specialized LAPACK > routines. And if so, which API it should use (Numeric, numarray, > Numeric3, scipy_core, standard array, minimum standard array > implementation > or array protocol meta info). > > Any comments are appreciated, > > Piotr Luszczek > If you don't need anything unusual, using the Numeric C-API should be safe. There is the intent to preserve backward compatibility for that in numarray and Numeric3 for the most part (numarray's ufunc api is different however, but it isn't clear you need to use that). Numeric3 and numarray will/do have other capabilities not part of the Numeric api, but again, I suspect that for a first version, one can probably avoid needing those. I'd also like to hear what Travis thinks about this. Perry Greenfield From pjssilva at ime.usp.br Fri Apr 15 08:00:44 2005 From: pjssilva at ime.usp.br (Paulo J. S. Silva) Date: Fri Apr 15 08:00:44 2005 Subject: [Numpy-discussion] Pycoin - Python interface to COIN/CLP Linear Programming solver Message-ID: <1113577115.9013.9.camel@localhost.localdomain> Hello, I am finally releasing the code I have to interface COIN/CLP linear programming solver with Python/Numarray. You can download the code at: http://www.ime.usp.br/~pjssilva/pycoin/index.html In the page you can see sample client code. The interface is very simple, consisting mostly of swing interfaces files, but it is very useful to me. It also can be used as an example on how to interface C++ and Python/Numarray using swig. I plan to make this interface grow to something much better, with an interface to full Clp, another to OsiClp (only this one is available right now) and maybe other COIN optimization libraries like IPOPT. Please, download, use, test, comment. Best, Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: pjssilva at ime.usp.br Web: http://www.ime.usp.br/~pjssilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica. (understand well enough to call) (practice) From cookedm at physics.mcmaster.ca Fri Apr 15 10:48:55 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 15 10:48:55 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <425FCAFC.3010603@cs.utk.edu> (Piotr Luszczek's message of "Fri, 15 Apr 2005 10:09:00 -0400") References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: Piotr Luszczek writes: > Hi all, > > the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I > apologize if every body knows that). > > I'm on the LAPACK team right now and we were wondering if we should > provide bindings for Python. It is almost trivial to do with Pyrex. > But Numeric and numarray already have some functionality in it. > Also, I don't know about popularity of PyLapack. > > So my question is if there is a need for the specialized LAPACK > routines. And if so, which API it should use (Numeric, numarray, > Numeric3, scipy_core, standard array, minimum standard array implementation > or array protocol meta info). You'll probably first want to look at scipy, which already wraps (all? most?) of LAPACK in its scipy.linalg package (including dpotrs :-) It uses f2py to make the process much easier. Since you mention you're on the LAPACK team ... I've been working on redoing the f2c'd LAPACK wrappers in Numeric, updating them to the current version...except: what *is* the current version? The patches on netlib are 2-3 years old, and you have to grab them separately, file-by-file (can I say how insanely stupid that is?). Also ... they break: with some test cases (derived from ones posted to our bug tracker) some routines segfault. Is it the LAPACK 3e? If that's the case, we can't use it unless there are C versions (Numeric only requires Python and a C compiler; throwing a F90 compiler in there is *not* an option -- we don't even require a F77 compiler). I ended up using the source from Debian unstable from the lapack3 package, and those work fine. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From haase at msg.ucsf.edu Fri Apr 15 12:38:51 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Apr 15 12:38:51 2005 Subject: [Numpy-discussion] Why does nd_image require writable input array ? Message-ID: <200504151235.48573.haase@msg.ucsf.edu> Hi, I'm using memmap to read my MRC-imagedata files. I just thought this might be a case of general interest - see below: >>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest", cval=0.0, origin=0, output_type=None) Traceback (most recent call last): File "", line 1, in ? File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in boxcar_filter cval = cval, output_type = output_type) File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in boxcar_filter1d cval, origin, _ni_support._type_to_num[output_type]) TypeError: NA_IoArray: I/O numarray must be writable NumArrays. >>> na.__version__ '1.2.3' >>> Thanks, Sebastian Haase From verveer at embl.de Fri Apr 15 12:55:33 2005 From: verveer at embl.de (Peter Verveer) Date: Fri Apr 15 12:55:33 2005 Subject: [Numpy-discussion] Why does nd_image require writable input array ? In-Reply-To: <200504151235.48573.haase@msg.ucsf.edu> References: <200504151235.48573.haase@msg.ucsf.edu> Message-ID: <9396f2dea14c14fb7a6bd04f6077c448@embl.de> You may have run in an older bug which I fixed. Please try upgrading to the new numarray 1.3 and see if the problem disappears. If not let me know. Note: the function you are using (boxcar_filter) has been renamed in 1.3 to uniform_filter (to be more in line with common image processing terminology.) Cheers, Peter On Apr 15, 2005, at 9:35 PM, Sebastian Haase wrote: > Hi, > I'm using memmap to read my MRC-imagedata files. > I just thought this might be a case of general interest - see below: > >>>> s = U.nd.boxcar_filter(Y.vd(1), size=3, output=None, mode="nearest", > cval=0.0, origin=0, output_type=None) > Traceback (most recent call last): > File "", line 1, in ? > File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 314, in > boxcar_filter > cval = cval, output_type = output_type) > File "/jws30/haase/PrLin0/numarray/nd_image/filters.py", line 261, in > boxcar_filter1d > cval, origin, _ni_support._type_to_num[output_type]) > TypeError: NA_IoArray: I/O numarray must be writable NumArrays. >>>> na.__version__ > '1.2.3' >>>> > > > Thanks, > Sebastian Haase From luszczek at cs.utk.edu Fri Apr 15 20:41:05 2005 From: luszczek at cs.utk.edu (Piotr Luszczek) Date: Fri Apr 15 20:41:05 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: <426088F5.90602@cs.utk.edu> David M. Cooke wrote: > Piotr Luszczek writes: > > >>Hi all, >> >>the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I >>apologize if every body knows that). >> >>I'm on the LAPACK team right now and we were wondering if we should >>provide bindings for Python. It is almost trivial to do with Pyrex. >>But Numeric and numarray already have some functionality in it. >>Also, I don't know about popularity of PyLapack. >> >>So my question is if there is a need for the specialized LAPACK >>routines. And if so, which API it should use (Numeric, numarray, >>Numeric3, scipy_core, standard array, minimum standard array implementation >>or array protocol meta info). > > > You'll probably first want to look at scipy, which already wraps (all? > most?) of LAPACK in its scipy.linalg package (including dpotrs :-) It seems to have almost all routines. > It uses f2py to make the process much easier. > > > Since you mention you're on the LAPACK team ... > > I've been working on redoing the f2c'd LAPACK wrappers in Numeric, > updating them to the current version...except: what *is* the current Current version is 3.0. > version? The patches on netlib are 2-3 years old, and you have to grab After funding ran out there were only volunteers left. It's hard to get free open-source developers these days. > them separately, file-by-file (can I say how insanely stupid that Frankly, I had the same comment when I first saw it. Hopefully, next update will straighten things out. > is?). Also ... they break: with some test cases (derived from ones > posted to our bug tracker) some routines segfault. Yes I know. We have postings about it on the mailing list almost weekly. > Is it the LAPACK 3e? If that's the case, we can't use it unless there LAPACK 3E is only somewhat related to LAPACK. But it's not "current version". > are C versions (Numeric only requires Python and a C compiler; > throwing a F90 compiler in there is *not* an option -- we don't even > require a F77 compiler). We've been thinking about languages for a while. CLAPACK user base is too strong to ignore. So we think of keeping F77 as the base language. Or maybe we should do f90toC. f2c and f2j are on Netlib already and f2py has some F90 support. > I ended up using the source from Debian unstable from the lapack3 > package, and those work fine. Again, it's hard to get grant money for support. Thanks for the comments. Piotr From pearu at cens.ioc.ee Fri Apr 15 23:09:01 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Fri Apr 15 23:09:01 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <426088F5.90602@cs.utk.edu> Message-ID: On Fri, 15 Apr 2005, Piotr Luszczek wrote: > > You'll probably first want to look at scipy, which already wraps (all? > > most?) of LAPACK in its scipy.linalg package (including dpotrs :-) > > It seems to have almost all routines. You should look at scipy.lib.lapack package that has more wrappers than in scipy.linalg and it will be used in scipy.linalg in future. scipy.lib.lapack certainly does not wrap all of LAPACK but adding new wrappers is easy and is done on demand basis. What's wrapped and what's not in scipy.lib.lapack is well documented in the headers of .pyf.src files. My current plan is to add CLAPACK sources to scipy.lib.lapack so that it could be included to Numeric3 project because it has a requirement that everything should compile having only C compiler available. > We've been thinking about languages for a while. CLAPACK user base > is too strong to ignore. So we think of keeping F77 as the base language. > Or maybe we should do f90toC. f2c and f2j are on Netlib already and > f2py has some F90 support. f2py will have limited support for F90 derived types as soon as I get a chance to review Jeffrey Hagelberg patches on this. However, keeping F77 as the base language is a good idea, imho, free F90 compilers are still rare these days. Pearu From florian.proff.schulze at gmx.net Sat Apr 16 03:25:37 2005 From: florian.proff.schulze at gmx.net (Florian Schulze) Date: Sat Apr 16 03:25:37 2005 Subject: [Numpy-discussion] bytes object info Message-ID: Hi! I just discovered this: http://members.dsl-only.net/~daniels/Block.html I didn't try it out, but maybe it's helpful to you. Regards, Florian Schulze From cjw at sympatico.ca Sat Apr 16 11:29:01 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Apr 16 11:29:01 2005 Subject: [Numpy-discussion] bytes object info In-Reply-To: References: Message-ID: <426158FD.8060507@sympatico.ca> Florian Schulze wrote: > Hi! > > I just discovered this: > http://members.dsl-only.net/~daniels/Block.html Ugh! Letter codes to identify data types - I thought that we had moved beyond that. ;-) Colin W. > > I didn't try it out, but maybe it's helpful to you. > > Regards, > Florian Schulze > > From oliphant at ee.byu.edu Sat Apr 16 21:16:07 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Sat Apr 16 21:16:07 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <425FCAFC.3010603@cs.utk.edu> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <425FCAFC.3010603@cs.utk.edu> Message-ID: <4261E2A5.1060109@ee.byu.edu> Piotr Luszczek wrote: > Hi all, > > the Cholesky routine that's been mentioned (dpotrs) is from LAPACK (I > apologize if every body knows that). > > I'm on the LAPACK team right now and we were wondering if we should > provide bindings for Python. It is almost trivial to do with Pyrex. > But Numeric and numarray already have some functionality in it. > Also, I don't know about popularity of PyLapack. Scipy already has extensive bindings for LAPACK. There is even a lot of development that has been done for c-compiled bindings. Right now, scipy_core is being developed to be a single replacement for Numeric/numarray. Lapack bindings are a huge part of that effort. But, as I said, the work has been done (using f2py). The biggest issue is supporting f2c'd versions of Lapack so that folks without Fortran compilers can still install it. scipy_core will allow this. Again, most of the effort is accomplished through f2py and scipy_distutils which are really good tools. Pyrex is nice, but f2py is really, really nice (it even supports wrapping basic c-code). > > So my question is if there is a need for the specialized LAPACK > routines. And if so, which API it should use (Numeric, numarray, > Numeric3, scipy_core, standard array, minimum standard array > implementation > or array protocol meta info). I think if LAPACK were going to go through the trouble, it would be best for LAPACK to provide "array protocol" style wrappers. That way any Python array user could take advantage of them. While current scipy users and future scipy_core users do not need LAPACK-provided Python wrappers, we would welcome any native support by the LAPACK team. Again, though, I think this should be done through the array_protocol API. A C-API is likely in the near future as well (which will provide a little speed up for many small arrays). -Travis -Travis From simon at arrowtheory.com Sun Apr 17 20:44:16 2005 From: simon at arrowtheory.com (Simon Burton) Date: Sun Apr 17 20:44:16 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <1113561843.5030.9.camel@jaytmiller.comcast.net> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <1113561843.5030.9.camel@jaytmiller.comcast.net> Message-ID: <20050418134337.1b3f8ae8.simon@arrowtheory.com> On Fri, 15 Apr 2005 06:44:02 -0400 Todd Miller wrote: > On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote: > > Hi, > > > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > > Is this in the pipeline, > > No. Most of the add-on subpackages in numarray, with the exception of > convolve, image, and nd_image, are ports from Numeric. > Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this that would be much appreciated. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From arnd.baecker at web.de Mon Apr 18 00:30:10 2005 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 18 00:30:10 2005 Subject: [Numpy-discussion] scipy.base - % and fmod segfault Message-ID: Hi (in particular Travis), concerning my recent question on % on fmod for Numeric and numarray I was curious to see how scipy.base behaves. With a CVS check-out this morning I get: In [1]: from scipy.base import * In [2]: x=arange(10) In [3]: print x%4 array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1], 'l') In [4]: print (-x)%4 zsh: 12391 segmentation fault ipython (The same holds for fmod, and also for x=arange(10.0) ). Personally I would prefer if in the end % behaves the same way for arrays as for scalars. Do you think that this is possible with scipy.base? Best, Arnd P.S.: I haven't tested much more of scipy.base this time (but the few things concerning array operations I looked at, seem fine. Ah there is one: Doing import scipy.base scipy.base.fmod? in ipython gives a segmentation fault (the same with .sin, .exp etc. ...) ) From jmiller at stsci.edu Mon Apr 18 06:38:21 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Apr 18 06:38:21 2005 Subject: [Numpy-discussion] numarray cholesky solver ? In-Reply-To: <20050418134337.1b3f8ae8.simon@arrowtheory.com> References: <20050415160425.42cb20a6.simon@arrowtheory.com> <1113561843.5030.9.camel@jaytmiller.comcast.net> <20050418134337.1b3f8ae8.simon@arrowtheory.com> Message-ID: <1113831328.29165.30.camel@halloween.stsci.edu> On Sun, 2005-04-17 at 23:43, Simon Burton wrote: > On Fri, 15 Apr 2005 06:44:02 -0400 > Todd Miller wrote: > > > On Fri, 2005-04-15 at 16:04 +1000, Simon Burton wrote: > > > Hi, > > > > > > I see there is a cholesky_decomposition routine in numarray, but we are also needing the corresponding cholesky solver. > > > Is this in the pipeline, > > > > No. Most of the add-on subpackages in numarray, with the exception of > > convolve, image, and nd_image, are ports from Numeric. > > > > Ok, thanks Todd; we will have a go at porting this solver then. If you have any more advice on how to get started with this > that would be much appreciated. If you're doing a port of something that already works for Numeric chances are good that numarray's Numeric compatibility API will make things "just work." In any case, be sure to use the compatibility API since it's the easiest path forward to Numeric3 should that effort prove successful (which I think it will). Usually what's involved in porting from Numeric to numarray is just making sure that the numarray files can be used rather than the Numeric header files. I think the style we used for matplotlib, while not fully general, is the simplest and best compromise: #ifdef NUMARRAY #include "numarray/arrayobject.h" #else #include "Numeric/arrayobject.h" #endif In setup.py, you have to pass extra_compile_args=["-DNUMARRAY=1"] or similar to the Extension() constructions to build for numarray. There are more details we could discuss if you want to build for both Numeric and numarray simultaneously. Two limitations of the numarray Numeric compatible C-API are: (1) a partially compatible array descriptor structure (PyArray_Descr) and (2) the UFunc C-API. Generally, neither of those is an issue, but for large projects (e.g. scipy) they matter. Good luck porting. Feel free to ask questions either on the list or privately if you run into trouble. Regards, Todd From haase at msg.ucsf.edu Mon Apr 18 09:16:15 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Mon Apr 18 09:16:15 2005 Subject: [Numpy-discussion] bytes object info In-Reply-To: References: Message-ID: <200504180914.33383.haase@msg.ucsf.edu> Hey, this _really_ is no SPAM ... ;-) (Maybe different wording next time) Thanks, Sebastian Haase On Saturday 16 April 2005 03:22, Florian Schulze wrote: > Hi! > > I just discovered this: > http://members.dsl-only.net/~daniels/Block.html > > I didn't try it out, but maybe it's helpful to you. > > Regards, > Florian Schulze > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From oliphant at ee.byu.edu Mon Apr 18 17:09:49 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 18 17:09:49 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: <42644B7C.9030907@ee.byu.edu> I am going to release Numeric 24.0 today or tomorrow unless I hear from anybody about some changes that need to get made. -Travis From faltet at carabos.com Tue Apr 19 03:05:27 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 19 03:05:27 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: <42644B7C.9030907@ee.byu.edu> References: <42644B7C.9030907@ee.byu.edu> Message-ID: <200504191202.52097.faltet@carabos.com> Hi, I was curious about the newly introduced array protocol in Numeric 24.0 (as well as in current numarray CVS), and wanted to check if there is better speed during Numeric <-> numarray objects conversion. There answer is "partially" affirmative: >>> import numarray >>> import Numeric >>> print numarray.__version__ 1.4.0 >>> print Numeric.__version__ 24.0 >>> from time import time >>> a = numarray.arange(100*1000) >>> t1=time();b=Numeric.array(a);time()-t1 # numarray --> Numeric 0.0021419525146484375 # It was 1.58109998703 with Numeric 23.8 ! So, numarray --> Numeric speed has been improved quite a lot On the other way round, Numeric to numarray is not as efficient: >>> Na = Numeric.arange(100*1000) >>> t1=time();c=numarray.array(Na);time()-t1 # Numeric --> numarray 0.15217900276184082 # It is much slower than numarray --> Numeric I guess that the numarray --> Numeric can be speed-up because: >>> t1=time();Nb=numarray.array(buffer(Na),typecode=Na.typecode(),shape=Na.shape);time()-t1 0.00017499923706054688 # Numeric --> numarray using the buffer protocol So, I guess CVS numarray is still refining the array protocol. But the thing that mostly shocks me is why the array protocol is still allowing doing conversions with memory copies because, as you can see in the last example that uses a buffer protocol, a non-copy memory conversion is indeed possible for numarray --> Numeric. So the question is: Would the array protocol bring numarray <-> Numeric <-> Numeric3 conversions without memory copies or this is more a wish on my half than an actual possibility? Thanks and keep the nice work! -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From eric at enthought.com Tue Apr 19 22:48:17 2005 From: eric at enthought.com (eric jones) Date: Tue Apr 19 22:48:17 2005 Subject: [Numpy-discussion] job openings at Enthought Message-ID: <4265ECEF.6050004@enthought.com> Hey group, We have a number of scientific/python related jobs open. If you have any interest, please see: http://www.enthought.com/careers.htm thanks, eric From cjw at sympatico.ca Wed Apr 20 00:45:21 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Wed Apr 20 00:45:21 2005 Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler Message-ID: <42660855.4090600@sympatico.ca> I have tried: python setup.py install build_ext --compiler=bcpp It seems that the distutils call uses scipy.distutils, rather than the standard, and that the scipy version is based on an older version of distutils. Is there some way to work around this? Colin W. From pearu at cens.ioc.ee Wed Apr 20 12:00:34 2005 From: pearu at cens.ioc.ee (pearu at cens.ioc.ee) Date: Wed Apr 20 12:00:34 2005 Subject: [Numpy-discussion] Installing Numeric3 using the Borland Compiler In-Reply-To: <42660855.4090600@sympatico.ca> Message-ID: On Wed, 20 Apr 2005, Colin J. Williams wrote: > I have tried: > > python setup.py install build_ext --compiler=bcpp > > It seems that the distutils call uses scipy.distutils, rather than the > standard, and that the scipy version is based on an older version of > distutils. > > Is there some way to work around this? So, what problems exactly to you experience with the above command? Using scipy.distutils should not be much different compared to std distutils when building std extension modules. Pearu From oliphant at ee.byu.edu Wed Apr 20 12:05:30 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 20 12:05:30 2005 Subject: [Numpy-discussion] Numeric 24.0 Message-ID: <4266A7AD.5090600@ee.byu.edu> I've released Numeric 24.0 as a beta (2nd version) release. Right now it's just a tar file. Please find any bugs. I'll wait a week or two and release a final version unless I hear reports of problems. Thanks to those who have found bugs already. David Cooke has been especially active in helping fix problems. Many kudos to him. -Travis From jmiller at stsci.edu Thu Apr 21 08:12:30 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 21 08:12:30 2005 Subject: [Numpy-discussion] ANN: numarray-1.3.1 Message-ID: <1114096238.4446.18.camel@jaytmiller.comcast.net> Release Notes for numarray-1.3.1 Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, arrays of heterogeneous records, string arrays, and in-place operation on memory mapped files. I. ENHANCEMENTS None. 1.3.1 fixes the problem with gcc-3.4.3 II. BUGS FIXED / CLOSED 1152323 /usr/include/fenv.h:96: error: conflicting types for 'fegete 1185024 numarray-1.2.3 fails to compile with gcc-3.4.3 1187162 Numarray 1.3.0 installation failure See http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse for more details. From oliphant at ee.byu.edu Fri Apr 22 03:51:14 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 22 03:51:14 2005 Subject: [Numpy-discussion] Numeric 24.0 In-Reply-To: References: <4266A7AD.5090600@ee.byu.edu> Message-ID: <4268D6BD.9000100@ee.byu.edu> Alexander Schmolck wrote: >Travis Oliphant writes: > > > >>I've released Numeric 24.0 as a beta (2nd version) release. Right now it's >>just a tar file. >> >>Please find any bugs. I'll wait a week or two and release a final version >>unless I hear reports of problems. >> >> > > >I suspect some other problems I haven't tried to track down yet are due to >this: > > >>> a = num.array([[1],[2],[3]]) > >>> ~(a==a) > array([[-2], > [-2], > [-2]]) > > What is wrong with this? ~ is bit-wise not and gives the correct answer, here. > >Object array comparisons still produce haphazard behaviour: > > >>> a = num.array(["ab", "cd", "efg"], 'O') > >>> a == 'ab' > 0 > > You are mixing Object arrays and character arrays here and expecting too much. String arrays in Numeric and their relationship with object arrays have never been too useful. You need to be explicit about how 'ab' is going to be interpreted and do a == array('ab','O') to get what you were probably expecting. >Finally -- not necessarily a bug, but a change of behaviour that seems undocumented (I'm >pretty sure this used to give a float array as return value): > > >>> num.zeros((2.0,)) > *** TypeError: an integer is required > > > >'as > > I don't think this worked as you think it did (I looked at Numeric 21.3). num.zeros(2.0) works but it shouldn't. This is a bug that I'll fix. Shapes should be integers, not floats. If this was not checked before than that was a bug. It looks like it's always been checked differently for single-element tuples and scalars So, in short, I see only one small bug here. Thanks for testing things out. -Travis From stephen.walton at csun.edu Mon Apr 25 11:50:28 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Apr 25 11:50:28 2005 Subject: [Numpy-discussion] Value selections? Message-ID: <426D3BA8.6020500@csun.edu> I'm trying out Numeric 24b2. In numarray, the following code will plot the values of an array which are not equal to 'flag': f = array!=flag plot(array[f]) What is the equivalent in Numeric 24b2? From rkern at ucsd.edu Mon Apr 25 11:59:03 2005 From: rkern at ucsd.edu (Robert Kern) Date: Mon Apr 25 11:59:03 2005 Subject: [Numpy-discussion] Value selections? In-Reply-To: <426D3BA8.6020500@csun.edu> References: <426D3BA8.6020500@csun.edu> Message-ID: <426D3D4C.5070302@ucsd.edu> Stephen Walton wrote: > I'm trying out Numeric 24b2. In numarray, the following code will plot > the values of an array which are not equal to 'flag': > > f = array!=flag > plot(array[f]) > > What is the equivalent in Numeric 24b2? compress(f, array) is the lowest common denominator. I'm not sure if Numeric 24 gets fancier like numarray. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com Tue Apr 26 03:10:12 2005 From: confirm-s2-anNSKqzsyA7slXUGUdYHvlkpsPI-numpy-discussion=lists.sourceforge.net at yahoogroups.com (Yahoo! Groups) Date: Tue Apr 26 03:10:12 2005 Subject: [Numpy-discussion] Please confirm your request to join IErussian Message-ID: <1114509872.69.19665.m18@yahoogroups.com> Hello numpy-discussion at lists.sourceforge.net, We have received your request to join the IErussian group hosted by Yahoo! Groups, a free, easy-to-use community service. This request will expire in 7 days. TO BECOME A MEMBER OF THE GROUP: 1) Go to the Yahoo! Groups site by clicking on this link: http://groups.yahoo.com/i?i=anNSKqzsyA7slXUGUdYHvlkpsPI&e=numpy-discussion%40lists%2Esourceforge%2Enet (If clicking doesn't work, "Cut" and "Paste" the line above into your Web browser's address bar.) -OR- 2) REPLY to this email by clicking "Reply" and then "Send" in your email program If you did not request, or do not want, a membership in the IErussian group, please accept our apologies and ignore this message. Regards, Yahoo! Groups Customer Care Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ From jswhit at fastmail.fm Tue Apr 26 07:58:36 2005 From: jswhit at fastmail.fm (Jeff Whitaker) Date: Tue Apr 26 07:58:36 2005 Subject: [Numpy-discussion] numarray problems on AIX Message-ID: <426E5637.1080305@fastmail.fm> Hi: I'm having problems with numarray 1.3.1/Python 2.4.1 on AIX 5.2: Python 2.4.1 (#3, Apr 26 2005, 10:34:56) [C] on aix5 Type "help", "copyright", "credits" or "license" for more information. >>> import numarray Traceback (most recent call last): File "", line 1, in ? File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/__init__.py", line 42, in ? from numarrayall import * File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarrayall.py", line 2, in ? from generic import * File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/generic.py", line 1116, in ? import numarraycore as _nc File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/numarraycore.py", line 1751, in ? import ufunc File "/u/wx20wj/home/blue/lib/python2.4/site-packages/numarray/ufunc.py", line 13, in ? import _converter ImportError: dynamic module does not define init function (init_converter) it works with AIX 4 - anyone seen this before? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jeffrey.S.Whitaker at noaa.gov 325 Broadway Office : Skaggs Research Cntr 1D-124 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg From faltet at carabos.com Tue Apr 26 10:45:02 2005 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 26 10:45:02 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms Message-ID: <200504261942.46011.faltet@carabos.com> Hi, I'm having problems converting numarray objects into Numeric in 64-bit platforms, and I think this is numarray fault, but I'm not completely sure. The problem can be easily visualized in an example (I'm using numarray 1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux): >>> Num=Numeric.array((3,),typecode='l') >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) array([3],'i') # The conversion has finished correctly In 64-bit platforms (AMD64, Linux): >>> Num=Numeric.array((3,),typecode='l') >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) Traceback (most recent call last): File "", line 1, in ? TypeError: typecode argument must be a valid type. The problem is that, for 32-bit platforms, na.typecode() == 'i' as it should be, but for 64-bit platforms na.typecode() == 'N' that is not a valid type in Numeric. I guess that na.typecode() should be mapped to 'l' in 64-bit platforms so that Numeric can recognize the Int64 correctly. Any suggestion? -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From jmiller at stsci.edu Tue Apr 26 13:57:14 2005 From: jmiller at stsci.edu (Todd Miller) Date: Tue Apr 26 13:57:14 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms In-Reply-To: <200504261942.46011.faltet@carabos.com> References: <200504261942.46011.faltet@carabos.com> Message-ID: <1114548937.24120.97.camel@halloween.stsci.edu> On Tue, 2005-04-26 at 13:42, Francesc Altet wrote: > Hi, > > I'm having problems converting numarray objects into Numeric in 64-bit > platforms, and I think this is numarray fault, but I'm not completely > sure. > > The problem can be easily visualized in an example (I'm using numarray > 1.3.1 and Numeric 24.0b2). In a 32-bit platform (Intel32, Linux): > > >>> Num=Numeric.array((3,),typecode='l') > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > array([3],'i') # The conversion has finished correctly > > In 64-bit platforms (AMD64, Linux): > > >>> Num=Numeric.array((3,),typecode='l') > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: typecode argument must be a valid type. > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it > should be, but for 64-bit platforms na.typecode() == 'N' that is not a > valid type in Numeric. I guess that na.typecode() should be mapped to > 'l' in 64-bit platforms so that Numeric can recognize the Int64 > correctly. > > Any suggestion? I agree that since the typecode() method exists for backward compatibility, returning 'N' rather than 'l' on an LP64 platform can be considered a bug. However, there are two problems I see: 1. Returning 'l' doesn't handle the case of converting a numarray Int64 array on a 32-bit platform. AFIK, there is no typecode that will work for that case. So, we're only getting a partial solution. 2. numarray uses typecodes internally to encode type signatures. There, platform-independent typecodes are useful and making this change will add confusion. I think we may be butting up against the absolute/relative type definition problem. Comments? Todd From faltet at carabos.com Wed Apr 27 05:40:35 2005 From: faltet at carabos.com (Francesc Altet) Date: Wed Apr 27 05:40:35 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms In-Reply-To: <1114548937.24120.97.camel@halloween.stsci.edu> References: <200504261942.46011.faltet@carabos.com> <1114548937.24120.97.camel@halloween.stsci.edu> Message-ID: <200504271432.46852.faltet@carabos.com> A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure: > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it > > should be, but for 64-bit platforms na.typecode() == 'N' that is not a > > valid type in Numeric. I guess that na.typecode() should be mapped to > > 'l' in 64-bit platforms so that Numeric can recognize the Int64 > > correctly. > > I agree that since the typecode() method exists for backward > compatibility, returning 'N' rather than 'l' on an LP64 platform can be > considered a bug. However, there are two problems I see: > > 1. Returning 'l' doesn't handle the case of converting a numarray Int64 > array on a 32-bit platform. AFIK, there is no typecode that will work > for that case. So, we're only getting a partial solution. One can always do a separate case for 64-bit platforms. This solution is already used in Lib/numerictypes.py > 2. numarray uses typecodes internally to encode type signatures. There, > platform-independent typecodes are useful and making this change will > add confusion. Well, this is the root of the problem for 'l' (long int) types, that their meaning depends on the platform. Anyway, I've tried with the next patch, and everything seems to work well (i.e. it's done what it is itended): -------------------------------------------------------------- --- Lib/numerictypes.py Wed Apr 27 07:13:08 2005 +++ Lib/numerictypes.py.modif Wed Apr 27 07:21:48 2005 @@ -389,7 +389,11 @@ # at code generation / installation time. from codegenerator.ufunccode import typecode for tname, tcode in typecode.items(): - typecode[ eval(tname)] = tcode + if tname == "Int64" and numinclude.LP64: + typecode[ eval(tname)] = 'l' + else: + typecode[ eval(tname)] = tcode + if numinclude.hasUInt64: _MaximumType = { --------------------------------------------------------------- With that, we have on 64-bit platforms: >>> import Numeric >>> Num=Numeric.array((3,),typecode='l') >>> import numarray >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) array([3]) >>> Numeric.array(na,typecode=na.typecode()).typecode() 'l' and on 32-bit: >>> Num=Numeric.array((3,),typecode='l') >>> na=numarray.array(Num,typecode=Num.typecode()) >>> Numeric.array(na,typecode=na.typecode()) array([3],'i') >>> Numeric.array(na,typecode=na.typecode()).typecode() 'i' Which should be the correct behaviour. > I think we may be butting up against the absolute/relative type > definition problem. Comments? That may add some confusion, but if we want to be consistent with the 'l' (long int) meaning for different platforms, I think the suggested patch (or other more elegant) is the way to go, IMHO. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From jmiller at stsci.edu Wed Apr 27 08:36:09 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Apr 27 08:36:09 2005 Subject: [Numpy-discussion] numarray, Numeric and 64-bit platforms In-Reply-To: <200504271432.46852.faltet@carabos.com> References: <200504261942.46011.faltet@carabos.com> <1114548937.24120.97.camel@halloween.stsci.edu> <200504271432.46852.faltet@carabos.com> Message-ID: <1114615773.28309.95.camel@halloween.stsci.edu> On Wed, 2005-04-27 at 08:32, Francesc Altet wrote: > A Dimarts 26 Abril 2005 22:55, Todd Miller va escriure: > > > The problem is that, for 32-bit platforms, na.typecode() == 'i' as it > > > should be, but for 64-bit platforms na.typecode() == 'N' that is not a > > > valid type in Numeric. I guess that na.typecode() should be mapped to > > > 'l' in 64-bit platforms so that Numeric can recognize the Int64 > > > correctly. > > > > I agree that since the typecode() method exists for backward > > compatibility, returning 'N' rather than 'l' on an LP64 platform can be > > considered a bug. However, there are two problems I see: > > > > 1. Returning 'l' doesn't handle the case of converting a numarray Int64 > > array on a 32-bit platform. AFIK, there is no typecode that will work > > for that case. So, we're only getting a partial solution. > > One can always do a separate case for 64-bit platforms. This solution > is already used in Lib/numerictypes.py True. I'm just pointing out that doing this is still "half broken". On the other hand, it is also "half fixed". > if numinclude.hasUInt64: > _MaximumType = { > --------------------------------------------------------------- > > With that, we have on 64-bit platforms: > > >>> import Numeric > >>> Num=Numeric.array((3,),typecode='l') > >>> import numarray > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > array([3]) > >>> Numeric.array(na,typecode=na.typecode()).typecode() > 'l' > > and on 32-bit: > > >>> Num=Numeric.array((3,),typecode='l') > >>> na=numarray.array(Num,typecode=Num.typecode()) > >>> Numeric.array(na,typecode=na.typecode()) > array([3],'i') > >>> Numeric.array(na,typecode=na.typecode()).typecode() > 'i' > > Which should be the correct behaviour. My point was that if you have a numarray Int64 array, there's nothing in 32-bit Numeric to convert it to. Round tripping from Numeric-to-numarray works, but not from numarray-to-Numeric. In this case, I think "half-fixed" still has some merit, I just wanted it to be clear what we're not doing. > > I think we may be butting up against the absolute/relative type > > definition problem. Comments? > > That may add some confusion, but if we want to be consistent with the > 'l' (long int) meaning for different platforms, I think the suggested > patch (or other more elegant) is the way to go, IMHO. I logged this on Source Forge and will get something in for numarray-1.4 so that the typecode() method gives a workable answer on LP64. Intersted parties should stick to using the typecode() method rather than any of numarray's typecode related mappings. Cheers, Todd From simon at arrowtheory.com Thu Apr 28 17:38:08 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Apr 28 17:38:08 2005 Subject: [Numpy-discussion] numarray dotblas problem on OSX Message-ID: <20050429103116.092907a7.simon@arrowtheory.com> Hi, I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink) who has managed to bomb on this little code example: >>> import numarray as na >>> import numarray.random_array as ra >>> a = ra.random(shape=(257,256)) >>> b = ra.random(shape=(1,256)) >>> na.innerproduct(a, b) He gets a blas error: ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0 Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From rkern at ucsd.edu Thu Apr 28 18:05:30 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 28 18:05:30 2005 Subject: [Numpy-discussion] numarray dotblas problem on OSX In-Reply-To: <20050429103116.092907a7.simon@arrowtheory.com> References: <20050429103116.092907a7.simon@arrowtheory.com> Message-ID: <42718719.1010206@ucsd.edu> Simon Burton wrote: > Hi, > > I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from fink) > who has managed to bomb on this little code example: > > >>>>import numarray as na >>>>import numarray.random_array as ra >>>>a = ra.random(shape=(257,256)) >>>>b = ra.random(shape=(1,256)) >>>>na.innerproduct(a, b) > > > He gets a blas error: > > ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine cblas_dgemm was incorrect > Mac OS BLAS parameter error in cblas_dgemm, parameter #0, (unavailable), is 0 On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed Python with vecLib as the BLAS, I don't get an error. I don't get a result that's sensible to me, either; I get a (257,1)-shape array with only the first and last entries non-zero. Your colleague might want to reconsider whether he wants innerproduct() or dot(), with the appropriate change of shape for b. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From rkern at ucsd.edu Thu Apr 28 18:09:53 2005 From: rkern at ucsd.edu (Robert Kern) Date: Thu Apr 28 18:09:53 2005 Subject: [Numpy-discussion] numarray dotblas problem on OSX In-Reply-To: <42718719.1010206@ucsd.edu> References: <20050429103116.092907a7.simon@arrowtheory.com> <42718719.1010206@ucsd.edu> Message-ID: <427188D1.201@ucsd.edu> Robert Kern wrote: > Simon Burton wrote: > >> Hi, >> >> I have a colleague running Mac OS 10.3, running numarray-1.3.1 (from >> fink) >> who has managed to bomb on this little code example: >> >> >>>>> import numarray as na >>>>> import numarray.random_array as ra >>>>> a = ra.random(shape=(257,256)) >>>>> b = ra.random(shape=(1,256)) >>>>> na.innerproduct(a, b) >> >> >> >> He gets a blas error: >> >> ldc must be >= MAX(N,1): ldc=256 N=257Parameter 14 to routine >> cblas_dgemm was incorrect >> Mac OS BLAS parameter error in cblas_dgemm, parameter #0, >> (unavailable), is 0 > > > On OS X 10.3, numarray 1.3.0, self-compiled for the Apple-installed > Python with vecLib as the BLAS, I don't get an error. > > I don't get a result that's sensible to me, either; I get a > (257,1)-shape array with only the first and last entries non-zero. Oh yes, and apparently a segfault on exit, too. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From edcjones at comcast.net Fri Apr 29 11:26:05 2005 From: edcjones at comcast.net (Edward C. Jones) Date: Fri Apr 29 11:26:05 2005 Subject: [Numpy-discussion] numarray: problem with numarray.records Message-ID: <42727B35.9050401@comcast.net> #! /usr/bin/env python import numarray, numarray.strings, numarray.records doubles = numarray.array([1.0], 'Float64') strings = numarray.strings.array('abcdefgh', itemsize=8, kind=numarray.strings.RawCharArray) print numarray.records.array(buffer=[strings, strings]) print print numarray.records.array(buffer=[doubles, doubles]) print print numarray.records.array(buffer=[strings, doubles]) """ The output is: RecArray[ ('abcdefgh'), ('abcdefgh') ] RecArray[ (1.0, 1.0) ] Traceback (most recent call last): File "./mess.py", line 12, in ? print numarray.records.array(buffer=[strings, doubles]) File "/usr/local/lib/python2.4/site-packages/numarray/records.py", line 397, in array byteorder=byteorder, aligned=aligned) File "/usr/local/lib/python2.4/site-packages/numarray/records.py", line 106, in fromrecords raise ValueError, "inconsistent data at row %d,field %d" % (row, col) ValueError: inconsistent data at row 1,field 0 The numarray docs (11.2) say: The first argument, buffer, may be any one of the following: ... (5) a list of numarrays. There must be one such numarray for each field. What is going on here? """ From edcjones at comcast.net Fri Apr 29 11:32:07 2005 From: edcjones at comcast.net (Edward C. Jones) Date: Fri Apr 29 11:32:07 2005 Subject: [Numpy-discussion] numarray: lexicographical sort Message-ID: <42727D37.8070700@comcast.net> Suppose arr is a two dimensional numarray. Can the following be done entirely within numarray? alist = arr.tolist() alist.sort() arr = numarray.array(alist, arr.type()) From jmiller at stsci.edu Fri Apr 29 12:42:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Apr 29 12:42:22 2005 Subject: [Numpy-discussion] numarray: lexicographical sort In-Reply-To: <42727D37.8070700@comcast.net> References: <42727D37.8070700@comcast.net> Message-ID: <1114803546.21036.30.camel@halloween.stsci.edu> On Fri, 2005-04-29 at 14:30, Edward C. Jones wrote: > Suppose arr is a two dimensional numarray. Can the following be done > entirely within numarray? > > alist = arr.tolist() > alist.sort() > arr = numarray.array(alist, arr.type()) > I'm pretty sure the answer is no. The comparisons in numarray's sort() functions are all single element numerical comparisons. The list sort() is using a polymorphic comparison which in this case is the comparison of two lists. There's nothing like that in numarray so I don't think it's possible. Todd