I've committed the data-type change discussed at the end of last week to the SVN repository. Now the concept of a data type for an array has been replaced with a "data-descriptor". This data-descriptor is flexible enough to handle an arbitrary record specification with fields that include records and arrays or arrays of records. While nesting may not be the best data-layout for a new design, when memory-mapping an arbitrary fixed-record-length file, this capability allows you to handle even the most obsure record file. While the basic core tests pass for me, there may be lurking problems and so testing of the SVN trunk of scipy core will be appreciated. I've bumped up the version number because the C-API has changed (a few new functions and some functions becoming macros). I'd like to make a release of the new version by the end of the week (as soon as Chris Hanley at STSCI and I get records.py working better), so please test. Recently some intel c-compiler tests were failing on a 64-bit platform. It would be nice to figure out why that is happening as well, but I will probably not have time for that this week. Thanks, -Travis
Travis Oliphant wrote:
I've committed the data-type change discussed at the end of last week to the SVN repository. Now the concept of a data type for an array has been replaced with a "data-descriptor". This data-descriptor is flexible enough to handle an arbitrary record specification with fields that include records and arrays or arrays of records. While nesting may not be the best data-layout for a new design, when memory-mapping an arbitrary fixed-record-length file, this capability allows you to handle even the most obsure record file.
Does this mean that the dtype parameter is changed? obscure??
While the basic core tests pass for me, there may be lurking problems and so testing of the SVN trunk of scipy core will be appreciated. I've bumped up the version number because the C-API has changed (a few new functions and some functions becoming macros). I'd like to make a release of the new version by the end of the week (as soon as Chris Hanley at STSCI and I get records.py working better), so please test.
Recently some intel c-compiler tests were failing on a 64-bit platform. It would be nice to figure out why that is happening as well, but I will probably not have time for that this week.
Thanks,
-Travis
Colin J. Williams wrote:
Travis Oliphant wrote:
I've committed the data-type change discussed at the end of last week to the SVN repository. Now the concept of a data type for an array has been replaced with a "data-descriptor". This data-descriptor is flexible enough to handle an arbitrary record specification with fields that include records and arrays or arrays of records. While nesting may not be the best data-layout for a new design, when memory-mapping an arbitrary fixed-record-length file, this capability allows you to handle even the most obsure record file.
Does this mean that the dtype parameter is changed? obscure??
No, it's not changed. The dtype parameter is still used and it is still called the same thing. It's just that what constitutes a data-type has changed significantly. For example now tuples and dictionaries can be used to describe a data-type. These definitions are recursive so that whenever data-type is used it means anything that can be interpreted as a data-type. And I really mean data-descriptor, but data-type is in such common usage that I still use it. Tuple: ======== (fixed-size-data-type, shape) (generic-size-data-type, itemsize) (base-type-data-type, new-type-data-type) Examples: dtype=(int32, (5,5)) --- a 5x5 array of int32 is the description of this item. dtype=(str, 10) --- a length-10 string dtype=(int16, {'real':(int8,0),'imag':(int8,4)} --- a descriptor that acts like an int16 array mathematically (in ufuncs) but has real and imag fields. Dictionary (defaults to a dtypechar == 'V') ========== format1: {"names": list-of-field-names, "formats": list of data-types, <optionally> "offsets" : list of start-of-the-field "titles" : extra field names } format2 (and how it's stored internally) {key1 : (data-type1, offset1 [, title1]), key2 : (data-type2, offset2 [, title2]), ... keyn : (data-typen, offsetn [, titlen]) } Other objects not already covered: ===================== ???? Right now, it just passes the tp_dict of the typeobject to the dictionary-conversion routine. I'm open for ideas here and will probably have better ideas once the actual record data-type (not data-descriptor but actual subclass of the scipy.void data type) looks like. All of these can be used as the dtype parameter wherever it is taken (of course you can't always do something useful with every data-descriptor). When an ndarray has an associated type descriptor with fields (that's where the field information is stored), then those fields can be accessed using string or unicode keys to the getitem call. Thus, you can do something like this:
a = ones((4,3), dtype=(int16, {'real':(int8, 0), 'imag':(int8, 1)})) a['imag'] = 2 a['real'] = 1 a.tostring() '\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02'
Note that there are now three distinct but interacting Python objects: 1) the N-dimensional array of a fixed itemsize. 2) a Python object representing one element of the array. 3) the data-descriptor object describing the data-type of the array. These three things were always there under the covers (the PyArray_Descr* has been there since Numeric), and the standard Python types were always filling in for number 2. Now we are just being more explicit about it. Now, all three things are present and accounted for. I'm really quite happy with the resulting infrastructure. I think it will allow some really neat possibilities. I'm thinking the record array subclass will allow attribute-based look-up and register a nice record type for the actual "element" in of the record array. -Travis
Travis Oliphant wrote:
Colin J. Williams wrote:
Travis Oliphant wrote:
I've committed the data-type change discussed at the end of last week to the SVN repository. Now the concept of a data type for an array has been replaced with a "data-descriptor". This data-descriptor is flexible enough to handle an arbitrary record specification with fields that include records and arrays or arrays of records. While nesting may not be the best data-layout for a new design, when memory-mapping an arbitrary fixed-record-length file, this capability allows you to handle even the most obsure record file.
Does this mean that the dtype parameter is changed? obscure??
No, it's not changed. The dtype parameter is still used and it is still called the same thing. It's just that what constitutes a data-type has changed significantly.
For example now tuples and dictionaries can be used to describe a data-type. These definitions are recursive so that whenever data-type is used it means anything that can be interpreted as a data-type. And I really mean data-descriptor, but data-type is in such common usage that I still use it.
This would appear to be a good step forward but with all of the immutable types (int8, FloatType, TupleType, etc.) the data is stored in the ArrayType instance (array_data?) whereas, with a dictionary, it would appear to be necessary to store the items outside the array. Is that desirable? Even the tuple can have its content modified, as the example below shows:
a= [] b= (a, [2, 3]) b[0] [] b[0]=99 Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: object does not support item assignment <<< GOOD b[1][0] 2 b[1][0]=99 b ([], [99, 3]) <<< HERE WE CHANGE THE VALUE OF THE
TUPLE
Tuple: ======== (fixed-size-data-type, shape) (generic-size-data-type, itemsize) (base-type-data-type, new-type-data-type)
Examples:
dtype=(int32, (5,5)) --- a 5x5 array of int32 is the description of this item. dtype=(str, 10) --- a length-10 string
So dtype now contains both the data type of each element and the shape of the array? This seems a significant change from numarray or Numeric.
dtype=(int16, {'real':(int8,0),'imag':(int8,4)} --- a descriptor that acts
like an int16 array mathematically
(in ufuncs) but has real and imag
fields.
This adds complexity, is there a compensating benefit? Do all of the complex operations apply?
Dictionary (defaults to a dtypechar == 'V') ==========
Why no clean things up by dropping typechar? These seemed to be one of the warts in numarray, only carried forward for compatibility reasons. Could the compatibility objectives of the project not be achieved, outside the ArrayType object, with a wrapper of some sort?
format1:
{"names": list-of-field-names, "formats": list of data-types,
<optionally> "offsets" : list of start-of-the-field "titles" : extra field names }
Couldn't the use of records avoid the cumbersome use of keys?
format2 (and how it's stored internally)
{key1 : (data-type1, offset1 [, title1]), key2 : (data-type2, offset2 [, title2]), ... keyn : (data-typen, offsetn [, titlen]) }
This is cleaner, but couldn't this inormation be contained within the Record instance?
Other objects not already covered: ===================== ???? Right now, it just passes the tp_dict of the typeobject to the dictionary-conversion routine. I'm open for ideas here and will probably have better ideas once the actual record data-type (not data-descriptor but actual subclass of the scipy.void data type) looks like.
All of these can be used as the dtype parameter wherever it is taken (of course you can't always do something useful with every data-descriptor). When an ndarray has an associated type descriptor with fields (that's where the field information is stored), then those fields can be accessed using string or unicode keys to the getitem call.
I've used ArrayType in place of ndarray (or maybe it should have been ndbigarray?) above as it appear to be more descriptive and fits with the Python convention on class naming.
Thus, you can do something like this:
a = ones((4,3), dtype=(int16, {'real':(int8, 0), 'imag':(int8, 1)})) a['imag'] = 2 a['real'] = 1 a.tostring() '\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02'
Or, one could have something like: class SmallComplex(Record): ..''' This class typically has no instances in user code. ''' ..real= (int8, ) ..imag= (int8) ..def __init__(self): .... ..def __new__(self): ....
a = ones((4,3), dtype= SmallComplex) a.imag = 2 a.real = 1 a.tostring() '\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02'
Note that there are now three distinct but interacting Python objects:
1) the N-dimensional array of a fixed itemsize. 2) a Python object representing one element of the array. 3) the data-descriptor object describing the data-type of the array.
This looks cleaner. Perehaps 2) and 3) could be phrased a little differently: 2) a Python object which is one element of the array. 3) the data-descriptor object describing the data-type of the array element.
These three things were always there under the covers (the PyArray_Descr* has been there since Numeric), and the standard Python types were always filling in for number 2. Now we are just being more explicit about it.
Now, all three things are present and accounted for. I'm really quite happy with the resulting infrastructure. I think it will allow some really neat possibilities.
I'm thinking the record array subclass will allow attribute-based look-up and register a nice record type for the actual "element" in of the record array.
This is good but the major structure is the array which can have elements of various types such as ComplexType, NoneType, int8 or a variety of user defined immutable records. Colin W. PS My Record sketch above needs a lot more thinking through
-Travis
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Hi all, I was wondering if i'm missing a numpy/numarray function that would return the indices of the minimum/maximum element of an array. Argmin/max can only do this one axis at a time. Alternatively, one can find the index for the flattened array and then play with modulo arithmetic, which gets quickly annoying for > 2 dimensions. Since this is a rather frequent operation, i would think there's already a function for it?... Thanks, Christos
Why not: ind = where(arr == arr.max()) and use the first element of the index array(s) Perry Greenfield On Dec 6, 2005, at 12:06 PM, Christos Siopis wrote:
Hi all,
2 dimensions. Since this is a rather frequent operation, i would think
I was wondering if i'm missing a numpy/numarray function that would return the indices of the minimum/maximum element of an array. Argmin/max can only do this one axis at a time. Alternatively, one can find the index for the flattened array and then play with modulo arithmetic, which gets quickly annoying for there's already a function for it?...
Thanks, Christos
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
On Tue, Dec 06, 2005 at 12:17:14PM -0500, Perry Greenfield wrote:
Why not:
ind = where(arr == arr.max())
and use the first element of the index array(s)
Thanks, that worked nice! I was stuck with the old Numeric definition of where(). Could it perhaps be made more obvious in numarray's where() documentation that it can also be called with one parameter only? (I.e., put this information in the section header, not mid-way in the description). I'm looking at the April 20, 2005 documentation release. Would it be safe to assume that the numarray definition of where() will be the one propagated into scipy.core? As a side note, i am wondering if there is a semantic asymmetry in using min() and max() to signify the min/max element in the entire array while argmin() and argmax() signify the min/max element along each axis. At the same time, and as far as i can tell, there is no min()/max() method to provide the min/max element along each axis, and there is no method to do the equivalent of "argmin/max_all()", as implemented by where(arr == arr.min/max()). Apologies if this has been discussed before. Thank you, Christos
Christos Siopis wrote:
On Tue, Dec 06, 2005 at 12:17:14PM -0500, Perry Greenfield wrote:
Why not:
ind = where(arr == arr.max())
and use the first element of the index array(s)
Would it be safe to assume that the numarray definition of where() will be the one propagated into scipy.core?
Yes, it is already there. -Travis
On Tue, 6 Dec 2005, Christos Siopis apparently wrote:
As a side note, i am wondering if there is a semantic asymmetry in using min() and max() to signify the min/max element in the entire array while argmin() and argmax() signify the min/max element along each axis.
SciPy arrays function as "expected" in this sense:
import scipy x=scipy.array([[1,2],[3,4]]) x.max() 4 x.argmax() 3
Note that, as I understand, argmax gives the index from x.flat Also, the scipy ufuncs max and argmax have the symmetry you seek, if I understand correctly. hth, Alan Isaac
On Tue, Dec 06, 2005 at 06:37:11PM -0500, Alan G Isaac wrote:
On Tue, 6 Dec 2005, Christos Siopis apparently wrote:
As a side note, i am wondering if there is a semantic asymmetry in using min() and max() to signify the min/max element in the entire array while argmin() and argmax() signify the min/max element along each axis.
SciPy arrays function as "expected" in this sense:
import scipy x=scipy.array([[1,2],[3,4]]) x.max() 4 x.argmax() 3
Note that, as I understand, argmax gives the index from x.flat
Thanks for the note. I do not have SciPy installed to test this, and i am not sure which version of SciPy you are using. I believe in the (remote?) past, SciPy was using Numeric as a core, but using the latest Numeric available on Gentoo AMD64 i obtain different results: Python 2.4.2 (#1, Nov 19 2005, 12:30:12) [GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
import Numeric Numeric.__version__ '23.7' x=Numeric.array([[1,2],[3,4]]) x.max() Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: max x.argmax() Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: argmax
i.e., these methods are not supported for Numeric arrays. The max() function does not exist either:
Numeric.max(x) Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'max'
(i think one would have to use MA.Maximum() to find the global array maximum back at those days...). And:
Numeric.argmax(x) array([1, 1])
i.e., it does not flatten the array. --- Now, using numarray:
import numarray numarray.__version__ '1.3.1' x=numarray.array([[1,2],[3,4]]) x.max() 4 x.argmax() array([1, 1])
i.e., these methods for the array object now do exist, but the behavior of argmax() is not the same as in your SciPy. My conjencture is that your SciPy uses some "intermediate" version between the old Numeric and the current scipy.core which, as i understand from what Travis said, supports the above numarray behavior.
Also, the scipy ufuncs max and argmax have the symmetry you seek, if I understand correctly.
With so many versions of things floating around, i think it's hard to tell what has what any more... One more reason to look forward to the outcome of Travis' work, and hope that things (or at least the API) will stabilize... Thanks, Christos
participants (6)
-
Alan G Isaac
-
Christos Siopis
-
Colin J. Williams
-
Perry Greenfield
-
Travis Oliphant
-
Travis Oliphant