Example of power of new data-type descriptors.
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
I'd like more people to know about the new power that is in scipy core due to the general data-type descriptors that can now be used to define numeric arrays. Towards that effort here is a simple example (be sure to use latest SVN -- there were a coupld of minor changes that improve usability made recently). Notice this example does not use a special "record" array subclass. This is just a regular array. I'm kind of intrigued (though not motivated to pursue) the possibility of accessing (or defining) databases directly into scipy_core arrays using the record functionality. # Define a new data-type descriptor
import scipy
dtype = scipy.dtypedescr({'names': ['name', 'age', 'weight'], 'formats': ['S30', 'i2', 'f4']}) a = scipy.array([('Bill',31,260),('Fred',15,135)], dtype=dtype) # the argument to dtypedescr could have also been placed here as the argument to dtype
print a['name'] [Bill Fred]
print a['age'] [31 15]
print a['weight'] [ 260. 135.]
print a[0] ('Bill', 31, 260.0)
print a[1] ('Fred', 15, 135.0)
It seems to me there are some very interesting possibilities with this new ability. The record array subclass adds an improved scalar type (the record) and attribute access to get at the fields: (e.g. a.name, a.age, and a.weight). But, if you don't need attribute access you can use regular arrays to do a lot of what you might need a record array to accomplish for you. I'd love to see what people come up with using this new facility. The new array PEP for Python basically proposes adding a very simple array object (just the basic PyArrayObject * of Numeric with a bare-bones type-object table) plus this new data-type descriptor object to Python and a very few builtin data-type descriptors (perhaps just object initially). This would basically add the array interface to Python directly and allow people to start using it generally. The PEP is slow going because it is not on my priority list right now because it is not essential to making scipy_core work well. But, I would love to have more people ruminating on the basic ideas which I think are crystallizing. Best wishes for a new year, -Travis Oliphant
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
A Dilluns 26 Desembre 2005 10:00, Travis Oliphant va escriure:
I'd like more people to know about the new power that is in scipy core due to the general data-type descriptors that can now be used to define numeric arrays. Towards that effort here is a simple example (be sure to use latest SVN -- there were a coupld of minor changes that improve usability made recently). Notice this example does not use a special "record" array subclass. This is just a regular array.
IMO, this is very good stuff and it opens the door to support homogeneous, heterogeneous and character strings in just one object. That makes the inclusion of such an object in Python a very big improvement because people will finally have a very effective container for virtually *any* kind of large datasets in an easy way. I'm personally very excited about this new functionality :-) Just a few kirks (using scipy_core 0.9.0.1713)
print a[0]
('Bill', 31, 260.0)
For me, this prints: In [87]: a[0] Out[87]: ('Bill\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 31, 260.0) which looks a bit ugly. However: In [86]: a['name'] Out[86]: array([Bill, Fred], dtype=(string,30)) seems fine. Also, I find the name of the .getfield() method a bit confusing: In [71]: a.getfield? Type: builtin_function_or_method Base Class: <type 'builtin_function_or_method'> String Form: <built-in method getfield of scipy.ndarray object at 0x8298678> Namespace: Interactive Docstring: m.getfield(dtype, offset) returns a field of the given array as a certain type. A field is a view of the array's data with each itemsize determined by the given type and the offset into the current array. So, whoever that generates a heterogeneous generic array may be tempted to call getfield() in order to get an actual field of the array and get disapointed. I suggest to change this name by .viewas() or just .as() and keep the 'getfield' name for heterogeneous datasets. Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On 12/30/05, Francesc Altet <faltet@carabos.com> wrote:
A Dilluns 26 Desembre 2005 10:00, Travis Oliphant va escriure:
I'd like more people to know about the new power that is in scipy core due to the general data-type descriptors that can now be used to define numeric arrays. Towards that effort here is a simple example (be sure to use latest SVN -- there were a coupld of minor changes that improve usability made recently). Notice this example does not use a special "record" array subclass. This is just a regular array.
IMO, this is very good stuff and it opens the door to support homogeneous, heterogeneous and character strings in just one object. That makes the inclusion of such an object in Python a very big improvement because people will finally have a very effective container for virtually *any* kind of large datasets in an easy way. I'm personally very excited about this new functionality :-)
I wonder if it couldn't be adapted to read binary c structures off disk. There are lots of data acquisition programs that store their data that way, even though doing so introduces all sorts of packing and alignment problems due differences between compilers and architectures. Chuck
![](https://secure.gravatar.com/avatar/14306073f4a28c796c7aa70254ad7b09.jpg?s=120&d=mm&r=g)
I was looking at these new datatype mechanisms, and was unable to make an array string compare. example: (using Travis code) a['name'] Out[49]: array([Bill, Fred], dtype=(string,30)) In [50]: a['name']=='Bill' Out[50]: False In [51]: a['name'].__eq__('Bill') Out[51]: NotImplemented I expected that a['name']=='Bill' would return [True, False] Am I trying something in a wrong way? Is this related to chararray and I should use methods from that class? Hugo Gamboa On 12/26/05, Travis Oliphant <oliphant.travis@ieee.org> wrote:
I'd like more people to know about the new power that is in scipy core due to the general data-type descriptors that can now be used to define numeric arrays. Towards that effort here is a simple example (be sure to use latest SVN -- there were a coupld of minor changes that improve usability made recently). Notice this example does not use a special "record" array subclass. This is just a regular array. I'm kind of intrigued (though not motivated to pursue) the possibility of accessing (or defining) databases directly into scipy_core arrays using the record functionality.
# Define a new data-type descriptor
import scipy
dtype = scipy.dtypedescr({'names': ['name', 'age', 'weight'], 'formats': ['S30', 'i2', 'f4']}) a = scipy.array([('Bill',31,260),('Fred',15,135)], dtype=dtype) # the argument to dtypedescr could have also been placed here as the argument to dtype
print a['name'] [Bill Fred]
print a['age'] [31 15]
print a['weight'] [ 260. 135.]
print a[0] ('Bill', 31, 260.0)
print a[1] ('Fred', 15, 135.0)
It seems to me there are some very interesting possibilities with this new ability. The record array subclass adds an improved scalar type (the record) and attribute access to get at the fields: (e.g. a.name, a.age, and a.weight). But, if you don't need attribute access you can use regular arrays to do a lot of what you might need a record array to accomplish for you. I'd love to see what people come up with using this new facility.
The new array PEP for Python basically proposes adding a very simple array object (just the basic PyArrayObject * of Numeric with a bare-bones type-object table) plus this new data-type descriptor object to Python and a very few builtin data-type descriptors (perhaps just object initially). This would basically add the array interface to Python directly and allow people to start using it generally. The PEP is slow going because it is not on my priority list right now because it is not essential to making scipy_core work well. But, I would love to have more people ruminating on the basic ideas which I think are crystallizing.
Best wishes for a new year,
-Travis Oliphant
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.net http://www.scipy.net/mailman/listinfo/scipy-dev
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
Hugo Gamboa wrote:
I was looking at these new datatype mechanisms, and was unable to make an array string compare.
example: (using Travis code)
a['name'] Out[49]: array([Bill, Fred], dtype=(string,30))
In [50]: a['name']=='Bill' Out[50]: False
In [51]: a['name'].__eq__('Bill') Out[51]: NotImplemented
I expected that a['name']=='Bill' would return [True, False]
Problem is nobody has implemented that yet for strings. Comparisons go through universal functions. And there is no support for flexible-length arrays in the universal functions (ufuncs) right now. We could special-case the (rich) comparisons for strings and unicodes rather easily (and I think we should), but that hasn't been done yet. Right now, the chararray does implement equality testing (in Python so more slowly). Use a.view(numpy.chararray) to get a chararray. But, note the chararray has not been well-tested, yet. -Travis
participants (4)
-
Charles R Harris
-
Francesc Altet
-
Hugo Gamboa
-
Travis Oliphant