For the most part, it seems the array protocol is easy to agree on. The one difficulty is typestr. For what it's worth, here are my opinions on what has been said regarding the typestr. * Endian-ness should be included in the typestr --- it is how the data is viewed and an intrinsic part of the type as much as int, or float. * I like the fact that struct character codes are documented, but it is hard to remember. The simpler division into basic types and byte-widths that the numarray record module uses is easier to remember. * I'm mixed on whether or not support for describing complex data types should be used or if their description as a record is good enough. On the one hand we think of complex numbers as additional types, but on the other hand, in terms of machine layout they really are just two floats, so perhaps it is better to look at them that way in a protocol whose purpose is just describing how to interpret a block of memory. Especially since complex numbers could conceivably be built on top of any of the other types. In addition, it is conceivable that a rational array might be supported by some array object in the future and that would most easily be handled by a record array where the names were now something like ("numer", "denom") . The typestr argument should just help us specify what is in the memory chunk at each array element (how should it be described). * I'm wondering about including multiple types in the typestr. On the one hand we could describe complicated structures by packing all the information into the typestr. On the other hand, it may be better if we just use 'V8' to describe an 8-byte memory buffer with an additional attribute that contains both the names and the typestr: __array_recinfo__ = (('real','f4'),('imag','f4')) or for a "rational type" __array_recinfo__ = (('numer','i4'),('denom','i4')) so that the detail of the typecode for a "record" type is handled by another special method using tuples. On this level, we could add the possibility of specifying a shape for a small array inside (just like the record array of numarray does). -Travis
A Divendres 01 Abril 2005 11:31, Travis Oliphant va escriure:
* I'm wondering about including multiple types in the typestr. On the one hand we could describe complicated structures by packing all the information into the typestr. On the other hand, it may be better if we just use 'V8' to describe an 8-byte memory buffer with an additional attribute that contains both the names and the typestr:
__array_recinfo__ = (('real','f4'),('imag','f4'))
or for a "rational type"
__array_recinfo__ = (('numer','i4'),('denom','i4'))
so that the detail of the typecode for a "record" type is handled by another special method using tuples. On this level, we could add the possibility of specifying a shape for a small array inside (just like the record array of numarray does).
Like: __array_recinfo__ = (('numer','i4', (3,4)),('denom','i4', (2,))) ? Also, this can be easily extended to nested types: __array_recinfo__ = (('a','i4',(3,4)),(('b','i4',(2,)),('c','f4',(10,2))) Well, this looks pretty good to me. It has nothing to do with struct format, but is much more usable, of course. Cheers, --
qo< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data ""
Travis Oliphant wrote:
For the most part, it seems the array protocol is easy to agree on. The one difficulty is typestr.
For what it's worth, here are my opinions on what has been said regarding the typestr.
* Endian-ness should be included in the typestr --- it is how the data is viewed and an intrinsic part of the type as much as int, or float.
In most cases, endian-ness is associated with the machine being used, rather than the data element. It seems to me that numarray's numeric types provides a good model, which may need enhancing for records, strings etc. numarray has: Numeric type objects: Bool Int8 Int16 Int32 Int64 UInt8 UInt16 UInt32 UInt64 Float32 Double64 Complex32 Complex64 Numeric type classes: NumericType BooleanType SignedType UnsignedType IntegralType SignedIntegralType UnsignedIntegralType FloatingType ComplexType
* I like the fact that struct character codes are documented, but it is hard to remember.
This is the problem. numerictypes provides nmenonic names and, if one uses an editor with autocompletion, a prompt from the editor. For those interface to existing code, there could be a helper function: def toType(eltType= 'i'): => an instance of NumericType It should also be possible to derive the typeCode from the eltType, numarray doesn't seem to provide this. Colin W.
Hello all, I've updated the numeric web site and given special prominence to the array interface which I believe should be pushed. Numeric 24.0 will support it as will scipy.base (Numeric3). I hope that numarray will also support it in an upcoming release. Please read through the interface and feel free to comment. However, unless there is a glaring problem, I'm more interested that you feel free to start using the interface then that we debate it further. Scott has expressed interest in implementing a very basic Python-only implementation of an object exporting the interface. I suggest he and anyone else interested look at numarray for a starting point for a Python implementation, and Numeric for a C implementation. -Travis
There are two questions that I have about the array interface: 1) To what degree will the new array interface look different to users of the existing Numerical Python? If I were to install the new array interface on the computer of a current Numerical Python user and I didn't tell them, would they notice a difference? 2) To what degree is the new array interface compatible with Numerical Python for the purpose of C extension modules? Do C extension modules need to be modified in order to use the new array interface? --Michiel. Travis Oliphant wrote:
Hello all,
I've updated the numeric web site and given special prominence to the array interface which I believe should be pushed. Numeric 24.0 will support it as will scipy.base (Numeric3). I hope that numarray will also support it in an upcoming release.
Please read through the interface and feel free to comment. However, unless there is a glaring problem, I'm more interested that you feel free to start using the interface then that we debate it further.
Scott has expressed interest in implementing a very basic Python-only implementation of an object exporting the interface. I suggest he and anyone else interested look at numarray for a starting point for a Python implementation, and Numeric for a C implementation.
-Travis
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
Michiel Jan Laurens de Hoon wrote:
There are two questions that I have about the array interface:
1) To what degree will the new array interface look different to users of the existing Numerical Python? If I were to install the new array interface on the computer of a current Numerical Python user and I didn't tell them, would they notice a difference?
Nothing will look different. For now there is nothing to "install" so the array interface is just something to expect from other objects. The only thing that would be different is in Numeric 24.0 (if a users were to call array(<someobj>) and <someobj> supported the array interface then Numeric could return an array (without copying data). Older versions of Numeric won't benefit from the interface but won't be harmed either.
2) To what degree is the new array interface compatible with Numerical Python for the purpose of C extension modules? Do C extension modules need to be modified in order to use the new array interface?
It is completely compatible. C-extensions don't need to be modified at all to make use of the interface (of course they should be re-compiled if using Numeric 24.0). Only two things will be modified in Numeric 24.0. 1) PyArray_FromObject and friends will be expanded so that if an object exposes the array interface the right thing will be done to use it's memory. 2) Attributes will be added so that Numeric arrays expose the array interface so other objects can use their memory intelligently -Travis
Travis Oliphant wrote:
1) To what degree will the new array interface look different to users of the existing Numerical Python?
Nothing will look different. For now there is nothing to "install" so the array interface is just something to expect from other objects. The only thing that would be different is in Numeric 24.0 (if a users were to call array(<someobj>) and <someobj> supported the array interface then Numeric could return an array (without copying data). Older versions of Numeric won't benefit from the interface but won't be harmed either.
Very nice. Thanks, Travis. I'm not sure what you mean by "the array interface could become part of the Python standard as early as Python 2.5", since there is nothing to install. Or does this mean that Python's array will conform to the array interface? Some comments on the array interface: 1) The "__array_shape__" method is identical to the existing "shape" method in Numerical Python and numarray (except that "shape" does a little bit better checking, but it can be added easily to "__array_shape__"). To avoid code duplication, it might be better to keep that method. (and rename the other methods for consistency, if desired). 2) The __array_datalen__ is introduced to get around the 32-bit int limitation of len(). Another option is to fix len() in Python itself, so that it can return integers larger than 32 bits. So we can avoid adding a new method. 3) Where do default values come from? Is it the responsability of the extension module writer to find out if the array module implements e.g. __array_strides__, and substitute the default values if it doesn't? If so, I have a slight preference to make all methods required, since it's not a big effort to return the defaults, and there will be more extension modules than array packages (or so I hope). Whereas the array interface certainly helps extension writers to create an extension module that works with all array implementations, it also enables and perhaps encourages the creation of different array modules, while our original goal was to create a single array module that satisfies the needs of both Numerical Python and numarray users. I still think such a solution would be preferable. Inconsistencies other than the array interface (e.g. one implements argmax(x) while another implements x.argmax()) may mean that an extension module can work with one array implementation but not with another, even though they both conform to the array interface. We may end up with several array packages (we already have Numerical Python, numarray, and scipy), and extension modules that work with one package and not with another. So in a sense, the array interface is letting the genie out of the bottle. But maybe such a single array package is not attainable given the different needs of the different communities. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
Michiel Jan Laurens de Hoon <mdehoon@ims.u-tokyo.ac.jp>:
[snip]
1) The "__array_shape__" method is identical to the existing "shape" method in Numerical Python and numarray (except that "shape" does a little bit better checking, but it can be added easily to "__array_shape__"). To avoid code duplication, it might be better to keep that method. (and rename the other methods for consistency, if desired).
Why not just use 'shape' as an alias for '__array_shape__' (or vice versa)?
2) The __array_datalen__ is introduced to get around the 32-bit int limitation of len(). Another option is to fix len() in Python itself, so that it can return integers larger than 32 bits. So we can avoid adding a new method.
That would bee good, IMO. But how realistic is it? (I have no idea -- this is not a rhetorical question :)
3) Where do default values come from? Is it the responsability of the extension module writer to find out if the array module implements e.g. __array_strides__, and substitute the default values if it doesn't?
If the support of these attributes is optional, that would have to be the case.
If so, I have a slight preference to make all methods required, since it's not a big effort to return the defaults, and there will be more extension modules than array packages (or so I hope).
But isn't the point that you should be able to export other things (such as images or sounds or what-have-you) *as* arrays? As for implementing the defaults: How about having some utility functions (or a wrapper object or whatever) that does just this -- so neither array nor client code need think about it? This could, perhaps, be put in the stdlib array module or something...
Whereas the array interface certainly helps extension writers to create an extension module that works with all array implementations, it also enables and perhaps encourages the creation of different array modules, while our original goal was to create a single array module that satisfies the needs of both Numerical Python and numarray users. I still think such a solution would be preferable.
I agree. But what I think would be cool if such a standardized package could take any object conforming to this protocol and use it (possibly as the argument to the array() constructor) -- with all the ufuncs and operators it has. Because then I could implement specialized arrays where the specialized behaviour lies just in the data itself, not the behaviour. For example, I might want to create a thin array wrapper around a memory-mapped, compressed video file, and treat it as a three-dimensional array of rgb triples... (And so forth.)
Inconsistencies other than the array interface (e.g. one implements argmax(x) while another implements x.argmax()) may mean that an extension module can work with one array implementation but not with another,
This does *not* sound like a good thing -- I agree. Certainly not what I would hope this protocol is used for.
even though they both conform to the array interface. We may end up with several array packages (we already have Numerical Python, numarray, and scipy), and extension modules that work with one package and not with another. So in a sense, the array interface is letting the genie out of the bottle.
Well, perhaps -- but the current APIs of e.g., Numeric or numarray could be used in the same way (i.e., writing your own array implementations with the same interface). As (I think) Travis has said, there is still a goal (somewhat separate from the protocol) of getting one standard heavy-duty numerical array package. I think that would be very beneficial. The point (as I see it) is just to make it easier for various array implementations (i.e., the data, not the ufuncs/operators etc.) to interoperate with it.
But maybe such a single array package is not attainable given the different needs of the different communities.
I would certainly hope it is.
--Michiel.
-- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb]
Michiel Jan Laurens de Hoon wrote:
Travis Oliphant wrote:
1) To what degree will the new array interface look different to users of the existing Numerical Python?
Nothing will look different. For now there is nothing to "install" so the array interface is just something to expect from other objects. The only thing that would be different is in Numeric 24.0 (if a users were to call array(<someobj>) and <someobj> supported the array interface then Numeric could return an array (without copying data). Older versions of Numeric won't benefit from the interface but won't be harmed either.
Very nice. Thanks, Travis. I'm not sure what you mean by "the array interface could become part of the Python standard as early as Python 2.5", since there is nothing to install. Or does this mean that Python's array will conform to the array interface?
The latter is what I mean... I think it is important to have something in Python itself that "conforms to the interface." I wonder if it would also be nice to have some protocol slots in the object type so that extension writers can avoid converting some objects. There is also the possibility that a very simple N-d array type could be included in Python 2.5 that conforms to the interface, if somebody wants to champion that. I think it is important to realize what the array interface is trying to accomplish. From my perspective, I still think it is better for the scientific community to build off of a single array object that is "best of breed." The purpose of the array interface is to allow us scientific users to share information with other Python extension writers who may be wary to require scipy.base for their users but who really should be able to interoperate with scipy.base arrays. I'm thinking of extensions like wxPython, PIL, and so forth. There are also lots of uses for arrays that don't necessarily need the complexity of the scipy.base array (or uses that need even more types). At some point we may be able to accomodate dynamic type additions to the scipy.base array. But, right now it requires enough work that others may want to design their own simple arrays. It's very useful if all such arrays could speak together with a common basic language. The fact that numarray and Numeric arrays can talk to each other more seamlessly was not the main goal of the array interface but it is a nice side benefit. I'd still like to see the scientific community use a single array. But, others may not see it that way. The array interface lets us share more easily.
Some comments on the array interface:
1) The "__array_shape__" method is identical to the existing "shape" method in Numerical Python and numarray (except that "shape" does a little bit better checking, but it can be added easily to "__array_shape__"). To avoid code duplication, it might be better to keep that method. (and rename the other methods for consistency, if desired).
There is no code duplication. In these cases it is just another name for .shape. What "better checking" are you referring to?
2) The __array_datalen__ is introduced to get around the 32-bit int limitation of len(). Another option is to fix len() in Python itself, so that it can return integers larger than 32 bits. So we can avoid adding a new method.
Python len() will never return a 64-bit number on a 32-bit platform.
3) Where do default values come from? Is it the responsability of the extension module writer to find out if the array module implements e.g. __array_strides__, and substitute the default values if it doesn't? If so, I have a slight preference to make all methods required, since it's not a big effort to return the defaults, and there will be more extension modules than array packages (or so I hope).
Optional attributes let modules that care talk to each other on a "higher level" without creating noise for simpler extensions. Both the consumer and exporter have to use it to matter. The defaults are just clarifying what is being assumed if it isn't there.
Whereas the array interface certainly helps extension writers to create an extension module that works with all array implementations, it also enables and perhaps encourages the creation of different array modules, while our original goal was to create a single array module that satisfies the needs of both Numerical Python and numarray users. I still think such a solution would be preferable.
I agree with you. I would like a single array module for scientific users. But, satisfying everybody is probably impossible with a single array object. Yes, there could be a proliferation of array objects but sometimes we need multiple array objects to learn from each other. It's nice to have actual code that implements some idea rather than just words in a mailing list. The interface allows us to talk to each other while we learn from each other's actual working implementations. In a way this is like the old argument between the 1920-era communists and the free-marketers. The communists say that we should have only one company that produces some product because having multiple companies is "wasteful" of resources, while the free-marketers point out that satisfying consumers is tricky business, and there is not only "one right way to do it." Therefore, having multiple companies each trying to satisfy consumers actually creates wealth as new and better ideas are tried by the different companies. The successful ideas are emulated by the rest. In mature markets there tend to be a reduction in the number of producers while in developing markets there are all kinds of companies producing basically the same thing. Of course software creates it's own issues that aren't addressed by that simple analogy, but I think it's been shown repeatedly that good interfaces (http, smtp anyone?) create a lot of utility.
Inconsistencies other than the array interface (e.g. one implements argmax(x) while another implements x.argmax()) may mean that an extension module can work with one array implementation but not with another, even though they both conform to the array interface. We may end up with several array packages (we already have Numerical Python, numarray, and scipy), and extension modules that work with one package and not with another. So in a sense, the array interface is letting the genie out of the bottle.
I think this genie is out of the bottle already. We need to try and get our wishes from it now. -Travis
--- Magnus Lie Hetland <magnus@hetland.org> wrote:
Why not just use 'shape' as an alias for '__array_shape__' (or vice versa)?
The protocol just describes the layout and format of the data in memory. As such, most users won't use it directly just as most users don't call obj.__add__ directly... If an array implementation has a .shape attribute, it can be whatever the implementor wants. Perhaps it's assignable. Maybe it's a method that returns a ShapeObject with methods and attributes of it's own. Features like these are the things that make the high level array packages like Numeric and Numarray enjoyable to use. The low level __array_*metadata__ interface should be simple and precisely defined and just for data interchange.
3) Where do default values come from? Is it the responsability of the extension module writer to find out if the array module implements e.g. __array_strides__, and substitute the default values if it doesn't?
If the support of these attributes is optional, that would have to be the case.
As for implementing the defaults: How about having some utility functions (or a wrapper object or whatever) that does just this -- so neither array nor client code need think about it? This could, perhaps, be put in the stdlib array module or something...
There will be a simple Python module or C include file for such things. Hopefully it will eventually be included in the Python standard distribution, but even if that doesn't happen, it will be easier than requiring and linking against the Numeric/Numarray/scipy.base libraries directly.
But what I think would be cool if such a standardized package could take any object conforming to this protocol and use it (possibly as the argument to the array() constructor) -- with all the ufuncs and operators it has. Because then I could implement specialized arrays where the specialized behaviour lies just in the data itself, not the behaviour. For example, I might want to create a thin array wrapper around a memory-mapped, compressed video file, and treat it as a three-dimensional array of rgb triples... (And so forth.)
If you want the ufuncs, you probably want one of the full featured library packages like scipy.base or numarray. It looks like Travis is able to promote any "array protocol object" to a full blown scipy.base.array already.
Inconsistencies other than the array interface (e.g. one implements argmax(x) while another implements x.argmax()) may mean that an extension module can work with one array implementation but not with another,
This does *not* sound like a good thing -- I agree. Certainly not what I would hope this protocol is used for.
Things like argmax(x) are not part of this protocol. The high level array packages and libraries will have all sorts of crazy and useful features. The protocol only describes the layout and format of the data. It enables higher level packages to work seemlessly with all the different array objects. That said, this protocol would allow a version argmax(x) to be written in such a way as to handle *any* array object. Cheers, -Scott
Magnus Lie Hetland wrote:
Michiel Jan Laurens de Hoon <mdehoon@ims.u-tokyo.ac.jp>:
2) The __array_datalen__ is introduced to get around the 32-bit int limitation of len(). Another option is to fix len() in Python itself, so that it can return integers larger than 32 bits. So we can avoid adding a new method.
That would bee good, IMO. But how realistic is it? (I have no idea -- this is not a rhetorical question :)
Actually, why is __array_datalen__ needed at all? Can't it be calculated trivially from __array_shape__? --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
Travis Oliphant wrote:
Some comments on the array interface:
1) The "__array_shape__" method is identical to the existing "shape" method in Numerical Python and numarray (except that "shape" does a little bit better checking, but it can be added easily to "__array_shape__"). To avoid code duplication, it might be better to keep that method. (and rename the other methods for consistency, if desired).
There is no code duplication. In these cases it is just another name for .shape. What "better checking" are you referring to?
The method __array_shape__ is if (strcmp(name, "__array_shape__") == 0) { PyObject *res; int i; res = PyTuple_New(self->nd); for (i=0; i<self->nd; i++) { PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i])); } return res; } while the method shape is if (strcmp(name, "shape") == 0) { PyObject *s, *o; int i; if ((s=PyTuple_New(self->nd)) == NULL) return NULL; for(i=self->nd; --i >= 0;) { if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL; if (PyTuple_SetItem(s,i,o) == -1) return NULL; } return s; } so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be fine. --Michiel. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
Michiel Jan Laurens de Hoon <mdehoon@ims.u-tokyo.ac.jp> writes:
Travis Oliphant wrote:
Some comments on the array interface:
1) The "__array_shape__" method is identical to the existing "shape" method in Numerical Python and numarray (except that "shape" does a little bit better checking, but it can be added easily to "__array_shape__"). To avoid code duplication, it might be better to keep that method. (and rename the other methods for consistency, if desired). There is no code duplication. In these cases it is just another name for .shape. What "better checking" are you referring to?
The method __array_shape__ is
if (strcmp(name, "__array_shape__") == 0) { PyObject *res; int i; res = PyTuple_New(self->nd); for (i=0; i<self->nd; i++) { PyTuple_SET_ITEM(res, i, PyInt_FromLong((long)self->dimensions[i])); } return res; }
while the method shape is
if (strcmp(name, "shape") == 0) { PyObject *s, *o; int i;
if ((s=PyTuple_New(self->nd)) == NULL) return NULL;
for(i=self->nd; --i >= 0;) { if ((o=PyInt_FromLong(self->dimensions[i])) == NULL) return NULL; if (PyTuple_SetItem(s,i,o) == -1) return NULL; } return s; }
so it checks if PyInt_FromLong and PyTuple_SetItem are successful. I don't see how PyTuple_SetItem can fail, so PyTuple_SET_ITEM should be fine.
The #1 rule of thumb when using the Python C API: _always_ check your returned results (this usually means checking for NULL). In this, PyInt_FromLong _can_ fail (if there's an error creating the int free list). I've fixed this in CVS. You're right on PyTuple_SET_ITEM: the space for it is guaranteed to exist after the PyTuple_New. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
Scott Gilbert <xscottg@yahoo.com>:
[snip]
Inconsistencies other than the array interface (e.g. one implements argmax(x) while another implements x.argmax()) may mean that an extension module can work with one array implementation but not with another,
This does *not* sound like a good thing -- I agree. Certainly not what I would hope this protocol is used for.
Things like argmax(x) are not part of this protocol. The high level array packages and libraries will have all sorts of crazy and useful features.
Sure -- I realise that. I just mean that I hope there won't be several scientific array modules that implement similar concepts with different APIs, just because they can (because of the new array API).
The protocol only describes the layout and format of the data. It enables higher level packages to work seemlessly with all the different array objects.
Exactly.
That said, this protocol would allow a version argmax(x) to be written in such a way as to handle *any* array object.
... given that you can compare the values in the array, of course. But, yes. This would be (IMO) the ideal situation. Instead of spawning several equivalent-but-different scientific array modules (i.e. the ones implementing such functionality as argmax()) we would have *one* main, standard such module, whose operations would work with almost any conceivable array object (e.g. from wxPython or PIL). That seems like a very, very good situation, IMO.
Cheers, -Scott
-- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb]
Travis Oliphant <oliphant@ee.byu.edu>:
Actually, why is __array_datalen__ needed at all? Can't it be calculated trivially from __array_shape__?
Lovely point. I've taken away the __array_datalen__ from the interface description.
This is only getting prettier and prettier :)
-Travis
-- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb]
participants (7)
-
Colin J. Williams
-
cookedm@physics.mcmaster.ca
-
Francesc Altet
-
Magnus Lie Hetland
-
Michiel Jan Laurens de Hoon
-
Scott Gilbert
-
Travis Oliphant