Performance of the array protocol

Hi, I'm trying to start using the array protocol for conversion between Numeric <--> numarray (and newcore in the future), but I'm a bit disappointed because of its performance. For numarray --> Numeric we have:
t1=timeit.Timer("num=Numeric.array(na)", "import numarray; import Numeric; na=numarray.arange(10)") t1.repeat(3,10000) [0.59375977516174316, 0.57908082008361816, 0.56574010848999023] t2=timeit.Timer("num=Numeric.fromstring(na._data,typecode=na.typecode())", "import numarray; import Numeric; na=numarray.arange(10)") t2.repeat(3,10000) [0.11653494834899902, 0.1140749454498291, 0.1141819953918457]
i.e. the array protocol seems 5x slower than the fromstring() method. Conversely, for Numeric --> numarray:
t3=timeit.Timer("na=numarray.array(num)", "import numarray; import Numeric;num=Numeric.arange(10)") t3.repeat(3,10000) [1.3475611209869385, 1.3277668952941895, 1.3417830467224121] t4=timeit.Timer("na=numarray.array(buffer(num),type=num.typecode(),shape=num.shape)", "import numarray; import Numeric; num=Numeric.arange(10)") t4.repeat(3,10000) [0.42027187347412109, 0.41690587997436523, 0.41626906394958496]
in this case, the array protocol is 3x slower than using the buffer interface. I'm wondering whether this relatively poor performance in present implementation of the array protocol is surmountable or is an intrinsic limitation of it. Thanks, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Francesc Altet wrote:
Hi,
I'm trying to start using the array protocol for conversion between Numeric <--> numarray (and newcore in the future), but I'm a bit disappointed because of its performance. For numarray --> Numeric we have:
t1=timeit.Timer("num=Numeric.array(na)", "import numarray; import
Numeric; na=numarray.arange(10)")
t1.repeat(3,10000)
[0.59375977516174316, 0.57908082008361816, 0.56574010848999023]
t2=timeit.Timer("num=Numeric.fromstring(na._data,typecode=na.typecode())", "import numarray; import Numeric; na=numarray.arange(10)") t2.repeat(3,10000)
[0.11653494834899902, 0.1140749454498291, 0.1141819953918457]
i.e. the array protocol seems 5x slower than the fromstring() method.
If you are going to be copying the data anyway, then there may be no advantage to the array protocol (in fact because it has to look up several attributes of the input object it can be slower). When you use Numeric.array(na) it makes a copy of the data by default. The idea is to be able to use the array protocol to not have to make copies of the data. Try using num = Numeric.array(na,copy=0) in your first timing runs and see what that provides.
Conversely, for Numeric --> numarray:
t3=timeit.Timer("na=numarray.array(num)", "import numarray; import
Numeric;num=Numeric.arange(10)")
t3.repeat(3,10000)
[1.3475611209869385, 1.3277668952941895, 1.3417830467224121]
t4=timeit.Timer("na=numarray.array(buffer(num),type=num.typecode(),shape=num.shape)", "import numarray; import Numeric; num=Numeric.arange(10)") t4.repeat(3,10000)
[0.42027187347412109, 0.41690587997436523, 0.41626906394958496]
in this case, the array protocol is 3x slower than using the buffer interface.
Again, you are making copies of the data. I'm not sure how numarray handles the array protocol on consumption of the interface, so I can't comment further. -Travis

El dt 01 de 11 del 2005 a les 09:11 -0700, en/na Travis Oliphant va escriure:
If you are going to be copying the data anyway, then there may be no advantage to the array protocol (in fact because it has to look up several attributes of the input object it can be slower). When you use Numeric.array(na) it makes a copy of the data by default.
The idea is to be able to use the array protocol to not have to make copies of the data.
Yes, I don't want to do a copy. And, in fact, I want to use moderately large array conversions (10**4 ~ 10**6 elements).
Try using num = Numeric.array(na,copy=0) in your first timing runs and see what that provides.
Good! Using copy=0 and larger arrays (10**5 elements) I'm getting now:
t1_2=timeit.Timer("num=Numeric.array(na,copy=0)", "import numarray; import Numeric; na=numarray.arange(100000)") t1_2.repeat(3,1000) [0.064317941665649414, 0.060917854309082031, 0.07666015625] t2_2=timeit.Timer("num=Numeric.fromstring(na._data,typecode=na.typecode())", "import numarray; import Numeric; na=numarray.arange(100000)") t2_2.repeat(3,1000) [4.8582658767700195, 4.8404099941253662, 4.8652839660644531]
So, the implementation of the array protocol in the numarray --> Numeric way is performing ashtonishingly well :-) For the records, using the array protocol without a copy gives:
t1=timeit.Timer("num=Numeric.array(na)", "import numarray; import Numeric; na=numarray.arange(100000)") t1.repeat(3,1000) [5.014805793762207, 4.9959368705749512, 5.0420081615447998]
i.e. almost as fast a the fromstring() method, which is very good as well! BTW, I'm wondering whether a False value for copy should be used as the default instead of True. IMO, many people would want to make use of the array protocol just to access easily the data, and making a copy() behind the scenes just for this might be potentially killer, specially for large objects.
Conversely, for Numeric --> numarray: Again, you are making copies of the data. I'm not sure how numarray handles the array protocol on consumption of the interface, so I can't comment further.
Mmmm, I've tried disabling the copy, but unfortunately enough I can't get the same figures as above:
t3=timeit.Timer("na=numarray.array(num,copy=0)", "import numarray; import Numeric; num=Numeric.arange(100000)") t3.repeat(3,10) [1.6356601715087891, 1.6529910564422607, 1.6299269199371338] t4=timeit.Timer("na=numarray.array(buffer(num),type=num.typecode(),shape=num.shape)", "import numarray; import Numeric; num=Numeric.arange(100000)") t4.repeat(3,1000) [0.045578956604003906, 0.043890953063964844, 0.043296098709106445]
so, for the Numeric --> numarray way, the slowdown is more than three orders of magnitude than expected (note the fewer iterations for the first repeat loop). Maybe Todd can comment more on this. Thanks! --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Francesc Altet wrote:
BTW, I'm wondering whether a False value for copy should be used as the default instead of True. IMO, many people would want to make use of the array protocol just to access easily the data, and making a copy() behind the scenes just for this might be potentially killer, specially for large objects.
Thats exactly what the asarray function is for. It is equivalent to array except its default for copy is False. -Travis

Francesc Altet wrote:
Hi,
I'm trying to start using the array protocol for conversion between Numeric <--> numarray (and newcore in the future), but I'm a bit disappointed because of its performance. For numarray --> Numeric we have:
The other factor in this discussion is that you are creating and sharing relatively small arrays. The fromstring apporach is not a problem if all you need is a copy of the data for small arrays. The array protocol approach uses attribute lookups on an object. This is going to have overhead that will only be useful for large arrays (or arrays that you want to have share the same data region). So, I guess the answer to your question is that for small arrays it is an intrinsic limitation of the use of Python attributes in the array protocol. -Travis

Travis Oliphant wrote:
So, I guess the answer to your question is that for small arrays it is an intrinsic limitation of the use of Python attributes in the array protocol.
IIRC, in the early discussion of the array protocol, we had talked about defining a C struct, and a set of utilities to query that struct. Now, I guess it uses Python attributes. Do I recall incorrectly, or has there been a design change, or is this a prototype implementation? I guess I'd like to see an array protocol that is as fast as fromstring(), even for small arrays, though it's probably not a big deal. Also, when writing C code to work with an array, it might be easier, as well as faster, to not have to query Python attributes. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker wrote:
Travis Oliphant wrote:
So, I guess the answer to your question is that for small arrays it is an intrinsic limitation of the use of Python attributes in the array protocol.
IIRC, in the early discussion of the array protocol, we had talked about defining a C struct, and a set of utilities to query that struct. Now, I guess it uses Python attributes. Do I recall incorrectly, or has there been a design change, or is this a prototype implementation? I guess I'd like to see an array protocol that is as fast as fromstring(), even for small arrays, though it's probably not a big deal. Also, when writing C code to work with an array, it might be easier, as well as faster, to not have to query Python attribute
You are correct that it would be good to have a C-protocol that did not require attribute lookups (the source of any speed difference with fromstring). Not much progress has been made on a C-version of the protocol, though. I don't know how to do it without adding something to Python itself. At SciPy 2005, I outlined my vision for how we could proceed in that direction. There is a PEP in a subversion repository at http://svn.scipy.org/svn/PEP that explains my view. Basically, I think we should push for a simple (C-based) Nd array object that is nothing more than the current C-structure of the N-d array. Then, arrays would inherit from this base class but all of Python would be able to understand and query it's C-structure. If we could also get an array interface into the typeobject table, it would be a simple thing to populate this structure even with objects that didn't inherit from the base object. I am still interested in other ideas for how to implement the array interface in C, without adding something to the type-object table (we could push for that, but it might take more political effort). -Travis

"Chris Barker" <Chris.Barker@noaa.gov> writes:
Travis Oliphant wrote:
So, I guess the answer to your question is that for small arrays it is an intrinsic limitation of the use of Python attributes in the array protocol.
IIRC, in the early discussion of the array protocol, we had talked about defining a C struct, and a set of utilities to query that struct. Now, I guess it uses Python attributes. Do I recall incorrectly, or has there been a design change, or is this a prototype implementation? I guess I'd like to see an array protocol that is as fast as fromstring(), even for small arrays, though it's probably not a big deal. Also, when writing C code to work with an array, it might be easier, as well as faster, to not have to query Python attributes.
I had posted an idea before: http://thread.gmane.org/gmane.comp.python.numeric.general/2159 It would still be one attribute lookup, but then everything would be C-based from there on. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca

David M. Cooke wrote:
"C
I had posted an idea before:
http://thread.gmane.org/gmane.comp.python.numeric.general/2159
It would still be one attribute lookup, but then everything would be C-based from there on.
Thanks for reminding us of your idea again. This is a very good idea, that I think we could add. My only question is should Py_LONG_LONG be used or Py_intptr_t? Since the array has to be in memory anyway there does not seem any advantage to using Py_LONG_LONG. I also think a flags member of the structure is useful along with a typestr instead of typecode. I would say the C-struct should look like. We could push to get something like this in Python core, I think, so this Array_Interface header was available to everybody. typedef struct { int nd; char typekind; int itemsize; int flags; Py_intptr_t *shape; Py_intptr_t *strides; Py_intptr_t offset; void *data; } PyArray_Interface

Travis Oliphant <oliphant@ee.byu.edu> writes:
David M. Cooke wrote:
"C
I had posted an idea before:
http://thread.gmane.org/gmane.comp.python.numeric.general/2159
It would still be one attribute lookup, but then everything would be C-based from there on.
Thanks for reminding us of your idea again. This is a very good idea, that I think we could add. My only question is should Py_LONG_LONG be used or Py_intptr_t? Since the array has to be in memory anyway there does not seem any advantage to using Py_LONG_LONG.
Ok, I see how that works. I probably wasn't aware of the existence of Py_intptr_t at the time :-)
I also think a flags member of the structure is useful along with a typestr instead of typecode.
The point of a typecode instead of a typestr was to make it easy for code to determine the type. Endianness would be part of the flags, and the number of bytes for the type would be in itemsize.
I would say the C-struct should look like. We could push to get something like this in Python core, I think, so this Array_Interface header was available to everybody.
typedef struct { int nd; char typekind; int itemsize; int flags; Py_intptr_t *shape; Py_intptr_t *strides; Py_intptr_t offset; void *data; } PyArray_Interface
If it's in the Python core, then this is fine. If we did it ourself as an informal protocol (like the array interface spec), I'd still add the version member as a sanity check. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca

Travis Oliphant wrote:
like. We could push to get something like this in Python core, I think, so this Array_Interface header was available to everybody.
That would be great. Until then, it would still be a tiny header that others could easily include with their code. David version number would help keep things sorted out. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker wrote:
Travis Oliphant wrote:
like. We could push to get something like this in Python core, I think, so this Array_Interface header was available to everybody.
That would be great. Until then, it would still be a tiny header that others could easily include with their code. David version number would help keep things sorted out.
I've placed an updated array interface description that includes this struct-based access on http://numeric.scipy.org Somebody will add support for this in scipy soon too. -Travis

Francesc Altet wrote:
Hi,
I'm trying to start using the array protocol for conversion between Numeric <--> numarray (and newcore in the future), but I'm a bit disappointed because of its performance. For numarray --> Numeric we have:
t1=timeit.Timer("num=Numeric.array(na)", "import numarray; import
Numeric; na=numarray.arange(10)")
t1.repeat(3,10000)
[0.59375977516174316, 0.57908082008361816, 0.56574010848999023]
t2=timeit.Timer("num=Numeric.fromstring(na._data,typecode=na.typecode())", "import numarray; import Numeric; na=numarray.arange(10)") t2.repeat(3,10000)
[0.11653494834899902, 0.1140749454498291, 0.1141819953918457]
i.e. the array protocol seems 5x slower than the fromstring() method.
Conversely, for Numeric --> numarray:
t3=timeit.Timer("na=numarray.array(num)", "import numarray; import
Numeric;num=Numeric.arange(10)")
t3.repeat(3,10000)
[1.3475611209869385, 1.3277668952941895, 1.3417830467224121]
t4=timeit.Timer("na=numarray.array(buffer(num),type=num.typecode(),shape=num.shape)", "import numarray; import Numeric; num=Numeric.arange(10)") t4.repeat(3,10000)
[0.42027187347412109, 0.41690587997436523, 0.41626906394958496]
in this case, the array protocol is 3x slower than using the buffer interface.
I'm wondering whether this relatively poor performance in present implementation of the array protocol is surmountable or is an intrinsic limitation of it.
Thanks,
I was working on improving this for numarray yesterday so some improvements are in CVS now. I moved some of the Python array interface properties down to C attributes and the performance ratio for my debug Python is now <2x for numarray-->Numeric. numarray's array interface for Numeric-->numarray was degenerating to fromlist(); I added array interface "consumption" support for numerical arrays by beefing up numarray's array() function. I tweaked your benchmarks a little to support profiling as well as timing and attached the result. % python arrayif_bench.py numarray-->Numeric array_if: [0.35534501075744629, 0.36865997314453125, 0.36826896667480469] numarray-->Numeric fromstring: [0.36841487884521484, 0.21085405349731445, 0.20747494697570801] Numeric-->numarray array_if: [0.73384881019592285, 0.6396629810333252, 0.60234308242797852] Numeric-->numarray buffer_if: [0.36455512046813965, 0.24601507186889648, 0.23759102821350098]

El dt 01 de 11 del 2005 a les 12:27 -0500, en/na Todd Miller va escriure:
I was working on improving this for numarray yesterday so some improvements are in CVS now.
I moved some of the Python array interface properties down to C attributes and the performance ratio for my debug Python is now <2x for numarray-->Numeric.
Good! so for numarray-->Numeric we have really good performance now.
numarray's array interface for Numeric-->numarray was degenerating to fromlist(); I added array interface "consumption" support for numerical arrays by beefing up numarray's array() function.
I'm not sure what are you saying here. You mean that in CVS there is already code for Numeric-->numarray that does not degenerate to fromlist() anymore? If this is the case, it's great news, of course. I'll try it as soon as I can. Thanks, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Francesc Altet wrote:
El dt 01 de 11 del 2005 a les 12:27 -0500, en/na Todd Miller va escriure:
numarray's array interface for Numeric-->numarray was degenerating to fromlist(); I added array interface "consumption" support for numerical arrays by beefing up numarray's array() function.
I'm not sure what are you saying here. You mean that in CVS there is already code for Numeric-->numarray that does not degenerate to fromlist() anymore?
Yes. I added a check in array() for suppliers of the array-interface which are not NDArrays. It only works for numerical arrays. Todd

El dt 01 de 11 del 2005 a les 13:23 -0500, en/na Todd Miller va escriure:
Yes. I added a check in array() for suppliers of the array-interface which are not NDArrays. It only works for numerical arrays.
Sounds good. I suppose that you will add the same check to asarray() in order to avoid the copy, isn't it? --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
participants (5)
-
Chris Barker
-
cookedm@physics.mcmaster.ca
-
Francesc Altet
-
Todd Miller
-
Travis Oliphant