
I spend some time seeing what I could do in the way of speeding up wxPoint_LIST_helper by tweaking the numarray code. My first suspect was _universalIndexing by way of _ndarray_item. However, due to some new-style machinations, _ndarray_item was never getting called. Instead, _ndarray_subscript was being called. So, I added a special case to _ndarray_subscript. This sped things up by 50% or so (I don't recall exactly). The code for that is at the end of this message; it's not gauranteed to be 100% correct; it's all experimental. After futzing around some more I figured out a way to trick python into using _ndarray_item. I added "type->tp_as_sequence->sq_item = _ndarray_item;" to _ndarray new. I then optimized _ndarray_item (code at end). This halved the execution time of my arbitrary benchmark. This trick may have horrible, unforseen consequences so use at your own risk. Finally I commented out the __del__ method numarraycore. This resulted in an additional speedup of 64% for a total speed up of 240%. Still not close to 10x, but a large improvement. However, this is obviously not viable for real use, but it's enough of a speedup that I'll try to see if there's anyway to move the shadow stuff back to tp_dealloc. In summary: Version Time Rel Speedup Abs Speedup Stock 0.398 ---- ---- _naarray_item mod 0.192 107% 107% del __del__ 0.117 64% 240% There were a couple of other things I tried that resulted in additional small speedups, but the tactics I used were too horrible to reproduce here. The main one of interest is that all of the calls to NA_updateDataPtr seem to burn some time. However, I don't have any idea what one could do about that. That's all for now. -tim static PyObject* _ndarray_subscript(PyArrayObject* self, PyObject* key) { PyObject *result; #ifdef TAH if (PyInt_CheckExact(key)) { long ikey = PyInt_AsLong(key); long offset; if (NA_getByteOffset(self, 1, &ikey, &offset) < 0) return NULL; if (!NA_updateDataPtr(self)) return NULL; return _simpleIndexingCore(self, offset, 1, Py_None); } #endif #if _PYTHON_CALLBACKS result = PyObject_CallMethod( (PyObject *) self, "_universalIndexing", "(OO)", key, Py_None); #else result = _universalIndexing(self, key, Py_None); #endif return result; } static PyObject * _ndarray_item(PyArrayObject *self, int i) { #ifdef TAH long offset; if (NA_getByteOffset(self, 1, &i, &offset) < 0) return NULL; if (!NA_updateDataPtr(self)) return NULL; return _simpleIndexingCore(self, offset, 1, Py_None); #else PyObject *result; PyObject *key = PyInt_FromLong(i); if (!key) return NULL; result = _universalIndexing(self, key, Py_None); Py_DECREF(key); return result; #endif }