I'd bet a case of beer (or cash equivalent) that one of the main bottlenecks is the path PySequence_GetItem->_ndarray_item->_universalIndexing->_simpleIndexing->_simpleIndexingCore. The path through _universalIndexing in particular, if I deciphered it correctly, looks very slow. I don't think it needs to be that way though, _universalIndexing could probably be sped up, but more promising I think _ndarray_item could be made to call _simpleIndexingCore without all that much work. It appears that this would save the creation of several intermediate objects and it also looks like a couple of calls back to python! I'm not familiar with this code though, so I could easily be missing something that makes calling _simpleIndexingCore harder than it looks. -tim