Here's an idea Fernando and I have briefly talked about off-list, but which perhaps bears talking about here: Is there speed to be gained by an alternative, very simple, very optimized ndarray constructor? The idea would be a special-case constructor with very limited functionality designed purely for speed. It wouldn't support (m)any of the fantastic things Travis has done, but would be useful only in specialized use cases, such as creating indices.
I'm not familiar enough with what the normal constructor does to know if we could implement something, (in C, perhaps) that would do nothing but create a simple, contiguous array significantly faster than what is currently done. Or does the current constructor create a new instance about as fast as possible? I know Travis has optimized it, but it's a general purpose constructor, and I'm thinking these extra features may take some extra CPU cycles.
I think the indexing code will be slower because it is more sophisticated than Numeric's. Basically, it has to check for fancy indexing before defaulting to the old way. I see this as more of a slow-down than array creation. It might be possible to improve it --- more eyeballs are always helpful. But, I'm not sure how at this point.
-Travis