My guess;

First of all, you are actually manipulating twice as much data as opposed to 
an inplace sort.

Moreover, an inplace sort gains locality as it is being sorted, whereas the 
argsort is continuously making completely random memory accesses.

currently using numpy 1.6.1

What's the fastest argsort for a 1d array with around 28 Million
elements, roughly uniformly distributed, random order?

Is there a reason that np.argsort is almost 3 times slower than np.sort?

I'm doing semi-systematic timing for a stats(models) algorithm.

