Right - sorry about that last bit!
I couldn't figure out how to use timeit/perf for this because the list has to be reinitialized each time it gets sorted. So with time.time() I can just wrap the sorting and cut the initialization out (and I could always throw it in a for loop to do it as many times as necessary). But with timeit/perf, as I understand, you just give it a number of iterations and a code snippet and it loops for you. So I don't see how I could get it to cut out the initialization. There's an option to provide setup code, of course, but I need to set up before each trial, not just before the loop.
Regardless, one could always wrap the whole contribution with a length test and disable for lists with less than, say, 1000 elements. And though for the above reason I couldn't figure out how to benchmark properly for small lists, it's clear that for large lists this gives a real improvement with very little change in the code: the fact is that it really doesn't make sense to do all this typechecking nlogn times when we can just get it over with once at the beginning. The cost is very, very small, as demonstrated by the last benchmark (which is for a 1e7 list, so it is probably more valid with my crude technique), so heterogeneous lists don't get slowed down perceptibly.