Warning: the contents of this message may be dangerous for readers with heart conditions.
It appears some people are offended by your exuberance. Ignore them ;-)
... And it turns out that if you do that, PyFloat_RichCompare becomes ONE LINE (after we set op=Py_LT)!!! Just
Right. And this is why floats will likely be the best case for your approach: all the overheads of finding "the right" bottom-level comparison are weighed against what's basically a single machine instruction to do the actual work of comparing two floats.
return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) == 0 ? Py_False : Py_True;
In real life, this is better written as:
return (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True : Py_False;
That is, the comparison to 0 was an artificial complication left over from simplifying the code.
However, that code is buggy, in a way that may not show up for a long time. Keeping track of reference counts is crucial. It needs to be more like:
PyObject *res: res = (PyFloat_AS_DOUBLE(v) < PyFloat_AS_DOUBLE(w)) ? Py_True : Py_False; Py_INCREF(res); return res;
So I thought, wow, this will give some nice numbers! But I underestimated the power of this optimization. You have no idea. It's crazy.
It's in the ballpark of what I expected :-)
Since we're dealing with floats, I used Tim Peter's benchmark: Lib/test/sortperf.py, just modifying one function:
It's actually Guido's file, although I probably changed it much more recently than he did.
As I explained in a different msg, it's _not_ a good benchmark. It's good at distinguishing among "really really good", "not a huge change", and "really really bad", but that's all. At the start, such gross distinctions are all anyone should care about.
One thing I'd like to see: do exactly the same, but comment out your float specialization. It's important to see whether sortperf.py says "not a huge change" then (which I expect, but it needs confirmation). That is, it's not enough if some cases get 4x faster if it's also that other cases get much slower.
... #My function: def doit(L): F = FastList(L) f0 = time.perf_counter() F.fastsort() f1 = time.perf_counter() F = FastList(L) t0 = time.perf_counter() F.sort() t1 = time.perf_counter() print("%6.2f%%" % (100*(1-(f1-f0)/(t1-t0))), end=' ') flush()
Numbers in benchmarks always need to be explained, because they're only clear to the person who wrote the code. From your code, you're essentially computing
(old_time - new_time) / old_time * 100.0
but in an obfuscated way. So long as the new way is at least as fast, the only possible values are between 0% and 100%. That's important to note. Other people, e.g., measure these things by putting "new_time" in the denominator. Or just compute a ratio, like old_time / new_time.
They way you're doing it, e.g., "75%" means the new way took only a quarter of the time of the old way - it means new_time is 75% smaller than old_time. I'm comfortable with that, but it's not the _clearest_ way to display timing differences so dramatic; old_time / new_time would be 4.0 in this case, which is easier to grasp at a glance.
So the benchmarking here is valid.
No, it sucks ;-) But it's perfectly adequate for what I wanted to see from it :-)
I didn't write it. All I did was modify it to print percent improvement instead of sort time. The numbers below are percent improvement of my sort vs default sort (just clone my repo and run python sortperf.py to verify):
i 2**i *sort \sort /sort 3sort +sort %sort ~sort =sort !sort 15 32768 44.11% 54.69% 47.41% 57.61% 50.17% 75.24% 68.03% 65.16% 82.16% 16 65536 14.14% 75.38% 63.44% 56.56% 67.99% 66.19% 50.72% 61.55% 61.87% 17 131072 73.54% 60.52% 60.97% 61.63% 52.55% 49.84% 68.68% 84.12% 65.90% 18 262144 54.19% 55.34% 54.67% 54.13% 55.62% 52.88% 69.30% 74.86% 72.66% 19 524288 55.12% 53.32% 53.77% 54.27% 52.97% 53.53% 67.55% 76.60% 78.56% 20 1048576 55.05% 51.09% 60.05% 50.69% 62.98% 50.20% 66.24% 71.47% 61.40%
If it _didn't_ suck, all the numbers in a column would be about the same :-)
A meta-note: when Guido first wrote sortperf.py, machines - and Python! - were much slower. Now sorting 2**15 elements goes so fast that this gross timing approach is especially only good for making the grossest distinctions. On my box today, even the slowest case (*sort on 2**20 elements) takes under half a second. That's "why" the numbers in each column are much more stable across the last 3 rows than across the first 3 rows - the cases in the first 3 rows take so little time that any timing glitch can skew them badly.
Note that you can pass arguments to sortperf.py to check any range of power-of-2 values you like, but if you're aware of the pitfalls the output as-is is fine for me.
This is just insane. This is crazy.
Yet nevertheless wholly expected ;-)
... This is so cool.
It is! Congratulations :-)