Tim Hochberg wrote:
We may want to reconsider this at least partially. I tried implementing a few 1-argument functions two way. First as a table lookup and second using a dedicated opcode. The first gave me a speedup of almost 60%, but the latter gave me a speedup of 100%. The difference suprised me, but I suspect it's do to the fact that the x86 supports some functions directly, so the function call gets optimized away for sin and cos just as it does for +-*/. That implies that some functions should get there own opcodes, while others are not worthy. This little quote from wikipedia lists the functions we would want to give there own opcodes to:
x86 (since the 80486DX processor) assembly language includes a stack based floating point unit which can perform addition, subtraction, negation, multiplication, division, remainder, square roots, integer truncation, fraction truncation, and scale by power of two. The operations also include conversion instructions which can load or store a value from memory in any of the following formats: Binary coded decimal, 32-bit integer, 64-bit integer, 32-bit floating point, 64-bit floating point or 80-bit floating point (upon loading, the value is converted to the currently floating point mode). The x86 also includes a number of transcendental functions including sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to bases 2, 10, or e
So, that's my new proposal: some functions get there own opcodes (sin, cos, ln, log10, etc), while others get shunted to table lookup (not sure what's in that list yet, but I'm sure there's lots).
For the same reason, I think these same functions should get their own ufunc loops instead of using the default loop with function pointers. Thanks for providing this link. It's a useful list. -Travis