David M. Cooke wrote:
Tim Hochberg <tim.hochberg@cox.net> writes:
That makes sense. One thought I had with respect to the various numpy functions (sin, cos, pow, etc) was to just have the bytecodes:
call_unary_function, function_id, store_in, source call_binary_function, function_id, store_in, source1, source2 call_trinary_function, function_id, store_in, source1, source2, source3
Then just store pointers to the functions in relevant tables. In it's most straightforward form, you'd need 6 character chunks of bytecode instead of four. However, if that turns out to slow everything else down I think it could be packed down to 4 again. The function_ids could probably be packed into the opcode (as long as we stay below 200 or so functions, which is probably safe), the other way to pack things down is to require that one of the sources for trinary functions is always a certain register (say register-0). That requires a bit more cleverness at the compiler level, but is probably feasible.
That's along the lines I'm thinking of. It seems to me that if evaluating the function requires a function call (and not an inlined machine instruction like the basic ops), then we may as well dispatch like this (plus, it's easier :). This could also allow for user extensions. Binary and trinary (how many of those do we have??) could maybe be handled by storing the extra arguments in a separate array.
We may want to reconsider this at least partially. I tried implementing a few 1-argument functions two way. First as a table lookup and second using a dedicated opcode. The first gave me a speedup of almost 60%, but the latter gave me a speedup of 100%. The difference suprised me, but I suspect it's do to the fact that the x86 supports some functions directly, so the function call gets optimized away for sin and cos just as it does for +-*/. That implies that some functions should get there own opcodes, while others are not worthy. This little quote from wikipedia lists the functions we would want to give there own opcodes to: x86 (since the 80486DX processor) assembly language includes a stack based floating point unit which can perform addition, subtraction, negation, multiplication, division, remainder, square roots, integer truncation, fraction truncation, and scale by power of two. The operations also include conversion instructions which can load or store a value from memory in any of the following formats: Binary coded decimal, 32-bit integer, 64-bit integer, 32-bit floating point, 64-bit floating point or 80-bit floating point (upon loading, the value is converted to the currently floating point mode). The x86 also includes a number of transcendental functions including sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to bases 2, 10, or e <http://www.answers.com/main/ntquery;jsessionid=aal3fk2ck8cuc?method=4&dsid=2222&dekey=E+%28mathematical+constant%29&gwp=8&curtab=2222_1&sbid=lc04b>. So, that's my new proposal: some functions get there own opcodes (sin, cos, ln, log10, etc), while others get shunted to table lookup (not sure what's in that list yet, but I'm sure there's lots). -tim
I'm going to look at adding more smarts to the compiler, too. Got a couple books on them :-)
Different data types could be handled by separate input arrays, and a conversion opcode ('int2float', say).