Oct. 15, 2014
1:46 a.m.
Hi Alex, On 14 October 2014 18:42, Alex Gaynor <alex.gaynor@gmail.com> wrote:
I'm talking about an explicit "vector_3f_add()" primitive, would be pretty straightforward to do the mapping. IIRC it took us less than a day to add READ_TIMESTAMP instruction.
I agree with Maciej on this point. The problem with "vector_3f_add()" is that it needs to use more than the lower 8 bytes of xmm registers. We currently have no support for initializing/allocating/saving/restoring such special xmm registers. The alternative would be to have a "vector_3f_add()" that takes 4 xmm registers, combine them into two, add them, and re-split the result. It's unclear if that's really faster than doing two separate additions though. A bientôt, Armin.