
On Tue, Aug 23, 2011 at 08:15, Antoine Pitrou <solipsis@pitrou.net> wrote:
So why would you need three separate implementation of the unrolled loop? You already have a macro named WRITE_FLEXIBLE_OR_WSTR.
The WRITE_FLEXIBLE_OR_WSTR macro does a check for kind and then writes. Using this macro for the fast path would be inefficient, to have a real fast path, you would need a outer if to check for kind and then in each condition body the matching access to the string (1, 2, or 4 bytes) and for each body also write 4 or 8 times (guarded by #ifdef, depending on platform). As all these cases bloated up the C code, we went for the simple solution with the goal of profiling the code again afterwards to see where the new performance bottlenecks would be.
Even without taking into account the unrolled loop, I wonder how much slower UTF-8 decoding becomes with that approach, by the way. Instead of testing the "kind" variable at each loop iteration, using a stringlib-like approach may be a better deal IMO.
To me this feels like this would complicate the C source code and decrease readability. For each function you would need a wrapper which does the kind checking logic and then, in a separate file, the implementation of the function which then gets included three times for each character width. Regards, Torsten