Aligning an array on Windows
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
Hi, Some time ago I made an improvement in speed on the numexpr version of PyTables so as to accelerate the operations with unaligned arrays (objects that can appear quite commonly when dealing with columns of recarrays, as PyTables does). This improvement has demostrated to work correctly and flawlessly in Linux machines (using GCC 4.x and in both 32-bit and 64-bit Linux boxes) for several weeks of intensive testing. Moreover, its speed-up is ranging from a 40% on modern processors and up to a 70% in older ones, so I'd like to keep it. The surprise came today when I tried to compile the same code on a Windows box (Win XP Service Pack 2) using MSVC 7.1, through the free (as in beer) Toolkit 2003. The compilation process went fine, but the problem is that I'm getting crashes from time to time when running the numexpr test suite. After some in-depth investigation, I'm pretty sure that the problem is in a concrete part of the code that I'd modified for this improvement. IMO, the affected code is in numexpr/interp_body.c and reads like: case OP_COPY_II: VEC_ARG1(memcpy(dest, x1, sizeof(int)); dest += sizeof(int); x1 += stride1); case OP_COPY_LL: VEC_ARG1(memcpy(dest, x1, sizeof(long long)); dest += sizeof(long long); x1 += stride1); case OP_COPY_FF: VEC_ARG1(memcpy(dest, x1, sizeof(double)); dest += sizeof(double); x1 += stride1); case OP_COPY_CC: VEC_ARG1(memcpy(dest, x1, sizeof(double)*2); dest += sizeof(double)*2; x1 += stride1); This might seem complicated, but it is not. Each of the OP_COPY_XX is a function that has to copy source (x1) to destination (dest) for int, long long, double and complex data types (this function will be called in a loop for copying all the data in array). The code for the original numexpr reads as: case OP_COPY_BB: VEC_ARG1(b_dest = b1); case OP_COPY_II: VEC_ARG1(i_dest = i1); case OP_COPY_FF: VEC_ARG1(f_dest = f1); case OP_COPY_CC: VEC_ARG1(cr_dest = c1r; ci_dest = c1i); i.e. the copy is done through direct assignment. This can be done because, in the original numexpr, an array is always guaranteed to reach this part of the code (the computing kernel) in the aligned form. But in my code, this is not guaranteed (the copy is made precisely for alignment purposes), so this is why I need to make use of memcpy/memmove calls. The thing I don't see is why my version of the code can create problems on Windows platforms and work perfectly on Linux ones. I've tried to use memmove instead of memcpy, but the problem persists. I've had a look at how numpy makes an 'aligned' copy of an unaligned array, and it seems to me that it uses memcpy/memmove (not sure when you should use one or another) just as I use it above. It might be possible that the problem is in another place, but my tests reaffirm me in the possibility that something is wrong with my copy code above (but again, I can't see where). Of course, we can get rid of this optimization but it is a bit depressing to have renounce to it just because it doesn't work on Windows :( Thanks in advance for any hint you may provide! -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
A Dijous 24 Maig 2007 20:33, Francesc Altet escrigué:
Just for the record: I've found the culprit. The problem here was the use of the stride1 variable that was declared just above the main switch for opcodes as: intp stride1 = params.memsteps[arg1]; Unfortunately, this assignment gave problems because arg1 can take values out of the range of memsteps array. The solution was to use another variable, that was initialized as: intp sb1 = params.memsteps[arg1]; but in the VEC_ARG* macros, just after the BOUNDS_CHECK(arg1) call, so that it checks that arg1 doesn't get out of range. All in all, a very subtle bug that would have evident for Numexpr main authors, but not for me. Anyway, you can find the details of the fix in: http://www.pytables.org/trac/changeset/2939 I don't know exactly why, this wasn't giving problems with Linux boxes. Fortunately, Windows platform is much more finicky in terms of memory problems and brought this bug to my attention. Oh, thanks god for letting Windows be! ;) Cheers, --
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
On 5/25/07, Francesc Altet <faltet@carabos.com> wrote:
A Dijous 24 Maig 2007 20:33, Francesc Altet escrigué:
[SNIP] Just for the record: I've found the culprit. The problem here was the use
Don't feel bad, when I had a very similar problem early on when we were first adding multiple types and it mystified me for considerably longer than this seems to have stumped you. I don't know exactly why, this wasn't giving problems with Linux boxes.
;-) -- //=][=\\ tim.hochberg@ieee.org
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
El dv 25 de 05 del 2007 a les 14:19 -0700, en/na Timothy Hochberg va escriure:
Well, I wouldn't say the truth if I say that this doesn't help ;) Anyway, I think that this piece of code is dangerous enough and in order to avoid someone (including me!) tripping over it again, it would be nice to apply the next 'patch': Index: interp_body.c =================================================================== --- interp_body.c (revision 3053) +++ interp_body.c (working copy) @@ -89,6 +89,9 @@ unsigned int arg2 = params.program[pc+3]; #define arg3 params.program[pc+5] #define store_index params.index_data[store_in] + /* WARNING: From now on, only do references to params.mem[arg[123]] + & params.memsteps[arg[123]] inside the VEC_ARG[123] macros, + or you will risk accessing invalid addresses. */ #define reduce_ptr (dest + flat_index(&store_index, j)) #define i_reduce *(long *)reduce_ptr #define f_reduce *(double *)reduce_ptr Cheers, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
[Much time passes] I went ahead and added this warning. I kept thinking I should write a more detailed explanation about why this is a problem, but I never got around to it. At least this let's people know that there are some dragons to be wary of. -tim On 5/28/07, Francesc Altet <faltet@carabos.com> wrote:
-- . __ . |-\ . . tim.hochberg@ieee.org
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
A Dijous 24 Maig 2007 20:33, Francesc Altet escrigué:
Just for the record: I've found the culprit. The problem here was the use of the stride1 variable that was declared just above the main switch for opcodes as: intp stride1 = params.memsteps[arg1]; Unfortunately, this assignment gave problems because arg1 can take values out of the range of memsteps array. The solution was to use another variable, that was initialized as: intp sb1 = params.memsteps[arg1]; but in the VEC_ARG* macros, just after the BOUNDS_CHECK(arg1) call, so that it checks that arg1 doesn't get out of range. All in all, a very subtle bug that would have evident for Numexpr main authors, but not for me. Anyway, you can find the details of the fix in: http://www.pytables.org/trac/changeset/2939 I don't know exactly why, this wasn't giving problems with Linux boxes. Fortunately, Windows platform is much more finicky in terms of memory problems and brought this bug to my attention. Oh, thanks god for letting Windows be! ;) Cheers, --
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
On 5/25/07, Francesc Altet <faltet@carabos.com> wrote:
A Dijous 24 Maig 2007 20:33, Francesc Altet escrigué:
[SNIP] Just for the record: I've found the culprit. The problem here was the use
Don't feel bad, when I had a very similar problem early on when we were first adding multiple types and it mystified me for considerably longer than this seems to have stumped you. I don't know exactly why, this wasn't giving problems with Linux boxes.
;-) -- //=][=\\ tim.hochberg@ieee.org
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
El dv 25 de 05 del 2007 a les 14:19 -0700, en/na Timothy Hochberg va escriure:
Well, I wouldn't say the truth if I say that this doesn't help ;) Anyway, I think that this piece of code is dangerous enough and in order to avoid someone (including me!) tripping over it again, it would be nice to apply the next 'patch': Index: interp_body.c =================================================================== --- interp_body.c (revision 3053) +++ interp_body.c (working copy) @@ -89,6 +89,9 @@ unsigned int arg2 = params.program[pc+3]; #define arg3 params.program[pc+5] #define store_index params.index_data[store_in] + /* WARNING: From now on, only do references to params.mem[arg[123]] + & params.memsteps[arg[123]] inside the VEC_ARG[123] macros, + or you will risk accessing invalid addresses. */ #define reduce_ptr (dest + flat_index(&store_index, j)) #define i_reduce *(long *)reduce_ptr #define f_reduce *(double *)reduce_ptr Cheers, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
[Much time passes] I went ahead and added this warning. I kept thinking I should write a more detailed explanation about why this is a problem, but I never got around to it. At least this let's people know that there are some dragons to be wary of. -tim On 5/28/07, Francesc Altet <faltet@carabos.com> wrote:
-- . __ . |-\ . . tim.hochberg@ieee.org
participants (2)
-
Francesc Altet
-
Timothy Hochberg