where are the benefits of ldexp and/or "array times 2"?
Hi all, I expected to have some speedup via using ldexp or multiplying an array by a power of 2 (doesn't it have to perform a simple shift of mantissa?), but I don't see the one. Have I done something wrong? See the code below. from scipy import rand from numpy import dot, ones, zeros, array, ldexp from time import time N = 1500 A = rand(N, N) b = rand(N) b2 = 2*ones(A.shape, 'int32') I = 100 t = time() for i in xrange(I): dot(A, b) # N^2 multiplications + some sum operations #A * 2.1 # N^2 multiplications, so it should consume no greater than 1st line time #ldexp(A, b2) # it should consume no greater than prev line time, isn't it? print 'time elapsed:', time() - t # 1st case: 0.62811088562 # 2nd case: 2.00850605965 # 3rd case: 6.79027700424 # Let me also note - # 1) using b = 2 * ones(N) or b = zeros(N) doesn't yield any speedup vs b = rand() # 2) using A * 2.0 (or mere 2) instead of 2.1 doesn't yield any speedup, despite it is exact integer power of 2.
On Thu, May 21, 2009 at 10:26, dmitrey <dmitrey.kroshko@scipy.org> wrote:
Hi all, I expected to have some speedup via using ldexp or multiplying an array by a power of 2 (doesn't it have to perform a simple shift of mantissa?),
Addition of the exponent, not shift of the mantissa.
but I don't see the one.
I said there *might* be a speedup, but it was probably going to be insignificant. The overhead of using frexp and ldexp probably outweighs any benefits. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
dmitrey schrieb:
Hi all, I expected to have some speedup via using ldexp or multiplying an array by a power of 2 (doesn't it have to perform a simple shift of mantissa?), but I don't see the one.
# Let me also note - # 1) using b = 2 * ones(N) or b = zeros(N) doesn't yield any speedup vs b = rand() # 2) using A * 2.0 (or mere 2) instead of 2.1 doesn't yield any speedup, despite it is exact integer power of 2.
On recent processors multiplication is very fast and takes 1.5 clock cycles (float, double precision), independent of the values. There is very little gain by using bit shift operators. Gregor
A Friday 22 May 2009 11:55:31 Gregor Thalhammer escrigué:
dmitrey schrieb:
Hi all, I expected to have some speedup via using ldexp or multiplying an array by a power of 2 (doesn't it have to perform a simple shift of mantissa?), but I don't see the one.
# Let me also note - # 1) using b = 2 * ones(N) or b = zeros(N) doesn't yield any speedup vs b = rand() # 2) using A * 2.0 (or mere 2) instead of 2.1 doesn't yield any speedup, despite it is exact integer power of 2.
On recent processors multiplication is very fast and takes 1.5 clock cycles (float, double precision), independent of the values. There is very little gain by using bit shift operators.
...unless you use the vectorization capabilities of modern Intel-compatible processors and shift data in bunches of up to 4 elements (i.e. the number of floats that fits on a 128-bit SSE2 register), in which case you can perform operations up to a speed of 0.25 cycles/element. Indeed, that requires dealing with SSE2 instructions in your code, but using latest GCC, ICC or MSVC implementations, this is not that difficult. Cheers, -- Francesc Alted
participants (4)
-
dmitrey
-
Francesc Alted
-
Gregor Thalhammer
-
Robert Kern