For a pivoted algorithm, I have to perform an operation that in fully vectorized form can be expressed as: pivot = tableau[locat,:]/tableau[locat,cand] tableau -= tableau[:,cand:cand+1]*pivot tableau[locat,:] = pivot tableau is a rather large bidimensional array, and I'd like to avoid the allocation of a temporary array of the same size holding the result of the right-hand side expression in the second line of code (the outer product of tableau[:,cand] and pivot). On the other hand, if I replace that line with: for i in xrange(tableau.shape[0]): tableau[i] -= tableau[i,cand]*pivot ...I incur some CPU overhead for the "for" loop -- and this part of code is the botteneck of the whole algorithm. Is there any smarter (i.e., more time-efficient) way of achieving my goal? TIA -- Enzo
On Sun, Dec 26, 2010 at 3:51 AM, Enzo Michelangeli <enzomich@gmail.com> wrote:
For a pivoted algorithm, I have to perform an operation that in fully vectorized form can be expressed as:
pivot = tableau[locat,:]/tableau[locat,cand] tableau -= tableau[:,cand:cand+1]*pivot tableau[locat,:] = pivot
tableau is a rather large bidimensional array, and I'd like to avoid the allocation of a temporary array of the same size holding the result of the right-hand side expression in the second line of code (the outer product of tableau[:,cand] and pivot). On the other hand, if I replace that line with:
for i in xrange(tableau.shape[0]): tableau[i] -= tableau[i,cand]*pivot
...I incur some CPU overhead for the "for" loop -- and this part of code is the botteneck of the whole algorithm. Is there any smarter (i.e., more time-efficient) way of achieving my goal?
just a generic answer: Working in batches can be a good compromise in some cases. I instead of working in a loop with one row at a time, loop and handle, for example, 1000 rows at a time. Josef
TIA --
Enzo
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Dec 26, 2010 at 7:34 AM, <josef.pktd@gmail.com> wrote:
On Sun, Dec 26, 2010 at 3:51 AM, Enzo Michelangeli <enzomich@gmail.com> wrote:
For a pivoted algorithm, I have to perform an operation that in fully vectorized form can be expressed as:
pivot = tableau[locat,:]/tableau[locat,cand] tableau -= tableau[:,cand:cand+1]*pivot tableau[locat,:] = pivot
tableau is a rather large bidimensional array, and I'd like to avoid the allocation of a temporary array of the same size holding the result of the right-hand side expression in the second line of code (the outer product of tableau[:,cand] and pivot). On the other hand, if I replace that line with:
for i in xrange(tableau.shape[0]): tableau[i] -= tableau[i,cand]*pivot
...I incur some CPU overhead for the "for" loop -- and this part of code is the botteneck of the whole algorithm. Is there any smarter (i.e., more time-efficient) way of achieving my goal?
just a generic answer:
Working in batches can be a good compromise in some cases. I instead of working in a loop with one row at a time, loop and handle, for example, 1000 rows at a time.
Josef
TIA --
Enzo
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
If this is really such a big bottleneck, then I would look into using Cython for this part. With just a few cdef's, I bet that that you could speed up the for loop tremendously. Depending on the details of your algorithm, you might want to make a Cython function that takes tableau, cand and pivot as inputs and just does the for loop part.
From the error message issued by g++, it would appear that blitz can't
Many thanks to Josef and Justin for their replies. Josef's hint sounds like a good way of reducing peak memory allocation especially when the row size is large, which makes the "for" overhead for each iteration comparatively lower. However, time is still spent in back-and-forth conversions between numpy arrays and the native BLAS data structures, and copying data from the temporary array holding the intermediate results and tableau. Regarding Justin's suggestion, before trying Cython (which, according to http://wiki.cython.org/tutorials/numpy , seems to require a bit of work to handle numpy arrays properly) I was looking at weave.blitz . Unfortunately, this doesn't seems to like my code. Running code containing: expr = "tableau = tableau - tableau[:,cand:cand+1]*pivot" weave.blitz(expr) ...elicits: /---------------------------------------------------------- distutils.errors.CompileError: error: Command "g++ -mno-cygwin -O2 -Wall -IC:\Python26\lib\site-packages\scipy\weave -IC:\Python26\lib\site-packages\scipy\weave\scxx -IC:\Python26\lib\site-packages\scipy\weave\blitz -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp -o c:\docume~1\admin\locals~1\temp\ADMIN\python26_intermediate\compiler_4b433fbf94fa137eaa5ee69a06987eda\Release\docume~1\admin\locals~1\temp\admin\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.o" failed with exit status 1 \---------------------------------------------------------- figure out the type of cand: /---------------------------------------------------------- C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py>g++ -mno-cygwin -O2 -Wall -IC:\Python26\lib\site-packages\scipy\weave -IC:\Python26\lib\site-packages\scipy\weave\scxx -IC:\Python26\lib\site-packages\scipy\weave\blitz -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp -o c:\docume~1\admin\locals~1\temp\ADMIN\python26_intermediate\compiler_4b433fbf94fa137eaa5ee69a06987eda\Release\docume~1\admin\locals~1\temp\admin\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.o In file included from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array-impl.h:37, from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array.h:26, from c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:11: C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h: In member function 'bool blitz::Range::isAscendingContiguous() const': C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h:120: warning: suggest parentheses around '&&' within '||' c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp: In function 'PyObject* compiled_func(PyObject*, PyObject*)': c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: error: ambiguous overload for 'operator+' in 'cand + 1' c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: candidates are: operator+(PyObject*, int) <built-in> c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(int, int) <built-in> c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(float, int) <built-in> c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(double, int) <built-in> c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(char*, int) <built-in> \---------------------------------------------------------- Using a temporary variable to keep cand out of blitz: tmp = tableau[:,cand:cand+1] expr = "tableau = tableau - tmp*pivot" weave.blitz(expr) ...produces an even uglier error message, which makes me think that blitz doesn't understand that the product between a (n,1)-shaped array and an (n,)-shaped one is meant to be an outer product: /---------------------------------------------------------- C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py>LCPSolve.py Found executable C:\Program Files\pythonxy\mingw\bin\g++.exe In file included from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array-impl.h:37, from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array.h:26, from c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp:11: C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h: In member function 'bool blitz::Range::isAscendingContiguous() const': C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h:120: warning: suggest parentheses around '&&' within '||' In file included from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array-impl.h:2504, from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array.h:26, from c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp:11: C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h: In member function 'typename P_op::T_numtype blitz::_bz_ArrayExprBinaryOp<P_expr1, P_expr2, P_op>::operator()(const blitz::TinyVector<int, N_rank>&) [with int N_rank = 2, P_expr1 = blitz::FastArrayIterator<double, 2>, P_expr2 = blitz::FastArrayIterator<double, 1>, P_op = blitz::Multiply<double, double>]': C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:144: instantiated from 'typename P_expr::T_numtype blitz::_bz_ArrayExpr<P_expr>::operator()(const blitz::TinyVector<int, N_rank>&) [with int N_rank = 2, P_expr = blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::FastArrayIterator<double, 1>, blitz::Multiply<double, double> >]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:486: instantiated from 'typename P_op::T_numtype blitz::_bz_ArrayExprBinaryOp<P_expr1, P_expr2, P_op>::operator()(const blitz::TinyVector<int, N_rank>&) [with int N_rank = 2, P_expr1 = blitz::FastArrayIterator<double, 2>, P_expr2 = blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::FastArrayIterator<double, 1>, blitz::Multiply<double, double> >
, P_op = blitz::Subtract<double, double>]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:144: instantiated from 'typename P_expr::T_numtype blitz::_bz_ArrayExpr<P_expr>::operator()(const blitz::TinyVector<int, N_rank>&) [with int N_rank = 2, P_expr = blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::FastArrayIterator<double, 1>, blitz::Multiply<double, double> > , blitz::Subtract<double, double> >]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/eval.cc:670: instantiated from 'blitz::Array<P_numtype, N_rank>& blitz::Array<T, N>::evaluateWithIndexTraversal1(T_expr, T_update) [with T_expr = blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::FastArrayIterator<double, 1>, blitz::Multiply<double, double> > , blitz::Subtract<double, double> > >, T_update = blitz::_bz_update<double, double>, P_numtype = double, int N_rank = 2]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/eval.cc:171: instantiated from 'blitz::Array<P_numtype, N_rank>& blitz::Array<T, N>::evaluate(T_expr, T_update) [with T_expr = blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::FastArrayIterator<double, 1>, blitz::Multiply<double, double> > >, blitz::Subtract<double, double> > >, T_update = blitz::_bz_update<double, double>, P_numtype = double, int N_rank = 2]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/ops.cc:45: instantiated from 'blitz::Array<P_numtype, N_rank>& blitz::Array<T, N>::operator=(const blitz::ETBase<T_expr>&) [with T_expr = blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::_bz_ArrayExpr<blitz::_bz_ArrayExprBinaryOp<blitz::FastArrayIterator<double, 2>, blitz::FastArrayIterator<double, 1>, blitz::Multiply<double, double> > , blitz::Subtract<double, double>
, P_numtype = double, int N_rank = 2]' c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp:732: instantiated from here C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:486: error: no match for call to '(blitz::FastArrayIterator<double, 1>) (const blitz::TinyVector<int, 2>&)' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:74: note: candidates are: P_numtype blitz::FastArrayIterator<T_numtype, N_rank>::operator()(const blitz::TinyVector<int, N_destRank>&) [with P_numtype = double, int N_rank = 1] C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:202: note: P_numtype& blitz::FastArrayIterator<T_numtype, N_rank>::operator()(int) [with P_numtype = double, int N_rank = 1] C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:208: note: P_numtype& blitz::FastArrayIterator<T_numtype, N_rank>::operator()(int, int) [with P_numtype = double, int N_rank = 1] C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:214: note: P_numtype& blitz::FastArrayIterator<T_numtype, N_rank>::operator()(int, int, int) [with P_numtype = double, int N_rank = 1] Traceback (most recent call last): File "C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py\LCPSolve.py", line 132, in <module> w, z, retcode = LCPSolve(M,q) File "C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py\LCPSolve.py", line 94, in LCPSolve weave.blitz(expr) File "C:\Python26\lib\site-packages\scipy\weave\blitz_tools.py", line 65, in blitz **kw) File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line 482, in compile_function verbose=verbose, **kw) File "C:\Python26\lib\site-packages\scipy\weave\ext_tools.py", line 367, in compile verbose = verbose, **kw) File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line 273, in build_extension setup(name = module_name, ext_modules = [ext],verbose=verb) File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Python26\lib\distutils\core.py", line 169, in setup raise SystemExit, "error: " + str(msg) distutils.errors.CompileError: error: Command "g++ -mno-cygwin -O2 -Wall -IC:\Python26\lib\site-packages\scipy\weave -IC:\Python26\lib\site-packages\scipy\weave\scxx -IC:\Python26\lib\site-packages\scipy\weave\blitz -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp -o c:\docume~1\admin\locals~1\temp\ADMIN\python26_intermediate\compiler_4b433fbf94fa137eaa5ee69a06987eda\Release\docume~1\admin\locals~1\temp\admin\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.o" failed with exit status 1 \----------------------------------------------------------
So, for the time being, no speed breakthrough... Enzo ----- Original Message ----- From: "Justin Peel" <jpscipy@gmail.com> To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Sent: Monday, December 27, 2010 2:51 PM Subject: Re: [Numpy-discussion] Optimization suggestion sought On Sun, Dec 26, 2010 at 7:34 AM, <josef.pktd@gmail.com> wrote:
On Sun, Dec 26, 2010 at 3:51 AM, Enzo Michelangeli <enzomich@gmail.com> wrote:
For a pivoted algorithm, I have to perform an operation that in fully vectorized form can be expressed as:
pivot = tableau[locat,:]/tableau[locat,cand] tableau -= tableau[:,cand:cand+1]*pivot tableau[locat,:] = pivot
tableau is a rather large bidimensional array, and I'd like to avoid the allocation of a temporary array of the same size holding the result of the right-hand side expression in the second line of code (the outer product of tableau[:,cand] and pivot). On the other hand, if I replace that line with:
for i in xrange(tableau.shape[0]): tableau[i] -= tableau[i,cand]*pivot
...I incur some CPU overhead for the "for" loop -- and this part of code is the botteneck of the whole algorithm. Is there any smarter (i.e., more time-efficient) way of achieving my goal?
just a generic answer:
Working in batches can be a good compromise in some cases. I instead of working in a loop with one row at a time, loop and handle, for example, 1000 rows at a time.
Josef
TIA --
Enzo
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
If this is really such a big bottleneck, then I would look into using Cython for this part. With just a few cdef's, I bet that that you could speed up the for loop tremendously. Depending on the details of your algorithm, you might want to make a Cython function that takes tableau, cand and pivot as inputs and just does the for loop part. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Dec 27, 2010 at 6:20 AM, Enzo Michelangeli <enzomich@gmail.com> wrote:
Many thanks to Josef and Justin for their replies.
Josef's hint sounds like a good way of reducing peak memory allocation especially when the row size is large, which makes the "for" overhead for each iteration comparatively lower. However, time is still spent in back-and-forth conversions between numpy arrays and the native BLAS data structures, and copying data from the temporary array holding the intermediate results and tableau.
Regarding Justin's suggestion, before trying Cython (which, according to http://wiki.cython.org/tutorials/numpy , seems to require a bit of work to handle numpy arrays properly)
Cython doesn't have to be that complicated. For your example, you just have to unroll the vectorization (and account for the fact that the result is mutated in place, which was your original goal). cimport numpy def do_it(numpy.ndarray[double, ndim=2] tableau, int locat, int cand, bint vectorize=True): cdef numpy.ndarray[double, ndim=1] pivot pivot = tableau[locat,:]/tableau[locat,cand] if vectorize: tableau -= tableau[:,cand:cand+1]*pivot else: for i in range(tableau.shape[0]): for j in range(tableau.shape[1]): if j != cand: tableau[i,j] -= tableau[i,cand] * pivot[j] tableau[:,cand] = 0 tableau[locat,:] = pivot return tableau - Robert
----- Original Message ----- From: "Robert Bradshaw" <robertwb@math.washington.edu> Sent: Wednesday, December 29, 2010 4:47 PM [...]
Regarding Justin's suggestion, before trying Cython (which, according to http://wiki.cython.org/tutorials/numpy , seems to require a bit of work to handle numpy arrays properly)
Cython doesn't have to be that complicated. For your example, you just have to unroll the vectorization (and account for the fact that the result is mutated in place, which was your original goal).
Thanks, but the full de-vectorization forces to give up any use of BLAS (I suppose that for array products numpy relies on its routines). In my tests, the performance in terms of speed is more or less the same as the original pure-numpy code (which may be made less memory-hungry with the chunking suggested by Josef). Instead, it would be nice to have a native function able to perform evaluation of arbitrary numpy expressions without converting the intermediate results in Python format (a sort of "better weave.blitz", able to understand slicing, broadcasting rules etc.). That would give us the best of both worlds: code execution at BLAS speeds, and savings in unnecessary conversions and temporary variable allocations. Such "numpy calculator" could also be a simple interpreter, avoiding the complexities and site dependencies deriving from the use of a C compiler: it should build temporary C data structures for the parameters in input, call the relevant C ATLAS/BLAS/LAPACK functions in the right order (possibly allocating temporary C arrays), and convert only the final result back to a Python object. Enzo
participants (4)
-
Enzo Michelangeli -
josef.pktd@gmail.com -
Justin Peel -
Robert Bradshaw