[SciPy-User] fast small matrix multiplication with cython?

Thu Dec 9 16:33:41 EST 2010

On Wed, Dec 8, 2010 at 11:28 PM,  <josef.pktd at gmail.com> wrote:
>>
>> It looks like I don't save too much time with just Python/scipy
>> optimizations.  Apparently ~75% of the time is spent in l-bfgs-b,
>> judging by its user time output and the profiler's CPU time output(?).
>>  Non-cython versions:
>>
>> Brief and rough profiling on my laptop for ARMA(2,2) with 1000
>> observations.  Optimization uses fmin_l_bfgs_b with m = 12 and iprint
>> = 0.
>
> Completely different idea: How costly are the numerical derivatives in l-bfgs-b?
> With l-bfgs-b, you should be able to replace the derivatives with the
> complex step derivatives that calculate the loglike function value and
> the derivatives in one iteration.
>

I couldn't figure out how to use it without some hacks.  The
fmin_l_bfgs_b will call both f and fprime as (x, *args), but
approx_fprime or approx_fprime_cs need actually approx_fprime(x, func,
args=args) and call func(x, *args).  I changed fmin_l_bfgs_b to make
the call like this for the gradient, and I get (different computer)

Using approx_fprime_cs
-----------------------------------
         861609 function calls (861525 primitive calls) in 3.337 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       70    1.942    0.028    3.213    0.046 kalmanf.py:504(loglike)
   840296    1.229    0.000    1.229    0.000 {numpy.core._dotblas.dot}
       56    0.038    0.001    0.038    0.001 {numpy.linalg.lapack_lite.zgesv}
      270    0.025    0.000    0.025    0.000 {sum}
       90    0.019    0.000    0.019    0.000 {numpy.linalg.lapack_lite.dgesdd}
       46    0.013    0.000    0.014    0.000
function_base.py:494(asarray_chkfinite)
      162    0.012    0.000    0.014    0.000 arima.py:117(_transparams)

Using approx_grad = True
---------------------------------------
         1097454 function calls (1097370 primitive calls) in 3.615 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       90    2.316    0.026    3.489    0.039 kalmanf.py:504(loglike)
  1073757    1.164    0.000    1.164    0.000 {numpy.core._dotblas.dot}
      270    0.025    0.000    0.025    0.000 {sum}
       90    0.020    0.000    0.020    0.000 {numpy.linalg.lapack_lite.dgesdd}
      182    0.014    0.000    0.016    0.000 arima.py:117(_transparams)
       46    0.013    0.000    0.014    0.000
function_base.py:494(asarray_chkfinite)
       46    0.008    0.000    0.023    0.000 decomp_svd.py:12(svd)
       23    0.004    0.000    0.004    0.000 {method 'var' of
'numpy.ndarray' objects}

Definitely less function calls and a little faster, but I had to write
some hacks to get it to work.

Skipper