[SciPy-User] [SciPy-user] fast small matrix multiplication with cython?

phubaba phubaba at gmail.com
Tue Jun 7 12:53:19 EDT 2011


Hello Skipper,

is there any chance you could explain the fast recursion algorithm or supply
the cython code you used to implement it?

Thanks,
Rob



jseabold wrote:
> 
> On Thu, Dec 9, 2010 at 4:33 PM, Skipper Seabold <jsseabold at gmail.com>
> wrote:
>> On Wed, Dec 8, 2010 at 11:28 PM,  <josef.pktd at gmail.com> wrote:
>>>>
>>>> It looks like I don't save too much time with just Python/scipy
>>>> optimizations.  Apparently ~75% of the time is spent in l-bfgs-b,
>>>> judging by its user time output and the profiler's CPU time output(?).
>>>>  Non-cython versions:
>>>>
>>>> Brief and rough profiling on my laptop for ARMA(2,2) with 1000
>>>> observations.  Optimization uses fmin_l_bfgs_b with m = 12 and iprint
>>>> = 0.
>>>
>>> Completely different idea: How costly are the numerical derivatives in
>>> l-bfgs-b?
>>> With l-bfgs-b, you should be able to replace the derivatives with the
>>> complex step derivatives that calculate the loglike function value and
>>> the derivatives in one iteration.
>>>
>>
>> I couldn't figure out how to use it without some hacks.  The
>> fmin_l_bfgs_b will call both f and fprime as (x, *args), but
>> approx_fprime or approx_fprime_cs need actually approx_fprime(x, func,
>> args=args) and call func(x, *args).  I changed fmin_l_bfgs_b to make
>> the call like this for the gradient, and I get (different computer)
>>
>>
>> Using approx_fprime_cs
>> -----------------------------------
>>         861609 function calls (861525 primitive calls) in 3.337 CPU
>> seconds
>>
>>   Ordered by: internal time
>>
>>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>       70    1.942    0.028    3.213    0.046 kalmanf.py:504(loglike)
>>   840296    1.229    0.000    1.229    0.000 {numpy.core._dotblas.dot}
>>       56    0.038    0.001    0.038    0.001
>> {numpy.linalg.lapack_lite.zgesv}
>>      270    0.025    0.000    0.025    0.000 {sum}
>>       90    0.019    0.000    0.019    0.000
>> {numpy.linalg.lapack_lite.dgesdd}
>>       46    0.013    0.000    0.014    0.000
>> function_base.py:494(asarray_chkfinite)
>>      162    0.012    0.000    0.014    0.000 arima.py:117(_transparams)
>>
>>
>> Using approx_grad = True
>> ---------------------------------------
>>         1097454 function calls (1097370 primitive calls) in 3.615 CPU
>> seconds
>>
>>   Ordered by: internal time
>>
>>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>       90    2.316    0.026    3.489    0.039 kalmanf.py:504(loglike)
>>  1073757    1.164    0.000    1.164    0.000 {numpy.core._dotblas.dot}
>>      270    0.025    0.000    0.025    0.000 {sum}
>>       90    0.020    0.000    0.020    0.000
>> {numpy.linalg.lapack_lite.dgesdd}
>>      182    0.014    0.000    0.016    0.000 arima.py:117(_transparams)
>>       46    0.013    0.000    0.014    0.000
>> function_base.py:494(asarray_chkfinite)
>>       46    0.008    0.000    0.023    0.000 decomp_svd.py:12(svd)
>>       23    0.004    0.000    0.004    0.000 {method 'var' of
>> 'numpy.ndarray' objects}
>>
>>
>> Definitely less function calls and a little faster, but I had to write
>> some hacks to get it to work.
>>
> 
> This is more like it!  With fast recursions in Cython:
> 
>          15186 function calls (15102 primitive calls) in 0.750 CPU seconds
> 
>    Ordered by: internal time
> 
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>        18    0.622    0.035    0.625    0.035
> kalman_loglike.pyx:15(kalman_loglike)
>       270    0.024    0.000    0.024    0.000 {sum}
>        90    0.019    0.000    0.019    0.000
> {numpy.linalg.lapack_lite.dgesdd}
>       156    0.013    0.000    0.013    0.000 {numpy.core._dotblas.dot}
>        46    0.013    0.000    0.014    0.000
> function_base.py:494(asarray_chkfinite)
>       110    0.008    0.000    0.010    0.000 arima.py:118(_transparams)
>        46    0.008    0.000    0.023    0.000 decomp_svd.py:12(svd)
>        23    0.004    0.000    0.004    0.000 {method 'var' of
> 'numpy.ndarray' objects}
>        26    0.004    0.000    0.004    0.000 tsatools.py:109(lagmat)
>        90    0.004    0.000    0.042    0.000 arima.py:197(loglike_css)
>        81    0.004    0.000    0.004    0.000
> {numpy.core.multiarray._fastCopyAndTranspose}
> 
> I can live with this for now.
> 
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/fast-small-matrix-multiplication-with-cython--tp30391922p31793732.html
Sent from the Scipy-User mailing list archive at Nabble.com.




More information about the SciPy-User mailing list