[SciPy-User] Curve fitting questions

Tue Oct 19 22:00:02 EDT 2010

On Tue, Oct 19, 2010 at 4:17 PM,  <josef.pktd at gmail.com> wrote:
> On Tue, Oct 19, 2010 at 3:20 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
>> On Tue, Oct 19, 2010 at 1:04 PM,  <josef.pktd at gmail.com> wrote:
>>> I think you could take logs and you would have a linear function in
>>> param=log(x), and you could use linalg to solve for param, and then
>>> transform back exp(param).  Or this would give you a starting value if
>>> you want the non-linear optimization.
>>
>> Using 3 and 5 data-points the curve_fit usually does a good job, even without
>> the initial estimates provided. When it's necessary we usually constrain the
>> initial parameters with max CCN concentration for C (param[0]) and a typical
>> k (param[1]) values.
>>
>> This even works with 2 data-points:
>>
>> I[34]: ccn_ss1 = [0.27, 0.34]
>>
>> I[35]: ccn_conc1 = np.array([383.51237409766452, 424.82669523141652])
>>
>> I[36]: tfit2, pcov2 = curve_fit(my_ck, ccn_ss1, ccn_conc1, p0=(424,
>> 0.5), ftol=1)
>>
>> provides me reasonable estimations. However, having another data-point
>> would surely
>> improve the quality of the fit and estimations.
>>
>>>> ccn_ss1 = 0.27
>>>> ccn_conc1 = 383.51237409766452
>>>> # One data point estimation fails with IndexError: index out of range for array
>>>> tfit3, pcov3 = curve_fit(my_ck, ccn_ss1, ccn_conc1, p0=tfit1, ftol=1)
>>>
>>> If you have one parameter to estimate and only one observations, then
>>> you should be able to solve it exactly with one of the solvers/
>>> rootfinders in scipy. optimize.
>>>
>>> Josef
>>
>> I want to estimate two parameters using one observation (which is a
>> data-pair for my case --one for ccn_ss1 and one for ccn_conc1.)
>> Probably, in this current version fsolve can't do give me any roots.
>
> I remembered curve_fit wrongly, I didn't remember it switched data and
> parameters in the argument list compare to leastsq.
>
> I need to reread your example (later today).
>
> Taking logs and using linalg is still more efficient (unless you
> insist on an additive error term).

>>> ccn_ss1 = [0.27, 0.34, 0.57]
>>> ccn_conc1 = np.array([383.51237409766452, 424.82669523141652, 511.48197391304342])

>>> def my_ck(x, a, b):
   return a*x**b

>>> tfit1, pcov1 = curve_fit(my_ck, ccn_ss1, ccn_conc1)
>>> tfit1
array([  6.33851519e+02,   3.78527717e-01])

>>> stats.linregress(np.log(ccn_ss1), np.log(ccn_conc1))
(0.38096158507713485, 6.4541006630438478, 0.99864456413652103,
0.033150010788760682, 0.019855348412039904)
>>> np.exp(6.4541006630438478)
635.30211858377766
>>> stats.linregress(np.log(ccn_ss1[:-1]), np.log(ccn_conc1[:-1]))
(0.44381311635631338, 6.5304711876039025, 1.0, nan, nan)
>>> np.exp(6.5304711876039025)
685.72123873003966
>>> stats.linregress(np.log(ccn_ss1[:-2]), np.log(ccn_conc1[:-2]))
(nan, nan, 0.0, nan, nan)

strangely leastsq/curve_fit has a better fit than linregress for exact
solution (2 observations)

>>> tfit1, pcov1 = curve_fit(my_ck, ccn_ss1[:-1], ccn_conc1[:-1], p0=(1,1))
>>> my_ck(ccn_ss1[:-1], *tfit1)
array([ 383.5123741 ,  424.82669523])
>>> my_ck(ccn_ss1[:-1], *tfit1) - ccn_conc1[:-1]
array([ 0.,  0.])
>>> my_ck(np.asarray(ccn_ss1[:-1]), np.exp(6.5304711876039025), 0.44381311635631338) - ccn_conc1[:-1]
array([  1.70530257e-13,   3.41060513e-13])

If you have reasonably good information about the function or the
range of starting values, then this always works better and faster for
non-linear optimization. An interesting alternative that James was
using for distribution estimation, is to use a global optimizer
(differential evolution) in combination with a non-linear optimizer.
You could also just draw several random starting values. Since your
optimization problem is very small, it would still be fast.

Josef

>
> Josef
>
>>
>> def my_ck(x, a, b):
>>    return a*x**b
>>
>> fsolve(my_ck, x0=tfit1, args=(ccn_ss1, ccn_conc1), xtol=1)
>>
>> rather gives a couple of overflow warnings:
>> Warning: overflow encountered in power
>>
>> In one data-pair situation my function looks like:
>>
>> a*x**b = 383.5
>>
>> Now there are two unknowns providing the x as ccn_ss1 as a*0.27**b =
>> 383.5. I should make one more assumption otherwise it is still
>> unsolvable. Probably making an assumption for a, then I can hand solve
>> this easily. OK, with a = 350 assumption in 350*0.27**b == 383.5, here
>> solving for b results ~ -0.065
>>
>> With some modifications on the original fitfunc:
>>
>> def my_ck(x):
>>    return 350*0.27**x - 383
>>
>> fsolve nicely estimates what I want.
>>
>> fsolve(my_ck, x0=0.5)
>> -0.068815047568104443
>>
>>
>>
>>
>>
>> --
>> Gökhan
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>