Gregor, many thanks for your input. I noted a few useful points: 1) The option to solve the normal equation directly is indeed useful when m >> n. 2) I read the tech. report and the approach looks really good considering how often the approximation is sought as the linear combination of basis functions (which in turn depend on tunable parameters). I didn't understand the reason to store Jacobian in transposed form. What significant difference will it make? Subject: Re: [SciPy-Dev] GSOC Optimization Project From: gregor.thalhammer@gmail.com Date: Mon, 9 Mar 2015 19:42:45 +0100 CC: n59_ru@hotmail.com To: scipy-dev@scipy.org Am 09.03.2015 um 13:06 schrieb Nikolay Mayorov <n59_ru@hotmail.com>:Hi! I was tricked by stackedit and I repeat my message with the working link. Sorry for that. I did some research on suitable algorithmic approaches and want to present you my pre-proposal: https://stackedit.io/viewer#!provider=gist&gistId=d806ad6048c7e3df4797&filename=GSOC_proposal.md I have to admit that I'm a bit overwhelmed by the amount of papers, books, etc on this subject. I tried my best to come up with some kind of a practical approach. Now I need the feedback / review. Also I'm eager to hear a word from Pauli Virtanen as we don't know yet whether this project could happen at all. Dear Nikolay, I hope you will be successful with your proposal. In the past I have been unhappy with the current MINPACK base implementation and ended up writing my own Python based implementation of LM-algorithm, specialized for my needs. Several other translations are floating around, so it seems there is really some need for a more flexible (class based) implementation, that provides easy customization by users. Years ago, my use case was fitting of images (2D Gaussian or slightly more complicated models), directly calculating the Jacobian (no numeric differentiation). Speed was important. I just want to share some findings and ideas I would be happy to be covered by future improvements to the scipy code. * In my case (many observations, few parameters) directly solving the normal equations was a lot faster, this was the main reason for me not to use the MINPACK implementation. * For the QR decomposition using the scipy implementations instead of the MINPACK gives better performance, especially when using optimized libraries (MKL, ATLAS, …) * calculating the fitting function and the Jacobian at the same time gave another speed boost, since the function value can be reused to speed up the calculation of the Jacobian. * If I remember correctly, MINPACK expects that the derivative of the function for the k-th parameter is stored in the k-th column of a 2d array. This does not fit nicely with the standard C-contiguous layout of numpy. Instead, storing it in J[k] simplifies the code and makes extension to 2D functions more easy (no shape manipulation to squeeze a 2d array into a 1d shape) * In my use case several parameters acted as linear parameters, e.g., a scaling factor and an offset. For this special case the method of ‚separable nonlinear least squares‘ or ‚variable projection method‘ reduced the number of iterations a lot and improved the robustness. It essentially only modifies how the function and Jacobian is calculated, so does not require a modification to the LM algorithm itself. But providing some means to support this for a high-level interface would be nice. My starting point was a technical report by H. B. Nielsen [1]. I hope some of these ideas are useful for your proposal. If you are interested, I put my incomplete and poorly documented code on https://gist.github.com/geggo/92c77159a9b8db5aae73 Gregor [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.6004&rep=rep1&type=pdf
Am 09.03.2015 um 21:05 schrieb Nikolay Mayorov <n59_ru@hotmail.com>:
Gregor, many thanks for your input. I noted a few useful points:
1) The option to solve the normal equation directly is indeed useful when m >> n. 2) I read the tech. report and the approach looks really good considering how often the approximation is sought as the linear combination of basis functions (which in turn depend on tunable parameters).
I didn't understand the reason to store Jacobian in transposed form. What significant difference will it make?
This is only a minor optimization that improves the memory access pattern. I forgot the details, but also the current scipy leastsq offers the possibility (see the col_deriv argument, default off) to switch to transposed storage to improve performance. Linear algebra (qr, dot) is faster with Fortran contiguous arrays, or C contiguous arrays with transposed storage. A simple example to show that the memory access pattern can make a big difference: In [30]: a = arange(10000000) In [31]: b = a[::5] In [32]: c = a[:2000000] In [33]: %timeit dot(b,b) 100 loops, best of 3: 3.81 ms per loop In [34]: %timeit dot(c,c) 1000 loops, best of 3: 1.09 ms per loop And I discovered that the code usually gets simpler. All this might be irrelevant for small problems or functions that are expensive to compute. Gregor
participants (2)
-
Gregor Thalhammer
-
Nikolay Mayorov