Mailman 3 September 2008 - SciPy-Dev

SLSQP Constrained Optimizer Status
by Rob Falck 25 Nov '13

25 Nov '13

I'm currently implementing the Sequential Least Squares Quadratic Programming (SLSQP) optimizer, by Dieter Kraft, for use in Scipy. The Fortran code being wrapped with F2PY is here: http://www.netlib.org/toms/733 (its use within Scipy has been cleared) SLSQP provides for bounds on the independent variables, as well as equality and inequality constraint functions, which is a capability that doesn't exist in scipy.optimize. Currently the code works, although the constraint normals are being generated by approximation. I'm working on a way to pass these in. I think the most elegant way will be a single function that returns the matrix of constraint normals. For a demonstration of what the code can do, here is an optimization of f(x,y) = 2xy + 2x - x**2 - 2y**2 Example 14.2 in Chapra & Canale gives the maximum as x=2.0, y=1.0. The unbounded optimization tests find this solution. As expected, its faster when derivatives are provided rather than approximated. Unbounded optimization. Derivatives approximated. Elapsed time: 1.45792961121 ms Results [[1.9999999515712266, 0.99999996181577444], -1.9999999999999984, 4, 0, 'Optimization terminated successfully.'] Unbounded optimization. Derivatives provided. Elapsed time: 1.03211402893 ms Results [[1.9999999515712266, 0.99999996181577444], -1.9999999999999984, 4, 0, 'Optimization terminated successfully.'] The following example uses an equality constraint to find the optimal when x=y. Bound optimization. Derivatives approximated. Elapsed time: 1.384973526 ms Results [[0.99999996845920858, 0.99999996845920858], -0.99999999999999889, 4, 0, 'Optimization terminated successfully.'] I've tried to conform to the syntax used by the other optimizers in scipy.optimize. The function definition and doc string are below. If anyone is interested in testing it out, let me know. def fmin_slsqp( func, x0 , eqcons=[], f_eqcons=None, ieqcons=[], f_ieqcons=None, bounds = [], fprime = None, fprime_cons=None,args = (), iter = 100, acc = 1.0E-6, iprint = 1, full_output = 0, epsilon = _epsilon ): """ Minimize a function using Sequential Least SQuares Programming Description: Python interface function for the SLSQP Optimization subroutine originally implemented by Dieter Kraft. Inputs: func - Objective function (in the form func(x, *args)) x0 - Initial guess for the independent variable(s). eqcons - A list of functions of length n such that eqcons[j](x0,*args) == 0.0 in a successfully optimized problem f_eqcons - A function of the form f_eqcons(x, *args) that returns an array in which each element must equal 0.0 in a successfully optimized problem. If f_eqcons is specified, eqcons is ignored. ieqcons - A list of functions of length n such that ieqcons[j](x0,*args) >= 0.0 in a successfully optimized problem f_ieqcons - A function of the form f_ieqcons(x0, *args) that returns an array in which each element must be greater or equal to 0.0 in a successfully optimized problem. If f_ieqcons is specified, ieqcons is ignored. bounds - A list of tuples specifying the lower and upper bound for each independent variable [ (xl0, xu0), (xl1, xu1), ...] fprime - A function that evaluates the partial derivatives of func fprime_cons - A function of the form f(x, *args) that returns the m by n array of constraint normals. If not provided, the normals will be approximated. Equality constraint normals precede inequality constraint normals. args - A sequence of additional arguments passed to func and fprime iter - The maximum number of iterations (int) acc - Requested accuracy (float) iprint - The verbosity of fmin_slsqp. iprint <= 0 : Silent operation iprint == 1 : Print summary upon completion (default) iprint >= 2 : Print status of each iterate and summary full_output - If 0, return only the minimizer of func (default). Otherwise, output final objective function and summary information. epsilon - The step size for finite-difference derivative estimates. Outputs: ( x, { fx, gnorm, its, imode, smode }) x - The final minimizer of func. fx - The final value of the objective function. its - The number of iterations. imode - The exit mode from the optimizer, as an integer. smode - A string describing the exit mode from the optimizer. Exit modes are defined as follows: -1 : Gradient evaluation required (g & a) 0 : Optimization terminated successfully. 1 : Function evaluation required (f & c) 2 : Number of equality constraints larger than number of independent variables 3 : More than 3*n iterations in LSQ subproblem 4 : Inequality constraints incompatible 5 : Singular matrix E in LSQ subproblem 6 : Singular matrix C in LSQ subproblem 7 : Rank-deficient equality constraint subproblem HFTI 8 : Positive directional derivative for linesearch 9 : Iteration limit exceeded -- - Rob Falck

4 16

Re: [SciPy-dev] [Numpy-discussion] Proposal: scipy.spatial
by Anne Archibald 13 Oct '08

13 Oct '08

2008/9/30 Peter <numpy-discussion(a)maubp.freeserve.co.uk>: > On Tue, Sep 30, 2008 at 5:10 AM, Christopher Barker > <Chris.Barker(a)noaa.gov> wrote: >> >> Anne Archibald wrote: >>> I suggest the creation of >>> a new submodule of scipy, scipy.spatial, >> >> +1 >> >> Here's one to consider: >> http://pypi.python.org/pypi/Rtree >> and perhaps other stuff from: >> http://trac.gispython.org/spatialindex/wiki >> which I think is LGPL -- can scipy use that? > > There is also a KDTree module in Biopython (which is under a BSD/MIT > style licence), > http://biopython.org/SRC/biopython/Bio/KDTree/ > > The current version is in C, there is an older version available in > the CVS history in C++ too, > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/KDTree/?… I think the profusion of different implementations is an argument for including this in scipy. I think it is also an argument for providing a standard interface with (at least potentially) several different implementations. At the moment, that proposed interface looks like: T = KDTree(data) distances, indices = T.query(xs) # single nearest neighbor distances, indices = T.query(xs, k=10) # ten nearest neighbors distances, indices = T.query(xs, k=None, distance_upper_bound=1.0) # all within 1 of x In the first two cases, missing neighbors are represented with an infinite distance and an invalid index. In the last case, distances and indices are both either lists (if there's only one query point) or object arrays of lists (if there are many query points). If only one neighbor is requested, the array does not have a dimension of length 1 in the which-neighbor position. If (potentially) many neighbors are returned, they are sorted by distance, nearest first. What do you think of this interface? It may make sense to provide additional kinds of query - nearest neighbors between two trees, for example - some of which would be available only for some implementations. Anne

4 14

Fwd: tests and partial patch for scipy.stats.distributions
by josef.pktd＠gmail.com 06 Oct '08

06 Oct '08

removed patch because of file size, mail bounced ---------- Forwarded message ---------- From: josef.pktd(a)gmail.com Date: Mon, 29 Sep 2008 00:00:25 -0400 Subject: tests and partial patch for scipy.stats.distributions To: scipy-dev(a)scipy.org To help in bug hunting and patching of the discrete distribution in scipy.stats, I wrote several test scripts that could be incorporated in scipy tests. The attachment contains the corrected scipy.stats.distributions.py file and 2 test files. The tests check all discrete distributions at predefined (not randomly chosen) values. Some distributions have some wrong results for some other range of parameters. To try out the patch, it is possible just to backup the current scipy.stats.distributions.py and copy the patched version in its place. Right now, I still left many comments (and some commented out failed attempts for fixing some persistent bugs). In direct comparison with the current trunk file this might help in understanding what the underlying bug is. Since there are a large number of bug fixes in this patch, if any of them get incorporated in scipy, then they can be selectively copied. Many bugs only show up once other bugs are fixed. In terms of importance and sequence the bug fix categories are roughly: * nin correction for vectorize with *args: without this, some methods raise exceptions, e.g. _ppf (I think after many shaky attempts, I found the correct solution) * _ppf did not handle infinite support properly and requires a .tolist in call to ``place`` (no idea why, but this works) * _drv2_moment(self, n, *args) was completely broken, it didn't even have a return and is the base for generic moment calculations individual fixes: * recipinvgauss_gen: rvs added, didn't work before * def moment: for 2nd moment returned the mean (both discrete and continuous distribution * randint: -1 missing in _ppf, definitions differs from those in description pdf * dlaplace: _ppf gives wrong results in tests, replace with generic calculation (Note: the first vectorize nin correction in continuous distributions is still at the wrong spot. I was only looking at discrete distributions in the last few days, since I figured out how vectorize nin works) If you want a cleaner patch or more information, I can provide it next week. Because, I never used numpy/scipy heavily, it took me a long time to figure out what is going on, but it should be more obvious to an expert. But overall my impression is, that using features in numpy.stats.distribution, that are not commonly used, is pretty risky, and until recently didn't have any stringent tests and does not have any reasonably large test coverage. For serious work, I would want to make sure that the results are actually correct. There are still several things that are not covered, e.g. handling of loc and scale, other methods, correctness over full range of parameters. I haven't checked the continuous distributions since the initial fuzz tests, but the test coverage for all methods looks pretty low. Josef Test f Basic Properties ---------------------- The first test just tests for basic properties and their consistency, e.g. cdf, pmf, ppf, stats and moments and the internal methods e.g _ppf. The tests check quite a bit for internal consistency (private methods) since I used the tests for debugging and trying out fixes. >python C:\Programs\Python25\Scripts\nosetests-script.py -s test_discrete_basic.py ... with current trunk, I get for this test: Ran 112 tests in 0.140s FAILED (errors=41, failures=15) after the bugfixes, I get: Ran 112 tests in 8.672s FAILED (failures=5) some failures in binom, dlaplace, zipf are still remaining Chisquare Test -------------- the second test is a chisquare test for the random variables to be close to the theoretical distribution as defined by .cdf with current trunk I get for this test: >python C:\Programs\Python25\Scripts\nosetests-script.py -s test_ndiscrete.py Ran 12 tests in 0.453s FAILED (errors=5) after patching scipy.stats.distribution the only test failure left is in logser, where I think the numpy.random random numbers are wrong >python C:\Programs\Python25\Scripts\nosetests-script.py -s test_ndiscrete.py bernoulli. binom. boltzmann. dlaplace. geom. hypergeom. logserF nbinom. planck. poisson. randint. zipf. ====================================================================== FAIL: test_ndiscrete.test_discrete_rvscdf('logser', (0.59999999999999998,)) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\programs\python25\lib\site-packages\nose-0.10.3-py2.5.egg\nose\case.p y", line 182, in runTest self.test(*self.arg) File "C:\Josef\work-oth\sort\pypi\test_ndiscrete.py", line 76, in checkchisqua re_discrete assert (pval > alpha), 'chisquare - test for %s at arg = %s' % (distname,str (arg)) AssertionError: chisquare - test for logser at arg = (0.59999999999999998,) ---------------------------------------------------------------------- Ran 12 tests in 19.781s FAILED (failures=1) scipy.stats tests: no failures ----------------------------------------- running the current trunk test for scipy.stats on the pre- and post patch distributions.py gives no errors: >>> from scipy import stats >>> stats.test() Running unit tests for scipy.stats NumPy version 1.2.0rc2 NumPy is installed in C:\Programs\Python25\lib\site-packages\numpy SciPy version 0.7.0.dev SciPy is installed in C:\Josef\_progs\virtualpy25\envscipy\lib\site-packages\sci py Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Int el)] nose version 0.10.3 ................................................................................ C:\Josef\_progs\virtualpy25\envscipy\lib\site-packages\scipy\stats\morestats.py: 618: UserWarning: Ties preclude use of exact statistic. warnings.warn("Ties preclude use of exact statistic.") ............................................................C:\Programs\Python25 \lib\site-packages\numpy\lib\function_base.py:343: Warning: The semantics of histogram has been modified in the current release to fix long-standing issues with outliers handling. The main changes concern 1. the definition of the bin edges, now including the rightmost edge, and 2. the handling of upper outliers, now ignored rather than tallied in the rightmost bin. The previous behaviour is still accessible using `new=False`, but is scheduled to be deprecated in the next release (1.3). *This warning will not printed in the 1.3 release.* Use `new=True` to bypass this warning. Please read the docstring for more information. """, Warning) ................................................................................ ............. ---------------------------------------------------------------------- Ran 233 tests in 2.266s OK <nose.result.TextTestResult run=233 errors=0 failures=0>

2 3

PyArray_FromDims and friends
by Zachary Pincus 06 Oct '08

06 Oct '08

Hi all, In testing out svn scipy and numpy, I noticed some run-time errors from scipy.interpolate because the _fitpack module's c sources use PyArray_FromDims and PyArray_FromDimsAndData, which are now deprecated in numpy svn. I opened a ticket and made a patch for this particular case, but I'm not sure if there's an overall strategy (someone writes a very good regexp, say, and lets it loose), or if these will be fixed piecemeal. If the latter, here's the patch: http://scipy.org/scipy/scipy/ticket/723 Zach

7 9

Question about Kaiser Implementation in firwin
by Buehler, Eric (AGRE) 04 Oct '08

04 Oct '08

Hello, I have been using the signal.firwin FIR design tool and I have a question about the implementation. Scipy-0.4.8 Numpy-0.9.6 I am generating a kaiser window with the following parameters: N = 128 Cutoff = ~4e7 width = .3 All of the special kaiser generation parameters in the function follow Oppenheim and Schafer well. Even the sinc function for linear phase is follows exactly. However, the last line of the firwin function is causing me some heartburn. filter_design.py 1538 win = get_window(window,N,fftbins=1) 1539 alpha = N//2 1540 m = numpy.arange(0,N) 1541 h = win*special.sinc(cutoff*(m-alpha)) 1542 return h / sum(h) Line 1542 of filter_design.py, "return h / sum(h)", normalizes the function where it doesn't seem necessary, at least in the kaiser window case. Without the normalization, the kaiser window already returns a value of 1 at the zero frequency point. This normalization scales all of the data, making the window difficult to use in the frequency domain. Can someone point me to the rationale for this line? Looking at the code, this seems to be a pretty recent change (within the last year/year and a half). Thanks, Eric Buehler eric <dot> buehler <at> smiths-aerospace <dot> com ****************************************** The information contained in, or attached to, this e-mail, may contain confidential information and is intended solely for the use of the individual or entity to whom they are addressed and may be subject to legal privilege. If you have received this e-mail in error you should notify the sender immediately by reply e-mail, delete the message from your system and notify your system manager. Please do not copy it for any purpose, or disclose its contents to any other person. The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. The recipient should check this e-mail and any attachments for the presence of viruses. The company accepts no liability for any damage caused, directly or indirectly, by any virus transmitted in this email. ******************************************

3 2

Next scipy release 0.7
by Nils Wagner 03 Oct '08

03 Oct '08

Hi Jarrod, Just curious. Is there a new deadline for scipy 0.7 ? Cheers, Nils

7 12

Proposal: scipy.spatial
by Anne Archibald 01 Oct '08

01 Oct '08

Hi, Once again there has been a thread on the numpy/scipy mailing lists requesting (essentially) some form of spatial data structure. Pointers have been posted to ANN (sadly LGPLed and in C++) as well as a handful of pure-python implementations of kd-trees. I suggest the creation of a new submodule of scipy, scipy.spatial, to contain spatial data structures and algorithms. Specifically, I propose it contain a kd-tree implementation providing nearest-neighbor, approximate nearest-neighbor, and all-points-near queries. I have a few other suggestions for things it might contain, but kd-trees seem like a good first step. 2008/9/27 Nathan Bell <wnbell(a)gmail.com>: > On Sat, Sep 27, 2008 at 11:18 PM, Anne Archibald > <peridot.faceted(a)gmail.com> wrote: >> >> I think a kd-tree implementation would be a valuable addition to >> scipy, perhaps in a submodule scipy.spatial that might eventually >> contain other spatial data structures and algorithms. What do you >> think? Should we have one? Should it be based on Sturla Molden's code, >> if the license permits? I am willing to contribute one, if not. > > +1 Judging that your vote and mine are enough in the absence of dissenting voices, I have written an implementation based on yours, Sturla Molden's, and the algorithms described by the authors of the ANN library. Before integrating it into scipy, though, I'd like to send it around for comments. Particular issues: * It's pure python right now, but with some effort it could be partially or completely cythonized. This is probably a good idea in the long run. In the meantime I have crudely vectorized it so that users can at least avoid looping in their own code. * It is able to return the r nearest neighbors, optionally subject to a maximum distance limit, as well as approximate nearest neighbors. * It is not currently set up to conveniently return all points within some fixed distance of the target point, but this could easily be added. * It returns distances and indices of nearest neighbors in the original array. * The testing code is, frankly, a mess. I need to look into using nose in a sensible fashion. * The license is the scipy license. I am particularly concerned about providing a convenient return format. The natural return from a query is a list of neighbors, since it may have variable length (there may not be r elements in the tree, or you might have supplied a maximum distance which doesn't contain r points). For a single query, it's simple to return a python list (should it be sorted? currently it's a heap). But if you want to vectorize the process, storing the results in an array becomes cumbersome. One option is an object array full of lists; another, which, I currently use, is an array with nonexistent points marked with an infinite distance and an invalid index. A third would be to return masked arrays. How do you recommend handling this variable (or potentially-variable) sized output? > If you're implementing one, I would highly recommend the > "left-balanced" kd-tree. > http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/2535/pdf/imm2535.pdf Research suggests that at least in high dimension a more geometric balancing criterion can produce vastly better results. I used the "sliding midpoint" rule, which doesn't allow such a numerical balancing but does ensure that you don't have too many long skinny cells (since a sphere tends to cut very many of these). I've also been thinking about what else would go in scipy.spatial. I think it would be valuable to have a reasonably efficient distance matrix function (we seem to get that question a lot, and the answer's not trivial) as well as a sparse distance matrix function based on the kd-trees. The module could potentially also swallow the (currently sandboxed?) delaunay code. Anne

6 9

NNLS.f
by Nils Wagner 30 Sep '08

30 Sep '08

Hi all, I cannot build scipy from svn due to g77:f77: scipy/optimize/slsqp/slsqp_optmz.f scipy/optimize/slsqp/slsqp_optmz.f: In subroutine `nnls': In file included from scipy/optimize/slsqp/slsqp_optmz.f:0: scipy/optimize/slsqp/slsqp_optmz.f:1058: warning: 'izmax' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f: In subroutine `hfti': scipy/optimize/slsqp/slsqp_optmz.f:1244: warning: 'hmax' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f: In subroutine `ldl': scipy/optimize/slsqp/slsqp_optmz.f:1455: warning: 'tp' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f: In function `linmin': scipy/optimize/slsqp/slsqp_optmz.f:1556: warning: 'd' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f:1556: warning: 'e' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f:1556: warning: 'u' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f:1556: warning: 'w' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f:1556: warning: 'x' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f:1557: warning: 'fv' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f:1557: warning: 'fw' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f:1557: warning: 'fx' might be used uninitialized in this function scipy/optimize/slsqp/slsqp_optmz.f: In function `dnrm2_': scipy/optimize/slsqp/slsqp_optmz.f:1855: warning: 'xmax' might be used uninitialized in this function /usr/bin/g77 -g -Wall -g -Wall -shared build/temp.linux-x86_64-2.5/build/src.linux-x86_64-2.5/scipy/optimize/slsqp/_slsqpmodule.o build/temp.linux-x86_64-2.5/build/src.linux-x86_64-2.5/fortranobject.o build/temp.linux-x86_64-2.5/scipy/optimize/slsqp/slsqp_optmz.o -Lbuild/temp.linux-x86_64-2.5 -lg2c -o build/lib.linux-x86_64-2.5/scipy/optimize/_slsqp.so building 'scipy.optimize._nnls' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC creating build/temp.linux-x86_64-2.5/build/src.linux-x86_64-2.5/scipy/optimize/nnls compile options: '-Ibuild/src.linux-x86_64-2.5 -I/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/include -I/data/home/nwagner/local/include/python2.5 -c' gcc: build/src.linux-x86_64-2.5/scipy/optimize/nnls/_nnlsmodule.c compiling Fortran sources Fortran f77 compiler: /usr/bin/g77 -g -Wall -fno-second-underscore -fPIC -O3 -funroll-loops -march=nocona -mmmx -msse2 -msse -msse3 -fomit-frame-pointer creating build/temp.linux-x86_64-2.5/nnls compile options: '-Ibuild/src.linux-x86_64-2.5 -I/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/include -I/data/home/nwagner/local/include/python2.5 -c' error: nnls/NNLS.f: No such file or directory Cheers, Nils

3 3

Re: [SciPy-dev] tests and partial patch for scipy.stats.distributions
by josef.pktd＠gmail.com 30 Sep '08

30 Sep '08

2 3

Error in timeseries Date class
by Dave 28 Sep '08

28 Sep '08

It appears that the strftime function as applied to the timeseries Date class gives incorrect results for the day of week c.f. In [1]: import scikits.timeseries as ts In [2]: from datetime import datetime In [3]: ts.Date('D', datetime=datetime.now()).strftime('%a %d-%b-%Y') Out[3]: 'Thu 26-Sep-2008' -Dave

2 1