From kwgoodman at gmail.com Wed Dec 1 12:08:13 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 1 Dec 2010 09:08:13 -0800 Subject: [SciPy-User] [ANN] Bottleneck 0.1 Message-ID: This is the first release of Bottleneck, a collection of fast, NumPy array functions written in Cython. The three categories of Bottleneck functions: - Faster replacement for NumPy and SciPy functions - Moving window functions - Group functions that bin calculations by like-labeled elements Function signatures (using nanmean as an example): Functions nanmean(arr, axis=None) Moving window move_mean(arr, window, axis=0) Group by group_nanmean(arr, label, order=None, axis=0) Let's give it a try. Create a NumPy array: >>> import numpy as np >>> arr = np.array([1, 2, np.nan, 4, 5]) Find the nanmean: >>> import bottleneck as bn >>> bn.nanmean(arr) 3.0 Moving window nanmean: >>> bn.move_nanmean(arr, window=2) array([ nan, 1.5, 2. , 4. , 4.5]) Group nanmean: >>> label = ['a', 'a', 'b', 'b', 'a'] >>> bn.group_nanmean(arr, label) (array([ 2.66666667, 4. ]), ['a', 'b']) Fast ==== Bottleneck is fast: >>> arr = np.random.rand(100, 100) >>> timeit np.nanmax(arr) 10000 loops, best of 3: 99.6 us per loop >>> timeit bn.nanmax(arr) 100000 loops, best of 3: 15.3 us per loop Let's not forget to add some NaNs: >>> arr[arr > 0.5] = np.nan >>> timeit np.nanmax(arr) 10000 loops, best of 3: 146 us per loop >>> timeit bn.nanmax(arr) 100000 loops, best of 3: 15.2 us per loop Bottleneck comes with a benchmark suite that compares the performance of the bottleneck functions that have a NumPy/SciPy equivalent. To run the benchmark: >>> bn.benchit(verbose=False) Bottleneck performance benchmark Bottleneck 0.1.0dev Numpy 1.5.1 Scipy 0.8.0 Speed is numpy (or scipy) time divided by Bottleneck time NaN means all NaNs Speed Test Shape dtype NaN? 2.4019 median(a, axis=-1) (500,500) float64 2.2668 median(a, axis=-1) (500,500) float64 NaN 4.1235 median(a, axis=-1) (10000,) float64 4.3498 median(a, axis=-1) (10000,) float64 NaN 9.8184 nanmax(a, axis=-1) (500,500) float64 7.9157 nanmax(a, axis=-1) (500,500) float64 NaN 9.2306 nanmax(a, axis=-1) (10000,) float64 8.1635 nanmax(a, axis=-1) (10000,) float64 NaN 6.7218 nanmin(a, axis=-1) (500,500) float64 7.9112 nanmin(a, axis=-1) (500,500) float64 NaN 6.4950 nanmin(a, axis=-1) (10000,) float64 8.0791 nanmin(a, axis=-1) (10000,) float64 NaN 12.3650 nanmean(a, axis=-1) (500,500) float64 42.0738 nanmean(a, axis=-1) (500,500) float64 NaN 12.2769 nanmean(a, axis=-1) (10000,) float64 22.1285 nanmean(a, axis=-1) (10000,) float64 NaN 9.5515 nanstd(a, axis=-1) (500,500) float64 68.9192 nanstd(a, axis=-1) (500,500) float64 NaN 9.2174 nanstd(a, axis=-1) (10000,) float64 26.1753 nanstd(a, axis=-1) (10000,) float64 NaN Faster ====== Under the hood Bottleneck uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in bn.nanmax(), for example, is in checking that the axis is within range, converting non-array data to an array, and selecting the function to use to calculate the maximum. You can get rid of the overhead by doing all this before you, say, enter an inner loop: >>> arr = np.random.rand(10,10) >>> func, a = bn.func.nanmax_selector(arr, axis=0) >>> func Let's see how much faster than runs: >> timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop >> timeit bn.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop >> timeit func(a) 100000 loops, best of 3: 2.5 us per loop Note that func is faster than Numpy's non-NaN version of max: >> timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop So adding NaN protection to your inner loops comes at a negative cost! Functions ========= Bottleneck is in the prototype stage. Bottleneck contains the following functions: median nanmean nanvar nanstd nanmin nanmax move_nanmean group_nanmean Currently only 1d, 2d, and 3d NumPy arrays with dtype int32, int64, and float64 are supported. License ======= Bottleneck is distributed under a Simplified BSD license. Parts of NumPy, Scipy and numpydoc, all of which have BSD licenses, are included in Bottleneck. See the LICENSE file, which is distributed with Bottleneck, for details. URLs ==== download http://pypi.python.org/pypi/Bottleneck docs http://berkeleyanalytics.com/bottleneck code http://github.com/kwgoodman/bottleneck mailing list http://groups.google.com/group/bottle-neck Install ======= Requirements: Bottleneck Python, NumPy 1.5.1+, SciPy 0.8.0+ Unit tests nose Compile gcc or MinGW **GNU/Linux, Mac OS X, et al.** To install Bottleneck: $ python setup.py build $ sudo python setup.py install Or, if you wish to specify where Bottleneck is installed, for example inside /usr/local: $ python setup.py build $ sudo python setup.py install --prefix=/usr/local **Windows** In order to compile the C code in dsna you need a Windows version of the gcc compiler. MinGW (Minimalist GNU for Windows) contains gcc and has been used to successfully compile dsna on Windows. Install MinGW and add it to your system path. Then install dsna with the commands: python setup.py build --compiler=mingw32 python setup.py install **Post install** After you have installed Bottleneck, run the suite of unit tests: >>> import bottleneck as bn >>> bn.test() Ran 10 tests in 13.756s OK From pauloa.herrera at gmail.com Wed Dec 1 13:15:19 2010 From: pauloa.herrera at gmail.com (Paulo Herrera) Date: Wed, 1 Dec 2010 19:15:19 +0100 Subject: [SciPy-User] Announcement: Self-contained Python module to write binary VTK files. In-Reply-To: References: <0F37073C-2AE8-4C65-A254-0943317B8FF1@gmail.com> <826B8A8A-0C10-4FF8-BA13-096E79999BB6@gmail.com> Message-ID: Hi, I just changed the license of the files in the repository to a FreeBSD license. I hope this will make easier to use the module. Paulo On Tue, Nov 30, 2010 at 9:19 AM, Matthew Brett wrote: > Hi, > >>>> PyEVTK is released under the GPL 3 open source license. A copy of the license is >>>> included in the src directory. >>> >>> Would you consider changing to a more permissive license? ? We >>> (nipy.org) would have good use of your package, I believe, but we're >>> using the BSD license. >> >> I'd like to release it with a license that is compatible with the GPL license. It seems that the FreeBSD license satisfies that requirement (http://en.wikipedia.org/wiki/BSD_licenses). Would the FreeBSD be useful for you? > > That's great - thank you. ?We use the 3-clause BSD license mainly [1], > and the MIT license in one project, but the 2-clause 'simplified' BSD > that FreeBSD uses is ideal. > > Thanks again, > > Matthew > > [1] http://www.opensource.org/licenses/bsd-license.php > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From tjhnson at gmail.com Wed Dec 1 18:49:19 2010 From: tjhnson at gmail.com (T J) Date: Wed, 1 Dec 2010 15:49:19 -0800 Subject: [SciPy-User] Bottleneck In-Reply-To: References: <1291160954.1783.5.camel@Portable-s2m.cnrs-mrs.fr> <1291165744.1783.9.camel@Portable-s2m.cnrs-mrs.fr> <1291167742.3733.3.camel@Portable-s2m.cnrs-mrs.fr> <1291171140.3733.6.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: On Tue, Nov 30, 2010 at 7:04 PM, Keith Goodman wrote: > I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy > 1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new > project. If SciPy is only used in the benchmarks/tests, then why not make it an optional benchmark/test that runs only if SciPy is present? nose.SkipTest should be useful here. I frequently run software on machines that only have NumPy installed. From kwgoodman at gmail.com Wed Dec 1 19:09:10 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 1 Dec 2010 16:09:10 -0800 Subject: [SciPy-User] Bottleneck In-Reply-To: References: <1291160954.1783.5.camel@Portable-s2m.cnrs-mrs.fr> <1291165744.1783.9.camel@Portable-s2m.cnrs-mrs.fr> <1291167742.3733.3.camel@Portable-s2m.cnrs-mrs.fr> <1291171140.3733.6.camel@Portable-s2m.cnrs-mrs.fr>

Message-ID: On Wed, Dec 1, 2010 at 3:49 PM, T J wrote: > On Tue, Nov 30, 2010 at 7:04 PM, Keith Goodman wrote: >> I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy >> 1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new >> project. > > If SciPy is only used in the benchmarks/tests, then why not make it an > optional benchmark/test that runs only if SciPy is present? > nose.SkipTest should be useful here. ?I frequently run software on > machines that only have NumPy installed. Seems like a strange discussion to have on the scipy list :) I don't want to have a hole in my unit test coverage. But I could copy over the nan functions in scipy stats. And I guess the benchmark could use those too. And then skip moving window benchmarks against scipy.ndimage for those who don't have scipy installed. From Chris.Barker at noaa.gov Wed Dec 1 19:19:04 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 01 Dec 2010 16:19:04 -0800 Subject: [SciPy-User] Bottleneck In-Reply-To: References: <1291160954.1783.5.camel@Portable-s2m.cnrs-mrs.fr> <1291165744.1783.9.camel@Portable-s2m.cnrs-mrs.fr> <1291167742.3733.3.camel@Portable-s2m.cnrs-mrs.fr> <1291171140.3733.6.camel@Portable-s2m.cnrs-mrs.fr>

Message-ID: <4CF6E5F8.5030403@noaa.gov> On 12/1/10 4:09 PM, Keith Goodman wrote: >> I frequently run software on >> machines that only have NumPy installed. > > Seems like a strange discussion to have on the scipy list :) True -- and yet I didn't have scipy on this machine yet, either... > I don't want to have a hole in my unit test coverage. But I could copy > over the nan functions in scipy stats. And I guess the benchmark could > use those too. And then skip moving window benchmarks against > scipy.ndimage for those who don't have scipy installed. I'd vote to have unit tests that don't require scipy, but I think it's fine that the benchmarks do -- that's kind of the point of them -- comparing bottleneck to the raw scipy functions. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From kwgoodman at gmail.com Wed Dec 1 19:21:33 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 1 Dec 2010 16:21:33 -0800 Subject: [SciPy-User] Bottleneck In-Reply-To: <4CF6E5F8.5030403@noaa.gov> References: <1291160954.1783.5.camel@Portable-s2m.cnrs-mrs.fr> <1291165744.1783.9.camel@Portable-s2m.cnrs-mrs.fr> <1291167742.3733.3.camel@Portable-s2m.cnrs-mrs.fr> <1291171140.3733.6.camel@Portable-s2m.cnrs-mrs.fr>

<4CF6E5F8.5030403@noaa.gov> Message-ID: On Wed, Dec 1, 2010 at 4:19 PM, Christopher Barker wrote: > On 12/1/10 4:09 PM, Keith Goodman wrote: >>> I frequently run software on >>> machines that only have NumPy installed. >> >> Seems like a strange discussion to have on the scipy list :) > > True -- and yet I didn't have scipy on this machine yet, either... > >> I don't want to have a hole in my unit test coverage. But I could copy >> over the nan functions in scipy stats. And I guess the benchmark could >> use those too. And then skip moving window benchmarks against >> scipy.ndimage for those who don't have scipy installed. > > I'd vote to have unit tests that don't require scipy, but I think it's > fine that the benchmarks do -- that's kind of the point of them -- > comparing bottleneck to the raw scipy functions. Well, now I have a most requested feature. OK, I'll do it for 0.2. From bsder at allcaps.org Thu Dec 2 11:17:20 2010 From: bsder at allcaps.org (Andrew Lentvorski) Date: Thu, 02 Dec 2010 08:17:20 -0800 Subject: [SciPy-User] Accurate Frequency Measurement In-Reply-To: <4CF4A0D1.4040709@silveregg.co.jp> References: <20101130014157.GA9408@spirit> <4CF4A0D1.4040709@silveregg.co.jp> Message-ID: <4CF7C690.2080400@allcaps.org> On 11/29/10 10:59 PM, David wrote: > You may want to look at something like CLAM (http://clam-project.org) to > analyse those signals if you want to track frequency changes. I believe > they have some python bindings. That looks like a really nice project, but it looks really dead. Is there anything that uses CLAM that is up to date? -a From ptittmann at gmail.com Thu Dec 2 14:55:57 2010 From: ptittmann at gmail.com (Peter Tittmann) Date: Thu, 2 Dec 2010 11:55:57 -0800 Subject: [SciPy-User] ancova with optimize.curve_fit Message-ID: Greetings, Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear regression with curve_fit in optimize. The doc string states: "xdata : An N-length sequence or an (k,N)-shaped array for functions with k predictors. The independent variable where the data is measured." I am hoping that this means that if I pass the independent variable and a categorical variable, the resulting covariance matrix will reflect the variability in the equation coefficients among the categorical variables. 1. Is tis the case? 2. If so, i'm having a problem with the input array for xdata. The following extracts data from a relational database (thats the sql). the eqCoeff() function works fine, however when I add a second dimension to the xdata in th ancova() function (indHtPlot), curve fit produces an error which seems to be related to the structure of my input array. I've tried column_stack and vstack to form the arrays. Any assistance would be gratefully received. import birdseye_db as db import numpy as np from scipy.optimize import curve_fit def getDiam(ht, a, b): dbh = a * ht**b return dbh def eqCoeff(): '''estimates coefficients a and b in dbh= a* h**b using all trees where height was measured''' species=[i[0].strip(' ') for i in db.query('select distinct species from plots')] res3d=db.query('select dbh, height, species from plots where ht_code=1') indHt=[i[1] for i in res3d] depDbh=[i[0] for i in res3d] estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) return estimated_params, err_est def ancova(): res=db.query('select dbh, height, plot, species from plots where ht_code=1') indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) depDbh=[i[0] for i in res] estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) return estimated_params, err_est Thanks in advance -- Peter Tittmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Dec 2 16:01:19 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Dec 2010 16:01:19 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: Message-ID: On Thu, Dec 2, 2010 at 2:55 PM, Peter Tittmann wrote: > Greetings, > Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear > regression with curve_fit in optimize. The doc string states: > "xdata : An N-length sequence or an (k,N)-shaped array for functions with k > predictors. The independent variable where the data is measured." > I am hoping that this means that if I pass the independent variable and a > categorical variable, the resulting covariance matrix will reflect the > variability in the equation coefficients among the categorical variables. > 1. Is tis the case? > 2. If so, i'm having a problem with the input array for xdata. The following > extracts data from a relational database (thats the sql). the eqCoeff() > function works fine, however when I add a second dimension to the xdata in > th ancova() function (indHtPlot), curve fit produces an error which seems to > be related to the structure of my input array. I've tried column_stack and > vstack to form the arrays. Any assistance would be gratefully received. > > import birdseye_db as db > import numpy as np > from scipy.optimize import curve_fit > def getDiam(ht, a, b): > ?? ?dbh = a * ht**b > ?? ?return dbh > def eqCoeff(): > ?? ?'''estimates coefficients a and b in dbh= a* h**b using all trees where > height was measured''' > ?? ?species=[i[0].strip(' ') for i in db.query('select distinct species from > plots')] > ?? ?res3d=db.query('select dbh, height, species from plots where ht_code=1') > ?? ?indHt=[i[1] for i in res3d] > ?? ?depDbh=[i[0] for i in res3d] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) > ?? ?return estimated_params, err_est > > def ancova(): > ?? ?res=db.query('select dbh, height, plot, species from plots where > ht_code=1') > ?? ?indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) > ?? ?depDbh=[i[0] for i in res] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) > ?? ?return estimated_params, err_est Can you post the actual traceback? My first guess, but I have to look it up, is that you need to transpose xdata, e.g. indHtPlot.T curve_fit(getDiam, indHtPlot.T, depDbh) Josef > Thanks in advance > -- > Peter Tittmann > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From ptittmann at gmail.com Thu Dec 2 16:10:33 2010 From: ptittmann at gmail.com (Peter Tittmann) Date: Thu, 2 Dec 2010 13:10:33 -0800 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References:

Message-ID: <0DD9408C0B1B40B99E1863F77126D466@gmail.com> Thanks very much for your reply Josef, here are the tracebacks from both the original and from your suggestion: with: curve_fit(getDiam, indHtPlot, depDbh) In [20]: ancova() ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in File "", line 5, in ancova File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", line 422, in curve_fit res = leastsq(func, p0, args=args, full_output=1, **kw) File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", line 273, in leastsq m = check_func(func,x0,args,n)[0] File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", line 13, in check_func res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", line 343, in _general_function return function(xdata, *params) - ydata ValueError: shape mismatch: objects cannot be broadcast to a single shape with curve_fit(getDiam, indHtPlot.T, depDbh) In [22]: ancova() Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/IPython/iplib.py", line 2028, in excepthook self.showtraceback((etype,value,tb),tb_offset=0) File "/usr/lib/pymodules/python2.6/IPython/iplib.py", line 1729, in showtraceback self.InteractiveTB(etype,value,tb,tb_offset=tb_offset) File "/usr/lib/pymodules/python2.6/IPython/ultraTB.py", line 998, in __call__ print >> out, self.text(etype, evalue, etb) File "/usr/lib/pymodules/python2.6/IPython/ultraTB.py", line 1012, in text return FormattedTB.text(self,etype,value,tb,context=5,mode=mode) File "/usr/lib/pymodules/python2.6/IPython/ultraTB.py", line 937, in text if len(elist) > self.tb_offset: TypeError: object of type 'NoneType' has no len() Original exception was: ValueError: object too deep for desired array ------------------------------------------------------------ Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/IPython/iplib.py", line 2257, in runcode exec code_obj in self.user_global_ns, self.user_ns File "", line 1, in File "", line 5, in ancova File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", line 422, in curve_fit res = leastsq(func, p0, args=args, full_output=1, **kw) File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", line 281, in leastsq maxfev, epsfcn, factor, diag) error: Result from function call is not a proper array of floats. -- Peter Tittmann On Thursday, December 2, 2010 at 1:01 PM, josef.pktd at gmail.com wrote: > On Thu, Dec 2, 2010 at 2:55 PM, Peter Tittmann wrote: > > > Greetings, > > Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear > > regression with curve_fit in optimize. The doc string states: > > "xdata : An N-length sequence or an (k,N)-shaped array for functions with k > > predictors. The independent variable where the data is measured." > > I am hoping that this means that if I pass the independent variable and a > > categorical variable, the resulting covariance matrix will reflect the > > variability in the equation coefficients among the categorical variables. > > 1. Is tis the case? > > 2. If so, i'm having a problem with the input array for xdata. The following > > extracts data from a relational database (thats the sql). the eqCoeff() > > function works fine, however when I add a second dimension to the xdata in > > th ancova() function (indHtPlot), curve fit produces an error which seems to > > be related to the structure of my input array. I've tried column_stack and > > vstack to form the arrays. Any assistance would be gratefully received. > > > > import birdseye_db as db > > import numpy as np > > from scipy.optimize import curve_fit > > def getDiam(ht, a, b): > > dbh = a * ht**b > > return dbh > > def eqCoeff(): > > '''estimates coefficients a and b in dbh= a* h**b using all trees where > > height was measured''' > > species=[i[0].strip(' ') for i in db.query('select distinct species from > > plots')] > > res3d=db.query('select dbh, height, species from plots where ht_code=1') > > indHt=[i[1] for i in res3d] > > depDbh=[i[0] for i in res3d] > > estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) > > return estimated_params, err_est > > > > def ancova(): > > res=db.query('select dbh, height, plot, species from plots where > > ht_code=1') > > indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) > > depDbh=[i[0] for i in res] > > estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) > > return estimated_params, err_est > > > > > > > Can you post the actual traceback? > > My first guess, but I have to look it up, is that you need to > transpose xdata, e.g. indHtPlot.T > > curve_fit(getDiam, indHtPlot.T, depDbh) > > Josef > > > > > Thanks in advance > > -- > > Peter Tittmann > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Dec 2 16:31:37 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 2 Dec 2010 16:31:37 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: <0DD9408C0B1B40B99E1863F77126D466@gmail.com> References:

<0DD9408C0B1B40B99E1863F77126D466@gmail.com> Message-ID: On Thu, Dec 2, 2010 at 4:10 PM, Peter Tittmann wrote: > Thanks very much for your reply Josef, here are the tracebacks from both the > original and from your suggestion: > with:?curve_fit(getDiam, indHtPlot, depDbh) > In [20]: ancova() > ------------------------------------------------------------ > Traceback (most recent call last): > ??File "", line 1, in > ??File "", line 5, in ancova > ??File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", > line 422, in curve_fit > ?? ?res = leastsq(func, p0, args=args, full_output=1, **kw) > ??File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", > line 273, in leastsq > ?? ?m = check_func(func,x0,args,n)[0] > ??File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", > line 13, in check_func > ?? ?res = atleast_1d(thefunc(*((x0[:numinputs],)+args))) > ??File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", > line 343, in _general_function > ?? ?return function(xdata, *params) - ydata > ValueError: shape mismatch: objects cannot be broadcast to a single shape > > with?curve_fit(getDiam, indHtPlot.T, depDbh) > In [22]: ancova() > Error in sys.excepthook: > Traceback (most recent call last): > ??File "/usr/lib/pymodules/python2.6/IPython/iplib.py", line 2028, in > excepthook > ?? ?self.showtraceback((etype,value,tb),tb_offset=0) > ??File "/usr/lib/pymodules/python2.6/IPython/iplib.py", line 1729, in > showtraceback > ?? ?self.InteractiveTB(etype,value,tb,tb_offset=tb_offset) > ??File "/usr/lib/pymodules/python2.6/IPython/ultraTB.py", line 998, in > __call__ > ?? ?print >> out, self.text(etype, evalue, etb) > ??File "/usr/lib/pymodules/python2.6/IPython/ultraTB.py", line 1012, in text > ?? ?return FormattedTB.text(self,etype,value,tb,context=5,mode=mode) > ??File "/usr/lib/pymodules/python2.6/IPython/ultraTB.py", line 937, in text > ?? ?if len(elist) > self.tb_offset: > TypeError: object of type 'NoneType' has no len() > Original exception was: > ValueError: object too deep for desired array > ------------------------------------------------------------ > Traceback (most recent call last): > ??File "/usr/lib/pymodules/python2.6/IPython/iplib.py", line 2257, in > runcode > ?? ?exec code_obj in self.user_global_ns, self.user_ns > ??File "", line 1, in > ??File "", line 5, in ancova > ??File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", > line 422, in curve_fit > ?? ?res = leastsq(func, p0, args=args, full_output=1, **kw) > ??File "/usr/local/lib/python2.6/dist-packages/scipy/optimize/minpack.py", > line 281, in leastsq > ?? ?maxfev, epsfcn, factor, diag) > error: Result from function call is not a proper array of floats. > > -- > Peter Tittmann > > On Thursday, December 2, 2010 at 1:01 PM, josef.pktd at gmail.com wrote: > > On Thu, Dec 2, 2010 at 2:55 PM, Peter Tittmann wrote: > > Greetings, > Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear > regression with curve_fit in optimize. The doc string states: > "xdata : An N-length sequence or an (k,N)-shaped array for functions with k > predictors. The independent variable where the data is measured." > I am hoping that this means that if I pass the independent variable and a > categorical variable, the resulting covariance matrix will reflect the > variability in the equation coefficients among the categorical variables. > 1. Is tis the case? > 2. If so, i'm having a problem with the input array for xdata. The following > extracts data from a relational database (thats the sql). the eqCoeff() > function works fine, however when I add a second dimension to the xdata in > th ancova() function (indHtPlot), curve fit produces an error which seems to > be related to the structure of my input array. I've tried column_stack and > vstack to form the arrays. Any assistance would be gratefully received. > > import birdseye_db as db > import numpy as np > from scipy.optimize import curve_fit > def getDiam(ht, a, b): > ?? ?dbh = a * ht**b > ?? ?return dbh > def eqCoeff(): > ?? ?'''estimates coefficients a and b in dbh= a* h**b using all trees where > height was measured''' > ?? ?species=[i[0].strip(' ') for i in db.query('select distinct species from > plots')] > ?? ?res3d=db.query('select dbh, height, species from plots where ht_code=1') > ?? ?indHt=[i[1] for i in res3d] > ?? ?depDbh=[i[0] for i in res3d] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) > ?? ?return estimated_params, err_est > > def ancova(): > ?? ?res=db.query('select dbh, height, plot, species from plots where > ht_code=1') > ?? ?indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) > ?? ?depDbh=[i[0] for i in res] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) > ?? ?return estimated_params, err_est > > > Can you post the actual traceback? > > My first guess, but I have to look it up, is that you need to > transpose xdata, e.g. indHtPlot.T > > curve_fit(getDiam, indHtPlot.T, depDbh) > > Josef > > > Thanks in advance > -- > Peter Tittmann > Can you post a small sample of your data to replicate? Skipper From josef.pktd at gmail.com Thu Dec 2 16:33:51 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Dec 2010 16:33:51 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: Message-ID: On Thu, Dec 2, 2010 at 2:55 PM, Peter Tittmann wrote: > Greetings, > Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear > regression with curve_fit in optimize. The doc string states: > "xdata : An N-length sequence or an (k,N)-shaped array for functions with k > predictors. The independent variable where the data is measured." > I am hoping that this means that if I pass the independent variable and a > categorical variable, the resulting covariance matrix will reflect the > variability in the equation coefficients among the categorical variables. > 1. Is tis the case? > 2. If so, i'm having a problem with the input array for xdata. The following > extracts data from a relational database (thats the sql). the eqCoeff() > function works fine, however when I add a second dimension to the xdata in > th ancova() function (indHtPlot), curve fit produces an error which seems to > be related to the structure of my input array. I've tried column_stack and > vstack to form the arrays. Any assistance would be gratefully received. > > import birdseye_db as db > import numpy as np > from scipy.optimize import curve_fit > def getDiam(ht, a, b): > ?? ?dbh = a * ht**b > ?? ?return dbh if ht is 2dimensional, then dbh is also two dimensional (n,k). there should be a reduce, e.g. sum in here so that the return is 1d. Josef > def eqCoeff(): > ?? ?'''estimates coefficients a and b in dbh= a* h**b using all trees where > height was measured''' > ?? ?species=[i[0].strip(' ') for i in db.query('select distinct species from > plots')] > ?? ?res3d=db.query('select dbh, height, species from plots where ht_code=1') > ?? ?indHt=[i[1] for i in res3d] > ?? ?depDbh=[i[0] for i in res3d] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) > ?? ?return estimated_params, err_est > > def ancova(): > ?? ?res=db.query('select dbh, height, plot, species from plots where > ht_code=1') > ?? ?indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) > ?? ?depDbh=[i[0] for i in res] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) > ?? ?return estimated_params, err_est > Thanks in advance > -- > Peter Tittmann > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From ptittmann at gmail.com Thu Dec 2 16:51:31 2010 From: ptittmann at gmail.com (Peter Tittmann) Date: Thu, 2 Dec 2010 13:51:31 -0800 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: Message-ID: here is some of the data. Josef, i'm not sure I understand your suggestion. dbh is the dependent variable and is id (actually a list). The independent variable is height and the categorical variable to test for covariance with is plot. Maybe I'm confused and am trying to do something that cant be done this way... Thanks for any further assistance.. Peter -- Peter Tittmann On Thursday, December 2, 2010 at 1:33 PM, josef.pktd at gmail.com wrote: > On Thu, Dec 2, 2010 at 2:55 PM, Peter Tittmann wrote: > > > Greetings, > > Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear > > regression with curve_fit in optimize. The doc string states: > > "xdata : An N-length sequence or an (k,N)-shaped array for functions with k > > predictors. The independent variable where the data is measured." > > I am hoping that this means that if I pass the independent variable and a > > categorical variable, the resulting covariance matrix will reflect the > > variability in the equation coefficients among the categorical variables. > > 1. Is tis the case? > > 2. If so, i'm having a problem with the input array for xdata. The following > > extracts data from a relational database (thats the sql). the eqCoeff() > > function works fine, however when I add a second dimension to the xdata in > > th ancova() function (indHtPlot), curve fit produces an error which seems to > > be related to the structure of my input array. I've tried column_stack and > > vstack to form the arrays. Any assistance would be gratefully received. > > > > import birdseye_db as db > > import numpy as np > > from scipy.optimize import curve_fit > > def getDiam(ht, a, b): > > dbh = a * ht**b > > return dbh > > > > > > if ht is 2dimensional, then dbh is also two dimensional (n,k). there > should be a reduce, e.g. sum in here so that the return is 1d. > > Josef > > > > > def eqCoeff(): > > '''estimates coefficients a and b in dbh= a* h**b using all trees where > > height was measured''' > > species=[i[0].strip(' ') for i in db.query('select distinct species from > > plots')] > > res3d=db.query('select dbh, height, species from plots where ht_code=1') > > indHt=[i[1] for i in res3d] > > depDbh=[i[0] for i in res3d] > > estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) > > return estimated_params, err_est > > > > def ancova(): > > res=db.query('select dbh, height, plot, species from plots where > > ht_code=1') > > indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) > > depDbh=[i[0] for i in res] > > estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) > > return estimated_params, err_est > > Thanks in advance > > -- > > Peter Tittmann > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: db_out.csv Type: application/octet-stream Size: 1738 bytes Desc: not available URL: From josef.pktd at gmail.com Thu Dec 2 17:43:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Dec 2010 17:43:16 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: Message-ID: On Thu, Dec 2, 2010 at 4:51 PM, Peter Tittmann wrote: > here is some of the data. Josef, i'm not sure I understand your suggestion. > dbh is the dependent variable and is id (actually a list). The independent > variable is height and the categorical variable to test for covariance with > is plot. > Maybe I'm confused and am trying to do something that cant be done this > way... I don't understand what your getDiam function is supposed to do. In ancova, indHtPlot is 2d (nobs,2) with variables dbh and plot. getdiam calculates [a * dbh**b, a * plot**b] a (nobs,2) array, but it should produce instead a 1d (nobs,) array. Maybe you want to sum the functions with dbh and plot for each observation sum([a * dbh**b, a * plot**b], axis=1) This should solve the curvefit problem, but if plot is categorical, then this wouldn't be correct since it is just treated as metric variable. Maybe you want some dummy variables for plot instead. ??? Josef > Thanks for any further assistance.. > Peter > > -- > Peter Tittmann > > > On Thursday, December 2, 2010 at 1:33 PM, josef.pktd at gmail.com wrote: > > On Thu, Dec 2, 2010 at 2:55 PM, Peter Tittmann wrote: > > Greetings, > Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear > regression with curve_fit in optimize. The doc string states: > "xdata : An N-length sequence or an (k,N)-shaped array for functions with k > predictors. The independent variable where the data is measured." > I am hoping that this means that if I pass the independent variable and a > categorical variable, the resulting covariance matrix will reflect the > variability in the equation coefficients among the categorical variables. > 1. Is tis the case? > 2. If so, i'm having a problem with the input array for xdata. The following > extracts data from a relational database (thats the sql). the eqCoeff() > function works fine, however when I add a second dimension to the xdata in > th ancova() function (indHtPlot), curve fit produces an error which seems to > be related to the structure of my input array. I've tried column_stack and > vstack to form the arrays. Any assistance would be gratefully received. > > import birdseye_db as db > import numpy as np > from scipy.optimize import curve_fit > def getDiam(ht, a, b): > ?? ?dbh = a * ht**b > ?? ?return dbh > > if ht is 2dimensional, then dbh is also two dimensional (n,k). there > should be a reduce, e.g. sum in here so that the return is 1d. > > Josef > > > def eqCoeff(): > ?? ?'''estimates coefficients a and b in dbh= a* h**b using all trees where > height was measured''' > ?? ?species=[i[0].strip(' ') for i in db.query('select distinct species from > plots')] > ?? ?res3d=db.query('select dbh, height, species from plots where ht_code=1') > ?? ?indHt=[i[1] for i in res3d] > ?? ?depDbh=[i[0] for i in res3d] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) > ?? ?return estimated_params, err_est > > def ancova(): > ?? ?res=db.query('select dbh, height, plot, species from plots where > ht_code=1') > ?? ?indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) > ?? ?depDbh=[i[0] for i in res] > ?? ?estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) > ?? ?return estimated_params, err_est > Thanks in advance > -- > Peter Tittmann > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From ptittmann at gmail.com Thu Dec 2 17:59:53 2010 From: ptittmann at gmail.com (Peter Tittmann) Date: Thu, 2 Dec 2010 14:59:53 -0800 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: Message-ID: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com> getDiam is a predictor to get dbh from height. It works with curve_fit to find coefficients a and b given datasetset of known dbh/height pairs. You are right, what I want is dummy variables for each plot. I'll see if I can get that worked out by revising getDiam.. Thanks again On Thursday, December 2, 2010 at 2:43 PM, josef.pktd at gmail.com wrote: > On Thu, Dec 2, 2010 at 4:51 PM, Peter Tittmann wrote: > > > here is some of the data. Josef, i'm not sure I understand your suggestion. > > dbh is the dependent variable and is id (actually a list). The independent > > variable is height and the categorical variable to test for covariance with > > is plot. > > Maybe I'm confused and am trying to do something that cant be done this > > way... > > > > > > I don't understand what your getDiam function is supposed to do. In > ancova, indHtPlot is 2d (nobs,2) with variables dbh and plot. getdiam > calculates [a * dbh**b, a * plot**b] a (nobs,2) array, but it should > produce instead a 1d (nobs,) array. > Maybe you want to sum the functions with dbh and plot for each > observation sum([a * dbh**b, a * plot**b], axis=1) > > This should solve the curvefit problem, but if plot is categorical, > then this wouldn't be correct since it is just treated as metric > variable. Maybe you want some dummy variables for plot instead. ??? > > Josef > > > > > Thanks for any further assistance.. > > Peter > > > > -- > > Peter Tittmann > > > > > > On Thursday, December 2, 2010 at 1:33 PM, josef.pktd at gmail.com wrote: > > > > On Thu, Dec 2, 2010 at 2:55 PM, Peter Tittmann wrote: > > > > Greetings, > > Im attempting to conduct analysis of covariance (ANCOVA) using a non-linear > > regression with curve_fit in optimize. The doc string states: > > "xdata : An N-length sequence or an (k,N)-shaped array for functions with k > > predictors. The independent variable where the data is measured." > > I am hoping that this means that if I pass the independent variable and a > > categorical variable, the resulting covariance matrix will reflect the > > variability in the equation coefficients among the categorical variables. > > 1. Is tis the case? > > 2. If so, i'm having a problem with the input array for xdata. The following > > extracts data from a relational database (thats the sql). the eqCoeff() > > function works fine, however when I add a second dimension to the xdata in > > th ancova() function (indHtPlot), curve fit produces an error which seems to > > be related to the structure of my input array. I've tried column_stack and > > vstack to form the arrays. Any assistance would be gratefully received. > > > > import birdseye_db as db > > import numpy as np > > from scipy.optimize import curve_fit > > def getDiam(ht, a, b): > > dbh = a * ht**b > > return dbh > > > > if ht is 2dimensional, then dbh is also two dimensional (n,k). there > > should be a reduce, e.g. sum in here so that the return is 1d. > > > > Josef > > > > > > def eqCoeff(): > > '''estimates coefficients a and b in dbh= a* h**b using all trees where > > height was measured''' > > species=[i[0].strip(' ') for i in db.query('select distinct species from > > plots')] > > res3d=db.query('select dbh, height, species from plots where ht_code=1') > > indHt=[i[1] for i in res3d] > > depDbh=[i[0] for i in res3d] > > estimated_params, err_est = curve_fit(getDiam, indHt, depDbh) > > return estimated_params, err_est > > > > def ancova(): > > res=db.query('select dbh, height, plot, species from plots where > > ht_code=1') > > indHtPlot= np.column_stack(([i[1] for i in res],[i[2] for i in res] )) > > depDbh=[i[0] for i in res] > > estimated_params, err_est = curve_fit(getDiam, indHtPlot, depDbh) > > return estimated_params, err_est > > Thanks in advance > > -- > > Peter Tittmann > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > @gmail.com> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Dec 2 19:11:04 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 2 Dec 2010 19:11:04 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com> References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com> Message-ID: On Thu, Dec 2, 2010 at 5:59 PM, Peter Tittmann wrote: > getDiam is a predictor to get dbh from height. It works with curve_fit to > find coefficients a and b given datasetset of known dbh/height pairs. You > are right, what I want is dummy variables for each plot. I'll see if I can > get that worked out by revising getDiam.. > Thanks again > I think it would be easier to create your dummy variables before you pass it in. You might find some of the tools in statsmodels to be helpful here. We don't yet have an ANCOVA model, but you could definitely do something like the following. Not sure if it's exactly what you want, but it should give you an idea. import numpy as np import scikits.statsmodels as sm dta = np.genfromtxt('./db_out.csv', delimiter=",", names=True, dtype=None) plot_dummies, col_map = sm.tools.categorical(dta['plot'], drop=True, dictnames=True) plot_dummies will be dummy variables for all of the "plot" categories, and col_map is a map from the column number to the plot just so you can be sure you know what's what. I don't see how to use your objective function though with dummy variables. What happens if the effect of one of the plots is negative, then you run into 0 ** -1.5 == inf. You could linearize your objective function to be b*ln(ht) and do something like indHtPlot = dta['height'] depDbh = dta['dbh'] X = np.column_stack((np.log(indHtPlot), plot_dummies)) Y = np.log(depDbh) res = sm.OLS(Y,X).fit() res.params array([ 0.98933264, -1.35239293, -1.0623305 , -0.99155293, -1.33675099, -1.30657011, -1.50933751, -1.28744779, -1.43937358, -1.33805883, -1.32744257, -1.42672539, -1.35239293, -1.60585046, -1.45239093, -1.45695112, -1.34811186, -1.32658794, -1.21721715, -1.32853084, -1.45775017, -1.44460388, -2.19065236, -1.3303631 , -1.20509831, -1.37341535, -1.25746105, -1.33954972, -1.33922709, -1.247304 ]) Note that your coefficient on height is now an elasticity. I'm sure I'm missing something here, but that might help you along the way. Skipper From david at silveregg.co.jp Thu Dec 2 19:54:21 2010 From: david at silveregg.co.jp (David) Date: Fri, 03 Dec 2010 09:54:21 +0900 Subject: [SciPy-User] Accurate Frequency Measurement In-Reply-To: <4CF7C690.2080400@allcaps.org> References: <20101130014157.GA9408@spirit> <4CF4A0D1.4040709@silveregg.co.jp> <4CF7C690.2080400@allcaps.org> Message-ID: <4CF83FBD.4070002@silveregg.co.jp> On 12/03/2010 01:17 AM, Andrew Lentvorski wrote: > On 11/29/10 10:59 PM, David wrote: >> You may want to look at something like CLAM (http://clam-project.org) to >> analyse those signals if you want to track frequency changes. I believe >> they have some python bindings. > > That looks like a really nice project, but it looks really dead. I don't know about that - I know it was quite alive 1-2 years ago (and the project was already a few years old). I have not followed the project much since I left academia, though. cheers, David From jsseabold at gmail.com Thu Dec 2 19:57:38 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 2 Dec 2010 19:57:38 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com> Message-ID: On Thu, Dec 2, 2010 at 7:11 PM, Skipper Seabold wrote: > On Thu, Dec 2, 2010 at 5:59 PM, Peter Tittmann wrote: >> getDiam is a predictor to get dbh from height. It works with curve_fit to >> find coefficients a and b given datasetset of known dbh/height pairs. You >> are right, what I want is dummy variables for each plot. I'll see if I can >> get that worked out by revising getDiam.. >> Thanks again >> > > I think it would be easier to create your dummy variables before you pass it in. > > You might find some of the tools in statsmodels to be helpful here. > We don't yet have an ANCOVA model, but you could definitely do > something like the following. ?Not sure if it's exactly what you want, > but it should give you an idea. > > import numpy as np > import scikits.statsmodels as sm > > dta = np.genfromtxt('./db_out.csv', delimiter=",", names=True, dtype=None) > plot_dummies, col_map = sm.tools.categorical(dta['plot'], drop=True, > dictnames=True) > > plot_dummies will be dummy variables for all of the "plot" categories, > and col_map is a map from the column number to the plot just so you > can be sure you know what's what. > > I don't see how to use your objective function though with dummy > variables. ?What happens if the effect of one of the plots is > negative, then you run into 0 ** -1.5 == inf. > If you want to do NLLS and not linearize then something like this might work and still keep the dummy variables as shift parameters def getDiam(ht, *b): return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) X = np.column_stack((indHtPlot, plot_dummies)) Y = depDbh coefs, cov = optimize.curve_fit(getDiam, X, Y, p0= [0.]*X.shape[1]) > You could linearize your objective function to be > > b*ln(ht) > > and do something like > > indHtPlot = dta['height'] > depDbh = dta['dbh'] > X = np.column_stack((np.log(indHtPlot), plot_dummies)) > Y = np.log(depDbh) > res = sm.OLS(Y,X).fit() > res.params > array([ 0.98933264, -1.35239293, -1.0623305 , -0.99155293, -1.33675099, > ? ? ? -1.30657011, -1.50933751, -1.28744779, -1.43937358, -1.33805883, > ? ? ? -1.32744257, -1.42672539, -1.35239293, -1.60585046, -1.45239093, > ? ? ? -1.45695112, -1.34811186, -1.32658794, -1.21721715, -1.32853084, > ? ? ? -1.45775017, -1.44460388, -2.19065236, -1.3303631 , -1.20509831, > ? ? ? -1.37341535, -1.25746105, -1.33954972, -1.33922709, -1.247304 ?]) > > Note that your coefficient on height is now an elasticity. ?I'm sure > I'm missing something here, but that might help you along the way. > > Skipper > From josef.pktd at gmail.com Thu Dec 2 20:03:53 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Dec 2010 20:03:53 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com> Message-ID: On Thu, Dec 2, 2010 at 7:57 PM, Skipper Seabold wrote: > On Thu, Dec 2, 2010 at 7:11 PM, Skipper Seabold wrote: >> On Thu, Dec 2, 2010 at 5:59 PM, Peter Tittmann wrote: >>> getDiam is a predictor to get dbh from height. It works with curve_fit to >>> find coefficients a and b given datasetset of known dbh/height pairs. You >>> are right, what I want is dummy variables for each plot. I'll see if I can >>> get that worked out by revising getDiam.. >>> Thanks again >>> >> >> I think it would be easier to create your dummy variables before you pass it in. >> >> You might find some of the tools in statsmodels to be helpful here. >> We don't yet have an ANCOVA model, but you could definitely do >> something like the following. ?Not sure if it's exactly what you want, >> but it should give you an idea. >> >> import numpy as np >> import scikits.statsmodels as sm >> >> dta = np.genfromtxt('./db_out.csv', delimiter=",", names=True, dtype=None) >> plot_dummies, col_map = sm.tools.categorical(dta['plot'], drop=True, >> dictnames=True) >> >> plot_dummies will be dummy variables for all of the "plot" categories, >> and col_map is a map from the column number to the plot just so you >> can be sure you know what's what. >> >> I don't see how to use your objective function though with dummy >> variables. ?What happens if the effect of one of the plots is >> negative, then you run into 0 ** -1.5 == inf. >> > > If you want to do NLLS and not linearize then something like this > might work and still keep the dummy variables as shift parameters > > def getDiam(ht, *b): > ? ?return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) > > X = np.column_stack((indHtPlot, plot_dummies)) > Y = depDbh > coefs, cov = optimize.curve_fit(getDiam, X, Y, p0= [0.]*X.shape[1]) In the sample file there are 11 levels of the `plot` that have only a single observation each. I tried to use onewaygls, but statsmodels.OLS doesn't work if y is a scalar. I don't know whether curvefit or optimize.leastsq will converge in this case, good starting values might be necessary. Josef > > >> You could linearize your objective function to be >> >> b*ln(ht) >> >> and do something like >> >> indHtPlot = dta['height'] >> depDbh = dta['dbh'] >> X = np.column_stack((np.log(indHtPlot), plot_dummies)) >> Y = np.log(depDbh) >> res = sm.OLS(Y,X).fit() >> res.params >> array([ 0.98933264, -1.35239293, -1.0623305 , -0.99155293, -1.33675099, >> ? ? ? -1.30657011, -1.50933751, -1.28744779, -1.43937358, -1.33805883, >> ? ? ? -1.32744257, -1.42672539, -1.35239293, -1.60585046, -1.45239093, >> ? ? ? -1.45695112, -1.34811186, -1.32658794, -1.21721715, -1.32853084, >> ? ? ? -1.45775017, -1.44460388, -2.19065236, -1.3303631 , -1.20509831, >> ? ? ? -1.37341535, -1.25746105, -1.33954972, -1.33922709, -1.247304 ?]) >> >> Note that your coefficient on height is now an elasticity. ?I'm sure >> I'm missing something here, but that might help you along the way. >> >> Skipper >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From nwagner at iam.uni-stuttgart.de Fri Dec 3 03:09:27 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Fri, 03 Dec 2010 09:09:27 +0100 Subject: [SciPy-User] scipy 0.9.0.dev6984 test failures Message-ID: ====================================================================== FAIL: test_cast_to_fp (test_recaster.TestRecaster) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/io/tests/test_recaster.py", line 73, in test_cast_to_fp 'Expected %s from %s, got %s' % (outp, inp, dtt) AssertionError: Expected from , got ====================================================================== FAIL: line-search Newton conjugate gradient optimization routine ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/optimize/tests/test_optimize.py", line 177, in test_ncg assert_(self.gradcalls == 18, self.gradcalls) # 0.8.0 File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", line 34, in assert_ raise AssertionError(msg) AssertionError: 16 ====================================================================== FAIL: test_basic (test_signaltools.TestMedFilt) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/signal/tests/test_signaltools.py", line 284, in test_basic [ 0, 7, 11, 7, 4, 4, 19, 19, 24, 0]]) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", line 686, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", line 618, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 8.0%) x: array([[ 0., 50., 50., 50., 42., 15., 15., 18., 27., 0.], [ 0., 50., 50., 50., 50., 42., 19., 21., 29., 0.], [ 50., 50., 50., 50., 50., 47., 34., 34., 46., 35.],... y: array([[ 0, 50, 50, 50, 42, 15, 15, 18, 27, 0], [ 0, 50, 50, 50, 50, 42, 19, 21, 29, 0], [50, 50, 50, 50, 50, 47, 34, 34, 46, 35],... ---------------------------------------------------------------------- Ran 4794 tests in 75.009s FAILED (KNOWNFAIL=13, SKIP=17, failures=4) From charlesr.harris at gmail.com Fri Dec 3 12:05:03 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Dec 2010 10:05:03 -0700 Subject: [SciPy-User] scipy 0.9.0.dev6984 test failures In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 1:09 AM, Nils Wagner wrote: > > ====================================================================== > FAIL: test_cast_to_fp (test_recaster.TestRecaster) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/io/tests/test_recaster.py", > line 73, in test_cast_to_fp > 'Expected %s from %s, got %s' % (outp, inp, dtt) > AssertionError: Expected from 'numpy.float64'>, got > > Recaster is gone from the repository but I found a copy in the build directory. Try deleting the build and installation directories. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Dec 4 09:27:08 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Dec 2010 09:27:08 -0500 Subject: [SciPy-User] stats.distributions moments and expect - another round Message-ID: I spend most of a day fixing the expect function for stats distributions and checking mean, variance, skew and kurtosis, and trying to get it to work for almost all distributions. expect which uses integrate quad doesn't always work well, but the results look mostly good except for some fat-tailed distributions. Below are the comparisons between the stats method of the distributions and the outcome of expect. The first failure for mean is at 4 decimals. The tables are discrepancies at two decimals: invgamma might still have some numerical integration problems, that I haven't figured out yet. Some other ones still look like incorrect formulas for skew and kurtosis. As an aside: I think, finally, that we can add the doc templates to many distributions, so we have a way of pointing out differences in parameterization and known numerical problems. For example, I'm still not sure whether dist.stats contains the correct information about whether the moments don't exist or are infinite. Josef Variance >>> print SimpleTable(var_, headers=['distname', 'diststats', 'expect']) ========================================== distname diststats expect ------------------------------------------ pareto 1.60340572053 1.57246536012 tukeylambda 0.304764722791 0.0268724542194 fatiguelife 884942.25 884940.75504 t 3.69051720817 3.66604272097 powerlaw 0.858574961358 0.0641248702779 invgamma 13.1319516251 5.5288716506 rdist 1.78002848501 0.526315790145 ------------------------------------------ Skew >>> print SimpleTable(skew, headers=['distname', 'diststats', 'expect']) =========================================== distname diststats expect ------------------------------------------- mielke 7.59540085257 6.91088104518 fisk 38.7938857832 12.9246301568 foldnorm 0.971407236222 0.202188769695 gilbrat 6.18487713863 6.17333365849 loglaplace 16.9237038681 11.2535633383 fatiguelife 0.0408718910942 3.93239048725 powerlaw -0.906960466124 -0.420181302329 ncf 40747519832.7 8.94481163856 f 1.93130205529 1.80641432186 invgamma -0.477729689377 655.942282864 ------------------------------------------- Kurtosis >>> print SimpleTable(kurt, headers=['distname', 'diststats', 'expect']) =========================================== distname diststats expect ------------------------------------------- mielke -149.405089743 362.442518204 fisk -224.659270348 2099.30241789 foldnorm 2.70517285483 -0.294828589688 tukeylambda -2.98365209914 -0.897302898918 dweibull 1.90893020344 -1.06484211833 gilbrat 110.936392176 107.859563214 loglaplace -164.332555303 1330.2395564 genpareto 14.8285714286 14.8119563403 lognorm 81.1353811489 79.6180870122 burr 112616.270172 6.21265707889 ncf -239984516633.0 13492.9902378 f 7.9138697318 7.06539159862 nct -409040.407062 0.605963897342 invgamma -2.866573514 2116889.58176 rdist -2.56785479799 -1.53846154515 ------------------------------------------- From josef.pktd at gmail.com Sat Dec 4 10:35:55 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Dec 2010 10:35:55 -0500 Subject: [SciPy-User] stats.distributions moments and expect - another round In-Reply-To: References: Message-ID: On Sat, Dec 4, 2010 at 9:27 AM, wrote: > I spend most of a day fixing the expect function for stats > distributions and checking mean, variance, skew and kurtosis, and > trying to get it to work for almost all distributions. > > expect which uses integrate quad doesn't always work well, but the > results look mostly good except for some fat-tailed distributions. > > Below are the comparisons between the stats method of the > distributions and the outcome of expect. The first failure for mean is > at 4 decimals. The tables are discrepancies at two decimals: > > invgamma might still have some numerical integration problems, that I > haven't figured out yet. Some other ones still look like incorrect > formulas for skew and kurtosis. > > As an aside: I think, finally, that we can add the doc templates to > many distributions, so we have a way of pointing out differences in > parameterization and known numerical problems. For example, I'm still > not sure whether dist.stats contains the correct information about > whether the moments don't exist or are infinite. > > Josef > > > Variance > >>>> print SimpleTable(var_, headers=['distname', 'diststats', 'expect']) > ========================================== > ?distname ? ?diststats ? ? ? ? expect > ------------------------------------------ > ? pareto ? 1.60340572053 ? 1.57246536012 > tukeylambda 0.304764722791 0.0268724542194 > fatiguelife ? 884942.25 ? ? ?884940.75504 > ? ? t ? ? ?3.69051720817 ? 3.66604272097 > ?powerlaw ?0.858574961358 0.0641248702779 > ?invgamma ?13.1319516251 ? ?5.5288716506 > ? rdist ? ?1.78002848501 ? 0.526315790145 > ------------------------------------------ > > Skew > >>>> print SimpleTable(skew, headers=['distname', 'diststats', 'expect']) > =========================================== > ?distname ? ? diststats ? ? ? ? expect > ------------------------------------------- > ? mielke ? ?7.59540085257 ? 6.91088104518 > ? ?fisk ? ? 38.7938857832 ? 12.9246301568 > ?foldnorm ? 0.971407236222 ?0.202188769695 > ?gilbrat ? ?6.18487713863 ? 6.17333365849 > ?loglaplace ?16.9237038681 ? 11.2535633383 > fatiguelife 0.0408718910942 ?3.93239048725 > ?powerlaw ?-0.906960466124 -0.420181302329 > ? ?ncf ? ? ?40747519832.7 ? 8.94481163856 > ? ? f ? ? ? 1.93130205529 ? 1.80641432186 > ?invgamma ?-0.477729689377 ?655.942282864 > ------------------------------------------- > > Kurtosis > >>>> print SimpleTable(kurt, headers=['distname', 'diststats', 'expect']) > =========================================== > ?distname ? ? diststats ? ? ? ? expect > ------------------------------------------- > ? mielke ? ?-149.405089743 ?362.442518204 > ? ?fisk ? ? -224.659270348 ?2099.30241789 > ?foldnorm ? 2.70517285483 ?-0.294828589688 > tukeylambda ?-2.98365209914 -0.897302898918 > ?dweibull ? 1.90893020344 ? -1.06484211833 > ?gilbrat ? ?110.936392176 ? 107.859563214 > ?loglaplace ?-164.332555303 ? 1330.2395564 > ?genpareto ? 14.8285714286 ? 14.8119563403 > ?lognorm ? ?81.1353811489 ? 79.6180870122 > ? ?burr ? ? 112616.270172 ? 6.21265707889 > ? ?ncf ? ? -239984516633.0 ?13492.9902378 > ? ? f ? ? ? ?7.9138697318 ? 7.06539159862 > ? ?nct ? ? ?-409040.407062 ?0.605963897342 > ?invgamma ? ?-2.866573514 ? 2116889.58176 > ? rdist ? ? -2.56785479799 ?-1.53846154515 > ------------------------------------------- > one down: invgamma is correct if a>4, requirement for kurtosis to exist Josef From josef.pktd at gmail.com Sat Dec 4 11:29:15 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Dec 2010 11:29:15 -0500 Subject: [SciPy-User] stats.distributions moments and expect - another round In-Reply-To: References:

Message-ID: On Sat, Dec 4, 2010 at 10:35 AM, wrote: > On Sat, Dec 4, 2010 at 9:27 AM, ? wrote: >> I spend most of a day fixing the expect function for stats >> distributions and checking mean, variance, skew and kurtosis, and >> trying to get it to work for almost all distributions. >> >> expect which uses integrate quad doesn't always work well, but the >> results look mostly good except for some fat-tailed distributions. >> >> Below are the comparisons between the stats method of the >> distributions and the outcome of expect. The first failure for mean is >> at 4 decimals. The tables are discrepancies at two decimals: >> >> invgamma might still have some numerical integration problems, that I >> haven't figured out yet. Some other ones still look like incorrect >> formulas for skew and kurtosis. >> >> As an aside: I think, finally, that we can add the doc templates to >> many distributions, so we have a way of pointing out differences in >> parameterization and known numerical problems. For example, I'm still >> not sure whether dist.stats contains the correct information about >> whether the moments don't exist or are infinite. >> >> Josef >> >> >> Variance >> >>>>> print SimpleTable(var_, headers=['distname', 'diststats', 'expect']) >> ========================================== >> ?distname ? ?diststats ? ? ? ? expect >> ------------------------------------------ >> ? pareto ? 1.60340572053 ? 1.57246536012 >> tukeylambda 0.304764722791 0.0268724542194 >> fatiguelife ? 884942.25 ? ? ?884940.75504 >> ? ? t ? ? ?3.69051720817 ? 3.66604272097 >> ?powerlaw ?0.858574961358 0.0641248702779 >> ?invgamma ?13.1319516251 ? ?5.5288716506 >> ? rdist ? ?1.78002848501 ? 0.526315790145 >> ------------------------------------------ >> >> Skew >> >>>>> print SimpleTable(skew, headers=['distname', 'diststats', 'expect']) >> =========================================== >> ?distname ? ? diststats ? ? ? ? expect >> ------------------------------------------- >> ? mielke ? ?7.59540085257 ? 6.91088104518 >> ? ?fisk ? ? 38.7938857832 ? 12.9246301568 >> ?foldnorm ? 0.971407236222 ?0.202188769695 >> ?gilbrat ? ?6.18487713863 ? 6.17333365849 >> ?loglaplace ?16.9237038681 ? 11.2535633383 >> fatiguelife 0.0408718910942 ?3.93239048725 >> ?powerlaw ?-0.906960466124 -0.420181302329 >> ? ?ncf ? ? ?40747519832.7 ? 8.94481163856 >> ? ? f ? ? ? 1.93130205529 ? 1.80641432186 >> ?invgamma ?-0.477729689377 ?655.942282864 >> ------------------------------------------- >> >> Kurtosis >> >>>>> print SimpleTable(kurt, headers=['distname', 'diststats', 'expect']) >> =========================================== >> ?distname ? ? diststats ? ? ? ? expect >> ------------------------------------------- >> ? mielke ? ?-149.405089743 ?362.442518204 >> ? ?fisk ? ? -224.659270348 ?2099.30241789 >> ?foldnorm ? 2.70517285483 ?-0.294828589688 >> tukeylambda ?-2.98365209914 -0.897302898918 >> ?dweibull ? 1.90893020344 ? -1.06484211833 >> ?gilbrat ? ?110.936392176 ? 107.859563214 >> ?loglaplace ?-164.332555303 ? 1330.2395564 >> ?genpareto ? 14.8285714286 ? 14.8119563403 >> ?lognorm ? ?81.1353811489 ? 79.6180870122 >> ? ?burr ? ? 112616.270172 ? 6.21265707889 >> ? ?ncf ? ? -239984516633.0 ?13492.9902378 >> ? ? f ? ? ? ?7.9138697318 ? 7.06539159862 >> ? ?nct ? ? ?-409040.407062 ?0.605963897342 >> ?invgamma ? ?-2.866573514 ? 2116889.58176 >> ? rdist ? ? -2.56785479799 ?-1.53846154515 >> ------------------------------------------- >> > > one down: invgamma is correct if a>4, requirement for kurtosis to exist I'm giving up, staring at the kurtosis of nct and burr without the direct references looks like more fun than I have time for. I think I have a patch for ncf. Any volunteers, or references ? At least we are down to a reasonably short list that needs checking or bugfixing. Josef > > Josef > From ralf.gommers at googlemail.com Sun Dec 5 10:15:47 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Dec 2010 23:15:47 +0800 Subject: [SciPy-User] scipy 0.9.0.dev6984 test failures In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 4:09 PM, Nils Wagner wrote: > > ====================================================================== > FAIL: line-search Newton conjugate gradient optimization > routine > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/optimize/tests/test_optimize.py", > line 177, in test_ncg > assert_(self.gradcalls == 18, self.gradcalls) # 0.8.0 > File > > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", > line 34, in assert_ > raise AssertionError(msg) > AssertionError: 16 > This number of calls has been changing before apparently, and now has differences between platforms or python versions. For 0.8.0 it had an issue on Windows due to == comparison with floating point numbers. Since converging faster is not exactly a bug, can we just change the comparison to <= ? > > ====================================================================== > FAIL: test_basic (test_signaltools.TestMedFilt) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/signal/tests/test_signaltools.py", > line 284, in test_basic > [ 0, 7, 11, 7, 4, 4, 19, 19, 24, 0]]) > File > > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", > line 686, in assert_array_equal > verbose=verbose, header='Arrays are not equal') > File > > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", > line 618, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not equal > > (mismatch 8.0%) > x: array([[ 0., 50., 50., 50., 42., 15., 15., > 18., 27., 0.], > [ 0., 50., 50., 50., 50., 42., 19., 21., > 29., 0.], > [ 50., 50., 50., 50., 50., 47., 34., 34., > 46., 35.],... > y: array([[ 0, 50, 50, 50, 42, 15, 15, 18, 27, 0], > [ 0, 50, 50, 50, 50, 42, 19, 21, 29, 0], > [50, 50, 50, 50, 50, 47, 34, 34, 46, 35],... > If you change the assert_array_equal calls in TestMedfilt to assert_array_almost_equal does the test pass? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Sun Dec 5 12:13:21 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Sun, 05 Dec 2010 18:13:21 +0100 Subject: [SciPy-User] scipy 0.9.0.dev6984 test failures In-Reply-To: References: Message-ID: On Sun, 5 Dec 2010 23:15:47 +0800 Ralf Gommers wrote: > On Fri, Dec 3, 2010 at 4:09 PM, Nils Wagner >wrote: > >> >> ====================================================================== >> FAIL: line-search Newton conjugate gradient optimization >> routine >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> >> "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/optimize/tests/test_optimize.py", >> line 177, in test_ncg >> assert_(self.gradcalls == 18, self.gradcalls) # >>0.8.0 >> File >> >> "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", >> line 34, in assert_ >> raise AssertionError(msg) >> AssertionError: 16 >> > > This number of calls has been changing before >apparently, and now has > differences between platforms or python versions. For >0.8.0 it had an issue > on Windows due to == comparison with floating point >numbers. > > Since converging faster is not exactly a bug, can we >just change the > comparison to <= ? > > >> >> ====================================================================== >> FAIL: test_basic (test_signaltools.TestMedFilt) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> >> "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/signal/tests/test_signaltools.py", >> line 284, in test_basic >> [ 0, 7, 11, 7, 4, 4, 19, 19, 24, 0]]) >> File >> >> "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", >> line 686, in assert_array_equal >> verbose=verbose, header='Arrays are not equal') >> File >> >> "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", >> line 618, in assert_array_compare >> raise AssertionError(msg) >> AssertionError: >> Arrays are not equal >> >> (mismatch 8.0%) >> x: array([[ 0., 50., 50., 50., 42., 15., 15., >> 18., 27., 0.], >> [ 0., 50., 50., 50., 50., 42., 19., 21., >> 29., 0.], >> [ 50., 50., 50., 50., 50., 47., 34., 34., >> 46., 35.],... >> y: array([[ 0, 50, 50, 50, 42, 15, 15, 18, 27, 0], >> [ 0, 50, 50, 50, 50, 42, 19, 21, 29, 0], >> [50, 50, 50, 50, 50, 47, 34, 34, 46, 35],... >> > > If you change the assert_array_equal calls in >TestMedfilt to > assert_array_almost_equal does the test pass? > > Ralf Hi Ralf, Unfortunately, the test didn't pass. ====================================================================== FAIL: test_basic (test_signaltools.TestMedFilt) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/nwagner/local/lib64/python2.6/site-packages/scipy/signal/tests/test_signaltools.py", line 284, in test_basic [ 0, 7, 11, 7, 4, 4, 19, 19, 24, 0]]) File "/home/nwagner/local/lib64/python2.6/site-packages/numpy/testing/utils.py", line 774, in assert_array_almost_equal header='Arrays are not almost equal') File "/home/nwagner/local/lib64/python2.6/site-packages/numpy/testing/utils.py", line 618, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal (mismatch 8.0%) x: array([[ 0., 50., 50., 50., 42., 15., 15., 18., 27., 0.], [ 0., 50., 50., 50., 50., 42., 19., 21., 29., 0.], [ 50., 50., 50., 50., 50., 47., 34., 34., 46., 35.],... y: array([[ 0, 50, 50, 50, 42, 15, 15, 18, 27, 0], [ 0, 50, 50, 50, 50, 42, 19, 21, 29, 0], [50, 50, 50, 50, 50, 47, 34, 34, 46, 35],... ---------------------------------------------------------------------- Ran 312 tests in 3.010s FAILED (failures=1) Nils From ralf.gommers at googlemail.com Sun Dec 5 19:13:08 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 6 Dec 2010 08:13:08 +0800 Subject: [SciPy-User] scipy 0.9.0.dev6984 test failures In-Reply-To: References:

Message-ID: On Mon, Dec 6, 2010 at 1:13 AM, Nils Wagner wrote: > On Sun, 5 Dec 2010 23:15:47 +0800 > Ralf Gommers wrote: > > On Fri, Dec 3, 2010 at 4:09 PM, Nils Wagner > >wrote: > > > >> > >> ====================================================================== > >> FAIL: line-search Newton conjugate gradient optimization > >> routine > >> ---------------------------------------------------------------------- > >> Traceback (most recent call last): > >> File > >> > >> > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/optimize/tests/test_optimize.py", > >> line 177, in test_ncg > >> assert_(self.gradcalls == 18, self.gradcalls) # > >>0.8.0 > >> File > >> > >> > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", > >> line 34, in assert_ > >> raise AssertionError(msg) > >> AssertionError: 16 > >> > > > > This number of calls has been changing before > >apparently, and now has > > differences between platforms or python versions. For > >0.8.0 it had an issue > > on Windows due to == comparison with floating point > >numbers. > > > > Since converging faster is not exactly a bug, can we > >just change the > > comparison to <= ? > > > > > >> > >> ====================================================================== > >> FAIL: test_basic (test_signaltools.TestMedFilt) > >> ---------------------------------------------------------------------- > >> Traceback (most recent call last): > >> File > >> > >> > "/data/home/nwagner/local/lib/python2.5/site-packages/scipy/signal/tests/test_signaltools.py", > >> line 284, in test_basic > >> [ 0, 7, 11, 7, 4, 4, 19, 19, 24, 0]]) > >> File > >> > >> > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", > >> line 686, in assert_array_equal > >> verbose=verbose, header='Arrays are not equal') > >> File > >> > >> > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/testing/utils.py", > >> line 618, in assert_array_compare > >> raise AssertionError(msg) > >> AssertionError: > >> Arrays are not equal > >> > >> (mismatch 8.0%) > >> x: array([[ 0., 50., 50., 50., 42., 15., 15., > >> 18., 27., 0.], > >> [ 0., 50., 50., 50., 50., 42., 19., 21., > >> 29., 0.], > >> [ 50., 50., 50., 50., 50., 47., 34., 34., > >> 46., 35.],... > >> y: array([[ 0, 50, 50, 50, 42, 15, 15, 18, 27, 0], > >> [ 0, 50, 50, 50, 50, 42, 19, 21, 29, 0], > >> [50, 50, 50, 50, 50, 47, 34, 34, 46, 35],... > >> > > > > If you change the assert_array_equal calls in > >TestMedfilt to > > assert_array_almost_equal does the test pass? > > > > Ralf > > > Hi Ralf, > > Unfortunately, the test didn't pass. > > Then can you investigate a bit? Are there nans/infs in one of the outputs? The parts of the arrays that are printed look exactly the same. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Mon Dec 6 06:20:50 2010 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Mon, 06 Dec 2010 12:20:50 +0100 Subject: [SciPy-User] ANN: SfePy 2010.4 Message-ID: <4CFCC712.9050508@ntc.zcu.cz> I am pleased to announce release 2010.4 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method. The code is based on NumPy and SciPy packages. It is distributed under the new BSD license. Home page: http://sfepy.org Mailing lists, issue tracking: http://code.google.com/p/sfepy/ Git (source) repository: http://github.com/sfepy Documentation: http://docs.sfepy.org/doc Highlights of this release -------------------------- - higher order elements - refactoring of geometries (reference mappings) - transparent DOF vector synchronization with variables - interface variables defined on a surface region For more information on this release, see http://sfepy.googlecode.com/svn/web/releases/2010.4_RELEASE_NOTES.txt (full release notes, rather long and technical). Best regards, Robert Cimrman and Contributors (*) (*) Contributors to this release (alphabetical order): Vladim?r Luke?, Logan Sorenson, Olivier Verdier From almar.klein at gmail.com Mon Dec 6 06:45:51 2010 From: almar.klein at gmail.com (Almar Klein) Date: Mon, 6 Dec 2010 12:45:51 +0100 Subject: [SciPy-User] ANN: IEP 2.3 (the Interactive Editor for Python) In-Reply-To: References:

Message-ID: On 27 November 2010 20:02, Almar Klein wrote: > > > On 26 November 2010 20:31, Christian K. wrote: > >> Hi Almar, >> >> Am 26.11.10 16:47, schrieb Almar Klein: >> > Hi all, >> > >> > I am pleased to announce version 2.3 of IEP, the interactive Editor for >> > Python. >> > >> > IEP is a cross-platform Python IDE focused on interactivity and >> > introspection, which makes it very suitable for scientific computing. >> > Its practical design is aimed at simplicity and efficiency. >> > >> > website: http://code.google.com/p/iep/ >> > downloads: http://code.google.com/p/iep/downloads/list >> > (binaries are available >> > for Windows, Linux and Mac) >> >> the mac binary does not work here. It looks for a python 3.1 >> installation in soem special place which I do not have: >> >> Dyld Error Message: >> Library not loaded: >> /opt/local/Library/Frameworks/Python.framework/Versions/3.1/Python >> Referenced from: /Applications/iep.app/Contents/MacOS/iep >> Reason: image not found >> > > A bug report has been filed: > http://code.google.com/p/iep/issues/detail?id=18 > A working binary is now available (there are remaining problems with Mac OS 10.5). On a related topic: the 32bit Linux binaries are now available with anti-aliased fonts. I'm now working on the 64bit Linux binaries (takes a day or two recompiling stuff). Almar -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandavison7 at gmail.com Mon Dec 6 07:14:36 2010 From: dandavison7 at gmail.com (Dan Davison) Date: Mon, 06 Dec 2010 12:14:36 +0000 Subject: [SciPy-User] Installing on OSX 10.6 Snow Leopard Message-ID: <87d3pfhzdf.fsf@gmail.com> Hi, I'm failing to install scipy on Mac OSX 10.6. I would be happy to use the binary .dmg installer from sourceforge. But they detect the system python that Apple ships and refuse to install. I do have python26 installed -- how do I use the .dmg installer? In many places on the web it is said that the scipy installers "work with the python from python.org" rather than Apple's python, but I haven't seen any instruction as to how one accomplishes that. Thanks very much, Dan From ralf.gommers at googlemail.com Mon Dec 6 07:58:40 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 6 Dec 2010 20:58:40 +0800 Subject: [SciPy-User] Installing on OSX 10.6 Snow Leopard In-Reply-To: <87d3pfhzdf.fsf@gmail.com> References: <87d3pfhzdf.fsf@gmail.com> Message-ID: On Mon, Dec 6, 2010 at 8:14 PM, Dan Davison wrote: > Hi, > > I'm failing to install scipy on Mac OSX 10.6. I would be happy to use > the binary .dmg installer from sourceforge. But they detect the system > python that Apple ships and refuse to install. I do have python26 > installed -- how do I use the .dmg installer? In many places on the web > it is said that the scipy installers "work with the python from > python.org" rather than Apple's python, but I haven't seen any > instruction as to how one accomplishes that. > Download the dmg from http://www.python.org/ftp/python/2.6.6/ Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From lionel.roubeyrie at gmail.com Mon Dec 6 09:22:22 2010 From: lionel.roubeyrie at gmail.com (Lionel Roubeyrie) Date: Mon, 6 Dec 2010 15:22:22 +0100 Subject: [SciPy-User] kriging module In-Reply-To: <9eab2628-df85-4248-b0fa-0ba1a52b86af@p30g2000prb.googlegroups.com> References: <4CEA2731.63BA.009B.1@twdb.state.tx.us> <4CEB7F2F.63BA.009B.1@twdb.state.tx.us> <9eab2628-df85-4248-b0fa-0ba1a52b86af@p30g2000prb.googlegroups.com> Message-ID: Hi all, no really familiar with Git, I just see I have given a bad address... So, if anyone want to test, here is the good one : git at github.com:LionelR/krige.git I'll really appreciate any comment, thanks 2010/11/24 Anand Patil : > Hi everyone, > > > I'm the author PyMC's GP module. Sorry to come late to this thread. > The discussion of my module has been on target, and thanks very much > for the kind words... as everyone here knows it's nice when people > notice code that you've worked hard on. I have a couple of hopefully > relevant things to say about it. > > > First, the GP module is broader in scope than what people typically > mean by GP regression and kriging. The statistical model underlying > typical GPR/K says that the data are normally distributed with > expectations equal to the GP's value at particular, known locations. > Further, the mean and covariance parameters of the field, as well as > the variance of the data, are typically fixed before starting the > regression. > > With the GP module, the mean and covariance parameters can be unknown, > and the data can depend on the field in any way; as a random example, > each data point could be Gamma distributed, with parameters determined > by a nonlinear transformation of the field's value at several unknown > locations. > > That said, the module has a very pronounced fast path that restricts > its practical model space to Bayesian geostatistics, which means the > aforementioned locations have to be known before starting the > regression. This is still a superset of GPR/K. There are numerous > examples of the GP module in use for Bayesian geostatistics at > github.com/malaria-atlas-project. > > > Second, the parts of the GP module that would help with GPR/K are not > very tightly bound to either the rest of PyMC or the Bayesian > paradigm, and could be pulled out. These parts are the Mean, > Covariance and Realization objects, functions like observe and > point_predict, and their components; but not the GP submodels and step > methods mentioned in the user guide. > > > Any questions on the GP module are welcome at groups.google.com/p/ > pymc. I'm looking forward to checking out the work in progress on the > scikit. > > Cheers, > Anand > > On Nov 23, 2:45?pm, "Dharhas Pothina" > wrote: >> We were planning to project our irregular data onto a cartesian grid and try and use matplotlib to visualize the variograms. I don't think I know enough about the math ofkrigingto be of much help in the coding but I might be able to give your module a try if I can find time between deadlines. >> >> - dharhas >> >> >>> Lionel Roubeyrie 11/22/2010 9:15 AM >>> >> >> I have tried hpgl and had some discussions with one of the main >> developper, but hpgl works only on cartesian (regular) grid where I >> want to have the possibility to have predictions on irregular points >> and have the possibility to visualize variograms >> >> 2010/11/22 Dharhas Pothina : >> >> >> >> >> >> >> >> >> >> >> >> > What about this package?http://hpgl.sourceforge.net/ >> >> > I was looking for a kridging module recently and came across this. I haven't tried it out yet but am getting ready to. It uses numpy arrays and also is able to read/write GSLib files. GSLib seems to be a fairly established command line library in the Geostats world. >> >> > - dharhas >> >> > On Sat, Nov 20, 2010 at 12:56 PM, Lionel Roubeyrie < >> > lionel.roubey... at gmail.com> wrote: >> >> >> Hi all, >> >> I have written a simple module forkrigingcomputation (ordinary >> >>krigingfor the moment), it's not optimized and maybe some minors >> >> errors are inside but I think it delivers corrects results. Is there >> >> some people here that can help me for optimize the code or just to >> >> have a try? I don't know the politic of this mailing-list against >> >> joined files, so I don't send it here for now. >> >> Thanks >> >> >> -- >> >> Lionel Roubeyrie >> >> lionel.roubey... at gmail.com >> >>http://youarealegend.blogspot.com >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-U... at scipy.org >> >>http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-U... at scipy.org >> >http://mail.scipy.org/mailman/listinfo/scipy-user >> >> -- >> Lionel Roubeyrie >> lionel.roubey... at gmail.comhttp://youarealegend.blogspot.com >> _______________________________________________ >> SciPy-User mailing list >> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Lionel Roubeyrie lionel.roubeyrie at gmail.com http://youarealegend.blogspot.com From carovi at utu.fi Mon Dec 6 13:47:12 2010 From: carovi at utu.fi (Carolin Villforth) Date: Mon, 06 Dec 2010 13:47:12 -0500 Subject: [SciPy-User] changing covariance factor in scipy.stats.kde.gaussian_kde Message-ID: <6C70B2C8-A065-4EE5-8C55-E50F80F1D5B2@utu.fi> Hello, I have a question concerning the usage of gaussian_kde. I am trying to change the covariance factor for the KDE, this is what I am doing it at the moment: myKDE = scipy.stats.kde.gaussian_kde(data) myKDE.covariance_factor = myKDE.silverman_factor myKDE._compute_covariance() The last line seems to be necessary for the changes to take effect. While the above code works, this might create quite an overhead since _compute_variance is executed twice, once in __init__ and then again after the covariance factor has been changed. If I understood correctly, this is not really necessary since silverman_factor does not depend on outputs from _compute_covariance. Also, I always assumed that one should avoid calling '._functions' from outside the class. Is there another way to change the covariance factor? Thanks Greetings Carolin ---------------------------------------------------------- Carolin Villforth PhD Student Tuorla Observatory Finland and Space Telescope Science Institute 3700 San Martin Drive 21218 Baltimore, MD USA phone: +1-410-338-4334 email: carovi at utu.fi, villfort at stsci.edu ---------------------------------------------------------- From robert.kern at gmail.com Mon Dec 6 13:48:39 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 6 Dec 2010 12:48:39 -0600 Subject: [SciPy-User] changing covariance factor in scipy.stats.kde.gaussian_kde In-Reply-To: <6C70B2C8-A065-4EE5-8C55-E50F80F1D5B2@utu.fi> References: <6C70B2C8-A065-4EE5-8C55-E50F80F1D5B2@utu.fi> Message-ID: On Mon, Dec 6, 2010 at 12:47, Carolin Villforth wrote: > Hello, > > I have a question concerning the usage of gaussian_kde. I am trying to change the covariance factor for the KDE, this is what I am doing it at the moment: > > myKDE = scipy.stats.kde.gaussian_kde(data) > myKDE.covariance_factor = myKDE.silverman_factor > myKDE._compute_covariance() > > The last line seems to be necessary for the changes to take effect. > > While the above code works, this might create quite an overhead since _compute_variance is executed twice, once in __init__ and then again after the covariance factor has been changed. If I understood correctly, this is not really necessary since silverman_factor does not depend on outputs from _compute_covariance. Also, I always assumed that one should avoid calling '._functions' from outside the class. > > Is there another way to change the covariance factor? Subclass. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From nwagner at iam.uni-stuttgart.de Mon Dec 6 13:55:42 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 06 Dec 2010 19:55:42 +0100 Subject: [SciPy-User] scipy 0.9.0.dev6984 test failures In-Reply-To: References:

Message-ID: >> Then can you investigate a bit? Are there nans/infs in >>one of the outputs? > The parts of the arrays that are printed look exactly >the same. O.k. the arrays d and e differ in one place - 88 versus 11. [[ 0. 50. 50. 50. 42. 15. 15. 18. 27. 0.] [ 0. 50. 50. 50. 50. 42. 19. 21. 29. 0.] [ 50. 50. 50. 50. 50. 47. 34. 34. 46. 35.] [ 50. 50. 50. 50. 50. 50. 46. 47. 65. 42.] [ 50. 50. 50. 50. 50. 50. 46. 55. 66. 35.] [ 33. 50. 50. 50. 50. 47. 46. 47. 58. 26.] [ 32. 50. 50. 50. 50. 50. 46. 45. 58. 26.] [ 7. 46. 50. 50. 47. 46. 46. 43. 45. 21.] [ 0. 32. 33. 39. 32. 32. 43. 43. 43. 0.] [ 0. 7. 88. 7. 4. 4. 19. 19. 24. 0.]] [[ 0. 50. 50. 50. 42. 15. 15. 18. 27. 0.] [ 0. 50. 50. 50. 50. 42. 19. 21. 29. 0.] [ 50. 50. 50. 50. 50. 47. 34. 34. 46. 35.] [ 50. 50. 50. 50. 50. 50. 42. 47. 64. 42.] [ 50. 50. 50. 50. 50. 50. 46. 55. 64. 35.] [ 33. 50. 50. 50. 50. 47. 46. 43. 55. 26.] [ 32. 50. 50. 50. 50. 47. 46. 45. 55. 26.] [ 7. 46. 50. 50. 47. 46. 46. 43. 45. 21.] [ 0. 32. 33. 39. 32. 32. 43. 43. 43. 0.] [ 0. 7. 11. 7. 4. 4. 19. 19. 24. 0.]] Nils From carovi at utu.fi Mon Dec 6 14:07:52 2010 From: carovi at utu.fi (Carolin Villforth) Date: Mon, 06 Dec 2010 14:07:52 -0500 Subject: [SciPy-User] changing covariance factor in scipy.stats.kde.gaussian_kde In-Reply-To: References: <6C70B2C8-A065-4EE5-8C55-E50F80F1D5B2@utu.fi> Message-ID: <01EA5DE5-29B5-4DCD-B9FE-863C35AB33E1@utu.fi> Do you mean I should inherit gaussian_kde into my own class and then override covariance_factor in the inherited class? Thanks On Dec 6, 2010, at 1:48 PM, Robert Kern wrote: > On Mon, Dec 6, 2010 at 12:47, Carolin Villforth wrote: >> Hello, >> >> I have a question concerning the usage of gaussian_kde. I am trying to change the covariance factor for the KDE, this is what I am doing it at the moment: >> >> myKDE = scipy.stats.kde.gaussian_kde(data) >> myKDE.covariance_factor = myKDE.silverman_factor >> myKDE._compute_covariance() >> >> The last line seems to be necessary for the changes to take effect. >> >> While the above code works, this might create quite an overhead since _compute_variance is executed twice, once in __init__ and then again after the covariance factor has been changed. If I understood correctly, this is not really necessary since silverman_factor does not depend on outputs from _compute_covariance. Also, I always assumed that one should avoid calling '._functions' from outside the class. >> >> Is there another way to change the covariance factor? > > Subclass. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user ---------------------------------------------------------- Carolin Villforth PhD Student Tuorla Observatory Finland and Space Telescope Science Institute 3700 San Martin Drive 21218 Baltimore, MD USA phone: +1-410-338-4334 email: carovi at utu.fi, villfort at stsci.edu ---------------------------------------------------------- From josef.pktd at gmail.com Mon Dec 6 14:26:30 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 6 Dec 2010 14:26:30 -0500 Subject: [SciPy-User] changing covariance factor in scipy.stats.kde.gaussian_kde In-Reply-To: <01EA5DE5-29B5-4DCD-B9FE-863C35AB33E1@utu.fi> References: <6C70B2C8-A065-4EE5-8C55-E50F80F1D5B2@utu.fi> <01EA5DE5-29B5-4DCD-B9FE-863C35AB33E1@utu.fi> Message-ID: On Mon, Dec 6, 2010 at 2:07 PM, Carolin Villforth wrote: > Do you mean I should inherit gaussian_kde into my own class and then override ?covariance_factor in the inherited class? yes http://stackoverflow.com/questions/2678425/fitting-gaussian-kde-in-numpy-scipy-in-python http://mail.scipy.org/pipermail/scipy-user/2010-January/023877.html I never checked if there are redundant calculations Josef > > Thanks > > On Dec 6, 2010, at 1:48 PM, Robert Kern wrote: > >> On Mon, Dec 6, 2010 at 12:47, Carolin Villforth wrote: >>> Hello, >>> >>> I have a question concerning the usage of gaussian_kde. I am trying to change the covariance factor for the KDE, this is what I am doing it at the moment: >>> >>> myKDE = scipy.stats.kde.gaussian_kde(data) >>> myKDE.covariance_factor = myKDE.silverman_factor >>> myKDE._compute_covariance() >>> >>> The last line seems to be necessary for the changes to take effect. >>> >>> While the above code works, this might create quite an overhead since _compute_variance is executed twice, once in __init__ and then again after the covariance factor has been changed. If I understood correctly, this is not really necessary since silverman_factor does not depend on outputs from _compute_covariance. Also, I always assumed that one should avoid calling '._functions' from outside the class. >>> >>> Is there another way to change the covariance factor? >> >> Subclass. >> >> -- >> Robert Kern >> >> "I have come to believe that the whole world is an enigma, a harmless >> enigma that is made terrible by our own mad attempt to interpret it as >> though it had an underlying truth." >> ? -- Umberto Eco >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > ---------------------------------------------------------- > Carolin Villforth > PhD Student > Tuorla Observatory Finland and > Space Telescope Science Institute > 3700 San Martin Drive > 21218 Baltimore, MD > USA > phone: +1-410-338-4334 > email: carovi at utu.fi, villfort at stsci.edu > ---------------------------------------------------------- > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Mon Dec 6 17:34:19 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 6 Dec 2010 17:34:19 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? Message-ID: I'm wondering if anyone might have a look at my cython code that does matrix multiplication and see where I can speed it up or offer some pointers/reading. I'm new to Cython and my knowledge of C is pretty basic based on trial and (mostly) error, so I am sure the code is still very naive. import numpy as np from matmult import dotAB, multAB A = np.array([[ 1., 3., 4.], [ 5., 6., 3.]]) B = A.T.copy() timeit dotAB(A,B) # 1 loops, best of 3: 826 ms per loop timeit multAB(A,B) # 1 loops, best of 3: 1.16 s per loop As you can see my multAB results in a negative speedup of about .75. I compile the cython code with cython -a matmult.pyx gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.6 -I/usr/local/lib/python2.6/dist-packages/numpy/core/include/ -o matmult.so matmult.c Cython code is attached and inlined below. Profile is here (some of which I don't understand why there are bottlenecks) http://eagle1.american.edu/~js2796a/matmult/matmult.html ----------------------------------------------------------- from numpy cimport float64_t, ndarray, NPY_DOUBLE, npy_intp cimport cython from numpy import dot ctypedef float64_t DOUBLE cdef extern from "numpy/arrayobject.h": cdef void import_array() cdef object PyArray_SimpleNew(int nd, npy_intp *dims, int typenum) import_array() @cython.boundscheck(False) @cython.wraparound(False) cdef inline object matmult(ndarray[DOUBLE, ndim=2, mode='c'] A, ndarray[DOUBLE, ndim=2, mode='c'] B): cdef int lda = A.shape[0] cdef int n = B.shape[1] cdef npy_intp *dims = [lda, n] cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE) cdef int i,j,k cdef double s for i in xrange(lda): for j in xrange(n): s = 0 for k in xrange(A.shape[1]): s += A[i,k] * B[k,j] out[i,j] = s return out def multAB(ndarray[DOUBLE, ndim=2] A, ndarray[DOUBLE, ndim=2] B): for i in xrange(1000000): C = matmult(A,B) return C def dotAB(ndarray[DOUBLE, ndim=2] A, ndarray[DOUBLE, ndim=2] B): for i in xrange(1000000): C = dot(A,B) return C Skipper -------------- next part -------------- A non-text attachment was scrubbed... Name: matmult.pyx Type: application/octet-stream Size: 1249 bytes Desc: not available URL: From pav at iki.fi Mon Dec 6 19:11:12 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 7 Dec 2010 00:11:12 +0000 (UTC) Subject: [SciPy-User] fast small matrix multiplication with cython? References: Message-ID: On Mon, 06 Dec 2010 17:34:19 -0500, Skipper Seabold wrote: > I'm wondering if anyone might have a look at my cython code that does > matrix multiplication and see where I can speed it up or offer some > pointers/reading. I'm new to Cython and my knowledge of C is pretty > basic based on trial and (mostly) error, so I am sure the code is still > very naive. You'll be hard pressed to do better than Numpy's dot. In the raw data handling, BLAS is very likely faster than most things you can code manually. Moreover, the Cython routine you write must have as much overhead as dot() --- dealing with refcounting, allocating/dellocating PyArrayObjects (which is expensive) etc. If you are willing to give up wrapping each small matrix in a separate Numpy ndarray, then you can expect to get additional speed gains. (Although even in that case it could make more sense to call BLAS routines to do the multiplication instead, unless your matrices are small and of fixed size in which case the C compiler may be able to produce some tightly optimized code.) However, in many cases the small matrices can be just stuffed into a single Numpy array. At the moment there is no "vectorized" matrix multiplication routine, however, so that could be written e.g. in Cython. -- Pauli Virtanen From robert.kern at gmail.com Mon Dec 6 19:23:12 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 6 Dec 2010 18:23:12 -0600 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 18:11, Pauli Virtanen wrote: > On Mon, 06 Dec 2010 17:34:19 -0500, Skipper Seabold wrote: >> I'm wondering if anyone might have a look at my cython code that does >> matrix multiplication and see where I can speed it up or offer some >> pointers/reading. ?I'm new to Cython and my knowledge of C is pretty >> basic based on trial and (mostly) error, so I am sure the code is still >> very naive. > > You'll be hard pressed to do better than Numpy's dot. In the raw data > handling, BLAS is very likely faster than most things you can code > manually. Moreover, the Cython routine you write must have as much > overhead as dot() --- dealing with refcounting, allocating/dellocating > PyArrayObjects (which is expensive) etc. The main thing for his use case is reducing the overhead when called from Cython. This started in a Cython-user thread where he was directly calling the Python numpy.dot() from Cython. I suggested that writing a Cython implementation may be better given the small dimensions (only up to 10x10) might be better handled by writing the matmult directly. Unfortunately, the buffer syntax adds a bunch of overhead. Not the *same* overhead, mind, and I was hoping it would be less, but it turns out to be more. Getting access to the C BLAS implementations would be best. I guess you could get descr.f.dotfunc and use that. > If you are willing to give up wrapping each small matrix in a separate > Numpy ndarray, then you can expect to get additional speed gains. > (Although even in that case it could make more sense to call BLAS > routines to do the multiplication instead, unless your matrices are small > and of fixed size in which case the C compiler may be able to produce > some tightly optimized code.) > > However, in many cases the small matrices can be just stuffed into a > single Numpy array. His use case (Kalman filters) prevents this. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From jsseabold at gmail.com Mon Dec 6 19:30:29 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 6 Dec 2010 19:30:29 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 7:11 PM, Pauli Virtanen wrote: > > On Mon, 06 Dec 2010 17:34:19 -0500, Skipper Seabold wrote: > > I'm wondering if anyone might have a look at my cython code that does > > matrix multiplication and see where I can speed it up or offer some > > pointers/reading. ?I'm new to Cython and my knowledge of C is pretty > > basic based on trial and (mostly) error, so I am sure the code is still > > very naive. > > You'll be hard pressed to do better than Numpy's dot. In the raw data > handling, BLAS is very likely faster than most things you can code > manually. Moreover, the Cython routine you write must have as much > overhead as dot() --- dealing with refcounting, allocating/dellocating > PyArrayObjects (which is expensive) etc. > > If you are willing to give up wrapping each small matrix in a separate > Numpy ndarray, then you can expect to get additional speed gains. > (Although even in that case it could make more sense to call BLAS > routines to do the multiplication instead, unless your matrices are small > and of fixed size in which case the C compiler may be able to produce > some tightly optimized code.) > > However, in many cases the small matrices can be just stuffed into a > single Numpy array. At the moment there is no "vectorized" matrix > multiplication routine, however, so that could be written e.g. in Cython. > Ah, I see. I didn't think about the overhead of PyArrayObject. Skipper From ptittmann at gmail.com Mon Dec 6 19:31:13 2010 From: ptittmann at gmail.com (Peter Tittmann) Date: Mon, 6 Dec 2010 16:31:13 -0800 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com>

Message-ID: <2061813969B94BD39B4BFFDABFFC2D00@gmail.com> thanks both of you, Josef, the data that I sent is only the first 100 rows of about 1500, there should be sufficient sampling in each plot. Skipper, I have attempted to deploy your suggestion for not linearizing the data. It seems to work. I'm a little confused at your modification if the getDiam function and I wonder if you could help me understand. The form of the equation that is being fit is: Y= a*X^b your version of the detDaim function: > > > > > def getDiam(ht, *b): > > return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1)@gmail.com>@gmail.com> > > > > > > Im sorry if this is an obvious question but I don't understand how this works as it seems that the "a" coefficient is missing. Thanks again! -- Peter Tittmann On Thursday, December 2, 2010 at 5:03 PM, josef.pktd at gmail.com wrote: > On Thu, Dec 2, 2010 at 7:57 PM, Skipper Seabold wrote: > > > On Thu, Dec 2, 2010 at 7:11 PM, Skipper Seabold wrote: > > > On Thu, Dec 2, 2010 at 5:59 PM, Peter Tittmann wrote: > > >> getDiam is a predictor to get dbh from height. It works with curve_fit to > > >> find coefficients a and b given datasetset of known dbh/height pairs. You > > >> are right, what I want is dummy variables for each plot. I'll see if I can > > >> get that worked out by revising getDiam.. > > >> Thanks again > > >> > > > > > > I think it would be easier to create your dummy variables before you pass it in. > > > > > > You might find some of the tools in statsmodels to be helpful here. > > > We don't yet have an ANCOVA model, but you could definitely do > > > something like the following. Not sure if it's exactly what you want, > > > but it should give you an idea. > > > > > > import numpy as np > > > import scikits.statsmodels as sm > > > > > > dta = np.genfromtxt('./db_out.csv', delimiter=",", names=True, dtype=None) > > > plot_dummies, col_map = sm.tools.categorical(dta['plot'], drop=True, > > > dictnames=True) > > > > > > plot_dummies will be dummy variables for all of the "plot" categories, > > > and col_map is a map from the column number to the plot just so you > > > can be sure you know what's what. > > > > > > I don't see how to use your objective function though with dummy > > > variables. What happens if the effect of one of the plots is > > > negative, then you run into 0 ** -1.5 == inf. > > > > > > > If you want to do NLLS and not linearize then something like this > > might work and still keep the dummy variables as shift parameters > > > > def getDiam(ht, *b): > > return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) > > > > X = np.column_stack((indHtPlot, plot_dummies)) > > Y = depDbh > > coefs, cov = optimize.curve_fit(getDiam, X, Y, p0= [0.]*X.shape[1]) > > @gmail.com>@gmail.com> > > > > In the sample file there are 11 levels of the `plot` that have only a > single observation each. I tried to use onewaygls, but statsmodels.OLS > doesn't work if y is a scalar. > > I don't know whether curvefit or optimize.leastsq will converge in > this case, good starting values might be necessary. > > Josef > > > > > > > > > > You could linearize your objective function to be > > > > > > b*ln(ht) > > > > > > and do something like > > > > > > indHtPlot = dta['height'] > > > depDbh = dta['dbh'] > > > X = np.column_stack((np.log(indHtPlot), plot_dummies)) > > > Y = np.log(depDbh) > > > res = sm.OLS(Y,X).fit() > > > res.params > > > array([ 0.98933264, -1.35239293, -1.0623305 , -0.99155293, -1.33675099, > > > -1.30657011, -1.50933751, -1.28744779, -1.43937358, -1.33805883, > > > -1.32744257, -1.42672539, -1.35239293, -1.60585046, -1.45239093, > > > -1.45695112, -1.34811186, -1.32658794, -1.21721715, -1.32853084, > > > -1.45775017, -1.44460388, -2.19065236, -1.3303631 , -1.20509831, > > > -1.37341535, -1.25746105, -1.33954972, -1.33922709, -1.247304 ]) > > > > > > Note that your coefficient on height is now an elasticity. I'm sure > > > I'm missing something here, but that might help you along the way. > > > > > > Skipper > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Mon Dec 6 19:31:28 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 6 Dec 2010 16:31:28 -0800 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold wrote: > @cython.boundscheck(False) > @cython.wraparound(False) > cdef inline object matmult(ndarray[DOUBLE, ndim=2, mode='c'] A, > ? ? ? ? ? ? ? ? ? ?ndarray[DOUBLE, ndim=2, mode='c'] B): > ? ?cdef int lda = A.shape[0] > ? ?cdef int n = B.shape[1] > ? ?cdef npy_intp *dims = [lda, n] > ? ?cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE) > ? ?cdef int i,j,k > ? ?cdef double s Do the cdef's above take a sizeable fraction of the time given that your input arrays are small? If so, then you could do those before you enter the inner loop where the dot product is needed. You wouldn't end up with a reusable matmult function, but you'd get rid of some overhead. So in your inner loop, you'd only have: > ? ?for i in xrange(lda): > ? ? ? ?for j in xrange(n): > ? ? ? ? ? ?s = 0 > ? ? ? ? ? ?for k in xrange(A.shape[1]): > ? ? ? ? ? ? ? ?s += A[i,k] * B[k,j] > ? ? ? ? ? ?out[i,j] = s > ? ?return out From jsseabold at gmail.com Mon Dec 6 19:31:26 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 6 Dec 2010 19:31:26 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: On Mon, Dec 6, 2010 at 7:23 PM, Robert Kern wrote: > On Mon, Dec 6, 2010 at 18:11, Pauli Virtanen wrote: >> On Mon, 06 Dec 2010 17:34:19 -0500, Skipper Seabold wrote: >>> I'm wondering if anyone might have a look at my cython code that does >>> matrix multiplication and see where I can speed it up or offer some >>> pointers/reading. ?I'm new to Cython and my knowledge of C is pretty >>> basic based on trial and (mostly) error, so I am sure the code is still >>> very naive. >> >> You'll be hard pressed to do better than Numpy's dot. In the raw data >> handling, BLAS is very likely faster than most things you can code >> manually. Moreover, the Cython routine you write must have as much >> overhead as dot() --- dealing with refcounting, allocating/dellocating >> PyArrayObjects (which is expensive) etc. > > The main thing for his use case is reducing the overhead when called > from Cython. This started in a Cython-user thread where he was > directly calling the Python numpy.dot() from Cython. I suggested that > writing a Cython implementation may be better given the small > dimensions (only up to 10x10) might be better handled by writing the > matmult directly. Unfortunately, the buffer syntax adds a bunch of > overhead. Not the *same* overhead, mind, and I was hoping it would be > less, but it turns out to be more. > Sorry for the cross-post. I figured this was better hashed out over here. > Getting access to the C BLAS implementations would be best. I guess > you could get descr.f.dotfunc and use that. > Thanks, I will see what I can come up with. I know it can be sped up since other software in C++ solves the whole optimization almost instantaneously when mine takes ~5 seconds for the same case, and my profiling says that most of the time is spent in the loglikelihood loop. >> If you are willing to give up wrapping each small matrix in a separate >> Numpy ndarray, then you can expect to get additional speed gains. >> (Although even in that case it could make more sense to call BLAS >> routines to do the multiplication instead, unless your matrices are small >> and of fixed size in which case the C compiler may be able to produce >> some tightly optimized code.) >> >> However, in many cases the small matrices can be just stuffed into a >> single Numpy array. > > His use case (Kalman filters) prevents this. > For posterity's sake. More akin to my actual problem. http://groups.google.com/group/cython-users/browse_thread/thread/a605a70626a455d > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ? -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Mon Dec 6 19:33:13 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 6 Dec 2010 19:33:13 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 5:34 PM, Skipper Seabold wrote: > I'm wondering if anyone might have a look at my cython code that does > matrix multiplication and see where I can speed it up or offer some > pointers/reading. ?I'm new to Cython and my knowledge of C is pretty > basic based on trial and (mostly) error, so I am sure the code is > still very naive. > > import numpy as np > from matmult import dotAB, multAB > > A = np.array([[ 1., ?3., ?4.], > ? ? ? ? ? ? ? ? ? [ 5., ?6., ?3.]]) > B = A.T.copy() > > timeit dotAB(A,B) > # 1 loops, best of 3: 826 ms per loop > > timeit multAB(A,B) > # 1 loops, best of 3: 1.16 s per loop > > As you can see my multAB results in a negative speedup of about .75. > > I compile the cython code with > > cython -a matmult.pyx > gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing > -I/usr/include/python2.6 > -I/usr/local/lib/python2.6/dist-packages/numpy/core/include/ -o > matmult.so matmult.c > > Cython code is attached and inlined below. > > Profile is here (some of which I don't understand why there are > bottlenecks) http://eagle1.american.edu/~js2796a/matmult/matmult.html > ----------------------------------------------------------- > > from numpy cimport float64_t, ndarray, NPY_DOUBLE, npy_intp > cimport cython > from numpy import dot > > ctypedef float64_t DOUBLE > > cdef extern from "numpy/arrayobject.h": > ? ?cdef void import_array() > ? ?cdef object PyArray_SimpleNew(int nd, npy_intp *dims, int typenum) > > import_array() > > @cython.boundscheck(False) > @cython.wraparound(False) > cdef inline object matmult(ndarray[DOUBLE, ndim=2, mode='c'] A, > ? ? ? ? ? ? ? ? ? ?ndarray[DOUBLE, ndim=2, mode='c'] B): > ? ?cdef int lda = A.shape[0] > ? ?cdef int n = B.shape[1] > ? ?cdef npy_intp *dims = [lda, n] > ? ?cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE) > ? ?cdef int i,j,k > ? ?cdef double s > ? ?for i in xrange(lda): > ? ? ? ?for j in xrange(n): > ? ? ? ? ? ?s = 0 > ? ? ? ? ? ?for k in xrange(A.shape[1]): > ? ? ? ? ? ? ? ?s += A[i,k] * B[k,j] > ? ? ? ? ? ?out[i,j] = s > ? ?return out > > def multAB(ndarray[DOUBLE, ndim=2] A, ndarray[DOUBLE, ndim=2] B): > ? ?for i in xrange(1000000): > ? ? ? ?C = matmult(A,B) > ? ?return C Does this generate c code, since it's not a cdef ? (I haven't updated cython in a while.) I guess you would want to have the entire loop in c. Josef > > def dotAB(ndarray[DOUBLE, ndim=2] A, ndarray[DOUBLE, ndim=2] B): > ? ?for i in xrange(1000000): > ? ? ? ?C = dot(A,B) > ? ?return C > > Skipper > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From jsseabold at gmail.com Mon Dec 6 19:41:09 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 6 Dec 2010 19:41:09 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: <2061813969B94BD39B4BFFDABFFC2D00@gmail.com> References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com>

<2061813969B94BD39B4BFFDABFFC2D00@gmail.com> Message-ID: On Mon, Dec 6, 2010 at 7:31 PM, Peter Tittmann wrote: > thanks both of you, > Josef, the data that I sent is only the first 100 rows of about 1500, there > should be sufficient sampling in each plot. > Skipper, I have attempted to deploy your suggestion for not linearizing the > data. It seems to work. I'm a little confused at your modification if the > getDiam function and I wonder if you could help me understand. The form of > the equation that is being fit is: > Y= a*X^b > your version of the detDaim function: > > def getDiam(ht, *b): > ? ?return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) > > Im sorry if this is an obvious question but I don't understand how this > works as it seems that the "a" coefficient is missing. > Thanks again! Right. I took out the 'a', because as I read it when I linearized (I might be misunderstanding ancova, I never recall the details), if you include 'a' and also all of the dummy variables for the plot, then you will have a the problem of multicollinearity. You could also include 'a' and drop one of the plot dummies, but then 'a' is just your reference category that you dropped. So now b[0] is the nonlinear effect of your main variable and b[1:] contains linear shift effects of all the plots. Hmm, thinking about it some more, though I think you could include 'a' in the non-linear version above (call it b[0] and shift everything else over by one), because now 'a' would be the effect when the current b[0] is zero. I was just unsure how you meant 'a' when you had a*ht**b and were trying to include in ht the plot variable dummies. Skipper From josef.pktd at gmail.com Mon Dec 6 19:55:04 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 6 Dec 2010 19:55:04 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com>

<2061813969B94BD39B4BFFDABFFC2D00@gmail.com> Message-ID: On Mon, Dec 6, 2010 at 7:41 PM, Skipper Seabold wrote: > On Mon, Dec 6, 2010 at 7:31 PM, Peter Tittmann wrote: >> thanks both of you, >> Josef, the data that I sent is only the first 100 rows of about 1500, there >> should be sufficient sampling in each plot. >> Skipper, I have attempted to deploy your suggestion for not linearizing the >> data. It seems to work. I'm a little confused at your modification if the >> getDiam function and I wonder if you could help me understand. The form of >> the equation that is being fit is: >> Y= a*X^b >> your version of the detDaim function: >> >> def getDiam(ht, *b): >> ? ?return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) >> >> Im sorry if this is an obvious question but I don't understand how this >> works as it seems that the "a" coefficient is missing. >> Thanks again! > > Right. ?I took out the 'a', because as I read it when I linearized (I > might be misunderstanding ancova, I never recall the details), if you > include 'a' and also all of the dummy variables for the plot, then you > will have a the problem of multicollinearity. ?You could also include > 'a' and drop one of the plot dummies, but then 'a' is just your > reference category that you dropped. ?So now b[0] is the nonlinear > effect of your main variable and b[1:] contains linear shift effects > of all the plots. ?Hmm, thinking about it some more, though I think > you could include 'a' in the non-linear version above (call it b[0] > and shift everything else over by one), because now 'a' would be the > effect when the current b[0] is zero. ?I was just unsure how you meant > 'a' when you had a*ht**b and were trying to include in ht the plot > variable dummies. As I understand it, the intention is to estimate equality of the slope coefficients, so the continuous variable is multiplied with the dummy variables. In this case, the constant should still be added. The normalization question is whether to include all dummy-cont.variable products and drop the continuous variable, or include the continuous variable and drop one of the dummy-cont levels. Unless there is a strong reason to avoid log-normality of errors, I would work (first) with the linear version. Josef > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ptittmann at gmail.com Mon Dec 6 23:00:19 2010 From: ptittmann at gmail.com (Peter Tittmann) Date: Mon, 6 Dec 2010 20:00:19 -0800 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com>

<2061813969B94BD39B4BFFDABFFC2D00@gmail.com> Message-ID: <80C1F8072EB74C4F903BC96C09FB1D6A@gmail.com> Gentlemen, I've decided to switch to the OLS method, thought I did get the NNLS method that Skipper proposed working. I was not prepared to spend more time trying to make sense of the resulting array for ancova, etc. (also could not figure out how to interpret the resulting coefficient array as I was expecting a 2d array representing the a and b coefficient values but it returned a 1d array). I have hopefully simple follow up questions: 1. Is there a method to define explicitly the function used in OLS? I know numpy.linalg.lstsq is the way OLS works but is there another function where I can define the form? 2. I'm still interested in interpreting the results of the NNLS method, so if either of you can suggest what the resulting arrays mean id be grateful. I've attached the output of NNLS warm regards, Peter Here is the working version of NNLS: def getDiam2(ht,*b): return b[0] * ht[:,1]**b[1] + np.sum(b[2:]*ht[:,2:], axis=1) dt = np.genfromtxt('/home/peter/Desktop/db_out.csv', delimiter=",", names=True, dtype=None) indHtPlot = adt['height'] depDbh = adt['dbh'] plot_dummies, col_map = sm.tools.categorical(dt['plot], drop=True, dictnames=True) def nnlsDummies(): '''this function returns coefficients and covariance arrays''' plot_dummies, col_map = sm.tools.categorical(indPlot, drop=True, dictnames=True) X = np.column_stack((indHt, plot_dummies)) Y = depDbh coefs, cov = curve_fit(getDiam2, X, Y, p0= [0.]*X.shape[1]) return coefs, cov -- Peter Tittmann On Monday, December 6, 2010 at 4:55 PM, josef.pktd at gmail.com wrote: > On Mon, Dec 6, 2010 at 7:41 PM, Skipper Seabold wrote: > > > On Mon, Dec 6, 2010 at 7:31 PM, Peter Tittmann wrote: > > > thanks both of you, > > > Josef, the data that I sent is only the first 100 rows of about 1500, there > > > should be sufficient sampling in each plot. > > > Skipper, I have attempted to deploy your suggestion for not linearizing the > > > data. It seems to work. I'm a little confused at your modification if the > > > getDiam function and I wonder if you could help me understand. The form of > > > the equation that is being fit is: > > > Y= a*X^b > > > your version of the detDaim function: > > > > > > def getDiam(ht, *b): > > > return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) > > > > > > Im sorry if this is an obvious question but I don't understand how this > > > works as it seems that the "a" coefficient is missing. > > > Thanks again! > > > > Right. I took out the 'a', because as I read it when I linearized (I > > might be misunderstanding ancova, I never recall the details), if you > > include 'a' and also all of the dummy variables for the plot, then you > > will have a the problem of multicollinearity. You could also include > > 'a' and drop one of the plot dummies, but then 'a' is just your > > reference category that you dropped. So now b[0] is the nonlinear > > effect of your main variable and b[1:] contains linear shift effects > > of all the plots. Hmm, thinking about it some more, though I think > > you could include 'a' in the non-linear version above (call it b[0] > > and shift everything else over by one), because now 'a' would be the > > effect when the current b[0] is zero. I was just unsure how you meant > > 'a' when you had a*ht**b and were trying to include in ht the plot > > variable dummies. > > @gmail.com> > > > > As I understand it, the intention is to estimate equality of the slope > coefficients, so the continuous variable is multiplied with the dummy > variables. In this case, the constant should still be added. The > normalization question is whether to include all dummy-cont.variable > products and drop the continuous variable, or include the continuous > variable and drop one of the dummy-cont levels. > > Unless there is a strong reason to avoid log-normality of errors, I > would work (first) with the linear version. > > Josef > > > > > > > Skipper > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nnls_out.rtf Type: application/octet-stream Size: 5564 bytes Desc: not available URL: From josef.pktd at gmail.com Tue Dec 7 01:10:04 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 7 Dec 2010 01:10:04 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: <80C1F8072EB74C4F903BC96C09FB1D6A@gmail.com> References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com>

<2061813969B94BD39B4BFFDABFFC2D00@gmail.com> <80C1F8072EB74C4F903BC96C09FB1D6A@gmail.com> Message-ID: On Mon, Dec 6, 2010 at 11:00 PM, Peter Tittmann wrote: > Gentlemen, > I've decided to switch to the OLS method, thought I did get the NNLS method > that Skipper proposed working. I was not prepared to spend more time trying > to make sense of the resulting array for ancova, etc. (also could not figure > out how to interpret the resulting coefficient array as I was expecting a 2d > array representing the a and b coefficient values but it returned a 1d > array). I have hopefully simple follow up questions: > 1. Is there a method to define explicitly the function used in OLS? I know > numpy.linalg.lstsq is the way OLS works but is there another function where > I can define the form? > 2. I'm still interested in interpreting the results of the NNLS method, so > if either of you can suggest what the resulting arrays mean id be grateful. > I've attached the output of NNLS > warm regards, > Peter > > Here is the working version of NNLS: > def getDiam2(ht,*b): > ?? ?return b[0] * ht[:,1]**b[1] + np.sum(b[2:]*ht[:,2:], axis=1) > dt = np.genfromtxt('/home/peter/Desktop/db_out.csv', delimiter=",", > names=True, dtype=None) > > indHtPlot = adt['height'] > depDbh = adt['dbh'] > plot_dummies, col_map = sm.tools.categorical(dt['plot], drop=True, > dictnames=True) > > def nnlsDummies(): > ?? ?'''this function returns coefficients and covariance arrays''' > ?? ?plot_dummies, col_map = sm.tools.categorical(indPlot, drop=True, > dictnames=True) > ?? ?X = np.column_stack((indHt, plot_dummies)) > ?? ?Y = depDbh > ?? ?coefs, cov = curve_fit(getDiam2, X, Y, p0= [0.]*X.shape[1]) > ?? ?return coefs, cov (can you post at the bottom instead of the top, that's the custom on this mailing list) getDiam is just a linear model in this case, the first coefficient should be the effect/slope of indHt, the remaining are the constants/intercept for each (level of) "plot". According to your output there would be 301 or 302 unique values in your plot array. np.unique(indPlot) The hypothesis that there are no differences across plots mean that all the coefficients (except the first) are the same. An f-test would be the usual to check this. If instead you want to check that the effect/coefficient of Ht is independent of plot, then you should use the product indHt[:,None]*plot_dummies (all plot dummies, use drop=False) If you already have statsmodels, then you could estimate the original linear model that Skipper described, take y=np.log(depDbh) and x = sm.add_constant(np.log(indHt)[:,None]*plot_dummies) then you can estimate res = sm.OLS(y.x).fit() res.params are the parameters Then you can do an f_test, which depends on the version of statsmodels that you have. You can also do an f_test with the results from the non-linear curve_fit. I guess the easiest will be to estimate the model with and without dummies, and compare the residual sum of squares with scipy.stats.f_anova (?). Josef > > -- > Peter Tittmann > > On Monday, December 6, 2010 at 4:55 PM, josef.pktd at gmail.com wrote: > > On Mon, Dec 6, 2010 at 7:41 PM, Skipper Seabold wrote: > > On Mon, Dec 6, 2010 at 7:31 PM, Peter Tittmann wrote: >> thanks both of you, >> Josef, the data that I sent is only the first 100 rows of about 1500, >> there >> should be sufficient sampling in each plot. >> Skipper, I have attempted to deploy your suggestion for not linearizing >> the >> data. It seems to work. I'm a little confused at your modification if the >> getDiam function and I wonder if you could help me understand. The form of >> the equation that is being fit is: >> Y= a*X^b >> your version of the detDaim function: >> >> def getDiam(ht, *b): >> ? ?return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) >> >> Im sorry if this is an obvious question but I don't understand how this >> works as it seems that the "a" coefficient is missing. >> Thanks again! > > Right. ?I took out the 'a', because as I read it when I linearized (I > might be misunderstanding ancova, I never recall the details), if you > include 'a' and also all of the dummy variables for the plot, then you > will have a the problem of multicollinearity. ?You could also include > 'a' and drop one of the plot dummies, but then 'a' is just your > reference category that you dropped. ?So now b[0] is the nonlinear > effect of your main variable and b[1:] contains linear shift effects > of all the plots. ?Hmm, thinking about it some more, though I think > you could include 'a' in the non-linear version above (call it b[0] > and shift everything else over by one), because now 'a' would be the > effect when the current b[0] is zero. ?I was just unsure how you meant > 'a' when you had a*ht**b and were trying to include in ht the plot > variable dummies. > > As I understand it, the intention is to estimate equality of the slope > coefficients, so the continuous variable is multiplied with the dummy > variables. In this case, the constant should still be added. The > normalization question is whether to include all dummy-cont.variable > products and drop the continuous variable, or include the continuous > variable and drop one of the dummy-cont levels. > > Unless there is a strong reason to avoid log-normality of errors, I > would work (first) with the linear version. > > Josef > > > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From fperez.net at gmail.com Tue Dec 7 01:56:17 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 6 Dec 2010 22:56:17 -0800 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: Hi Skipper, On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold wrote: > I'm wondering if anyone might have a look at my cython code that does > matrix multiplication and see where I can speed it up or offer some > pointers/reading. ?I'm new to Cython and my knowledge of C is pretty > basic based on trial and (mostly) error, so I am sure the code is > still very naive. a few years ago I had a similar problem, and I ended up getting a very significant speedup by hand-coding a very unsafe, but very fast pure C extension just to compute these inner products. This was basically a replacement for dot() that would only work with double precision inputs of compatible dimensions and would happily segfault with anything else, but it ran very fast. The inner loop is implemented completely naively, but it still beats calls to BLAS (even linked with ATLAS) for small matrix dimensions (my case was also up to ~ 15x15). I'm attaching the code in case you find it useful, please keep in mind I haven't compiled it in years, so it may have bit-rotted a little. Cheers, f -------------- next part -------------- A non-text attachment was scrubbed... Name: flinalg.c Type: text/x-csrc Size: 8658 bytes Desc: not available URL: From dagss at student.matnat.uio.no Tue Dec 7 03:51:59 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 07 Dec 2010 09:51:59 +0100 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: <4CFDF5AF.8060103@student.matnat.uio.no> On 12/07/2010 07:56 AM, Fernando Perez wrote: > Hi Skipper, > > On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold wrote: > >> I'm wondering if anyone might have a look at my cython code that does >> matrix multiplication and see where I can speed it up or offer some >> pointers/reading. I'm new to Cython and my knowledge of C is pretty >> basic based on trial and (mostly) error, so I am sure the code is >> still very naive. >> > a few years ago I had a similar problem, and I ended up getting a very > significant speedup by hand-coding a very unsafe, but very fast pure C > extension just to compute these inner products. This was basically a > replacement for dot() that would only work with double precision > inputs of compatible dimensions and would happily segfault with > anything else, but it ran very fast. The inner loop is implemented > completely naively, but it still beats calls to BLAS (even linked with > ATLAS) for small matrix dimensions (my case was also up to ~ 15x15). > Another idea: If the matrices are more in the intermediate range, here's a Cython library for calling BLAS more directly: http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/ For intermediate-size matrices the use of SSE instructions should be able to offset any call overhead. Try to stay clear of using NumPy for slicing though, instead one should do pointer arithmetic... Dag Sverre From charlesr.harris at gmail.com Tue Dec 7 09:54:39 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 7 Dec 2010 07:54:39 -0700 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 11:56 PM, Fernando Perez wrote: > Hi Skipper, > > On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold > wrote: > > I'm wondering if anyone might have a look at my cython code that does > > matrix multiplication and see where I can speed it up or offer some > > pointers/reading. I'm new to Cython and my knowledge of C is pretty > > basic based on trial and (mostly) error, so I am sure the code is > > still very naive. > > a few years ago I had a similar problem, and I ended up getting a very > significant speedup by hand-coding a very unsafe, but very fast pure C > extension just to compute these inner products. This was basically a > replacement for dot() that would only work with double precision > inputs of compatible dimensions and would happily segfault with > anything else, but it ran very fast. The inner loop is implemented > completely naively, but it still beats calls to BLAS (even linked with > ATLAS) for small matrix dimensions (my case was also up to ~ 15x15). > > I'm attaching the code in case you find it useful, please keep in mind > I haven't compiled it in years, so it may have bit-rotted a little. > > Blas adds quite a bit of overhead for multiplying small matrices, but so does calling from python. For implementing Kalman filters it might be better to write a whole Kalman class so that operations can be combined at the c level. Skipper, what kind of Kalman filter are you trying to implement? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Tue Dec 7 10:35:54 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 7 Dec 2010 07:35:54 -0800 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold wrote: > I'm wondering if anyone might have a look at my cython code that does > matrix multiplication and see where I can speed it up or offer some > pointers/reading. ?I'm new to Cython and my knowledge of C is pretty > basic based on trial and (mostly) error, so I am sure the code is > still very naive. > > > > ? ?cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE) I'd like to reduce the overhead in creating the empty array. Using PyArray_SimpleNew in Cython is faster than using np.empty but both are slower than using np.empty without Cython. Have I done something wrong? I suspect is has something to do with this line in the code below: "cdef npy_intp *dims = [r, c]" PyArray_SimpleNew: >> timeit matmult(2,2) 1000000 loops, best of 3: 773 ns per loop np.empty in cython: >> timeit matmult2(2,2) 1000000 loops, best of 3: 1.62 us per loop np.empty in python: >> timeit np.empty((2,2)) 1000000 loops, best of 3: 465 ns per loop Code: import numpy as np from numpy cimport float64_t, ndarray, NPY_DOUBLE, npy_intp ctypedef float64_t DOUBLE cdef extern from "numpy/arrayobject.h": cdef void import_array() cdef object PyArray_SimpleNew(int nd, npy_intp *dims, int typenum) # initialize numpy import_array() def matmult(int r, int c): cdef npy_intp *dims = [r, c] # Is there a faster way to do this? cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE) return out def matmult2(int r, int c): cdef ndarray[DOUBLE, ndim=2] out = np.empty((r, c), dtype=np.float64) return out From robert.kern at gmail.com Tue Dec 7 10:37:37 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 7 Dec 2010 09:37:37 -0600 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: On Tue, Dec 7, 2010 at 08:54, Charles R Harris wrote: > Blas adds quite a bit of overhead for multiplying small matrices, but so > does calling from python. For implementing Kalman filters it might be better > to write a whole Kalman class so that operations can be combined at the c > level. As I said, he's writing the Kalman filter in Cython. > Skipper, what kind of Kalman filter are you trying to implement? Does this help? http://groups.google.com/group/cython-users/browse_thread/thread/a605a70626a455d?pli=1 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From bsouthey at gmail.com Tue Dec 7 16:58:41 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 07 Dec 2010 15:58:41 -0600 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: <80C1F8072EB74C4F903BC96C09FB1D6A@gmail.com> References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com>

<2061813969B94BD39B4BFFDABFFC2D00@gmail.com> <80C1F8072EB74C4F903BC96C09FB1D6A@gmail.com> Message-ID: <4CFEAE11.7060408@gmail.com> On 12/06/2010 10:00 PM, Peter Tittmann wrote: > > Gentlemen, > > I've decided to switch to the OLS method, thought I did get the NNLS > method that Skipper proposed working. I was not prepared to spend more > time trying to make sense of the resulting array for ancova, etc. > (also could not figure out how to interpret the resulting coefficient > array as I was expecting a 2d array representing the a and b > coefficient values but it returned a 1d array). I have hopefully > simple follow up questions: > > 1. Is there a method to define explicitly the function used in OLS? I > know numpy.linalg.lstsq is the way OLS works but is there another > function where I can define the form? > > 2. I'm still interested in interpreting the results of the NNLS > method, so if either of you can suggest what the resulting arrays mean > id be grateful. I've attached the output of NNLS > > warm regards, > > Peter > > > Here is the working version of NNLS: > > def getDiam2(ht,*b): > return b[0] * ht[:,1]**b[1] + np.sum(b[2:]*ht[:,2:], axis=1) > > dt = np.genfromtxt('/home/peter/Desktop/db_out.csv', delimiter=",", > names=True, dtype=None) > indHtPlot = adt['height'] > depDbh = adt['dbh'] > plot_dummies, col_map = sm.tools.categorical(dt['plot], drop=True, > dictnames=True) > > > def nnlsDummies(): > '''this function returns coefficients and covariance arrays''' > plot_dummies, col_map = sm.tools.categorical(indPlot, drop=True, > dictnames=True) > X = np.column_stack((indHt, plot_dummies)) > Y = depDbh > coefs, cov = curve_fit(getDiam2, X, Y, p0= [0.]*X.shape[1]) > return coefs, cov > > > -- > Peter Tittmann > > > On Monday, December 6, 2010 at 4:55 PM, josef.pktd at gmail.com wrote: > >> On Mon, Dec 6, 2010 at 7:41 PM, Skipper Seabold > > wrote: >>> On Mon, Dec 6, 2010 at 7:31 PM, Peter Tittmann wrote: >>> > thanks both of you, >>> > Josef, the data that I sent is only the first 100 rows of about >>> 1500, there >>> > should be sufficient sampling in each plot. >>> > Skipper, I have attempted to deploy your suggestion for not >>> linearizing the >>> > data. It seems to work. I'm a little confused at your modification >>> if the >>> > getDiam function and I wonder if you could help me understand. The >>> form of >>> > the equation that is being fit is: >>> > Y= a*X^b >>> > your version of the detDaim function: >>> > >>> > def getDiam(ht, *b): >>> > return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) >>> > >>> > Im sorry if this is an obvious question but I don't understand how >>> this >>> > works as it seems that the "a" coefficient is missing. >>> > Thanks again! >>> >>> Right. I took out the 'a', because as I read it when I linearized (I >>> might be misunderstanding ancova, I never recall the details), if you >>> include 'a' and also all of the dummy variables for the plot, then you >>> will have a the problem of multicollinearity. You could also include >>> 'a' and drop one of the plot dummies, but then 'a' is just your >>> reference category that you dropped. So now b[0] is the nonlinear >>> effect of your main variable and b[1:] contains linear shift effects >>> of all the plots. Hmm, thinking about it some more, though I think >>> you could include 'a' in the non-linear version above (call it b[0] >>> and shift everything else over by one), because now 'a' would be the >>> effect when the current b[0] is zero. I was just unsure how you meant >>> 'a' when you had a*ht**b and were trying to include in ht the plot >>> variable dummies. >> >> As I understand it, the intention is to estimate equality of the slope >> coefficients, so the continuous variable is multiplied with the dummy >> variables. In this case, the constant should still be added. The >> normalization question is whether to include all dummy-cont.variable >> products and drop the continuous variable, or include the continuous >> variable and drop one of the dummy-cont levels. >> >> Unless there is a strong reason to avoid log-normality of errors, I >> would work (first) with the linear version. >> >> Josef >> >> >>> >>> Skipper >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user I do think this is starting to be an off-list discussion because this is really about statistics and not about numpy/scipy (you can contact me off-list if you want). I am not sure what all the variables are so please excuse me but I presume you want to model dbh as a function of height, plot and species. Following usual biostatistics interpretation, 'plot' is probably treated as random effect but you probably have to use R/SAS etc for that for both linear and nonlinear models or some spatial models. Really you need to determine whether or not a nonlinear model is required. With the few data points you provided, I only see a linear relationship between dbh and height with some outliers and perhaps some heterogeneity of variance. Often doing a simple polynomial/spline can help to see if there is any evidence for a nonlinear relationship in the full data - a linear model or polynomial with the data provided does not suggest a nonlinear model. Obviously a linear model is easier to fit and interpret especially if you create the design matrix as estimable functions (which is rather trivial once you understand using dummy variables). The most general nonlinear/multilevel model proposed is of the form: dbh= C + A*height^B Obviously if B=1 then it is a linear model and the parameters A, B and C can be modeled with a linear function of intercept, plot and species. Although, if 'plot' is what I think it is then you probably would not model the parameters A and B with it. Without C you are forcing the curve through zero which is biological feasible if you expect dbh=0 when height is zero. However, dbh can be zero if height is not zero just due to the model itself or what dbh actually is (it may take a minimum height before dbh is greater than zero). With the data you provided, there are noticeable differences between species for dbh and height so you probably need C in your model. For this general model you probably should just fit the curve for each species alone but I would use a general stats package to do this. This will give you a good starting point to know how well the curve fits each species as well as the similarity of parameters and residual variation. Getting convergence with a model that has B varying across species may be rather hard so I would suggest modeling A and C first. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 7 11:10:22 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 7 Dec 2010 09:10:22 -0700 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: On Tue, Dec 7, 2010 at 8:37 AM, Robert Kern wrote: > On Tue, Dec 7, 2010 at 08:54, Charles R Harris > wrote: > > > Blas adds quite a bit of overhead for multiplying small matrices, but so > > does calling from python. For implementing Kalman filters it might be > better > > to write a whole Kalman class so that operations can be combined at the c > > level. > > As I said, he's writing the Kalman filter in Cython. > > > Skipper, what kind of Kalman filter are you trying to implement? > > Does this help? > > > http://groups.google.com/group/cython-users/browse_thread/thread/a605a70626a455d?pli=1 > > A bit, but it isn't a class. Since the Kalman filter is basically weighted linear least squares with a noisy change of variable, Skipper's function could probably be implemented that way also. Since he seems to be doing a lot of observation updates in a single go the information Kalman filter, which basically implements the usual least squares, might be a faster way to go. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Tue Dec 7 11:33:25 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 7 Dec 2010 11:33:25 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References: Message-ID: On Tue, Dec 7, 2010 at 1:56 AM, Fernando Perez wrote: > Hi Skipper, > > On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold wrote: >> I'm wondering if anyone might have a look at my cython code that does >> matrix multiplication and see where I can speed it up or offer some >> pointers/reading. ?I'm new to Cython and my knowledge of C is pretty >> basic based on trial and (mostly) error, so I am sure the code is >> still very naive. > > a few years ago I had a similar problem, and I ended up getting a very > significant speedup by hand-coding a very unsafe, but very fast pure C > extension just to compute these inner products. ?This was basically a > replacement for dot() that would only work with double precision > inputs of compatible dimensions and would happily segfault with > anything else, but it ran very fast. ?The inner loop is implemented > completely naively, but it still beats calls to BLAS (even linked with > ATLAS) for small matrix dimensions (my case was also up to ~ 15x15). > > I'm attaching the code in case you find it useful, please keep in mind > I haven't compiled it in years, so it may have bit-rotted a little. > > Cheers, > > f > Thanks. This was my next step and would've taken me some time. Skipper From josef.pktd at gmail.com Tue Dec 7 11:35:38 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 7 Dec 2010 11:35:38 -0500 Subject: [SciPy-User] ancova with optimize.curve_fit In-Reply-To: <4CFEAE11.7060408@gmail.com> References: <2EDB4EB07FAA4F78B5ED0C2801524D5F@gmail.com>

<2061813969B94BD39B4BFFDABFFC2D00@gmail.com> <80C1F8072EB74C4F903BC96C09FB1D6A@gmail.com> <4CFEAE11.7060408@gmail.com> Message-ID: On Tue, Dec 7, 2010 at 4:58 PM, Bruce Southey wrote: > On 12/06/2010 10:00 PM, Peter Tittmann wrote: > > Gentlemen, > I've decided to switch to the OLS method, thought I did get the NNLS method > that Skipper proposed working. I was not prepared to spend more time trying > to make sense of the resulting array for ancova, etc. (also could not figure > out how to interpret the resulting coefficient array as I was expecting a 2d > array representing the a and b coefficient values but it returned a 1d > array). I have hopefully simple follow up questions: > 1. Is there a method to define explicitly the function used in OLS? I know > numpy.linalg.lstsq is the way OLS works but is there another function where > I can define the form? > 2. I'm still interested in interpreting the results of the NNLS method, so > if either of you can suggest what the resulting arrays mean id be grateful. > I've attached the output of NNLS > warm regards, > Peter > > Here is the working version of NNLS: > def getDiam2(ht,*b): > ?? ?return b[0] * ht[:,1]**b[1] + np.sum(b[2:]*ht[:,2:], axis=1) > dt = np.genfromtxt('/home/peter/Desktop/db_out.csv', delimiter=",", > names=True, dtype=None) > > indHtPlot = adt['height'] > depDbh = adt['dbh'] > plot_dummies, col_map = sm.tools.categorical(dt['plot], drop=True, > dictnames=True) > > def nnlsDummies(): > ?? ?'''this function returns coefficients and covariance arrays''' > ?? ?plot_dummies, col_map = sm.tools.categorical(indPlot, drop=True, > dictnames=True) > ?? ?X = np.column_stack((indHt, plot_dummies)) > ?? ?Y = depDbh > ?? ?coefs, cov = curve_fit(getDiam2, X, Y, p0= [0.]*X.shape[1]) > ?? ?return coefs, cov > > -- > Peter Tittmann > > On Monday, December 6, 2010 at 4:55 PM, josef.pktd at gmail.com wrote: > > On Mon, Dec 6, 2010 at 7:41 PM, Skipper Seabold wrote: > > On Mon, Dec 6, 2010 at 7:31 PM, Peter Tittmann wrote: >> thanks both of you, >> Josef, the data that I sent is only the first 100 rows of about 1500, >> there >> should be sufficient sampling in each plot. >> Skipper, I have attempted to deploy your suggestion for not linearizing >> the >> data. It seems to work. I'm a little confused at your modification if the >> getDiam function and I wonder if you could help me understand. The form of >> the equation that is being fit is: >> Y= a*X^b >> your version of the detDaim function: >> >> def getDiam(ht, *b): >> ? ?return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1) >> >> Im sorry if this is an obvious question but I don't understand how this >> works as it seems that the "a" coefficient is missing. >> Thanks again! > > Right. ?I took out the 'a', because as I read it when I linearized (I > might be misunderstanding ancova, I never recall the details), if you > include 'a' and also all of the dummy variables for the plot, then you > will have a the problem of multicollinearity. ?You could also include > 'a' and drop one of the plot dummies, but then 'a' is just your > reference category that you dropped. ?So now b[0] is the nonlinear > effect of your main variable and b[1:] contains linear shift effects > of all the plots. ?Hmm, thinking about it some more, though I think > you could include 'a' in the non-linear version above (call it b[0] > and shift everything else over by one), because now 'a' would be the > effect when the current b[0] is zero. ?I was just unsure how you meant > 'a' when you had a*ht**b and were trying to include in ht the plot > variable dummies. > > As I understand it, the intention is to estimate equality of the slope > coefficients, so the continuous variable is multiplied with the dummy > variables. In this case, the constant should still be added. The > normalization question is whether to include all dummy-cont.variable > products and drop the continuous variable, or include the continuous > variable and drop one of the dummy-cont levels. > > Unless there is a strong reason to avoid log-normality of errors, I > would work (first) with the linear version. > > Josef > > > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > I do think this is starting to be an off-list discussion because this is > really about statistics and not about numpy/scipy (you can contact me > off-list if you want). If it's too much stats, we can continue on the statsmodels list. Last time there was a similar question, I programmed most of the tests for the linear case, and it should soon be possible to do it for the non-linear case also. The target is to be able to do this in 5 (or so) lines of code for the linear case and maybe 10 lines for the non-linear case. Josef > > I am not sure what all the variables are so please excuse me but I presume > you want to model dbh as a function of height, plot and species. Following > usual biostatistics interpretation, 'plot' is probably treated as random > effect but you probably have to use R/SAS etc for that for both linear and > nonlinear models or some spatial models. > > Really you need to determine whether or not a nonlinear model is required. > With the few data points you provided, I only see a linear relationship > between dbh and height with some outliers and perhaps some heterogeneity of > variance. Often doing a simple polynomial/spline can help to see if there is > any evidence for a nonlinear relationship in the full data - a linear model > or polynomial with the data provided does not suggest a nonlinear model. > Obviously a linear model is easier to fit and interpret especially if you > create the design matrix as estimable functions (which is rather trivial > once you understand using dummy variables). > > The most general nonlinear/multilevel model proposed is of the form: > dbh= C + A*height^B > Obviously if B=1 then it is a linear model and the parameters A, B and C can > be modeled with a linear function of intercept, plot and species. Although, > if 'plot' is what I think it is then you probably would not model the > parameters A and B with it. > > Without C you are forcing the curve through zero which is biological > feasible if you expect dbh=0 when height is zero. However, dbh can be zero > if height is not zero just due to the model itself or what dbh actually is > (it may take a minimum height before dbh is greater than zero). With the > data you provided, there are noticeable differences between species for dbh > and height so you probably need C in your model. > > For this general model you probably should just fit the curve for each > species alone but I would use a general stats package to do this. This will > give you a good starting point to know how well the curve fits each species > as well as the similarity of parameters and residual variation. Getting > convergence with a model that has B varying across species may be rather > hard so I would suggest modeling A and C first. > > Bruce > > > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From jsseabold at gmail.com Tue Dec 7 11:37:34 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 7 Dec 2010 11:37:34 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: <4CFDF5AF.8060103@student.matnat.uio.no> References: <4CFDF5AF.8060103@student.matnat.uio.no> Message-ID: On Tue, Dec 7, 2010 at 3:51 AM, Dag Sverre Seljebotn wrote: > On 12/07/2010 07:56 AM, Fernando Perez wrote: >> Hi Skipper, >> >> On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold ?wrote: >> >>> I'm wondering if anyone might have a look at my cython code that does >>> matrix multiplication and see where I can speed it up or offer some >>> pointers/reading. ?I'm new to Cython and my knowledge of C is pretty >>> basic based on trial and (mostly) error, so I am sure the code is >>> still very naive. >>> >> a few years ago I had a similar problem, and I ended up getting a very >> significant speedup by hand-coding a very unsafe, but very fast pure C >> extension just to compute these inner products. ?This was basically a >> replacement for dot() that would only work with double precision >> inputs of compatible dimensions and would happily segfault with >> anything else, but it ran very fast. ?The inner loop is implemented >> completely naively, but it still beats calls to BLAS (even linked with >> ATLAS) for small matrix dimensions (my case was also up to ~ 15x15). >> > > Another idea: If the matrices are more in the intermediate range, here's > a Cython library for calling BLAS more directly: > > http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/ > I actually tried to use tokyo, but I couldn't get it to build against the ATLAS I compiled a few days ago out of the box. A few changes to setup.py didn't fix it, so I gave up. > For intermediate-size matrices the use of SSE instructions should be > able to offset any call overhead. Try to stay clear of using NumPy for > slicing though, instead one should do pointer arithmetic... Right. Thanks. Skipper From jsseabold at gmail.com Tue Dec 7 11:47:21 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 7 Dec 2010 11:47:21 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: On Tue, Dec 7, 2010 at 9:54 AM, Charles R Harris wrote: > > > On Mon, Dec 6, 2010 at 11:56 PM, Fernando Perez > wrote: >> >> Hi Skipper, >> >> On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold >> wrote: >> > I'm wondering if anyone might have a look at my cython code that does >> > matrix multiplication and see where I can speed it up or offer some >> > pointers/reading. ?I'm new to Cython and my knowledge of C is pretty >> > basic based on trial and (mostly) error, so I am sure the code is >> > still very naive. >> >> a few years ago I had a similar problem, and I ended up getting a very >> significant speedup by hand-coding a very unsafe, but very fast pure C >> extension just to compute these inner products. ?This was basically a >> replacement for dot() that would only work with double precision >> inputs of compatible dimensions and would happily segfault with >> anything else, but it ran very fast. ?The inner loop is implemented >> completely naively, but it still beats calls to BLAS (even linked with >> ATLAS) for small matrix dimensions (my case was also up to ~ 15x15). >> >> I'm attaching the code in case you find it useful, please keep in mind >> I haven't compiled it in years, so it may have bit-rotted a little. >> > > Blas adds quite a bit of overhead for multiplying small matrices, but so > does calling from python. For implementing Kalman filters it might be better > to write a whole Kalman class so that operations can be combined at the c > level. > > Skipper, what kind of Kalman filter are you trying to implement? > It's just a linear Gaussian filter. I use it to get the loglikelihood of a univariate ARMA series with exact initial conditions. As it stands it is fairly inflexible, but if I can make it fast I would like to generalize it. There is a fair amount of scratch work in here, and some attempts at generalized state space models, but all the action for my purposes is in KalmanFilter.loglike http://bazaar.launchpad.net/~jsseabold/statsmodels/statsmodels-skipper/annotate/head%3A/scikits/statsmodels/tsa/kalmanf/kalmanf.py#L505 It's not terribly slow, but I have to maximize the likelihood using numerical derivatives, so it's getting called quite a few times. A 1000 observation ARMA(2,2) series takes about 5-6 seconds on my machine with fmin_l_bfgs_b. Skipper From dagss at student.matnat.uio.no Tue Dec 7 11:52:43 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 07 Dec 2010 17:52:43 +0100 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: <4CFE665B.70202@student.matnat.uio.no> On 12/07/2010 04:35 PM, Keith Goodman wrote: > On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold wrote: > >> I'm wondering if anyone might have a look at my cython code that does >> matrix multiplication and see where I can speed it up or offer some >> pointers/reading. I'm new to Cython and my knowledge of C is pretty >> basic based on trial and (mostly) error, so I am sure the code is >> still very naive. >> >> >> >> cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE) >> > I'd like to reduce the overhead in creating the empty array. Using > PyArray_SimpleNew in Cython is faster than using np.empty but both are > slower than using np.empty without Cython. Have I done something > wrong? I suspect is has something to do with this line in the code > below: "cdef npy_intp *dims = [r, c]" > Nope, unless something very strange is going on, that line would be ridiculously fast compared to the rest. Basically just copying two integers on the stack. Try PyArray_EMPTY? Dag Sverre > PyArray_SimpleNew: > >>> timeit matmult(2,2) >>> > 1000000 loops, best of 3: 773 ns per loop > > np.empty in cython: > >>> timeit matmult2(2,2) >>> > 1000000 loops, best of 3: 1.62 us per loop > > np.empty in python: > >>> timeit np.empty((2,2)) >>> > 1000000 loops, best of 3: 465 ns per loop > > Code: > > import numpy as np > from numpy cimport float64_t, ndarray, NPY_DOUBLE, npy_intp > > ctypedef float64_t DOUBLE > > cdef extern from "numpy/arrayobject.h": > cdef void import_array() > cdef object PyArray_SimpleNew(int nd, npy_intp *dims, int typenum) > > # initialize numpy > import_array() > > def matmult(int r, int c): > cdef npy_intp *dims = [r, c] # Is there a faster way to do this? > cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE) > return out > > def matmult2(int r, int c): > cdef ndarray[DOUBLE, ndim=2] out = np.empty((r, c), dtype=np.float64) > return out > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From charlesr.harris at gmail.com Tue Dec 7 11:53:23 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 7 Dec 2010 09:53:23 -0700 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: On Tue, Dec 7, 2010 at 9:47 AM, Skipper Seabold wrote: > On Tue, Dec 7, 2010 at 9:54 AM, Charles R Harris > wrote: > > > > > > On Mon, Dec 6, 2010 at 11:56 PM, Fernando Perez > > wrote: > >> > >> Hi Skipper, > >> > >> On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold > >> wrote: > >> > I'm wondering if anyone might have a look at my cython code that does > >> > matrix multiplication and see where I can speed it up or offer some > >> > pointers/reading. I'm new to Cython and my knowledge of C is pretty > >> > basic based on trial and (mostly) error, so I am sure the code is > >> > still very naive. > >> > >> a few years ago I had a similar problem, and I ended up getting a very > >> significant speedup by hand-coding a very unsafe, but very fast pure C > >> extension just to compute these inner products. This was basically a > >> replacement for dot() that would only work with double precision > >> inputs of compatible dimensions and would happily segfault with > >> anything else, but it ran very fast. The inner loop is implemented > >> completely naively, but it still beats calls to BLAS (even linked with > >> ATLAS) for small matrix dimensions (my case was also up to ~ 15x15). > >> > >> I'm attaching the code in case you find it useful, please keep in mind > >> I haven't compiled it in years, so it may have bit-rotted a little. > >> > > > > Blas adds quite a bit of overhead for multiplying small matrices, but so > > does calling from python. For implementing Kalman filters it might be > better > > to write a whole Kalman class so that operations can be combined at the c > > level. > > > > Skipper, what kind of Kalman filter are you trying to implement? > > > > It's just a linear Gaussian filter. I use it to get the loglikelihood > of a univariate ARMA series with exact initial conditions. As it > stands it is fairly inflexible, but if I can make it fast I would like > to generalize it. > > There is a fair amount of scratch work in here, and some attempts at > generalized state space models, but all the action for my purposes is > in KalmanFilter.loglike > > > http://bazaar.launchpad.net/~jsseabold/statsmodels/statsmodels-skipper/annotate/head%3A/scikits/statsmodels/tsa/kalmanf/kalmanf.py#L505 > > It's not terribly slow, but I have to maximize the likelihood using > numerical derivatives, so it's getting called quite a few times. A > 1000 observation ARMA(2,2) series takes about 5-6 seconds on my > machine with fmin_l_bfgs_b. > > Just a guess here, but the numerical derivative bit makes it sounds like you are implementing a generalized Kalman filter. Have you looked at unscented Kalman filters? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Dec 7 12:05:21 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 7 Dec 2010 12:05:21 -0500 Subject: [SciPy-User] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: On Tue, Dec 7, 2010 at 11:53 AM, Charles R Harris wrote: > > > On Tue, Dec 7, 2010 at 9:47 AM, Skipper Seabold wrote: >> >> On Tue, Dec 7, 2010 at 9:54 AM, Charles R Harris >> wrote: >> > >> > >> > On Mon, Dec 6, 2010 at 11:56 PM, Fernando Perez