Is an integer data type the obvious default for 'empty'? I expected float. Thanks, Alan Isaac using scipy core version 0.8.4
Can one specify a bandwidth for the kernel in scipy.stats.gaussian_kde? What is the default bandwidth? I checked the source code, but it was too obscure for me. If it is fixed, what is the reasoning? Don't I *want* to be able to adjust it? -gary
On 12/30/05, Gary <pajer@iname.com> wrote:
Can one specify a bandwidth for the kernel in scipy.stats.gaussian_kde?
What is the default bandwidth? I checked the source code, but it was too obscure for me.
If it is fixed, what is the reasoning? Don't I *want* to be able to adjust it?
The default uses Scott's Rule to calculate an "optimal" bandwidth (minimum asymptotic mean integrated square error for Gaussian true densities, I believe). You can change the method that calculates the bandwidth by overriding the method covariance_factor. The module was written for an application for which this was sufficient, and then it was contributed to scipy. There is certainly room for making it more sophisticated. There are endless numbers of ways to do bandwidth selection. This is frequently a bad thing. Here's the TODO list for kde.py: * Split out univariate from multivariate; there are some approaches that are much easier (or simply possible) for univariate KDE than multivariate. * Provide more ways to select a bandwidth including k-nearest neighbors (univariate only). * Add more kernels besides Gaussians. I probably won't be getting to all of these, so contributions are welcome. -- Robert Kern robert.kern@gmail.com
WinXP, scipy_core version 0.9.0.1713 tried to call scipy.histogram with the default bins=10. See below. It works ok if I make my own bins. ======================================== In [288]: f Out[288]: array([ 46., 59., 77., 87., 50., 97., 84., 73., 100., 34., 86., 67., 68., 100., 74., 81., 94., 66., 52., 66., 69., 54., 85., 97., 31., 49.]) In [289]: scipy.histogram(f) --------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last) c:\python24\Lib\site-packages\scipy\base\function_base.py in histogram(a, bins, range, normed) 116 117 n = a.sort().searchsorted(bins) --> 118 n = concatenate([n, [len(a)]]) 119 n = n[1:]-n[:-1] 120 TypeError: len() of unsized object
I encountered the same error, and It seems that the variable "a" is used to receive the data and to generate the bins. I've changed the name of the variable "a" to something else in line 115 of function_base.py and it solved the problem. But when looking at the created bins it produced unbalanced bins in the borders. Example: In [17]: histogram(linspace(-1,1,100)) Out[17]: (array([11, 11, 11, 11, 11, 11, 11, 11, 11, 1]), array([-1. , -0.77777778, -0.55555556, -0.33333333, -0.11111111, 0.11111111, 0.33333333, 0.55555556, 0.77777778, 1. ])) Each bin should have 10 counts instead of 11 leaving the last bin with 1. Will check the code to see if I can suggest a fix. Hugo Gamboa On 12/30/05, Gary <pajer@iname.com> wrote:
WinXP, scipy_core version 0.9.0.1713
tried to call scipy.histogram with the default bins=10. See below.
It works ok if I make my own bins.
========================================
In [288]: f Out[288]: array([ 46., 59., 77., 87., 50., 97., 84., 73., 100., 34., 86., 67., 68., 100., 74., 81., 94., 66., 52., 66., 69., 54., 85., 97., 31., 49.])
In [289]: scipy.histogram(f)
--------------------------------------------------------------------------- exceptions.TypeError Traceback (most recent call last)
c:\python24\Lib\site-packages\scipy\base\function_base.py in histogram(a, bins, range, normed) 116 117 n = a.sort().searchsorted(bins) --> 118 n = concatenate([n, [len(a)]]) 119 n = n[1:]-n[:-1] 120
TypeError: len() of unsized object
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.net http://www.scipy.net/mailman/listinfo/scipy-user
On Fri, 30 Dec 2005, Gary apparently wrote:
In [288]: f Out[288]: array([ 46., 59., 77., 87., 50., 97., 84., 73., 100., 34., 86., 67., 68., 100., 74., 81., 94., 66., 52., 66., 69., 54., 85., 97., 31., 49.]) In [289]: scipy.histogram(f) --------------------------------------------------------------------------- exceptions.TypeError
It is a scoping problem. (See comments below.) This also reminded me of a question: should a.sort() violate Pythonic expectations by returning a? Alan Isaac PS Possible rewrite of `histogram` offered at the end. Fixes this problem, eliminates the use of the built-in name `range`, and sets endpoint=False in the linspace call. See the end of this message. ################ The Current Def with Problem Highlighted ############### def histogram(a, bins=10, range=None, normed=False): a = asarray(a).ravel() #<- here `a` is an array if not iterable(bins): if range is None: range = (a.min(), a.max()) mn, mx = [a+0.0 for a in range] #<- now `a` is a number! if mn == mx: mn -= 0.5 mx += 0.5 bins = linspace(mn, mx, bins) n = a.sort().searchsorted(bins) #<- not caught here because type(a) is int32_arrtype!! n = concatenate([n, [len(a)]]) n = n[1:]-n[:-1] if normed: db = bins[1] - bins[0] return 1.0/(a.size*db) * n, bins else: return n, bins ################ Proposed Rewrite of histogram ################ def histogram(a, bins=10, minmax=None, normed=False, copy=True): '''Returns `n`,`bins` as arrays, where `n` contains the number of items in each bin, and `bins` contains the bin cutoffs (cutoff<=value) ''' a = array(a,copy=copy).ravel() if not iterable(bins): if minmax is None: minmax = (a.min(), a.max()) mn, mx = [mi+0.0 for mi in minmax] if mn == mx: mn -= 0.5 mx += 0.5 bins = linspace(mn, mx, bins, endpoint=False) n = a.sort().searchsorted(bins) n = concatenate([n, [len(a)]]) n = n[1:]-n[:-1] if normed: db = bins[1] - bins[0] return 1.0/(a.size*db) * n, bins else: return n, bins
On Fri, 30 Dec 2005, Alan G Isaac apparently wrote:
should a.sort() violate Pythonic expectations by returning a?
Rephrasing: Currently scipy core does not follow the Python or numarray convention of having the sort method sort the array in place and return None. Instead it returns a sorted copy of the old array, and leaves the old array untouched. This seems like a sure "gotcha". Perhaps it would be better to follow the Python and numarray convention for the `sort` method and have a new sorted() method handle the current behavior. fwiw, Alan Isaac
Alan G Isaac wrote:
On Fri, 30 Dec 2005, Alan G Isaac apparently wrote:
should a.sort() violate Pythonic expectations by returning a?
Rephrasing:
Currently scipy core does not follow the Python or numarray convention of having the sort method sort the array in place and return None. Instead it returns a sorted copy of the old array, and leaves the old array untouched.
This makes sense. The reason for the current behavior is that it was Numeric's behavior -- but sort was a function. So, clearly the sort function could still return a copy while the sort method does the in-place sort. I had forgotten that sort was not a method in Numeric. I think this is the right thing to do. I don't think we need another .sorted method either: sort(a) will return the sorted copy just as it did in Numeric. I'll do this as part of the change of inlining the sorting methods. -Travis
On Fri, 30 Dec 2005, Travis Oliphant apparently wrote:
So, clearly the sort function could still return a copy while the sort method does the in-place sort. I had forgotten that sort was not a method in Numeric. ... I'll do this as part of the change of inlining the sorting methods.
Great! I think this is important because of Python's behavior. Making the function name `sort` instead of `sorted` also leaves Python's built-in usable even if the scipy namespace is imported, which might be useful. I will also mention another case: ravel. I do not have an opinion on this one. But a parallel makes this seem worth mentioning: Numeric only had the function, while (in conflict with the current ndarray) the numarray ravel method worked in place and returned None. Cheers, Alan Isaac
Alan G Isaac wrote:
Is an integer data type the obvious default for 'empty'? I expected float.
This question comes up occasionally. The reason for int is largely historical --- that's what was decided long ago when Numeric came out. Changing this in some places would break a lot of code, I'm afraid. And the default for empty is done for consistency. I felt it better to have one default rather than many. The default can be changed in one place in the C-code if we did decide to change it. Now's the time because version 1.0 is approaching in the next couple of months. Version 0.9 will be the first-of-the-year release. -Travis
Hi I just installed the latest svn versions. 1) Trying to have a look at the doc gives me this ##################################################################################################################################### In [19]: ?scipy Type: module Base Class: <type 'module'> String Form: <module 'scipy' from '/usr/lib/python2.3/site-packages/scipy/__init__.pyc'> Namespace: Interactive File: /usr/lib/python2.3/site-packages/scipy/__init__.py Docstring: SciPy Core ========== You can support the development of SciPy by purchasing documentation at http://www.trelgol.com It is being distributed for a fee for a limited time to try and raise money for development. Documentation is also available in the docstrings. Available subpackages --------------------- SciPy: A scientific computing package for Python ================================================ Available subpackages --------------------- In [20]: ##################################################################################################################################### ... no infos about scipy subpackages (of course `help(scipy)` has some infos). 2) I think the doc strings of corefft and corelinalg should contain some words explaining their existence. New scipy users would wonder * why there is e.g. linalg.inv and corelinalg.inv and why they are the same function from .../scipy/linalg/basic.py; same with corefft.fft and fftpack.fft and so on * what the difference between corelinalg.det and corelinalg.determinant is (there is none as a look in .../scipy/corelinalg/linalg.py reveals) and why there are two names for the same function * .... Is it possible to move all additional functionality of corefft and corelinalg to linalg and fftpack (e.g corelinalg.eigh which seems to have no equivalent function in linalg). cheers, steve -- "People like Blood Sausage too. People are Morons!" -- Phil Connors, Groundhog Day
Steve Schmerler wrote:
Available subpackages ---------------------
SciPy: A scientific computing package for Python ================================================
Available subpackages ---------------------
In [20]: #####################################################################################################################################
... no infos about scipy subpackages (of course `help(scipy)` has some infos)
These subpackage docs were at one time auto-generated. I'm not sure what the status of that is right now. It could be that the info.py file is wrong in the sub-packages.
2) I think the doc strings of corefft and corelinalg should contain some words explaining their existence. New scipy users would wonder
Good idea.
Is it possible to move all additional functionality of corefft and corelinalg to linalg and fftpack (e.g corelinalg.eigh which seems to have no equivalent function in linalg).
Should be possible. The linalg approach uses f2py blas wrappers. I'm sure there is already a wrapper around the underlying function that handles eigh. Lots of little things to do so dive on in :-) Thanks for the feedback and suggestions. -Travis
On Fri, 30 Dec 2005, Travis Oliphant apparently wrote:
The default can be changed in one place in the C-code if we did decide to change it. Now's the time because version 1.0 is approaching in the next couple of months.
Once discovered, it does not matter much of course. My only argument for the change is the effect on prospective users as opposed to the existing base of knowledgeable users: most I suspect will come from environments where every number containing object is filled with floats by default. (I think GAUSS and Matlab work this way.) Any ordinary user who creates an array of zeros (or empty) and then assigns floats to a few elements is likely to be surprised by the outcome, I believe. Cheers, Alan Isaac
Seeing as this question comes up fairly often, I've made a FAQ for it in the new wiki: http://new.scipy.org/Wiki/FAQ#head-c366238b249beadfb51fc716bb440a6ad527dba9 Please edit (as is the wiki way) as you see fit. Alan G Isaac wrote:
On Fri, 30 Dec 2005, Travis Oliphant apparently wrote:
The default can be changed in one place in the C-code if we did decide to change it. Now's the time because version 1.0 is approaching in the next couple of months.
Once discovered, it does not matter much of course. My only argument for the change is the effect on prospective users as opposed to the existing base of knowledgeable users: most I suspect will come from environments where every number containing object is filled with floats by default. (I think GAUSS and Matlab work this way.) Any ordinary user who creates an array of zeros (or empty) and then assigns floats to a few elements is likely to be surprised by the outcome, I believe.
Cheers, Alan Isaac
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.net http://www.scipy.net/mailman/listinfo/scipy-user
On 30/12/2005, at 7:14 PM, Travis Oliphant wrote:
Alan G Isaac wrote:
Is an integer data type the obvious default for 'empty'? I expected float.
This question comes up occasionally. The reason for int is largely historical --- that's what was decided long ago when Numeric came out. Changing this in some places would break a lot of code, I'm afraid. And the default for empty is done for consistency. I felt it better to have one default rather than many.
The default can be changed in one place in the C-code if we did decide to change it. Now's the time because version 1.0 is approaching in the next couple of months. Version 0.9 will be the first-of-the-year release.
+5 on changing the default to float. I think we'd look back on this decision in several years as difficult but right. Here are some ideas on how we could ease the transition: (1) We could provide new functions intzeros(), intones(), and intempty () with the same behaviour as the current functions. That is, integer types would be the default, but this could be overridden by a dtype keyword argument. Then converting any old Numeric / numarray code would just require another global string substitution in convertcode.py. (2) We could provide two sets of functions, intzeros() etc. and floatzeros() etc., and remove the default interpretation altogether from the standard zeros() functions. This is not ideal long term, but could be a useful temporary measure during a transition for shaking out bugs from the scicore and scipy trees. (3) The default type could be chosen by the user as a package-level global variable. I think this would be the best solution. Then the old integer default could be turned on with one line of Python code. I suppose that Python functions using this default, given the static evaluation of default argument values, would need the "dtype=None" idiom in function headers followed by dtype=global_dtype in the function body. -- Ed
-Travis
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.net http://www.scipy.net/mailman/listinfo/scipy-user
participants (8)
-
Alan G Isaac -
Andrew Straw -
Ed Schofield -
Gary -
Hugo Gamboa -
Robert Kern -
Steve Schmerler -
Travis Oliphant