Dear All, I am going through the Scipy manual and I am trying to reproduce, as an exercise to learn Python and Scipy, some nonlinear least-square optimization which I was able to carry out using another language. I am collecting some questions which hopefully will help me understand Python a bit better: (1) What is the difference between from pylab import * and import pylab? (2) the 2nd question may be a non-problem: in running some of the examples in the tutorial by Olinfant, I could not use the xplt module, no matter whether trying import scipy.xplt, from scipy.xplt import * et similia. I bumped into http://lists.debian.org/debian-science/2007/01/msg00007.html which may be the answer (I am running Debian on my box as well and I installed Scipy from the Debian repository). Can I install xplt by itself? BTW, I also visited http://www.scipy.org/Cookbook/xplt but even using from scipy.sandbox import xplt does not help. (3) Is there a way to have arrays starting with index 1 rather than zero in Python? As you can guess, I do not have a strong C background. (4) This is the main question: I am trying to fit some experimental data to a log-normal curve. I would like to follow the same steps as in the tutorial, but something seems to be going wrong. I cut and paste the code I am using and attach a .csv file so that one can reproduce step by step my work. (5) Finally, if I load scipy and then write, e.g., z=10.3, how is z handled? Is it a floating point number? What if, for instance, I need to have a very large number of significant digits because they do matter for some computation I want to run? Can I have the equivalent of format long [Matlab statement], so that every non-integer number is by default treated with a certain precision? Here is the code: #! /usr/bin/env python from pylab import plot, show, ylim, yticks from scipy import * import pylab # now I want to try reading some .csv file data = pylab.load("120km-TPM.csv",delimiter=',') vecdim=shape(data) # now I introduce a vector with the dimensions of data, the file I read print "the dimensions of data are" print vecdim # now very careful! in Python the arrays start with index zero. diam=data[0:vecdim[0],0] # it means: slice the rows from the 1st one (0) to the last one ( # (vecdim[0]) for the first column (0) print "the dimensions of diam are" print shape(diam) #plot(diam,data[:,1]) #show() # uncomment them to plot a distribution # 1st problem: if uncomment the previous two lines, I get a warning and until I close # the window, the script does not progress. # now I try performing a least-square fitting from scipy.optimize import leastsq x=diam # just a list of diameters y_meas=data[0:vecdim[0],1] # measured data, for example the 2nd column of the .csv file def residuals(p, y, x): A1,mu1,myvar1 = p err = y-log(10.0)*A1/sqrt(2.0*pi)/log(myvar1)*exp(-((log(x/mu1))**2.0)/2.0/log(myvar1)/log(myvar1)) return err def peval(x, p): return log(10.0)*p[0]/sqrt(2.0*pi)/log(p[2])*exp(-((log(x/p[1]))**2.0)/2.0/log(p[2])/log(p[2])) p0 = [50000.0,90.0, 1.59] print array(p0) # now I try actually solving the problem print "ok up to here" plsq = leastsq(residuals, p0, args=(y_meas, x)) print "ok up to here2" print plsq[0] print array([A, k, theta]) print "So far so good" which produces this output on my box: $ ./read-and-plot.py the dimensions of data are (104, 10) the dimensions of diam are (104,) [ 5.00000000e+04 9.00000000e+01 1.59000000e+00] ok up to here TypeError: array cannot be safely cast to required type Traceback (most recent call last): File "./read-and-plot.py", line 51, in ? plsq = leastsq(residuals, p0, args=(y_meas, x)) File "/usr/lib/python2.4/site-packages/scipy/optimize/minpack.py", line 266, in leastsq retval = _minpack._lmdif(func,x0,args,full_output,ftol,xtol,gtol,maxfev,epsfcn,factor,diag) minpack.error: Result from function call is not a proper array of floats. Many thanks Lorenzo
I have a cloud of points (for instance given as a (n,3) shaped array, with columns formed by the x, y and z column vectors). I would like to find the mean distance in this cloud of points. I do not need an exact value, I am just interested in a typical distance. I could do it in a brute force way: ++++++++++++++++++++++++++++++++++++++++++ from scipy import * x = arange(1, 5) points = c_[x, x, x] diffs = abs(points[newaxis, :] - points[:, newaxis]) dists = sqrt(diffs[..., 0]**2 + diffs[..., 1]**2 + diffs[..., 2]**2).ravel() dists = dists[dists>0] mean(dists) ++++++++++++++++++++++++++++++++++++++++++ This is actually not as ugly and slow as I originaly thought. Are there any better ways of doing this ? Thanks, Gaël
On 24/02/07, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
I have a cloud of points (for instance given as a (n,3) shaped array, with columns formed by the x, y and z column vectors).
I would like to find the mean distance in this cloud of points. I do not need an exact value, I am just interested in a typical distance.
This is actually quite tricky, depending on what you mean by a "typical" distance - distances can have all sorts of distributions. Imagine for example a cloud that is actually two small clouds a long way apart, or a cloud with a few very distant outliers or a Julia set (for which the distance behaves like a power law whose exponent is related to the fractal dimension)... well, you get the point.
I could do it in a brute force way:
This can be tidied slightly:
++++++++++++++++++++++++++++++++++++++++++ from scipy import * x = arange(1, 5)
points = c_[x, x, x] diffs = abs(points[newaxis, :] - points[:, newaxis]) There's no need for an absolute value here. dists = sqrt(diffs[..., 0]**2 + diffs[..., 1]**2 + diffs[..., 2]**2).ravel() sqrt(sum(diffs**2,axis=2)).ravel() will do the same. dists = dists[dists>0] mean(dists) ++++++++++++++++++++++++++++++++++++++++++
Are there any better ways of doing this ?
Well, depending what you want from "typical distance" the median might do a better job (or not). Or you might be satisfied with a random sample of 100 points (say): p = points[random.randint(shape(points)[0],size=100)] and then use the above procedure. Alternatively, if you're willing to be crude: lwh = ptp(points,axis=0) # size of the bounding box d = sqrt(sum((lwh/2)**2)) I end up using sqrt(sum(X**2,axis=Y)) rather often, I wonder if there's a tidy idiom for it? It's the L2 norm, after all... Anne
Interesting remarks. You forced me to think a bit more about what I was trying to achieve. What I am trying to do is to find out the right size to use for symbols when used on a 3D cloud of points. I am not sure what the right "typical distance" should be used. If those symbols are arrows then it seems that should be smaller than the typical inter-point distance. I have in mind something like this: if you have n points, find out the distribution of distances, divide it by n**3, then take the value at 0.2 from the smallest. I am having diffculties expressing my point, but the idea would be to consider that the typical distribution will increase as n**3 (which is not obvious, for instance if the points are along a plane) and take the lower tail of the distribution, as we are interested in having symbols smaller than the inter-point distance. Taking not the smallest value, but the value at "20%" from the bottom helps getting rid of singular situations where a few points are very close but the major part is spread out. The problem is that the "good" solution does depend on the problem, and there will never be a one size fits all solution. I am interested in other suggestions. Cheers, Gaël
Correcting a stupid mistake. It bothered me to leave it, sorry for the noise. Gaël On Sun, Feb 25, 2007 at 01:03:03AM +0100, Gael Varoquaux wrote:
Interesting remarks. You forced me to think a bit more about what I was trying to achieve.
What I am trying to do is to find out the right size to use for symbols when used on a 3D cloud of points. I am not sure what the right "typical distance" should be used. If those symbols are arrows then it seems that should be smaller than the typical inter-point distance. I have in mind something like this:
if you have n points, find out the distribution of distances, divide it by n**3, then take the value at 0.2 from the smallest.
I am having diffculties expressing my point, but the idea would be to consider that the typical distribution will increase as n**3 (which is n**(1/3.) not obvious, for instance if the points are along a plane) and take the lower tail of the distribution, as we are interested in having symbols smaller than the inter-point distance. Taking not the smallest value, but the value at "20%" from the bottom helps getting rid of singular situations where a few points are very close but the major part is spread out.
participants (3)
-
Anne Archibald -
Gael Varoquaux -
Lorenzo Isella