[SciPy-user] organizing data

Xavier Barthelemy xavier.barthelemy at cmla.ens-cachan.fr
Wed Nov 21 11:24:10 EST 2007


Gael Varoquaux a écrit :
> Let's keep this on the mailing list, or else, we can switch to French
> (not that I mind English).

Sorry I made reply without looking at the email
> 
> On Wed, Nov 21, 2007 at 03:59:39PM +0100, Xavier Barthelemy wrote:
>> So I think you did not understand my problem. It's a problem of 
>> organizing and cutting by slice of rows in my data.
> 
> 
> OK, you're telling me you already have interpolated your data, and you
> have all the values you need (ie regular grids in the different
> directions you are interested in), but you just need to sort them out.
> 

no, I did not interpolate my data, because you cannot. I am looking for 
  sub-critical instabilities in binary instationary miscible 
compressible viscous fluids. so each numerical experiment is the result 
of (for some) days or hours of calculus in a high performance parallel 
computer.


>> My data comports 18 values witch depends of 12 parameters.
> 
>> so I have my data like that:
>> by rows:
>> first the 12 parameters, then the 18 values.
> 
>> each rows represent a numeric experiment. I am doing parameters 
>> exploration, so each varies independently. But the discretization of the 
>> space parameters is not "regular": i have refined some for some values 
>> of the others parameters.
> 
> If you want to do image plot, it will be more easier to have regular
> data, that's where interpolation comes in. It looks to me like you want
> to have an interpolation function f({P}) -> {V} where {P} is your set of
> parameters and {V} your set of values. When you want to plot the cut
> along a hyperplane HP, you simply choose a regular grid of this hyper
> plane, and apply your interpolation function f on it. That's how I would
> do it, if I understand your problem correctly.

Yes, you're right, i would do it like that, but I can't. Too much 
calculus. I am refining some zones in the space parameter independently.

> 
>> so my problem is really what I have (badly i guess) explained: I would 
>> like to plot 2D (and 3D) graphs.
>> Let's suppose that I have X ,Y and Z datas corresponding. knowing that, 
>> how do you plot Y(X) with Z constant when you have a bunch of three 
>> columns data?
> 
> In 1D that's really easy: if D is your [X, Y, Z] array, I would do
> 
> X = D[:, 0]
> Y = D[:, 1]
> Z = D[:, 2]
> 
> # Select all the data for Z = z0
> x = X[Z==z0]
> y = Y[Z==z0]
> 
> # Sort, just to make things prettier
> y = y[argsort(x)]
> x.sort()
> 
> plot(x, y)

i will try it. may be i have thought it too complicated.
> 
>> you 'll have to first sort by Z, and then by X. so when you'll plot 
>> that, for each "X", sequentially sorted, you'll have the different "Y" 
>> for each "Z" values. consequently you will have the number of different 
>> Z values plots.
> 
> OK, you want to do this for all values of Z.
> 
>> And my problem now is the generalization with 12 parameters, let's name 
>> them from A to J. How I'll do if I want to plot F(H)? the same, I will 
>> sort by each of the 10 parameters and finally by H and I will have a 
>> family of plots.
> 
> Yes, you can do this in a nice way using an array with fields and the
> "order" argument to sort.
> 
>> but now I want to cut (slice) them to have each plot independently, and 
>> i can plot them by the interfaced gnuplot.
> 
> You can always generalize the cutting method used in my example. If U, V,
> W are parameters (similar to Z in the example above), you can define a
> mask array:
> 
> mask = (U == u0) & (V == v0) & (W == w0)
> 
> # You dont really need x and y arrays, you could directly go to xy
> x = X[mask]
> y = Y[mask]
> xy = empty(x.shape, dtype=dtype([('x','float'), ('y','float')]))
> xy['x'] = x
> xy['y'] = y
> 
> xy.sort(order=('x', 'y'))
> 
> then you have in xy['x'] and xy['y'] what you are interested in, if I
> understand things right.

sounds good
> 
> But I still think you need a regular grid, so either your data already
> has that structure, and I don't understand why it has been "shuffled", or
> it hasn't, and you'll need interpolating, so why bother sorting?
> (selecting might be useful to reduce the amount of points).

they are "shuffled" because I did not saved a data-base with the logical 
links. so if I want the 7th column in function of the 5th, they are 
"shuffled"


thanks for your help, I'll try just now

cheers
Xavier




More information about the SciPy-User mailing list