[newbie] how to compare two datasets
![](https://secure.gravatar.com/avatar/501f760e1dda0cb3245887706f04146b.jpg?s=120&d=mm&r=g)
The datasets are borehole data - so they have borehole name, depth, and a value. Each borehole has two datasets - one real, one modelled. I want to compare the values between them for each depth in modelled dataset (lower resolution / fewer samples). If there is no matching depth in real dataset I want to linearly interpolate between nearest values. Comparison to be quite simple at first, difference between values, and stats for entire set of differences. example data (depth, value): model: 0 15.5 -10 17.0 -20 18.5 -30 20.0 real: 0 16.5 -1 16.6 -2 16.6 ... -655 55.3 Not having used python much, i don't know best data structure (dictionary? sequence? list?), or if there are helpful things in SciPy to help this come together (stats, methods for comparing datasets like these, linear interp methods?). Looking for inspiration and pointers! ben. _________________________________________________________________ New, Used, Demo, Dealer or Private? Find it at CarPoint.com.au http://clk.atdmt.com/NMN/go/206222968/direct/01/
![](https://secure.gravatar.com/avatar/d3dda840811ec6be262cbd4aee7ce4d8.jpg?s=120&d=mm&r=g)
That sounds like a set of NumPy arrays is what you need. You can simply import your dataset to an array and perform row- and columnwise operations. First I would do an interpolation of the real data, though I'd probably use a cubic spline, but linear is fine too. The spline function will operate on a numpy array and return the mathematical object, *not* a new array. This spline can then be evaluated in the depths for which you have your model data. An example of how it could be done would be: modeldata = numpy.genfromtxt('modeled.data') realdata = numpy.genfromtxt('real.data') # Now say depth is the first column, and value is second: tck = scipy.interpolate.splrep(realdata[:, 0], realdata[:, 1]) iplrealdata = scipy.interpolate.splev(modeldata[:, 0], tck) #You will now have an interpolated value of the real data for every #depth of the model data - done with a cubic spline. #Linear interpolation would be done by, instead of doing splev, doing: # Interpolate: func = scipy.interpolate.interp1d(realdata[:, 0], realdata[:, 1]) # Evaluate: iplrealdata = func(modeldata[:, 0]) Cheers; Emil On Tue, 2010-07-13 at 07:00 +0000, ben h wrote:
The datasets are borehole data - so they have borehole name, depth, and a value. Each borehole has two datasets - one real, one modelled. I want to compare the values between them for each depth in modelled dataset (lower resolution / fewer samples). If there is no matching depth in real dataset I want to linearly interpolate between nearest values. Comparison to be quite simple at first, difference between values, and stats for entire set of differences.
example data (depth, value): model: 0 15.5 -10 17.0 -20 18.5 -30 20.0
real: 0 16.5 -1 16.6 -2 16.6 ... -655 55.3
Not having used python much, i don't know best data structure (dictionary? sequence? list?), or if there are helpful things in SciPy to help this come together (stats, methods for comparing datasets like these, linear interp methods?).
Looking for inspiration and pointers!
ben.
______________________________________________________________________ Find it at CarPoint.com.au New, Used, Demo, Dealer or Private? _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/501f760e1dda0cb3245887706f04146b.jpg?s=120&d=mm&r=g)
Thøger Emil Juul Thorsen <thoeger <at> fys.ku.dk> writes:
# Now say depth is the first column, and value is second:
tck = scipy.interpolate.splrep(realdata[:, 0], realdata[:, 1])
I am unable to perform this when values in realdata are negative. I can't simply make them positive, as there are mixed +ve and -ve values. Any clues to a solution? Ben.
participants (2)
-
ben h
-
Thøger Emil Juul Thorsen