[Numpy-discussion] Using matplotlib's prctile on masked arrays
Gökhan Sever
gokhansever at gmail.com
Tue Oct 27 07:56:33 EDT 2009
Hello,
Consider this sample two columns of data:
999999.9999 999999.9999
999999.9999 999999.9999
999999.9999 999999.9999
999999.9999 1693.9069
999999.9999 1676.1059
999999.9999 1621.5875
651.8040 1542.1373
691.0138 1650.4214
678.5558 1710.7311
621.5777 999999.9999
644.8341 999999.9999
696.2080 999999.9999
Putting into this data into a file say "sample.data" and loading with:
a,b = np.loadtxt('sample.data', dtype="float").T
I[16]: a
O[16]:
array([ 1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
6.51804000e+02, 6.91013800e+02, 6.78555800e+02,
6.21577700e+02, 6.44834100e+02, 6.96208000e+02])
I[17]: b
O[17]:
array([ 999999.9999, 999999.9999, 999999.9999, 1693.9069,
1676.1059, 1621.5875, 1542.1373, 1650.4214,
1710.7311, 999999.9999, 999999.9999, 999999.9999])
### interestingly, the second column is loaded as it is but a values
reformed a little. Why this could be happening? Any idea? Anyways, back to
masked arrays:
I[24]: am = ma.masked_values(a, value=999999.9999)
I[25]: am
O[25]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
mask = [ True True True True True True False False False
False False False],
fill_value = 999999.9999)
I[30]: bm = ma.masked_values(b, value=999999.9999)
I[31]: am
O[31]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
mask = [ True True True True True True False False False
False False False],
fill_value = 999999.9999)
So far so good. A few basic checks:
I[33]: am/bm
O[33]:
masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
0.39664667346 -- -- --],
mask = [ True True True True True True False False False
True True True],
fill_value = 999999.9999)
I[34]: mean(am/bm)
O[34]: 0.41266624676580849
Unfortunately, matplotlib.mlab's prctile cannot handle this division:
I[54]: prctile(am/bm, p=[5,25,50,75,95])
O[54]:
array([ 3.96646673e-01, 6.21577700e+02, 1.00000000e+06,
1.00000000e+06, 1.00000000e+06])
This also results with wrong looking box-and-whisker plots.
Testing further with scipy.stats functions yields expected correct results:
I[55]: stats.scoreatpercentile(am/bm, per=5)
O[55]: 0.40877012449846228
I[49]: stats.scoreatpercentile(am/bm, per=25)
O[49]:
masked_array(data = --,
mask = True,
fill_value = 1e+20)
I[56]: stats.scoreatpercentile(am/bm, per=95)
O[56]:
masked_array(data = --,
mask = True,
fill_value = 1e+20)
Any confirmation?
--
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/6492935c/attachment.html>
More information about the NumPy-Discussion
mailing list