
On Mon, Oct 24, 2011 at 11:12 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing <efiring@hawaii.edu>
wrote:
On 10/23/2011 12:34 PM, Nathaniel Smith wrote:
like. And in this case I do think we can come up with an API that will make everyone happy, but that Mark's current API probably can't be incrementally evolved to become that API.)
No one could object to coming up with an API that makes everyone happy, provided that it actually gets coded up, tested, and is found to be fast and maintainable. When you say the API probably can't be evolved, do you mean that the underlying implementation also has to be redone? And if so, who will do it, and when?
Eric _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I personally am a bit apprehensive as I am worried about the masked array abstraction "leaking" through to users of pandas, something which I simply will not accept (why I decided against using numpy.ma early on, that + performance problems). Basically if having an understanding of masked arrays is a prerequisite for using pandas, the whole thing is DOA to me as it undermines the usability arguments I've been making about switching to Python (from R) for data analysis and statistical computing.
The missing data functionality looks far more like R than numpy.ma.
For instance
In [8]: a = arange(5, maskna=1)
In [9]: a[2] = np.NA
In [10]: a.mean() Out[10]: NA(dtype='float64')
In [11]: a.mean(skipna=1) Out[11]: 2.0
In [12]: a = arange(5)
In [13]: b = a.view(maskna=1)
In [14]: a.mean() Out[14]: 2.0
In [15]: b[2] = np.NA
In [16]: b.mean() Out[16]: NA(dtype='float64')
In [17]: b.mean(skipna=1) Out[17]: 2.0
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I don't really agree with you.
some sample R code
arr <- rnorm(10) arr[5:8] <- NA arr [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA [7] NA NA 0.3322803 -1.9201257
In your examples you had to pass maskna=True-- I suppose that my only recourse would be to make sure that every array inside a DataFrame, for example, has maskna=True set. I'll have to look in more detail and see if it's feasible/desirable. There's a memory cost to pay, but you can't get the functionality for free. I may just end up sticking with NaN as it's worked pretty well so far the last few years-- it's an impure solution but one with reasonably good performance characteristics in the places that matter.
It might useful to have a way of setting global defaults, or something like a with statement. These are the sort of things that can be adjusted based on experience. For instance, I'm thinking skipna=1 is the natural default for the masked arrays. Chuck