Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

July 1, 2011

      On Fri, Jul 1, 2011 at 11:20 AM, Matthew Brett <matthew.brett@gmail.com>wrote:
...
Hi,
On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root <ben.root@ou.edu> wrote:
...
On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett <matthew.brett@gmail.com>
wrote:
...
...
You can't switch between the two approaches without big changes in
your
...
...
...
code.
...
Lluis provided a case, and it was obscure.  That switch seems like a
rare or non-existent use-case that should not guide the API.
Just to respond to this specific issue.
In matplotlib, there are often constructs like the following:
plot_something(X, Y, V)
From a module perspective, we have no clue about the nature of the input
data.  We often have to do things like np.asanyarray, np.atleast_2d and
such
to establish some base-level assumptions about the input data.  Numpy
currently makes this fairly cheap by not performing a copy if it is not
needed.  So far, so good.
Next, some plotting functions needs to broadcast the arrays together
(again,
numpy makes that fairly cheap).
Then, we need to figure out the common elements to plot.  With something
simple like plot(), this is straight-forward or-ing of any masks.  Of
course, right now, this is not cheap because we can't assume that the
array
supports masking semantics.  This is where we either cast the arrays as
masked arrays, or perform our own masking semantics.  But, essentially, a
point that was masked in X, may not be masked in Y and/or V, and we can
not
change the original data (or else we would be a bad tool).
For more complicated functions like pcolor() and contour(), the arrays
needs
to know what the status of the neighboring points in itself, and for the
other arrays.  Again, either we use numpy.ma to share a common mask
across
the data arrays, or we implement our own semantics to deal with this.
And
again, we can not change any of the original data.
This is not an obscure case.  This is existing code in matplotlib.  I
will
be evaluating the current missingdata branch later today to assess its
suitability for use in matplotlib.
I think I missed why your case needs NA and IGNORE to use the same
API.  Why can't you just use masks and IGNORE here?
Best,
Matthew
The point is that matplotlib can not make assumptions about the nature of
the input data.  From matplotlib's perspective, NA's and IGNORE's are the
same thing and should be treated the same way (i.e. - skipped).  Right now,
matplotlib's code is messy and inconsistent with its treatment of masked
arrays and NaNs (some functions treat them the same, some only apply to NaNs
and vice versa).  This is because of code cruft over the years.  If we had
one interface to rule them all, we can bring *all* plotting functions to
have similar handling code and be more consistent across the board.

However, I think Mark's NEP provides a good way to distinguish between the
cases when needed (but I have not examined it from that perspective yet).

Ben Root