[Numpy-discussion] Who will use numpy.ma?

Sasha ndarray at mac.com
Thu Jan 12 13:30:05 EST 2006


With Paul's permission I am posting his arguments and my responses.
Numpy.ma will follow Paul's design and there is now a wiki page
dedicated to the effort to make ma work better in numpy. (See
http://projects.scipy.org/scipy/numpy/wiki/MaskedArray).

-- sasha

On 1/12/06, Sasha <ndarray at mac.com> wrote:
> Paul,
>
> Thank you very much for your insights and once again thanks for all
> the great work that you've done.  I've noticed that your reply was not
> posted on any list, do you mind if I forward it to numpy-user?  Please
> see more below.
>
> On 1/12/06, Paul F. Dubois <paul at pfdubois.com> wrote:
> > What special values? Are you sure this works on any platform? What, for
> > example, is the special value for integer arrays? For arrays of objects?
> >
> Yes, these are hard questions.  For floats nan is an obvious choice and
> IEEE support is getting better on the new hardware. For objects None is
> a fine choice.  For integers some may argue for sys.maxint, and given that
> numpy integer arithmetics is already handling overflow a check for maxint will
> not add much complexity. Yet don't get me wrong: I don't see any replacement
> for ma myself.
>
> > How do replaceable mathematical operations make any difference? The
> > fundamental problem is that if array x has special values in some places
> > and array y has them in some other places, how do you create a result
> > that has special values in the correct places AND is of a type for which
> > those special values are still treated as 'missing'. How do you do this?
> >
>
> Replaceable operations would allow one to redefine all operations on integer
> arrays to treat say sys.maxint as invariant and cast it to nan in floating point
> conversions without changing the logic of main line numpy.
>
> > I converted MA to ma but did not have time to flesh out all the
> > differences with the new ndarray. I was hoping the community would do
> > that.
>
> Me too.  That's was the point of my post - to find out the size of the comunity
> rather than to suggest an alternative.
>
> > I am retired.
> >
> You deserve it.
>
> > It is my belief that the approach you outline is not workable, but
> > perhaps I am not understanding it properly.
> >
> I don't have any workable approach other than enchancing ma to work
> better with numpy.  This is what I am doing right now.
>
> > If I, who have thought about this a lot, do not know for sure, what
> > information can you derive from a poll of the general public, who will
> > not think through these issues very carefully?
> >
> I was trying to poll numpy community to find out how many people actually
> use ma in real projects.  This would determine how well tested the new features
> will be and how quickly any bugs will be discovered and fixed.
> Unfortunately, I have
> not seen a single response saying - I've been using MA for X years on
> Y projects and
> plan on using it as we upgrade to numpy.  There was a lot of
> theoretical discussions
> and a pointer to a plotting library that has recently added MA
> support, but no testimony
> from end users.
>
> > I am close to absolutely positive that subclassing won't particularly
> > ease the task.
> >
> I thought about this a little, and I think you are right. Subclassing
> may improve speed a little, but all methods will need to be adapted
> the same ways as it is done without subclassing.
>
> > For the reason I indicated, I don't care to engage in public discussions
> > of complex technical issues so I have not cc'd this to the group.
> >
> I respect that, but please allow me to forward at least portions of
> this correspondence
> to the community. Your insights are invaluable.
>
> -- sasha
>
> >
> > Sasha wrote:
> > > MA is intended to be a drop-in replacement for Numeric arrays that can
> > > explicitely handle missing observations.  With the recent improvements
> > > to the array object in NumPy, the MA library has fallen behind.  There
> > > are more than 50 methods in the ndarray object that are not present in
> > > ma.array.
> > >
> > > I would like to hear from people who work with datasets with missing
> > > observations? Do you use MA? Do you think with the support for nan's
> > > and replaceable mathematical operations, should missing observations
> > > be handled in numpy using special values rather than an array of
> > > masks?
> > >
> > > Thanks.
> > >
> > > -- sasha
> > >
> > >
> > > -------------------------------------------------------
> > > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> > > for problems?  Stop!  Download the new AJAX search engine that makes
> > > searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> > > http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
> > > _______________________________________________
> > > Numpy-discussion mailing list
> > > Numpy-discussion at lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> > >
> >
>




More information about the NumPy-Discussion mailing list