Re: [Numpy-discussion] Who will use numpy.ma?

With Paul's permission I am posting his arguments and my responses. Numpy.ma will follow Paul's design and there is now a wiki page dedicated to the effort to make ma work better in numpy. (See http://projects.scipy.org/scipy/numpy/wiki/MaskedArray). -- sasha On 1/12/06, Sasha <ndarray@mac.com> wrote:
Paul,
Thank you very much for your insights and once again thanks for all the great work that you've done. I've noticed that your reply was not posted on any list, do you mind if I forward it to numpy-user? Please see more below.
On 1/12/06, Paul F. Dubois <paul@pfdubois.com> wrote:
What special values? Are you sure this works on any platform? What, for example, is the special value for integer arrays? For arrays of objects?
Yes, these are hard questions. For floats nan is an obvious choice and IEEE support is getting better on the new hardware. For objects None is a fine choice. For integers some may argue for sys.maxint, and given that numpy integer arithmetics is already handling overflow a check for maxint will not add much complexity. Yet don't get me wrong: I don't see any replacement for ma myself.
How do replaceable mathematical operations make any difference? The fundamental problem is that if array x has special values in some places and array y has them in some other places, how do you create a result that has special values in the correct places AND is of a type for which those special values are still treated as 'missing'. How do you do this?
Replaceable operations would allow one to redefine all operations on integer arrays to treat say sys.maxint as invariant and cast it to nan in floating point conversions without changing the logic of main line numpy.
I converted MA to ma but did not have time to flesh out all the differences with the new ndarray. I was hoping the community would do that.
Me too. That's was the point of my post - to find out the size of the comunity rather than to suggest an alternative.
I am retired.
You deserve it.
It is my belief that the approach you outline is not workable, but perhaps I am not understanding it properly.
I don't have any workable approach other than enchancing ma to work better with numpy. This is what I am doing right now.
If I, who have thought about this a lot, do not know for sure, what information can you derive from a poll of the general public, who will not think through these issues very carefully?
I was trying to poll numpy community to find out how many people actually use ma in real projects. This would determine how well tested the new features will be and how quickly any bugs will be discovered and fixed. Unfortunately, I have not seen a single response saying - I've been using MA for X years on Y projects and plan on using it as we upgrade to numpy. There was a lot of theoretical discussions and a pointer to a plotting library that has recently added MA support, but no testimony from end users.
I am close to absolutely positive that subclassing won't particularly ease the task.
I thought about this a little, and I think you are right. Subclassing may improve speed a little, but all methods will need to be adapted the same ways as it is done without subclassing.
For the reason I indicated, I don't care to engage in public discussions of complex technical issues so I have not cc'd this to the group.
I respect that, but please allow me to forward at least portions of this correspondence to the community. Your insights are invaluable.
-- sasha
Sasha wrote:
MA is intended to be a drop-in replacement for Numeric arrays that can explicitely handle missing observations. With the recent improvements to the array object in NumPy, the MA library has fallen behind. There are more than 50 methods in the ndarray object that are not present in ma.array.
I would like to hear from people who work with datasets with missing observations? Do you use MA? Do you think with the support for nan's and replaceable mathematical operations, should missing observations be handled in numpy using special values rather than an array of masks?
Thanks.
-- sasha
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
participants (1)
-
Sasha