
On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root <ben.root@ou.edu> wrote:
this by making missing data front-and-center. However, my belief is that Mark's approach is easier to comprehend and is cleaner. Cleaner features means that it is more likely to be used.
Cleaner features may be easier to adopt, but whether they are used or not depends on whether they address the problem in hand. The implementation as it stands essentially gives us a faster and more integrated version of numpy.ma; but it has become clear from this conversation that such an approach overlooks a very common subset of masked-related problems. We should be concerned about memory use; we often don't have too much of it, and accessing it is slow. Would it be workable to store 8 mask bits per byte instead? I don't think it should impact on the speed much, and we can always generate a full mask for the user on request. Regards Stéfan