[Numpy-discussion] Masking through generator arrays

Inati, Souheil (NIH/NIMH) [E] souheil.inati at nih.gov
Thu May 10 14:55:41 EDT 2012

On May 10, 2012, at 2:23 PM, Chris Barker wrote:

> On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> What would serve me? I use NumPy as a glorified "double*".
>> all I want is my glorified
>> "double*". I'm probably not a representative user.)
> Actually, I think you are representative of a LOT of users -- it
> turns, out, whether Jim Huginin originally was thinking this way or
> not, but numpy arrays are really powerful because the provide BOTH and
> nifty, full featured array object in Python, AND a wrapper around a
> generic "double*" (actually char*, that could be any type).
> This is are really widely used feature, and has become even more so
> with Cython's numpy support.
> That is one of my concerns about the "bit pattern" idea -- we've then
> created a new binary type that no other standard software understands
> -- that looks like a a lot of work to me to deal with, or even worse,
> ripe for weird, non-obvious errors in code that access that good-old
> char*.
> So I'm happier with a mask implementation -- more memory, yes, but it
> seems more robust an easy to deal with with outside code.
> But either way, Dag's key point is right on -- in Cython (or any other
> code) -- we need to make sure ti's easy to get a regular old pointer
> to a regular old C array, and get something else by accident.
> -Chris


As a physicist who uses numpy to develop MRI image reconstruction and data analysis methods, I really do think of numpy as a glorified double with a nice way to call useful numerical methods.  I also use external methods all the time and it's of the utmost importance to have a pointer to a block of data that I can say is N complex doubles or something.  Using a separate array for a mask is not a big deal.  At worst it's a factor of 2 in memory.  It forces me to pay attention to what I'm doing, and if I want to do an SVD on my data, I better keep track of what I'm doing myself.

I am not that old, but I'm old enough to remember when matlab was really just this - glorified double with a nice slicing/view interface and a thin wrapper around eispack and linpack.  (here is a great article by Cleve Moler from 2000: http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter2000.cleve.html).  You used to read in some ints from a data file and they converted it to double and you knew that if you got numerical precision errors it was because your algorithm was wrong or you were inverting some nearly singular matrix or something, not because of overflow.  And they made a copy of the data every time you called a function.  It had serious limitations, but what it did just worked.  And then they started to get fancy and it took them a REALLY long time and a lot of versions and man hours to get that all sorted out, with lazy evaluations and classes and sparse arrays and all that.

I'm not saying what the developers of numpy should do about the masked array thing and I really can't comment on how other people use numpy.  I also don't really have much of a say about the technical implementations of the guts of numpy, but it's worth asking really simple questions like:  I want to do an SVD on a 2D array with some missing or masked data.  What should happen?  This seems like such a simple question, but really it is incredibly complicated, or rather, it's very hard for numpy which is a foundation framework type of code to guess what the user means.

Anyway, that's my point of view.  I'm really happy numpy exists and works as well as it does and I'm thankful that there are developers out there that can build something so useful.


Souheil Inati, PhD
Staff Scientist
Functional MRI Facility

> -- 
> Christopher Barker, Ph.D.
> Oceanographer
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list