[Numpy-discussion] NA/Missing Data Conference Call Summary

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Jul 6 16:08:34 EDT 2011


On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire
<cjordan1 at uw.edu> wrote:
>
>
> On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker <Chris.Barker at noaa.gov>
> wrote:
>>
>> Christopher Jordan-Squire wrote:
>> > If we follow those rules for IGNORE for all computations, we sometimes
>> > get some weird output. For example:
>> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
>> > multiply and not * with broadcasting.) Or should that sort of operation
>> > through an error?
>>
>> That should throw an error -- matrix computation is heavily influenced
>> by the shape and size of matrices, so I think IGNORES really don't make
>> sense there.
>>
>>
>
> If the IGNORES don't make sense in basic numpy computations then I'm kinda
> confused why they'd be included at the numpy core level.
>
>>
>> Nathaniel Smith wrote:
>> > It's exactly this transparency that worries Matthew and me -- we feel
>> > that the alterNEP preserves it, and the NEP attempts to erase it. In
>> > the NEP, there are two totally different underlying data structures,
>> > but this difference is blurred at the Python level. The idea is that
>> > you shouldn't have to think about which you have, but if you work with
>> > C/Fortran, then of course you do have to be constantly aware of the
>> > underlying implementation anyway.
>>
>> I don't think this bothers me -- I think it's analogous to things in
>> numpy like Fortran order and non-contiguous arrays -- you can ignore all
>> that when working in pure python when performance isn't critical, but
>> you need a deeper understanding if you want to work with the data in C
>> or Fortran or to tune performance in python.
>>
>> So as long as there is an API to query and control how things work, I
>> like that it's hidden from simple python code.
>>
>> -Chris
>>
>>
>
> I'm similarly not too concerned about it. Performance seems finicky when
> you're dealing with missing data, since a lot of arrays will likely have to
> be copied over to other arrays containing only complete data before being
> handed over to BLAS.

Unless you know the neutral value for the computation or you just want
to do a forward_fill in time series, and you have to ask the user not
to give you an unmutable array with NAs if they don't want extra
copies.

Josef

> My primary concern is that the np.NA stuff 'just
> works'. Especially since I've never run into use cases in statistics where
> the difference between IGNORE and NA mattered.
>
>
>>
>>
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.Barker at noaa.gov
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



More information about the NumPy-Discussion mailing list