[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Laurent Gautier
lgautier at gmail.com
Sat Jun 25 02:36:04 EDT 2011
On 2011-06-24 17:30, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Jun 24, 2011 at 10:07, Laurent Gautier<lgautier at gmail.com> wrote:
>> > On 2011-06-24 16:43, Robert Kern<robert.kern at gmail.com> wrote:
>>> >>
>>> >> On Fri, Jun 24, 2011 at 09:33, Charles R Harris
>>> >> <charlesr.harris at gmail.com> wrote:
>>>> >>>
>>>>> >>> >
>>>>> >>> > ?On Fri, Jun 24, 2011 at 8:06 AM, Robert Kern<robert.kern at gmail.com>
>>>>> >>> > ?wrote:
>>>>> >>>>
>>>>>>> >>>> >> ?The alternative proposal would be to add a few new dtypes that are
>>>>>>> >>>> >> ?NA-aware. E.g. an nafloat64 would reserve a particular NaN value
>>>>>>> >>>> >> ?(there are lots of different NaN bit patterns, we'd just reserve
>>>>>>> >>>> >> one)
>>>>>>> >>>> >> ?that would represent NA. An naint32 would probably reserve the most
>>>>>>> >>>> >> ?negative int32 value (like R does). Using the NA-aware dtypes
>>>>>>> >>>> >> signals
>>>>>>> >>>> >> ?that you are using NA values; there is no need for an additional
>>>>>>> >>>> >> flag.
>>>> >>>
>>>>> >>> >
>>>>> >>> > ?Definitely better names than r-int32. Going this way has the advantage
>>>>> >>> > of
>>>>> >>> > ?reducing the friction between R and numpy, and since R has pretty much
>>>>> >>> > ?become the standard software for statistics that is an important
>>>>> >>> > ?consideration.
>>> >>
>>> >> I would definitely steal their choices of NA value for naint32 and
>>> >> nafloat64. I have reservations about their string NA value (i.e. 'NA')
>>> >> as anyone doing business in North America and other continents may
>>> >> have issues with that....
>> >
>> > May be there is not so much need for reservation over the string NA, when
>> > making the distinction between:
>> > a- the internal representation of a "missing string" (what is stored in
>> > memory, and that C-level code would need to be aware of)
>> > b- the 'external' representation of a missing string (in Python, what would
>> > be returned by repr() )
>> > c- what is assumed to be a missing string value when reading from a file.
>> >
>> > a/ is not 'NA', c/ should be a parameter in the relevant functions, b/ can
>> > be configured as a module-level, class-level, or instance-level variable.
> In R, a/ happens to be 'NA', unfortunately. :-/
In a sense yes, in a sense no.
There is NA_STRING (that happens to store 'NA', but it could equally be
'foobar' or whatever) and there is "NA".
NA_STRING is set once for all, and each time a string element in a
vector is set to NA this points to that one.
A string "NA" is not the NA_STRING.
> I'm not really sure how they handle datasets that use valid 'NA'
> values. Presumably, their input routines allow one to convert such
> values to something else such that it can use 'NA'==NA internally.
That's c/. Example in R's read.table(..., na.strings = "NA", ...).
A number of R design choices (that are S design choices here) are
empirical and based on experience.
Datasets can originally be in a number of different flavours and
conversion can be made when reading data into memory / R format.
L.
> -- Robert Kern
More information about the NumPy-Discussion
mailing list