[Pandas-dev] What could a pandas 2.0 look like?

Joris Van den Bossche jorisvandenbossche at gmail.com
Mon Feb 17 06:36:54 EST 2020


>
> > This would also imply creating a nullable float dtype and making our
> datelikes use NA rather than NaT too. That seemed to be generally OK, but
> wasn't discussed too much.
>
> My understanding of the discussion is that using a mask on top of
> datetimelike arrays would not _replace_ NaT, but supplement it with
> something semantically different.
>

Yes, if we see it similar as NaNs for floats (where NaN is a specific float
value in the data array, while NAs are tracked in the mask array), then for
datetimelike arrays we can do something similar. And the same discussions
about to what extent to distinguish NaN and NA or whether we need to
provide options that we are going to have for float dtypes, will also be
relevant for datetimelike dtypes (but then for NaT and NA).

But note that in practice, I *think* that the big majority of use cases
will mostly use NA and not NaT in the data (eg when reading from files that
have missing data).

Replacing NaT with NA breaks arithmetic consistency, as has been discussed
> ad nauseum.
>

It's not fully clear to me what you want to say with this, so a more
detailed clarification is welcome (I mean, I understand the sentence and
remember the discussion, but don't fully understand the point being made in
context, or in what direction you think more discussion is needed).

Assume we introduce a new "nullable datetime" dtype that uses a mask to
track NAs, and can still have NaT in the values. In practice, this still
means that we "replace NaT with NA" (because even though NaT is still
possible, I think you would mostly get NAs as mentioned above; eg reading a
file would now give NA instaed of NaT).
So do you mean: "in my opinion, we should not do this" (what I just
described above), because in practice that would mean breaking arithmetic
consistency? Or that if we want to start using NA for datetimelike dtypes,
you think "dtype-parametrized" NA values are necessary (so you can
distinguish NA[datetime] and NA[timedelta] ?)

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/89c276d2/attachment.html>


More information about the Pandas-dev mailing list