[Pandas-dev] tslibs 2.0 and non-nanosecond datetime64/timedelta64

Sebastian Berg sebastian at sipsolutions.net
Sat May 30 18:39:07 EDT 2020


On Fri, 2020-05-29 at 09:37 -0700, Brock Mendel wrote:
> This is a discussion of what it would take to support non-nanosecond
> datetime64/timedelta64 dtypes and what decisions would need to be
> made
> along the way.
> 
> The implementation would probably consist of:
> - add a NPY_DATETIMEUNIT attribute to Timestamp and Datetime64TZDtype
> - for timezone-related methods:
>     - short-term: cast to nanosecond, use existing code, cast back to
> other
> unit
>     - longer-term: update existing code to support non-nano units
> directly
> - comb through the code for all the places where we implicitly assume
> nano
> units and update
> - tests, so, so many tests
> 
> We could then consider de-duplication. Tick is already redundant with
> Timedelta, and Timestamp[H] would render Period[H] redundant.  With
> appropriate deprecation cycle, we could rip out a bunch of code.
> 
> Another possibility is to try to upstream some code to numpy, which
> they
> have recently been receptive to (#16266
> <https://github.com/numpy/numpy/pull/16266>;, #16363
> <https://github.com/numpy/numpy/pull/16363>;, #16364
> <https://github.com/numpy/numpy/pull/16364>;, #16352
> <https://github.com/numpy/numpy/issues/16352>;,
> <https://github.com/numpy/numpy/issues/16195>#16195
> <https://github.com/numpy/numpy/issues/16195>;).  @rgommers tells me
> that
> trying to implement a tz-aware datetime64 dtype in numpy would be
> "folly,
> that way madness lies", but that it might be more feasible once
> @seberg's
> dtype refactor lands.

Timezones do seem like to much complexity to add to numpy.  And with
dtypes refactor should not actually be required to live within NumPy
hopefully soon.  The more likely discussion would be to go the opposite
direction :).  Since:

    np.array([datetime.datetime(2019, 1, 1)])

gives an object array, NumPy datetimes should not have any long term
advantage over an externally developed datetime (except living in the
prominent numpy namespace).

Having a new datetime dtype external to NumPy and with tz-info indeed
seems very desirable.  And I would be happy to have you in the loop, so
we could maybe even use it as an early test balloon by including it as
a test in NumPy. With the idea to later cut it out as a stand-alone
package.
But that would be mostly useful if you are excited to about getting a
small head-start.  In the end, it would likely help me/NumPy more then
you in terms of time-investment.

> More realistically short-term, if we convinced numpy
> to update NPY_DATETIMEUNIT to include the anchored quarter/year/week
> units
> we use for Period, we could condense a lot of confusing enum-like
> code.

On first sight, that does sound reasonable and probably only depends on
the complexity.  If it does not increase numpy's code complexity too
much (and obviously it decreases pandas' quite a bit more).  I assume
that this would mainly move some fairly straight forward and thoroughly
tested code from pandas into NumPy?

Can't say I am excited about reviewing datetime code, but upstreaming
seems much better for the community than band-aids in pandas...

- Sebastian


> 
> Tangentially related: with zoneinfo (PEP 615) we should consider
> making
> those our canonical tzinfos and converting any dateutil/pytz tzinfos
> we
> encounter to those.  They are implemented in C, so I'm _hopeful_ we
> can
> make some of our vectorized tzconversion code unnecessary.  @pganssle
> has
> suggested we implement our own tzinfos, but I'm holding out hope we
> can
> keep that upstream.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200530/e88dd6b0/attachment.sig>


More information about the Pandas-dev mailing list