[Numpy-discussion] datetime64: Remove deprecation warning when constructing with timezone

Brock Mendel jbrockmendel at gmail.com
Fri Nov 6 10:57:41 EST 2020


> I find the whole notion of a "timezone naive timestamp" to be nearly
meaningless

>From the perspective of, say, the dateutil parser, what would you do with
"2020-11-06 07:48"?  If you assume it's UTC you'll be wrong in this case.
If you assume it is in your local timezone, you'll be wrong in Europe.
Timezone-naive datetimes are an abstraction for exactly this case.

>>> t0 = pd.Timestamp.now()

You can use `pd.Timestamp.now("UTC")`.  See also
https://mail.python.org/archives/list/datetime-sig@python.org/thread/PT4JWJLYBE5R2QASVBPZLHH37ULJQR43/
, https://github.com/pandas-dev/pandas/issues/22451





On Fri, Nov 6, 2020 at 2:48 AM Noam Yorav-Raphael <noamraph at gmail.com>
wrote:

> Hi,
>
> I actually arrived at this by first trying to use pandas.Timestamp and
> getting very frustrated about it. With pandas, I get:
>
> >>> pd.Timestamp.now()
> Timestamp('2020-11-06 09:45:24.249851')
>
> I find the whole notion of a "timezone naive timestamp" to be nearly
> meaningless. A timestamp should mean a moment in time (as the current numpy
> documentation defines very well). A "naive timestamp" doesn't mean
> anything. It's exactly like a "unit naive length". I can have a Length type
> which just takes a number, and be very happy that it works both if my "unit
> zone" is inches or centimeters. So "Length(3)" will mean 3 cm in most of
> the world and 3 inches in the US. But then, if I get "Length(3)" from
> someone, I can't be sure what length it refers to.
>
> So currently, this happens with pandas timestamps:
>
> >>> os.environ['TZ'] = 'UTC'; time.tzset()
> ... t0 = pd.Timestamp.now()
> ... time.sleep(1)
> ... os.environ['TZ'] = 'EST-5'; time.tzset()
> ... t1 = pd.Timestamp.now()
> ... t1 - t0
> Timedelta('0 days 05:00:01.001583')
>
> This is not just theoretical - I actually need to work with data from
> several devices, each in its own time zone. And I need to know that I won't
> get such meaningless results.
>
> And you can even get something like this:
>
> >>> t0 = pd.Timestamp.now()
> ... time.sleep(10)
> ... t1 = pd.Timestamp.now()
> ... t1 - t0
> Timedelta('0 days 01:00:10.001583')
>
> if the first measurement happened to be in winter time and the second
> measurement happened to be in daylight saving time.
>
> The solution is simple, and is what datetime64 used to do before the
> change - have a type that just represents a moment in time. It's not "in
> UTC" - it just stores the number of seconds that passed since an agreed
> moment in time (which is usually 1970-01-01 02:00+0200, which is more
> commonly referred to as 1970-01-01 00:00Z - it's the exact same moment).
>
> I think it would make things clearer if I'll mention that there are
> operations that are not dealing with timestamps. For example, it's
> meaningless to ask what is the year of a timestamp - it may depend on the
> time zone. These are always *human* related questions, that depend on
> certain human conventions. We can call them "calendar questions". For these
> types of questions, a type that includes both a timestamp and a timezone
> offset (in minutes from UTC) can be useful. Some questions even require
> full timezone information, meaning a function that defines what's the
> timezone offset for each moment. However, I don't think numpy should deal
> with those calendar issues. As a very simple example, even for
> "timestamp+offset" types, it's not clear how to compare them - should
> values with the same timestamp and different offsets be considered equal or
> not? And in virtually all of my data analysis, this calendar aspect has
> nothing to do with the questions I'm trying to answer.
>
> I have a suggestion. Instead of changing datetime64 (which I consider to
> be ill-defined, but never mind), add a new type called "timestamp64". It
> will have the exact same behavior as datetime64 had before the change,
> except that its only allowed units will be seconds, milliseconds,
> microseconds and nanoseconds.  Removing the longer units will make it clear
> that it doesn't deal with calendar and dates. Also, all the business day
> functionality will not be applicable to timestamp64. In order to get
> calendar information (such as the year) from timestamp64, you will have to
> manually convert it to python's datetime (or to np.datetime64) with an
> explicit timezone (utc, local, an offset, or a timezone object).
>
> What do you think?
>
> Thanks,
> Noam
>
>
>
>
>
> On Fri, Nov 6, 2020 at 1:45 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> I can try to dig up the old discussions, but datetime64 used to implement
>> both (1) and (3), and this was updated in a very intentional way.
>> Datetime64 now works like Python's own time-zone naive datetime.datetime
>> objects. The documentation referencing "Z" should be updated -- datetime64
>> can be in any timezone you like.
>>
>> Timezone aware datetime objects are certainly useful, but NumPy's
>> datetime64 was restricted to UTC. The consensus was that it was worse to
>> have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often
>> used for data analysis purposes, for which automatic conversion to the
>> local timezone of the computer running the analysis is often
>> counter-productive.
>>
>> If you care about timezone conversions, I would highly recommend looking
>> into pandas's Timestamp class for this purpose. In the future, this would
>> be a good use-case for a new custom NumPy dtype. (The existing
>> np.datetime64 code cannot easily handle multiple timezones.)
>>
>> On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser <wieser.eric+numpy at gmail.com>
>> wrote:
>>
>>> Without weighing in yet on how I feel about the deprecation, you can see
>>> some discussion about why this was originally deprecated in the PR that
>>> introduced the warning:
>>>
>>> https://github.com/numpy/numpy/pull/6453
>>>
>>> Eric
>>>
>>> On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamraph at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I suggest removing the deprecation warning when constructing a
>>>> datetime64 with a timezone. For example, this is the current behavior:
>>>>
>>>> >>> np.datetime64('2020-11-05 16:00+0200')
>>>> <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is
>>>> deprecated; this will raise an error in the future
>>>> numpy.datetime64('2020-11-05T14:00')
>>>>
>>>> I suggest removing the deprecation warning because I find this to be a
>>>> useful behavior, and because it is a correct behavior. The manual says:
>>>> "The datetime object represents a single moment in time... Datetimes are
>>>> always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z."
>>>> So 2020-11-05T16:00+0200 is indeed the moment in time represented by
>>>> np.datetime64('2020-11-05T14:00').
>>>>
>>>> I just used this to restrict my data set to records created after a
>>>> certain moment. It was easier for me to write the moment in my local time
>>>> and add "+0200" than to figure out the moment representation in UTC.
>>>>
>>>> So this is my simple suggestion: remove the deprecation warning.
>>>>
>>>>
>>>> Beyond that, I have 3 ideas for changing the repr of datetime64 that I
>>>> would like to discuss.
>>>>
>>>> 1. Add "Z" at the end, for example,
>>>> numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which
>>>> moment it refers. I think this is significant - I had to dig quite a bit to
>>>> realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
>>>>
>>>> 2. Replace the 'T' with a space. I just find it much easier to read
>>>> '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of
>>>> characters makes it hard for my brain to parse.
>>>>
>>>> 3. This will require discussion, but will be very convenient: have the
>>>> repr display the time using the environment time zone, including a time
>>>> offset. So, in my specific time zone (+0200), I will have:
>>>>
>>>> repr(np.datetime64('2020-11-05 14:00Z')) ==
>>>> "numpy.datetime64('2020-11-05T16:00+0200')"
>>>>
>>>> I'm sure the pros and cons of having an environment-dependent repr
>>>> should be discussed. But I will list some pros:
>>>> 1. It's very convenient - it's immediately obvious to me to which
>>>> moment 2020-11-05 16:00+0200 refers.
>>>> 2. It's well defined - I may collect timestamps from machines with
>>>> different time zones, and I will be able to know to which exact moment each
>>>> timestamp refers.
>>>> 3. It's very simple - I could compare any two timestamps, I don't have
>>>> to worry about time zones.
>>>>
>>>> I would be happy to hear your thoughts.
>>>>
>>>> Thanks,
>>>> Noam
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201106/03037d9f/attachment.html>


More information about the NumPy-Discussion mailing list