datetime64: Remove deprecation warning when constructing with timezone
![](https://secure.gravatar.com/avatar/3ef1af3f43e91a0acd17c0739681de5d.jpg?s=120&d=mm&r=g)
Hi, I suggest removing the deprecation warning when constructing a datetime64 with a timezone. For example, this is the current behavior:
np.datetime64('2020-11-05 16:00+0200') <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future numpy.datetime64('2020-11-05T14:00')
I suggest removing the deprecation warning because I find this to be a useful behavior, and because it is a correct behavior. The manual says: "The datetime object represents a single moment in time... Datetimes are always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z." So 2020-11-05T16:00+0200 is indeed the moment in time represented by np.datetime64('2020-11-05T14:00'). I just used this to restrict my data set to records created after a certain moment. It was easier for me to write the moment in my local time and add "+0200" than to figure out the moment representation in UTC. So this is my simple suggestion: remove the deprecation warning. Beyond that, I have 3 ideas for changing the repr of datetime64 that I would like to discuss. 1. Add "Z" at the end, for example, numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which moment it refers. I think this is significant - I had to dig quite a bit to realize that datetime64('2020-11-05T14:00') means 14:00 UTC. 2. Replace the 'T' with a space. I just find it much easier to read '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of characters makes it hard for my brain to parse. 3. This will require discussion, but will be very convenient: have the repr display the time using the environment time zone, including a time offset. So, in my specific time zone (+0200), I will have: repr(np.datetime64('2020-11-05 14:00Z')) == "numpy.datetime64('2020-11-05T16:00+0200')" I'm sure the pros and cons of having an environment-dependent repr should be discussed. But I will list some pros: 1. It's very convenient - it's immediately obvious to me to which moment 2020-11-05 16:00+0200 refers. 2. It's well defined - I may collect timestamps from machines with different time zones, and I will be able to know to which exact moment each timestamp refers. 3. It's very simple - I could compare any two timestamps, I don't have to worry about time zones. I would be happy to hear your thoughts. Thanks, Noam
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Without weighing in yet on how I feel about the deprecation, you can see some discussion about why this was originally deprecated in the PR that introduced the warning: https://github.com/numpy/numpy/pull/6453 Eric On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamraph@gmail.com> wrote:
Hi,
I suggest removing the deprecation warning when constructing a datetime64 with a timezone. For example, this is the current behavior:
np.datetime64('2020-11-05 16:00+0200') <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future numpy.datetime64('2020-11-05T14:00')
I suggest removing the deprecation warning because I find this to be a useful behavior, and because it is a correct behavior. The manual says: "The datetime object represents a single moment in time... Datetimes are always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z." So 2020-11-05T16:00+0200 is indeed the moment in time represented by np.datetime64('2020-11-05T14:00').
I just used this to restrict my data set to records created after a certain moment. It was easier for me to write the moment in my local time and add "+0200" than to figure out the moment representation in UTC.
So this is my simple suggestion: remove the deprecation warning.
Beyond that, I have 3 ideas for changing the repr of datetime64 that I would like to discuss.
1. Add "Z" at the end, for example, numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which moment it refers. I think this is significant - I had to dig quite a bit to realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
2. Replace the 'T' with a space. I just find it much easier to read '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of characters makes it hard for my brain to parse.
3. This will require discussion, but will be very convenient: have the repr display the time using the environment time zone, including a time offset. So, in my specific time zone (+0200), I will have:
repr(np.datetime64('2020-11-05 14:00Z')) == "numpy.datetime64('2020-11-05T16:00+0200')"
I'm sure the pros and cons of having an environment-dependent repr should be discussed. But I will list some pros: 1. It's very convenient - it's immediately obvious to me to which moment 2020-11-05 16:00+0200 refers. 2. It's well defined - I may collect timestamps from machines with different time zones, and I will be able to know to which exact moment each timestamp refers. 3. It's very simple - I could compare any two timestamps, I don't have to worry about time zones.
I would be happy to hear your thoughts.
Thanks, Noam _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
I can try to dig up the old discussions, but datetime64 used to implement both (1) and (3), and this was updated in a very intentional way. Datetime64 now works like Python's own time-zone naive datetime.datetime objects. The documentation referencing "Z" should be updated -- datetime64 can be in any timezone you like. Timezone aware datetime objects are certainly useful, but NumPy's datetime64 was restricted to UTC. The consensus was that it was worse to have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often used for data analysis purposes, for which automatic conversion to the local timezone of the computer running the analysis is often counter-productive. If you care about timezone conversions, I would highly recommend looking into pandas's Timestamp class for this purpose. In the future, this would be a good use-case for a new custom NumPy dtype. (The existing np.datetime64 code cannot easily handle multiple timezones.) On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Without weighing in yet on how I feel about the deprecation, you can see some discussion about why this was originally deprecated in the PR that introduced the warning:
https://github.com/numpy/numpy/pull/6453
Eric
On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamraph@gmail.com> wrote:
Hi,
I suggest removing the deprecation warning when constructing a datetime64 with a timezone. For example, this is the current behavior:
np.datetime64('2020-11-05 16:00+0200') <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future numpy.datetime64('2020-11-05T14:00')
I suggest removing the deprecation warning because I find this to be a useful behavior, and because it is a correct behavior. The manual says: "The datetime object represents a single moment in time... Datetimes are always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z." So 2020-11-05T16:00+0200 is indeed the moment in time represented by np.datetime64('2020-11-05T14:00').
I just used this to restrict my data set to records created after a certain moment. It was easier for me to write the moment in my local time and add "+0200" than to figure out the moment representation in UTC.
So this is my simple suggestion: remove the deprecation warning.
Beyond that, I have 3 ideas for changing the repr of datetime64 that I would like to discuss.
1. Add "Z" at the end, for example, numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which moment it refers. I think this is significant - I had to dig quite a bit to realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
2. Replace the 'T' with a space. I just find it much easier to read '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of characters makes it hard for my brain to parse.
3. This will require discussion, but will be very convenient: have the repr display the time using the environment time zone, including a time offset. So, in my specific time zone (+0200), I will have:
repr(np.datetime64('2020-11-05 14:00Z')) == "numpy.datetime64('2020-11-05T16:00+0200')"
I'm sure the pros and cons of having an environment-dependent repr should be discussed. But I will list some pros: 1. It's very convenient - it's immediately obvious to me to which moment 2020-11-05 16:00+0200 refers. 2. It's well defined - I may collect timestamps from machines with different time zones, and I will be able to know to which exact moment each timestamp refers. 3. It's very simple - I could compare any two timestamps, I don't have to worry about time zones.
I would be happy to hear your thoughts.
Thanks, Noam _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/3ef1af3f43e91a0acd17c0739681de5d.jpg?s=120&d=mm&r=g)
Hi, I actually arrived at this by first trying to use pandas.Timestamp and getting very frustrated about it. With pandas, I get:
pd.Timestamp.now() Timestamp('2020-11-06 09:45:24.249851')
I find the whole notion of a "timezone naive timestamp" to be nearly meaningless. A timestamp should mean a moment in time (as the current numpy documentation defines very well). A "naive timestamp" doesn't mean anything. It's exactly like a "unit naive length". I can have a Length type which just takes a number, and be very happy that it works both if my "unit zone" is inches or centimeters. So "Length(3)" will mean 3 cm in most of the world and 3 inches in the US. But then, if I get "Length(3)" from someone, I can't be sure what length it refers to. So currently, this happens with pandas timestamps:
os.environ['TZ'] = 'UTC'; time.tzset() ... t0 = pd.Timestamp.now() ... time.sleep(1) ... os.environ['TZ'] = 'EST-5'; time.tzset() ... t1 = pd.Timestamp.now() ... t1 - t0 Timedelta('0 days 05:00:01.001583')
This is not just theoretical - I actually need to work with data from several devices, each in its own time zone. And I need to know that I won't get such meaningless results. And you can even get something like this:
t0 = pd.Timestamp.now() ... time.sleep(10) ... t1 = pd.Timestamp.now() ... t1 - t0 Timedelta('0 days 01:00:10.001583')
if the first measurement happened to be in winter time and the second measurement happened to be in daylight saving time. The solution is simple, and is what datetime64 used to do before the change - have a type that just represents a moment in time. It's not "in UTC" - it just stores the number of seconds that passed since an agreed moment in time (which is usually 1970-01-01 02:00+0200, which is more commonly referred to as 1970-01-01 00:00Z - it's the exact same moment). I think it would make things clearer if I'll mention that there are operations that are not dealing with timestamps. For example, it's meaningless to ask what is the year of a timestamp - it may depend on the time zone. These are always *human* related questions, that depend on certain human conventions. We can call them "calendar questions". For these types of questions, a type that includes both a timestamp and a timezone offset (in minutes from UTC) can be useful. Some questions even require full timezone information, meaning a function that defines what's the timezone offset for each moment. However, I don't think numpy should deal with those calendar issues. As a very simple example, even for "timestamp+offset" types, it's not clear how to compare them - should values with the same timestamp and different offsets be considered equal or not? And in virtually all of my data analysis, this calendar aspect has nothing to do with the questions I'm trying to answer. I have a suggestion. Instead of changing datetime64 (which I consider to be ill-defined, but never mind), add a new type called "timestamp64". It will have the exact same behavior as datetime64 had before the change, except that its only allowed units will be seconds, milliseconds, microseconds and nanoseconds. Removing the longer units will make it clear that it doesn't deal with calendar and dates. Also, all the business day functionality will not be applicable to timestamp64. In order to get calendar information (such as the year) from timestamp64, you will have to manually convert it to python's datetime (or to np.datetime64) with an explicit timezone (utc, local, an offset, or a timezone object). What do you think? Thanks, Noam On Fri, Nov 6, 2020 at 1:45 AM Stephan Hoyer <shoyer@gmail.com> wrote:
I can try to dig up the old discussions, but datetime64 used to implement both (1) and (3), and this was updated in a very intentional way. Datetime64 now works like Python's own time-zone naive datetime.datetime objects. The documentation referencing "Z" should be updated -- datetime64 can be in any timezone you like.
Timezone aware datetime objects are certainly useful, but NumPy's datetime64 was restricted to UTC. The consensus was that it was worse to have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often used for data analysis purposes, for which automatic conversion to the local timezone of the computer running the analysis is often counter-productive.
If you care about timezone conversions, I would highly recommend looking into pandas's Timestamp class for this purpose. In the future, this would be a good use-case for a new custom NumPy dtype. (The existing np.datetime64 code cannot easily handle multiple timezones.)
On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Without weighing in yet on how I feel about the deprecation, you can see some discussion about why this was originally deprecated in the PR that introduced the warning:
https://github.com/numpy/numpy/pull/6453
Eric
On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamraph@gmail.com> wrote:
Hi,
I suggest removing the deprecation warning when constructing a datetime64 with a timezone. For example, this is the current behavior:
np.datetime64('2020-11-05 16:00+0200') <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future numpy.datetime64('2020-11-05T14:00')
I suggest removing the deprecation warning because I find this to be a useful behavior, and because it is a correct behavior. The manual says: "The datetime object represents a single moment in time... Datetimes are always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z." So 2020-11-05T16:00+0200 is indeed the moment in time represented by np.datetime64('2020-11-05T14:00').
I just used this to restrict my data set to records created after a certain moment. It was easier for me to write the moment in my local time and add "+0200" than to figure out the moment representation in UTC.
So this is my simple suggestion: remove the deprecation warning.
Beyond that, I have 3 ideas for changing the repr of datetime64 that I would like to discuss.
1. Add "Z" at the end, for example, numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which moment it refers. I think this is significant - I had to dig quite a bit to realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
2. Replace the 'T' with a space. I just find it much easier to read '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of characters makes it hard for my brain to parse.
3. This will require discussion, but will be very convenient: have the repr display the time using the environment time zone, including a time offset. So, in my specific time zone (+0200), I will have:
repr(np.datetime64('2020-11-05 14:00Z')) == "numpy.datetime64('2020-11-05T16:00+0200')"
I'm sure the pros and cons of having an environment-dependent repr should be discussed. But I will list some pros: 1. It's very convenient - it's immediately obvious to me to which moment 2020-11-05 16:00+0200 refers. 2. It's well defined - I may collect timestamps from machines with different time zones, and I will be able to know to which exact moment each timestamp refers. 3. It's very simple - I could compare any two timestamps, I don't have to worry about time zones.
I would be happy to hear your thoughts.
Thanks, Noam _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/12f80aad6f2859c6a58eec0b3e93b164.jpg?s=120&d=mm&r=g)
I find the whole notion of a "timezone naive timestamp" to be nearly meaningless
From the perspective of, say, the dateutil parser, what would you do with "2020-11-06 07:48"? If you assume it's UTC you'll be wrong in this case. If you assume it is in your local timezone, you'll be wrong in Europe. Timezone-naive datetimes are an abstraction for exactly this case.
t0 = pd.Timestamp.now()
You can use `pd.Timestamp.now("UTC")`. See also https://mail.python.org/archives/list/datetime-sig@python.org/thread/PT4JWJL... , https://github.com/pandas-dev/pandas/issues/22451 On Fri, Nov 6, 2020 at 2:48 AM Noam Yorav-Raphael <noamraph@gmail.com> wrote:
Hi,
I actually arrived at this by first trying to use pandas.Timestamp and getting very frustrated about it. With pandas, I get:
pd.Timestamp.now() Timestamp('2020-11-06 09:45:24.249851')
I find the whole notion of a "timezone naive timestamp" to be nearly meaningless. A timestamp should mean a moment in time (as the current numpy documentation defines very well). A "naive timestamp" doesn't mean anything. It's exactly like a "unit naive length". I can have a Length type which just takes a number, and be very happy that it works both if my "unit zone" is inches or centimeters. So "Length(3)" will mean 3 cm in most of the world and 3 inches in the US. But then, if I get "Length(3)" from someone, I can't be sure what length it refers to.
So currently, this happens with pandas timestamps:
os.environ['TZ'] = 'UTC'; time.tzset() ... t0 = pd.Timestamp.now() ... time.sleep(1) ... os.environ['TZ'] = 'EST-5'; time.tzset() ... t1 = pd.Timestamp.now() ... t1 - t0 Timedelta('0 days 05:00:01.001583')
This is not just theoretical - I actually need to work with data from several devices, each in its own time zone. And I need to know that I won't get such meaningless results.
And you can even get something like this:
t0 = pd.Timestamp.now() ... time.sleep(10) ... t1 = pd.Timestamp.now() ... t1 - t0 Timedelta('0 days 01:00:10.001583')
if the first measurement happened to be in winter time and the second measurement happened to be in daylight saving time.
The solution is simple, and is what datetime64 used to do before the change - have a type that just represents a moment in time. It's not "in UTC" - it just stores the number of seconds that passed since an agreed moment in time (which is usually 1970-01-01 02:00+0200, which is more commonly referred to as 1970-01-01 00:00Z - it's the exact same moment).
I think it would make things clearer if I'll mention that there are operations that are not dealing with timestamps. For example, it's meaningless to ask what is the year of a timestamp - it may depend on the time zone. These are always *human* related questions, that depend on certain human conventions. We can call them "calendar questions". For these types of questions, a type that includes both a timestamp and a timezone offset (in minutes from UTC) can be useful. Some questions even require full timezone information, meaning a function that defines what's the timezone offset for each moment. However, I don't think numpy should deal with those calendar issues. As a very simple example, even for "timestamp+offset" types, it's not clear how to compare them - should values with the same timestamp and different offsets be considered equal or not? And in virtually all of my data analysis, this calendar aspect has nothing to do with the questions I'm trying to answer.
I have a suggestion. Instead of changing datetime64 (which I consider to be ill-defined, but never mind), add a new type called "timestamp64". It will have the exact same behavior as datetime64 had before the change, except that its only allowed units will be seconds, milliseconds, microseconds and nanoseconds. Removing the longer units will make it clear that it doesn't deal with calendar and dates. Also, all the business day functionality will not be applicable to timestamp64. In order to get calendar information (such as the year) from timestamp64, you will have to manually convert it to python's datetime (or to np.datetime64) with an explicit timezone (utc, local, an offset, or a timezone object).
What do you think?
Thanks, Noam
On Fri, Nov 6, 2020 at 1:45 AM Stephan Hoyer <shoyer@gmail.com> wrote:
I can try to dig up the old discussions, but datetime64 used to implement both (1) and (3), and this was updated in a very intentional way. Datetime64 now works like Python's own time-zone naive datetime.datetime objects. The documentation referencing "Z" should be updated -- datetime64 can be in any timezone you like.
Timezone aware datetime objects are certainly useful, but NumPy's datetime64 was restricted to UTC. The consensus was that it was worse to have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often used for data analysis purposes, for which automatic conversion to the local timezone of the computer running the analysis is often counter-productive.
If you care about timezone conversions, I would highly recommend looking into pandas's Timestamp class for this purpose. In the future, this would be a good use-case for a new custom NumPy dtype. (The existing np.datetime64 code cannot easily handle multiple timezones.)
On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Without weighing in yet on how I feel about the deprecation, you can see some discussion about why this was originally deprecated in the PR that introduced the warning:
https://github.com/numpy/numpy/pull/6453
Eric
On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamraph@gmail.com> wrote:
Hi,
I suggest removing the deprecation warning when constructing a datetime64 with a timezone. For example, this is the current behavior:
> np.datetime64('2020-11-05 16:00+0200') <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future numpy.datetime64('2020-11-05T14:00')
I suggest removing the deprecation warning because I find this to be a useful behavior, and because it is a correct behavior. The manual says: "The datetime object represents a single moment in time... Datetimes are always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z." So 2020-11-05T16:00+0200 is indeed the moment in time represented by np.datetime64('2020-11-05T14:00').
I just used this to restrict my data set to records created after a certain moment. It was easier for me to write the moment in my local time and add "+0200" than to figure out the moment representation in UTC.
So this is my simple suggestion: remove the deprecation warning.
Beyond that, I have 3 ideas for changing the repr of datetime64 that I would like to discuss.
1. Add "Z" at the end, for example, numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which moment it refers. I think this is significant - I had to dig quite a bit to realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
2. Replace the 'T' with a space. I just find it much easier to read '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of characters makes it hard for my brain to parse.
3. This will require discussion, but will be very convenient: have the repr display the time using the environment time zone, including a time offset. So, in my specific time zone (+0200), I will have:
repr(np.datetime64('2020-11-05 14:00Z')) == "numpy.datetime64('2020-11-05T16:00+0200')"
I'm sure the pros and cons of having an environment-dependent repr should be discussed. But I will list some pros: 1. It's very convenient - it's immediately obvious to me to which moment 2020-11-05 16:00+0200 refers. 2. It's well defined - I may collect timestamps from machines with different time zones, and I will be able to know to which exact moment each timestamp refers. 3. It's very simple - I could compare any two timestamps, I don't have to worry about time zones.
I would be happy to hear your thoughts.
Thanks, Noam _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/3ef1af3f43e91a0acd17c0739681de5d.jpg?s=120&d=mm&r=g)
On Fri, Nov 6, 2020 at 5:58 PM Brock Mendel <jbrockmendel@gmail.com> wrote:
I find the whole notion of a "timezone naive timestamp" to be nearly meaningless
From the perspective of, say, the dateutil parser, what would you do with "2020-11-06 07:48"? If you assume it's UTC you'll be wrong in this case. If you assume it is in your local timezone, you'll be wrong in Europe. Timezone-naive datetimes are an abstraction for exactly this case.
I'm not sure what you mean by "the perspective of the dateutil parser". Indeed, "2020-11-06 07:48" is not a well-defined timestamp, since it doesn't define a specific moment in time. If you ask what a timestamp type should do when constructed from such a string, then I can think of two reasonable alternatives. One is to just not allow it, and perhaps provide a .from_local() method which makes it explicit. The other is to allow it, and make it clear that when an offset is not defined, it uses the environment's timezone to convert the string to a timestamp. I wouldn't use the third alternative, which is to parse it in UTC, since it doesn't add a lot of convenience since it's easy to add a "Z" to the string.
t0 = pd.Timestamp.now()
You can use `pd.Timestamp.now("UTC")`. See also https://mail.python.org/archives/list/datetime-sig@python.org/thread/PT4JWJL... , https://github.com/pandas-dev/pandas/issues/22451
Thanks for pointing this out. However, this doesn't work:
pd.Timestamp.fromtimestamp(time.time(), 'UTC') Traceback (most recent call last): ... TypeError: fromtimestamp() takes exactly 2 positional arguments (3 given)
Also, this doesn't work:
t0 = pd.Timestamp.now('UTC') ... t1 = pd.Timestamp.now('Asia/Jerusalem') ... t1 - t0 Traceback (most recent call last): ... TypeError: Timestamp subtraction must have the same timezones or no timezones
Also, this doesn't do what it probably should:
pd.Timestamp.now('UTC'), pd.Timestamp.now().tz_localize('UTC') (Timestamp('2020-11-07 20:18:38.719603+0000', tz='UTC'), Timestamp('2020-11-08 01:18:38.719701+0000', tz='UTC'))
(I have no idea how the second result was calculated, but it's wrong. It should have been equal to the first) So, pd.Timestamp is crap. I think that adding np.timestamp64 may finally bring a sane timestamp type to python. Thanks, Noam
On Fri, Nov 6, 2020 at 2:48 AM Noam Yorav-Raphael <noamraph@gmail.com> wrote:
Hi,
I actually arrived at this by first trying to use pandas.Timestamp and getting very frustrated about it. With pandas, I get:
pd.Timestamp.now() Timestamp('2020-11-06 09:45:24.249851')
I find the whole notion of a "timezone naive timestamp" to be nearly meaningless. A timestamp should mean a moment in time (as the current numpy documentation defines very well). A "naive timestamp" doesn't mean anything. It's exactly like a "unit naive length". I can have a Length type which just takes a number, and be very happy that it works both if my "unit zone" is inches or centimeters. So "Length(3)" will mean 3 cm in most of the world and 3 inches in the US. But then, if I get "Length(3)" from someone, I can't be sure what length it refers to.
So currently, this happens with pandas timestamps:
os.environ['TZ'] = 'UTC'; time.tzset() ... t0 = pd.Timestamp.now() ... time.sleep(1) ... os.environ['TZ'] = 'EST-5'; time.tzset() ... t1 = pd.Timestamp.now() ... t1 - t0 Timedelta('0 days 05:00:01.001583')
This is not just theoretical - I actually need to work with data from several devices, each in its own time zone. And I need to know that I won't get such meaningless results.
And you can even get something like this:
t0 = pd.Timestamp.now() ... time.sleep(10) ... t1 = pd.Timestamp.now() ... t1 - t0 Timedelta('0 days 01:00:10.001583')
if the first measurement happened to be in winter time and the second measurement happened to be in daylight saving time.
The solution is simple, and is what datetime64 used to do before the change - have a type that just represents a moment in time. It's not "in UTC" - it just stores the number of seconds that passed since an agreed moment in time (which is usually 1970-01-01 02:00+0200, which is more commonly referred to as 1970-01-01 00:00Z - it's the exact same moment).
I think it would make things clearer if I'll mention that there are operations that are not dealing with timestamps. For example, it's meaningless to ask what is the year of a timestamp - it may depend on the time zone. These are always *human* related questions, that depend on certain human conventions. We can call them "calendar questions". For these types of questions, a type that includes both a timestamp and a timezone offset (in minutes from UTC) can be useful. Some questions even require full timezone information, meaning a function that defines what's the timezone offset for each moment. However, I don't think numpy should deal with those calendar issues. As a very simple example, even for "timestamp+offset" types, it's not clear how to compare them - should values with the same timestamp and different offsets be considered equal or not? And in virtually all of my data analysis, this calendar aspect has nothing to do with the questions I'm trying to answer.
I have a suggestion. Instead of changing datetime64 (which I consider to be ill-defined, but never mind), add a new type called "timestamp64". It will have the exact same behavior as datetime64 had before the change, except that its only allowed units will be seconds, milliseconds, microseconds and nanoseconds. Removing the longer units will make it clear that it doesn't deal with calendar and dates. Also, all the business day functionality will not be applicable to timestamp64. In order to get calendar information (such as the year) from timestamp64, you will have to manually convert it to python's datetime (or to np.datetime64) with an explicit timezone (utc, local, an offset, or a timezone object).
What do you think?
Thanks, Noam
On Fri, Nov 6, 2020 at 1:45 AM Stephan Hoyer <shoyer@gmail.com> wrote:
I can try to dig up the old discussions, but datetime64 used to implement both (1) and (3), and this was updated in a very intentional way. Datetime64 now works like Python's own time-zone naive datetime.datetime objects. The documentation referencing "Z" should be updated -- datetime64 can be in any timezone you like.
Timezone aware datetime objects are certainly useful, but NumPy's datetime64 was restricted to UTC. The consensus was that it was worse to have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often used for data analysis purposes, for which automatic conversion to the local timezone of the computer running the analysis is often counter-productive.
If you care about timezone conversions, I would highly recommend looking into pandas's Timestamp class for this purpose. In the future, this would be a good use-case for a new custom NumPy dtype. (The existing np.datetime64 code cannot easily handle multiple timezones.)
On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Without weighing in yet on how I feel about the deprecation, you can see some discussion about why this was originally deprecated in the PR that introduced the warning:
https://github.com/numpy/numpy/pull/6453
Eric
On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamraph@gmail.com> wrote:
Hi,
I suggest removing the deprecation warning when constructing a datetime64 with a timezone. For example, this is the current behavior:
>> np.datetime64('2020-11-05 16:00+0200') <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future numpy.datetime64('2020-11-05T14:00')
I suggest removing the deprecation warning because I find this to be a useful behavior, and because it is a correct behavior. The manual says: "The datetime object represents a single moment in time... Datetimes are always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z." So 2020-11-05T16:00+0200 is indeed the moment in time represented by np.datetime64('2020-11-05T14:00').
I just used this to restrict my data set to records created after a certain moment. It was easier for me to write the moment in my local time and add "+0200" than to figure out the moment representation in UTC.
So this is my simple suggestion: remove the deprecation warning.
Beyond that, I have 3 ideas for changing the repr of datetime64 that I would like to discuss.
1. Add "Z" at the end, for example, numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which moment it refers. I think this is significant - I had to dig quite a bit to realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
2. Replace the 'T' with a space. I just find it much easier to read '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of characters makes it hard for my brain to parse.
3. This will require discussion, but will be very convenient: have the repr display the time using the environment time zone, including a time offset. So, in my specific time zone (+0200), I will have:
repr(np.datetime64('2020-11-05 14:00Z')) == "numpy.datetime64('2020-11-05T16:00+0200')"
I'm sure the pros and cons of having an environment-dependent repr should be discussed. But I will list some pros: 1. It's very convenient - it's immediately obvious to me to which moment 2020-11-05 16:00+0200 refers. 2. It's well defined - I may collect timestamps from machines with different time zones, and I will be able to know to which exact moment each timestamp refers. 3. It's very simple - I could compare any two timestamps, I don't have to worry about time zones.
I would be happy to hear your thoughts.
Thanks, Noam _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/9b53dbd9984126b6beae3b17bc221b59.jpg?s=120&d=mm&r=g)
Noam Yorav-Raphael wrote
The solution is simple, and is what datetime64 used to do before the change - have a type that just represents a moment in time. It's not "in UTC" - it just stores the number of seconds that passed since an agreed moment in time (which is usually 1970-01-01 02:00+0200, which is more commonly referred to as 1970-01-01 00:00Z - it's the exact same moment).
I agree with this. I understand the issue of parsing arbitrary timestamps with incomplete information, however it's not clear to me why it has become more difficult to work with ISO 8601 timestamps. For example, we use numpy.genfromtxt to load an array with UTC offset timestamps e.g. `2020-08-19T12:42:57.7903616-04:00`. If loading this array took 0.0352s without having to convert, it now takes 0.8615s with the following converter:
lambda x: dateutil.parser.parse(x).astimezone(timezone.utc).replace(tzinfo=None)
That's a huge performance hit to do something that should be considered a standard operation, namely loading ISO compliant data. There may be more efficient converters out there but it seems strange to employ an external function to remove precision from an ISO datatype. As an aside, with or without the converter, numpy.genfromtxt is consistently faster than numpy.loadtxt, despite the documentation stating otherwise. I feel there's a lack of guidance in the documentation on this issue. In most threads I've encountered on this the first recommendation is to use pandas. The most effective way to crack a nut should not be to use a sledgehammer. The purpose of introducing standards should be to make these sorts of operations trivial and efficient. Perhaps I'm missing the solution here... -- Sent from: http://numpy-discussion.10968.n7.nabble.com/
participants (5)
-
Brock Mendel
-
Eric Wieser
-
k1o0
-
Noam Yorav-Raphael
-
Stephan Hoyer