Proposing an alternative to PEP 410
I've been meditating on the whole os.stat mtime representation thing. Here's a possible alternative approach. * Improve datetime.datetime objects so they support nanosecond resolution, in such a way that it's 100% painless to make them even more precise in the future. * Add support to datetime objects that allows adding and subtracting ints and floats as seconds. This behavior is controllable with a flag on the object--by default this behavior is off. * Support accepting naive datetime.datetime objects in all functions that accept a timestamp in os (utime etc). * Change the result of os.stat to be a custom class rather than a PyStructSequence. Support the sequence protocol on the custom class but mark it PendingDeprecation, to be removed completely in 3.5. (I can't take credit for this idea; MvL suggested it to me once while we were talking about this issue. Now that the os.stat object has named fields, who uses the struct unpacking anymore?) * Add support for setting "stat_float_times=2" (or perhaps "stat_float_times=datetime.datetime" ?) to enable returning st_[acm]time as naive datetime.datetime objects--specifically, ones that allow addition and subtraction of ints and floats. The value would be similar to calling datetime.datetime.fromdatetime() on the current float timestamp, but would preserve all available precision. * Add a new parameter to functions that produce stat-like timestamps to explicitly specify the type of the timestamps (float or datetime), as proposed in PEP 410. I realize datetime objects aren't a drop-in replacement for floats (or ints). In particular their str/repr representations are much more ornate. So I'd expect some breakage. Personally I think the adding/subtracting ints change is a tiny bit smelly--but this is a practicality beating purity thing. I propose making it non-default behavior just to minimize the effects of the change. Similarly, I realize os.stat_float_times was always a bit of a hack, what with it being global state and all. However the approach has the virtue of having worked in the past. I disagree with PEP 410's conclusions about the suitability of datetime as a timestamp object. I think "naive" datetime objects are a perfect fit. Specficially addressing PEP 410's concerns: * I don't propose doing anything about the other functions that have no explicit start time; I'm only proposing changing the functions that deal with timestamps. (Perhaps the right thing for epoch-less times like time.clock would be timedelta? But I think we can table this discussion for now.) * "You can't compare naive and non-naive datetimes." So what? The existing timestamp from os.stat is a float, and you can't compare floats and non-naive datetimes. How is this an issue? Perhaps someone else can propose something even better, //arry/
I rejected datetime.datetime because I want to get nanosecond resolution for time and os modules, not only for the os module. If we choose to only patch the os module (*stat() and *utime*() functions), datetime.datetime would be meaningful (e.g. it's easier to format datetime for an human, than a Epoch timestamp). I don't think that it's a real issue that datetime is not fully compatible with float. If os.stat() continues to return float by default, programs asking explicitly for datetime would be prepared to handle this type. I have the same rationale with Decimal :-) I don't think that there is a need to support datetime+int or datetime-float, there is already the timedelta type which is well defined. For os.stat(), you should use the UTC timezone, not a naive datetime.
* Add a new parameter to functions that produce stat-like timestamps to explicitly specify the type of the timestamps (float or datetime), as proposed in PEP 410.
What is a stat-like timestamp? Which functions are concerned?
Similarly, I realize os.stat_float_times was always a bit of a hack, what with it being global state and all. However the approach has the virtue of having worked in the past.
A global switch to get timestamps as datetime or Decimal would break libraries and programs unable to handle these types. I prefer adding an argument to os.*stat() functions to avoid border effects. Read also: http://www.python.org/dev/peps/pep-0410/#add-a-global-flag-to-change-the-tim...
Specficially addressing PEP 410's concerns:
* I don't propose doing anything about the other functions that have no explicit start time; I'm only proposing changing the functions that deal with timestamps. (Perhaps the right thing for epoch-less times like time.clock would be timedelta? But I think we can table this discussion for now.)
We may choose a different solution for the os.stat()/os.utime() and for the others functions (see the PEP 410 for the full list). But I would prefer an unified solution to provide nanosecond resolution in all modules. It would avoid to have to support two new types for example. Victor
On 02/23/2012 02:35 PM, Victor Stinner wrote:
I rejected datetime.datetime because I want to get nanosecond resolution for time and os modules, not only for the os module. If we choose to only patch the os module (*stat() and *utime*() functions), datetime.datetime would be meaningful (e.g. it's easier to format datetime for an human, than a Epoch timestamp).
I think a piecemeal approach would be better. I'm aware of a specific problem with os.stat / os.utime--the loss of precision problem that's already been endlessly discussed. Is there a similar problem with these other functions?
I don't think that there is a need to support datetime+int or datetime-float, there is already the timedelta type which is well defined.
I suggest this because I myself have written (admittedly sloppy) code that assumed it could perform simple addition with st_mtime. Instead of finding out the current timestamp and writing that out properly, I occasionally read in the file's mtime, add a small integer (or even smaller float), and write it back out.
For os.stat(), you should use the UTC timezone, not a naive datetime.
Why is that more appropriate? IIUC, timestamps ignore leap seconds and strictly represent "seconds since the epoch". In order to correctly return a time in the UTC time zone we'd have to adjust for leap seconds. Naive datetimes bask in their happy ignorance of such complexities. //arry/
On Thu, Feb 23, 2012 at 3:47 PM, Larry Hastings <larry@hastings.org> wrote:
On 02/23/2012 02:35 PM, Victor Stinner wrote:
For os.stat(), you should use the UTC timezone, not a naive datetime.
Why is that more appropriate? IIUC, timestamps ignore leap seconds and strictly represent "seconds since the epoch". In order to correctly return a time in the UTC time zone we'd have to adjust for leap seconds. Naive datetimes bask in their happy ignorance of such complexities.
You seem to have the meaning of "ignore leap seconds" backwards. POSIX timestamps are not *literally* seconds since the epoch. They are *non-leap* seconds since the epoch, which is just what you want. IOW the simple calculation ignoring leap seconds (found e.g. in calendar.py) will always produce the right value. -- --Guido van Rossum (python.org/~guido)
On Feb 23, 2012, at 01:28 PM, Larry Hastings wrote:
* Improve datetime.datetime objects so they support nanosecond resolution, in such a way that it's 100% painless to make them even more precise in the future.
+1
* Add support to datetime objects that allows adding and subtracting ints and floats as seconds. This behavior is controllable with a flag on the object--by default this behavior is off.
Why conditionalize this behavior? It should either be enabled or not, but making it switchable on a per-object basis seems like asking for trouble.
* Support accepting naive datetime.datetime objects in all functions that accept a timestamp in os (utime etc).
+1
* Change the result of os.stat to be a custom class rather than a PyStructSequence. Support the sequence protocol on the custom class but mark it PendingDeprecation, to be removed completely in 3.5. (I can't take credit for this idea; MvL suggested it to me once while we were talking about this issue. Now that the os.stat object has named fields, who uses the struct unpacking anymore?)
+1
* Add support for setting "stat_float_times=2" (or perhaps "stat_float_times=datetime.datetime" ?) to enable returning st_[acm]time as naive datetime.datetime objects--specifically, ones that allow addition and subtraction of ints and floats. The value would be similar to calling datetime.datetime.fromdatetime() on the current float timestamp, but would preserve all available precision.
I personally don't much like the global state represented by os.stat_float_times() in the first place. Even though it also could be considered somewhat un-Pythonthic, I think it probably would have been better to accept an optional argument in os.stat() to determine the return value. Or maybe it would have been more acceptable to have os.stat(), os.stat_float(), and os.stat_datetime() methods.
* Add a new parameter to functions that produce stat-like timestamps to explicitly specify the type of the timestamps (float or datetime), as proposed in PEP 410.
+1
I disagree with PEP 410's conclusions about the suitability of datetime as a timestamp object. I think "naive" datetime objects are a perfect fit. Specficially addressing PEP 410's concerns:
* I don't propose doing anything about the other functions that have no explicit start time; I'm only proposing changing the functions that deal with timestamps. (Perhaps the right thing for epoch-less times like time.clock would be timedelta? But I think we can table this discussion for now.)
+1, and yeah, I think we've had general agreement about using timedeltas for epoch-less times.
* "You can't compare naive and non-naive datetimes." So what? The existing timestamp from os.stat is a float, and you can't compare floats and non-naive datetimes. How is this an issue?
Exactly.
Perhaps someone else can propose something even better,
If we really feel like we need to make a change to support higher resolution timestamps, this comes pretty darn close to what I'd like to see. -Barry
On Sat, Feb 25, 2012 at 1:31 PM, Barry Warsaw <barry@python.org> wrote:
On Feb 23, 2012, at 01:28 PM, Larry Hastings wrote:
* Improve datetime.datetime objects so they support nanosecond resolution, in such a way that it's 100% painless to make them even more precise in the future.
+1
And how would you do that? Given the way the API currently works you pretty much have to add a separate field 'nanosecond' with a range of 0-999, leaving the microseconds field the same. (There are no redundant fields.) This is possible but makes it quite awkward by the time we've added picosecond and femtosecond.
* Add support to datetime objects that allows adding and subtracting ints and floats as seconds. This behavior is controllable with a flag on the object--by default this behavior is off.
Why conditionalize this behavior? It should either be enabled or not, but making it switchable on a per-object basis seems like asking for trouble.
I am guessing that Larry isn't convinced that this is always a good idea, but I agree with Barry that making it conditional is just too complex.
* Support accepting naive datetime.datetime objects in all functions that accept a timestamp in os (utime etc).
+1
What timezone would it assume? Timestamps are traditionally linked to UTC -- but naive timestamps are most frequently used for local time. Local time is awkward due to the ambiguities around DST transitions. I do think we should support APIs for going back and forth between timezone-aware datetime and timestamps.
* Change the result of os.stat to be a custom class rather than a PyStructSequence. Support the sequence protocol on the custom class but mark it PendingDeprecation, to be removed completely in 3.5. (I can't take credit for this idea; MvL suggested it to me once while we were talking about this issue. Now that the os.stat object has named fields, who uses the struct unpacking anymore?)
+1
Yeah, the sequence protocol is outdated here. Would this be a mutable or an immutable object?
* Add support for setting "stat_float_times=2" (or perhaps "stat_float_times=datetime.datetime" ?) to enable returning st_[acm]time as naive datetime.datetime objects--specifically, ones that allow addition and subtraction of ints and floats. The value would be similar to calling datetime.datetime.fromdatetime() on the current float timestamp, but would preserve all available precision.
I personally don't much like the global state represented by os.stat_float_times() in the first place.
Agreed. We should just deprecate stat_float_times().
Even though it also could be considered somewhat un-Pythonthic, I think it probably would have been better to accept an optional argument in os.stat() to determine the return value.
I still really don't like this.
Or maybe it would have been more acceptable to have os.stat(), os.stat_float(), and os.stat_datetime() methods.
But I also don't like a proliferation of functions, especially since there are already so many stat() functions: stat(), fstat(), fstatat(). My proposal: add extra fields that represent the time in different types. E.g. st_atime_nsec could be an integer expressing the entire timestamp in nanoseconds; st_atime_decimal could give as much precision as happens to be available as a Decimal; st_atime_datetime could be a UTC-based datetime; and in the future we could have other forms. Plain st_atime would be a float. (It can change if and when the default floating point type changes.) We could make these fields lazily computed so that if you never touch st_atime_decimal, the decimal module doesn't get loaded. (It would be awkward if "import os" would imply "import decimal", since the latter is huge.)
* Add a new parameter to functions that produce stat-like timestamps to explicitly specify the type of the timestamps (float or datetime), as proposed in PEP 410.
+1
No.
I disagree with PEP 410's conclusions about the suitability of datetime as a timestamp object. I think "naive" datetime objects are a perfect fit. Specficially addressing PEP 410's concerns:
* I don't propose doing anything about the other functions that have no explicit start time; I'm only proposing changing the functions that deal with timestamps. (Perhaps the right thing for epoch-less times like time.clock would be timedelta? But I think we can table this discussion for now.)
+1, and yeah, I think we've had general agreement about using timedeltas for epoch-less times.
Scratch that, *I* don't agree. timedelta is a pretty clumsy type to use. Have you ever tried to compute the number of seconds between two datetimes? You can't just use the .seconds field, you have to combine the .days and .seconds fields. And negative timedeltas are even harder due to the requirement that seconds and microseconds are never negative; e.g -1 second is represented as -1 days plus 86399 seconds. For fixed-epoch timestamps, *maybe* UTC datetime makes some sense. (We did add the UTC timezone to the stdlib right?) But still I think the flexibility of floating point wins, and there are no worries about ambiguities.
* "You can't compare naive and non-naive datetimes." So what? The existing timestamp from os.stat is a float, and you can't compare floats and non-naive datetimes. How is this an issue?
Exactly.
The problem is with the ambiguity of naive datetimes.
Perhaps someone else can propose something even better,
If we really feel like we need to make a change to support higher resolution timestamps, this comes pretty darn close to what I'd like to see.
I'm currently also engaged in an off-list discussion with Victor. I still think that when you are actually interested in *using* times, the current float format is absolutely fine. Anybody who thinks they need to accurately know the absolute time that something happened with nanosecond accuracy is out of their mind; given relativity such times have an incredibly local significance anyway. So I don't worry about not being able to represent a timestamp with nsec precision. For *relative* times, nanoseconds may be useful, and a float has no trouble representing them. (A float can represent time intervals of many millions of seconds with nanosecond precision. There are probably only a few clocks in the world whose drift is less than a nanosecond over such a timespan.) The one exception here is making accurate copies of filesystem metadata. This can be dealt with by making certain changes to os.stat() and os.utime(). For os.stat(), adding extra fields like I suggested above should work. For os.utime(), we could use keyword arguments, or some other API hack. -- --Guido van Rossum (python.org/~guido)
On 02/25/2012 03:31 PM, Guido van Rossum wrote:
On Sat, Feb 25, 2012 at 1:31 PM, Barry Warsaw<barry@python.org> wrote:
On Feb 23, 2012, at 01:28 PM, Larry Hastings wrote:
* Change the result of os.stat to be a custom class rather than a PyStructSequence. Support the sequence protocol on the custom class but mark it PendingDeprecation [...] +1 Yeah, the sequence protocol is outdated here.
Would this be a mutable or an immutable object?
Immutable, just like the current PyStructSequence object. //arry/
Scratch that, *I* don't agree. timedelta is a pretty clumsy type to use. Have you ever tried to compute the number of seconds between two datetimes? You can't just use the .seconds field, you have to combine the .days and .seconds fields. And negative timedeltas are even harder due to the requirement that seconds and microseconds are never negative; e.g -1 second is represented as -1 days plus 86399 seconds.
Guido, you should switch to Python3! timedelta has a new total_seconds() method since Python 3.2. http://docs.python.org/py3k/library/datetime.html#datetime.timedelta.total_s...
datetime.timedelta(1).total_seconds() 86400.0 datetime.timedelta(seconds=-1).total_seconds() -1.0
Victor
Scratch that, *I* don't agree. timedelta is a pretty clumsy type to use. Have you ever tried to compute the number of seconds between two datetimes? You can't just use the .seconds field, you have to combine the .days and .seconds fields. And negative timedeltas are even harder due to the requirement that seconds and microseconds are never negative; e.g -1 second is represented as -1 days plus 86399 seconds.
Guido, you should switch to Python3! timedelta has a new total_seconds() method since Python 3.2. http://docs.python.org/py3k/library/datetime.html#datetime.timedelta.total_s...
datetime.timedelta(1).total_seconds() 86400.0 datetime.timedelta(seconds=-1).total_seconds() -1.0
Victor
On Sun, Feb 26, 2012 at 1:31 AM, Guido van Rossum <guido@python.org> wrote:
I still think that when you are actually interested in *using* times, the current float format is absolutely fine. Anybody who thinks they need to accurately know the absolute time that something happened with nanosecond accuracy is out of their mind; given relativity such times have an incredibly local significance anyway.
There are good scientific use cases for nanosecond time resolution (e.g. radio astronomy) where one is actually measuring time down to that level and taking into account propagation delays. I have first hand experience of at least one radio telescope (MeerKAT) that is using Python to process these sorts of timestamps (Maciej even gave a talk on MeerKAT at PyCon 2011 :). Often these sorts of applications just use an large integer to hold the time. Higher-level constructs like datetime tend to be too bulky and provide functionality that is not particularly relevant. There is also a lot of pressure to have all the details coded by an in-house expert (because you need complete control and understanding of them, so you might as well do it yourself rather than continually patch, say, Python, to match your instrument's view of how this should all work). Hardware capable of generating nanosecond accurate timestamps is, however, becoming fairly easy to get hold of (a suitable crystalline clock slaved to a decent GPS unit can get you a lot of the way) and there are probably quite a few applications where it might become relevant. I'm not sure whether any of this is intended to be for or against any side in the current discussion. :D Schiavo Simon
On 02/26/2012 06:51 AM, Simon Cross wrote:
There are good scientific use cases for nanosecond time resolution (e.g. radio astronomy) where one is actually measuring time down to that level and taking into account propagation delays. I have first hand experience [...] I'm not sure whether any of this is intended to be for or against any side in the current discussion. :D
It's probably neutral. But I do have one question: can you foresee the scientific community moving to a finer resolution than nanoseconds in our lifetimes? //arry/
my 2 cents... being in electronics for over 30 years, it is forever expanding in both directions, bigger mega, giga, tera, peta, etc. AND smaller nano, pico, femto, atto. but, I agree that it is moot, as it is not the range, which is usually expressed in an exponential component of the system being used (decimal, hex., etc), and it is more a matter of significant number of digits being operated on, at that point in time. Basically the zeroes are removed and tracked separately. Tony On Sun, Feb 26, 2012 at 11:12 AM, Larry Hastings <larry@hastings.org> wrote:
On 02/26/2012 06:51 AM, Simon Cross wrote:
There are good scientific use cases for nanosecond time resolution (e.g. radio astronomy) where one is actually measuring time down to that level and taking into account propagation delays. I have first hand experience [...]
I'm not sure whether any of this is intended to be for or against any side in the current discussion. :D
It's probably neutral. But I do have one question: can you foresee the scientific community moving to a finer resolution than nanoseconds in our lifetimes?
//arry/
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** tkoker%40gmail.com<http://mail.python.org/mailman/options/python-dev/tkoker%40gmail.com>
Also, data collection will almost always be done by specialized hardware and the data stored off for deferred processing and analysis. Tony On Sun, Feb 26, 2012 at 11:34 AM, Tony Koker <tkoker@gmail.com> wrote:
my 2 cents...
being in electronics for over 30 years, it is forever expanding in both directions, bigger mega, giga, tera, peta, etc. AND smaller nano, pico, femto, atto.
but, I agree that it is moot, as it is not the range, which is usually expressed in an exponential component of the system being used (decimal, hex., etc), and it is more a matter of significant number of digits being operated on, at that point in time. Basically the zeroes are removed and tracked separately.
Tony
On Sun, Feb 26, 2012 at 11:12 AM, Larry Hastings <larry@hastings.org>wrote:
On 02/26/2012 06:51 AM, Simon Cross wrote:
There are good scientific use cases for nanosecond time resolution (e.g. radio astronomy) where one is actually measuring time down to that level and taking into account propagation delays. I have first hand experience [...]
I'm not sure whether any of this is intended to be for or against any side in the current discussion. :D
It's probably neutral. But I do have one question: can you foresee the scientific community moving to a finer resolution than nanoseconds in our lifetimes?
//arry/
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** tkoker%40gmail.com<http://mail.python.org/mailman/options/python-dev/tkoker%40gmail.com>
On Sun, Feb 26, 2012 at 6:12 PM, Larry Hastings <larry@hastings.org> wrote:
It's probably neutral. But I do have one question: can you foresee the scientific community moving to a finer resolution than nanoseconds in our lifetimes?
I think we're already there. Even just in radio astronomy new arrays like ALMA which operate a terahertz frequencies are looking at picosecond or possibly femtosecond timing accuracy (ALMA operates at ~1000 times higher frequency than MeerKAT so they need ~1000 times more accurate timing). E.g. http://www.guardian.co.uk/science/2012/jan/29/alma-radio-telescope-chile-ast... Schiavo Simon
On Sun, Feb 26, 2012 at 10:11 AM, Simon Cross <hodgestar+pythondev@gmail.com> wrote:
On Sun, Feb 26, 2012 at 6:12 PM, Larry Hastings <larry@hastings.org> wrote:
It's probably neutral. But I do have one question: can you foresee the scientific community moving to a finer resolution than nanoseconds in our lifetimes?
I think we're already there. Even just in radio astronomy new arrays like ALMA which operate a terahertz frequencies are looking at picosecond or possibly femtosecond timing accuracy (ALMA operates at ~1000 times higher frequency than MeerKAT so they need ~1000 times more accurate timing).
E.g. http://www.guardian.co.uk/science/2012/jan/29/alma-radio-telescope-chile-ast...
None of that bears any relation on the precision of the timers available in the OS through Python's time and os APIs. -- --Guido van Rossum (python.org/~guido)
participants (7)
-
Barry Warsaw
-
Guido van Rossum
-
Larry Hastings
-
Simon Cross
-
Tony Koker
-
Victor Stinner
-
Victor Stinner